OpenAI Swarm: Lightweight Multi-Agent Orchestration That Actually Works

OpenAI Swarm is a lightweight multi-agent framework built on two primitives: routines and handoffs. Agents are instructions plus functions. Handoffs transfer control by returning an Agent object. Stateless, no dependencies beyond the OpenAI SDK. Here's how it works, when to use it, and how it compares to LangGraph and CrewAI.

April 5, 2026 · 2 min read

What Is OpenAI Swarm

Most multi-agent frameworks solve the orchestration problem by adding layers: graph definitions, role schemas, memory backends, process managers. Swarm solves it by subtracting. An agent is a system prompt plus a list of functions. A handoff is a function that returns a different agent. That is the entire API surface.

OpenAI released Swarm in October 2024 as an open-source Python package. It is built directly on the Chat Completions API, adding no dependencies beyond the OpenAI SDK. The framework is stateless: each call to run() starts from scratch, processes messages through one or more agents, and returns the result. No session state survives between calls.

Swarm is explicitly educational. OpenAI describes it as a reference implementation for the patterns described in their Orchestrating Agents cookbook. It is not a managed service, not a production runtime, and not actively maintained. In March 2025, OpenAI released the Agents SDK as the production successor, keeping Swarm's primitives but adding guardrails, tracing, and TypeScript support.

The framework still works. The patterns it introduced are now the default way most developers think about agent coordination.

17K+
GitHub stars
2
Core primitives (Agent + Handoff)
0
External dependencies beyond OpenAI SDK

Core Concepts: Routines and Handoffs

Swarm is built on two ideas from the OpenAI cookbook: routines and handoffs.

Routines

A routine is a natural language instruction paired with the tools needed to execute it. In Swarm, this maps directly to the Agent class: instructions (a system prompt) plus functions (a list of Python callables). The language model reads the instructions, decides which functions to call, and follows the routine step by step.

The insight is that LLMs handle conditional logic well enough that you don't need explicit branching. Instead of coding "if the customer wants a refund, go to state X," you write "if the customer wants a refund, call the execute_refund function." The model interprets the instruction and picks the right tool. Routines replace state machines with natural language.

Handoffs

A handoff transfers control from one agent to another. In code, it is a function that returns an Agent object instead of a string. When Swarm's execution loop sees a function return an Agent, it switches: the new agent's instructions replace the old one's in the system prompt, and the new agent's functions become available. The full conversation history carries over.

This is the equivalent of a phone transfer. You call customer service, the triage agent listens to your problem, then connects you to the refund department. The refund agent has the full transcript of your conversation but a different set of capabilities.

Routines = Instructions + Tools

A system prompt defines what the agent does. Python functions define what it can do. The model reads the instructions and picks tools. No state machine, no graph definition, no role schema.

Handoffs = Functions That Return Agents

A transfer_to_refund_agent() function returns the refund Agent object. Swarm detects the return type, swaps the active agent, and continues the conversation. One line of code per handoff.

How It Works

The Swarm client exposes one function: run(). It takes an initial Agent, a list of messages, and optional context variables. Internally, it runs a loop:

  1. Send the current agent's instructions and available functions to the Chat Completions API
  2. If the model returns tool calls, execute each function in order
  3. If any function returns an Agent object, switch the active agent
  4. If any function updates context variables, merge the updates
  5. Repeat until the model returns a message with no tool calls
  6. Return a Response containing the messages, final agent, and context variables

The entire execution model is analogous to chat.completions.create() with a loop around it. Swarm converts Python function signatures into JSON Schema automatically: type hints become parameter types, docstrings become descriptions, parameters without defaults become required fields.

Context Variables

Context variables are a dictionary passed into run() that any function can read or update. If an agent's instructions are a callable (rather than a static string), the function receives the context variables and can generate dynamic instructions. This is how agents share state within a single run() call: the triage agent writes the customer's account ID to context, and the refund agent reads it.

Context variables do not persist between calls. When run() returns, the updated context comes back in the response, and you decide whether to pass it into the next call.

Stateless by Design

Swarm stores nothing between calls. No sessions, no database, no cache. Each run() invocation is independent. This makes the framework easy to reason about and test, but means you build your own persistence layer for anything beyond single-turn interactions.

Code Examples

Defining an Agent

Basic Agent with a function

from swarm import Swarm, Agent

client = Swarm()

def get_weather(location: str) -> str:
    """Get the current weather for a location."""
    return f"The weather in {location} is sunny, 72°F."

weather_agent = Agent(
    name="Weather Agent",
    instructions="You help users check the weather.",
    functions=[get_weather],
)

Agent Handoff

Triage agent that hands off to specialists

refund_agent = Agent(
    name="Refund Agent",
    instructions="You handle refund requests. Ask for the order ID, then process the refund.",
    functions=[execute_refund, check_order_status],
)

sales_agent = Agent(
    name="Sales Agent",
    instructions="You handle purchase inquiries. Help the customer find the right product.",
    functions=[search_products, place_order],
)

def transfer_to_refunds():
    """Transfer the customer to the refund department."""
    return refund_agent

def transfer_to_sales():
    """Transfer the customer to the sales department."""
    return sales_agent

triage_agent = Agent(
    name="Triage Agent",
    instructions="""You are a customer service triage agent.
Determine what the customer needs and transfer them:
- Refund or order issues -> transfer to refunds
- Purchase or product questions -> transfer to sales""",
    functions=[transfer_to_refunds, transfer_to_sales],
)

Running the Agent

Execute and read the response

response = client.run(
    agent=triage_agent,
    messages=[{"role": "user", "content": "I need a refund for order #1234"}],
)

# response.agent is now refund_agent (after handoff)
# response.messages contains the full conversation
print(response.messages[-1]["content"])

Context Variables

Sharing state between agents via context

def instructions(context_variables):
    name = context_variables.get("customer_name", "there")
    return f"You are a support agent. The customer's name is {name}. Be helpful."

def log_issue(context_variables, issue: str):
    """Log a customer issue and return confirmation."""
    customer = context_variables.get("customer_name", "Unknown")
    # In production, write to your database here
    return f"Issue logged for {customer}: {issue}"

support_agent = Agent(
    name="Support Agent",
    instructions=instructions,  # callable, receives context
    functions=[log_issue],
)

response = client.run(
    agent=support_agent,
    messages=[{"role": "user", "content": "My dashboard is not loading"}],
    context_variables={"customer_name": "Alex", "account_tier": "pro"},
)

Swarm vs LangGraph vs CrewAI

Swarm, LangGraph, and CrewAI solve the same problem (coordinating multiple agents) with fundamentally different abstractions. The right choice depends on where you are in the build cycle and how much control you need.

FeatureOpenAI SwarmLangGraphCrewAI
Core abstractionAgents + handoff functionsDirected acyclic graphs (DAGs)Crews with roles and tasks
State managementStateless (context vars per call)Built-in checkpointingBuilt-in memory system
Learning curveMinutesHours to days30-60 minutes
Production readinessExperimental / educationalProduction (LinkedIn, Uber, 400+ cos)Production (150+ enterprise, 60% F500)
Parallel executionSequential onlyYes (parallel graph branches)Yes (parallel task processes)
GuardrailsNoneCustom via graph logicInput/output validation
ObservabilityDebug flag onlyLangSmith integrationBuilt-in logging
Model supportOpenAI API format onlyAny (via LangChain)Any (via LiteLLM)
Lines to define an agent3-515-30+5-10
Best forPrototyping, learning, simple routingComplex stateful workflowsRole-based team collaboration

Swarm vs LangGraph

LangGraph gives you a graph where nodes are agent actions and edges define control flow. You get checkpointing, parallel branches, and visualization of your workflow. The tradeoff is complexity: defining a LangGraph workflow requires understanding graph construction, node functions, conditional edges, and state schemas. Swarm gives you a while loop and function calls. If your workflow is "triage, then specialist," Swarm gets you there in 20 lines. If your workflow has parallel branches that rejoin, retries with different strategies, and conditional loops, you need LangGraph.

Swarm vs CrewAI

CrewAI is more opinionated than Swarm but less complex than LangGraph. You define agents with roles, goals, and backstories, then assign them tasks with a process flow (sequential or hierarchical). CrewAI has built-in memory, structured outputs, and reports 40% faster time-to-production than LangGraph for standard business workflows. Swarm is smaller and simpler but gives you less: no memory, no structured output format, no process management. If you want a team of agents that collaborate on a defined task, CrewAI is faster to set up. If you want to understand the mechanics of agent coordination and build your own abstractions, Swarm is the better starting point.

When Swarm Is the Right Choice

Prototyping Multi-Agent Systems

You want to test whether a triage-to-specialist pattern works for your use case. Swarm lets you define the agents, wire up the handoffs, and run conversations in under an hour. No graph definitions, no role schemas, no configuration files.

Learning Agent Coordination

Swarm's source code is small enough to read in one sitting. The execution loop is a single while loop. There's no magic: you can trace exactly how the model's tool calls become function executions become agent switches. If you want to understand multi-agent systems, start here.

Simple Routing Workflows

Customer service triage, intent classification, or any workflow where one agent decides which specialist handles the request. If the routing logic is 'listen, classify, transfer,' Swarm handles it with less overhead than any alternative.

Embedding in Larger Systems

Because Swarm is stateless and has no external dependencies, you can embed it inside a larger application without worrying about database connections, session stores, or daemon processes. Call run(), get messages, done.

Limitations

Swarm is deliberately minimal. That means several capabilities you might expect from a multi-agent framework are not included.

No Persistent Memory

Context variables exist only within a single run() call. For multi-turn conversations, you pass messages and context back in on each call. There is no built-in conversation store, no long-term memory, no RAG integration.

No Guardrails or Validation

Swarm does not validate inputs, check outputs, or enforce safety constraints. If an agent hallucinates a function call with wrong parameters, the function receives those wrong parameters. You build your own validation.

No Parallel Execution

Agents execute sequentially. If three agents could work on independent subtasks simultaneously, Swarm still processes them one at a time. LangGraph's graph branches and CrewAI's parallel processes handle this. Swarm does not.

OpenAI API Only

Swarm is built on the OpenAI client library. It expects the Chat Completions API format. You can use OpenAI-compatible endpoints (via LiteLLM or similar), but there is no native support for Anthropic, Google, or other API formats.

Not Actively Maintained

OpenAI has not updated the Swarm repository since releasing the Agents SDK in March 2025. Bug reports and PRs are not being triaged. If you need a supported framework with the same patterns, the Agents SDK is the intended upgrade path.

From Swarm to the OpenAI Agents SDK

The Agents SDK, released in March 2025, is the production evolution of Swarm. The conceptual model is identical: agents with instructions and tools, handoffs via function returns. The Agents SDK adds what Swarm deliberately left out:

CapabilitySwarmAgents SDK
StatusExperimental, unmaintainedProduction, actively maintained
GuardrailsNoneInput/output validation, content filtering
TracingDebug flag (stdout)Full execution traces, visualization
Language supportPython onlyPython and TypeScript
Handoff modelFunction returns AgentManaged handoffs with context transfer
Tool integrationManual function definitionsBuilt-in tool types (code interpreter, file search)

If you are starting a new project, use the Agents SDK. If you are learning how multi-agent systems work, Swarm's simplicity makes it a better teaching tool. The patterns transfer directly.

Multi-Agent Infrastructure: The Hard Part

Frameworks like Swarm, the Agents SDK, and LangGraph solve the coordination problem: which agent runs when, how state flows between them, when to hand off. For most multi-agent use cases, coordination is the easy part. The infrastructure each agent needs to actually do its job is harder.

Consider a coding agent system built with Swarm's patterns. A triage agent classifies the task. A planning agent breaks it into subtasks. An implementation agent writes code. A review agent checks the output. Each of these agents needs:

  • An isolated sandbox to execute code without affecting other agents or the host system
  • Fast code search to find relevant files across the codebase
  • A way to apply edits to files reliably, not just generate diffs but validate and apply them
  • Context management to stay within token limits as codebases scale

Swarm gives you the orchestration layer. Morph gives you the execution layer: sandboxed code execution, codebase search, and fast apply for each agent in your system. When your triage agent hands off to an implementation agent, that agent needs a sandbox and search index ready in milliseconds, not minutes.

Anthropic's own research found a 90% improvement when using multi-agent architectures for complex coding tasks. Cognition measured that coding agents spend 60% of their time on search. The orchestration pattern matters. But the per-agent infrastructure, fast search, isolated execution, reliable file editing, determines whether the agents can actually deliver.

Frequently Asked Questions

What is OpenAI Swarm?

An experimental, open-source Python framework for multi-agent orchestration. It uses two primitives: Agents (instructions plus functions) and handoffs (functions that return another Agent). Stateless, minimal, built on the Chat Completions API. Released October 2024, now superseded by the Agents SDK.

Is OpenAI Swarm production-ready?

No. OpenAI labels it as educational and experimental. It has no session management, no guardrails, no observability, and is not actively maintained. Use the Agents SDK for production systems.

What is the difference between Swarm and the Agents SDK?

Same conceptual model, different maturity level. The Agents SDK adds guardrails, tracing, managed handoffs, built-in tool types, and TypeScript support. It is actively maintained and recommended for production.

How does Swarm compare to LangGraph?

LangGraph uses directed graphs with nodes and edges. It has built-in state management, checkpointing, parallel execution, and is production-grade (used by LinkedIn, Uber, 400+ companies). Swarm uses a simple while loop with function-based handoffs. Swarm is better for prototyping and learning. LangGraph is better for complex, stateful production workflows.

How does Swarm compare to CrewAI?

CrewAI organizes agents into crews with roles, goals, and backstories. It has built-in memory, structured outputs, and a defined process flow. CrewAI is more opinionated but reports 40% faster time-to-production for standard business workflows. Swarm is more minimal: two primitives, no opinions. CrewAI serves 150+ enterprise customers. Swarm is a reference implementation.

Can I use Swarm with non-OpenAI models?

Swarm uses the OpenAI API format. Any provider with an OpenAI-compatible endpoint works (local models via LiteLLM, vLLM, or Ollama). Native support for Anthropic, Google, or other API formats is not included.

Related

Give Your Swarm Agents Code Execution with Morph

Multi-agent systems need infrastructure per agent: sandboxed execution, codebase search, reliable file editing. Morph provides the execution layer that Swarm agents call into. Define the orchestration with Swarm or the Agents SDK. Let Morph handle what each agent actually does.