What Changed
Codex CLI gained the ability to spawn specialized sub-agents in parallel, wait for all results, and return a consolidated response. Before this, Codex was single-threaded: one model, one context window, one task at a time. Multi-agent breaks that constraint for problems that split naturally into parallel subtasks.
The implementation is config-driven. You define agent roles in config.toml with per-role models, instructions, sandbox modes, and even dedicated MCP servers. Codex decides when to spawn agents automatically, or you can tell it explicitly. A built-in CSV batch tool handles structured fan-out for audits and reviews.
The feature is experimental and requires an opt-in flag. It works in the CLI today; visibility in the Codex desktop app and IDE extensions is coming.
Enable Multi-Agent
Two ways to enable it. From the CLI, run /experimental and toggle "Multi-agents", then restart Codex. Or add the flag directly to your config file:
~/.codex/config.toml
[features]
multi_agent = trueRestart Required
After changing the config, restart Codex for the flag to take effect. The multi-agent UI surfaces in the CLI only. Activity in the desktop app and IDE extension is not yet visible.
How It Works
Codex handles orchestration: spawning sub-agents, routing instructions, waiting for results, and closing threads. When multiple agents are running, Codex waits until all requested results are available before returning a consolidated response.
You can let Codex decide when to spawn agents, or request it explicitly. A typical prompt:
Example: parallel PR review
Review this PR (branch vs main). Spawn one agent per point,
wait for all of them, and summarize each result.
1. Security issues
2. Code quality
3. Bugs
4. Race conditions
5. Test flakiness
6. MaintainabilityCodex spawns six sub-agents, each focused on one review dimension. Each agent gets its own context window and can read the codebase independently. Results come back as a single consolidated response.
For long-running commands, Codex can use the built-in monitor role, which is tuned for waiting and repeated status checks. The wait tool supports polling windows up to one hour per call.
Use /agent in the CLI to switch between active agent threads and inspect ongoing work. You can also steer running agents by asking Codex directly to redirect, stop, or close them.
Built-in Agent Roles
Codex ships four roles out of the box. Each is tuned for a specific type of work:
default
General-purpose fallback. Used when no specific role is requested. Full read-write access, standard model settings.
worker
Execution-focused. Designed for implementation tasks and fixes. Gets write access and runs at the standard model configuration.
explorer
Read-heavy codebase exploration. Traces execution paths, searches for patterns, and gathers evidence without proposing changes.
monitor
Long-running command and task monitoring. Optimized for waiting, polling, and repeated status checks. Supports up to 1-hour polling windows.
If you define a custom role with the same name as a built-in (e.g., explorer), your definition takes precedence. Any configuration not set by the agent role is inherited from the parent session.
Custom Agent Roles
Custom roles live in the [agents] section of config.toml. Each role has a name, optional description for when Codex should use it, and an optional config_file that sets model, sandbox mode, instructions, and MCP servers.
Project config: .codex/config.toml
[agents]
max_threads = 6
max_depth = 1
[agents.explorer]
description = "Read-only codebase explorer for gathering evidence."
config_file = "agents/explorer.toml"
[agents.reviewer]
description = "PR reviewer: correctness, security, missing tests."
config_file = "agents/reviewer.toml"
[agents.docs_researcher]
description = "Documentation specialist using docs MCP server."
config_file = "agents/docs-researcher.toml"Each config file overrides the parent session's defaults for that role:
agents/explorer.toml
model = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"
sandbox_mode = "read-only"
developer_instructions = """
Stay in exploration mode.
Trace the real execution path, cite files and symbols.
Avoid proposing fixes unless the parent agent asks.
Prefer fast search and targeted file reads over broad scans.
"""agents/reviewer.toml
model = "gpt-5.3-codex"
model_reasoning_effort = "high"
sandbox_mode = "read-only"
developer_instructions = """
Review code like an owner.
Prioritize correctness, security, behavior regressions, missing tests.
Lead with concrete findings. Include reproduction steps.
Avoid style-only comments unless they hide a real bug.
"""agents/docs-researcher.toml (with MCP server)
model = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"
sandbox_mode = "read-only"
developer_instructions = """
Use the docs MCP server to confirm APIs and version-specific behavior.
Return concise answers with links or exact references.
Do not make code changes.
"""
[mcp_servers.openaiDeveloperDocs]
url = "https://developers.openai.com/mcp"Key Design Principle
The best role definitions are narrow and opinionated. Give each role one clear job, a tool surface that matches that job, and instructions that keep it from drifting into adjacent work.
CSV Batch Processing
spawn_agents_on_csv is the structured fan-out tool. It reads a CSV, spawns one worker per row, waits for all to finish, and exports combined results to a new CSV. Each worker gets a templated instruction with {column_name} placeholders filled from its row.
CSV batch example
Create /tmp/components.csv with columns path,owner
and one row per frontend component.
Then call spawn_agents_on_csv with:
- csv_path: /tmp/components.csv
- id_column: path
- instruction: "Review {path} owned by {owner}. Return JSON
with keys path, risk, summary, follow_up via
report_agent_job_result."
- output_csv_path: /tmp/components-review.csv
- output_schema: { path, risk, summary, follow_up }The tool accepts max_concurrency and max_runtime_seconds for job control. Each worker must call report_agent_job_result exactly once. Workers that exit without reporting are marked as failed in the exported CSV.
Good use cases for CSV batching:
- Reviewing one file, package, or service per row
- Checking a list of incidents, PRs, or migration targets
- Generating structured summaries for many similar inputs
- Auditing security configurations across microservices
The exported CSV includes original row data plus metadata: job_id, item_id, status, last_error, and result_json. When run through codex exec, a single-line progress update shows on stderr while the batch runs.
Config Schema Reference
| Field | Type | Default | Purpose |
|---|---|---|---|
| agents.max_threads | number | — | Max concurrently open agent threads |
| agents.max_depth | number | 1 | Max nesting depth (root = 0) |
| agents.job_max_runtime_seconds | number | 1800 | Default per-worker timeout for CSV jobs |
| agents.<name>.description | string | — | Role guidance shown to Codex |
| agents.<name>.config_file | string (path) | — | TOML config layer for this role |
Validation Rules
Unknown fields in [agents.<name>] are rejected. The config_file path is validated at load time and must point to an existing file. Relative paths resolve from the config.toml that defines the role. If a role config file fails to load, agent spawns fail until you fix it.
Common settings to override per role: model, model_reasoning_effort, sandbox_mode, and developer_instructions. Any setting not specified in the role config inherits from the parent session.
Example: PR Review Team
Three roles split review into focused concerns. The explorer maps affected code paths, the reviewer finds real risks, and the docs_researcher verifies framework APIs.
Prompt
Review this branch against main. Have explorer map the affected
code paths, reviewer find real risks, and docs_researcher verify
the framework APIs that the patch relies on.The explorer uses gpt-5.3-codex-spark at medium reasoning in read-only mode. The reviewer uses the full gpt-5.3-codex at high reasoning effort. The docs_researcher connects to an MCP server for API reference lookups. All three run in parallel, and Codex consolidates their findings into one response.
This pattern works because each role has one clear job with no overlap. The explorer does not propose fixes. The reviewer does not chase documentation. The docs_researcher does not edit code. Narrow scope reduces hallucination and context waste.
Example: Frontend Integration Debugging
A three-role setup for UI regressions and integration bugs:
explorer
Maps the code that owns the failing UI flow. Identifies entry points, state transitions, and likely files before the worker starts editing.
browser_debugger
Reproduces the issue in the browser, captures screenshots, console output, and network evidence. Uses a Chrome DevTools MCP server. Does not edit application code.
worker
Owns the fix once the issue is reproduced. Makes the smallest defensible change. Validates only the behavior it changed.
agents/browser-debugger.toml
model = "gpt-5.3-codex"
model_reasoning_effort = "high"
sandbox_mode = "workspace-write"
developer_instructions = """
Reproduce the issue in the browser.
Capture exact steps and report what the UI actually does.
Use browser tooling for screenshots, console output, network evidence.
Do not edit application code.
"""
[mcp_servers.chrome_devtools]
url = "http://localhost:3000/mcp"
startup_timeout_sec = 20The sequence matters here: browser_debugger and explorer run in parallel to gather evidence, then worker takes over once the failure mode is clear. Codex handles this coordination automatically.
Approvals and Sandboxing
Sub-agents inherit the parent session's sandbox policy. In interactive CLI sessions, approval requests can surface from inactive agent threads while you're looking at the main thread. The approval overlay shows the source thread label, and you can press o to open that thread before deciding.
In non-interactive flows (like codex exec), actions that need fresh approval fail and the error surfaces to the parent workflow. Live runtime overrides (such as --yolo or /approvals changes) propagate to child agents at spawn time, even if the role's config file specifies different defaults.
You can also restrict specific roles. An explorer that should never write files gets sandbox_mode = "read-only" in its role config, regardless of what the parent session allows.
Codex vs Claude Code Multi-Agent
Both Codex and Claude Code Agent Teams solve the same problem: parallelizing work across multiple AI agents. The implementations differ significantly.
| Dimension | Codex CLI | Claude Code Agent Teams |
|---|---|---|
| Configuration | config.toml with role files | Settings JSON flag + inline prompts |
| Per-agent model | Yes, per-role config_file | No, all teammates use same model |
| Inter-agent communication | Results return to parent only | Direct messaging + shared task list |
| Agent coordination | Parent orchestrates all work | Self-coordinating via task claims |
| CSV batch processing | spawn_agents_on_csv built-in | No equivalent |
| MCP servers per role | Yes, per-agent config | Shared from parent session |
| Nesting depth | Configurable (default: 1) | Configurable (default: 1) |
| Sandbox control | Per-role override | Inherited from parent |
| Worktree isolation | Per-agent thread (app) | Per-teammate branch |
| Best for | Structured audits, role-based reviews | Collaborative exploration, big features |
Codex's strength is structured configuration. You get per-role model selection, dedicated MCP servers per agent, and CSV batch processing for repetitive audits. Claude Code's strength is emergent collaboration. Teammates message each other, claim tasks autonomously, and coordinate without the lead acting as bottleneck.
Choose Codex multi-agent when you know the roles and structure in advance. Choose Claude Code agent teams when the problem requires agents to discover and divide work themselves.
Where Morph Fits
Multi-agent workflows amplify both the speed and cost of every tool the agents call. When six agents each do file rewrites through a frontier model, you pay frontier token prices six times. When each agent searches the codebase sequentially, the wall-clock time multiplies.
Two Morph services address these bottlenecks directly:
WarpGrep for Explorer Agents
WarpGrep runs parallel sub-agent searches across your codebase, returning precise results in under 6 seconds. Instead of each Codex explorer scanning files sequentially, connect WarpGrep as an MCP server and let it handle the search layer. 8 parallel tool calls per turn, 4 turns deep.
Fast Apply for Worker Agents
Every file edit through a worker agent costs 3,500-4,500 tokens at frontier prices if you do full-file rewrites. Fast Apply merges edits at 10,500 tok/s for $0.80/M input tokens. A 500-line file merge takes 0.8 seconds instead of 8-10 seconds through GPT-5.3.
Adding WarpGrep as an MCP server to a Codex explorer role:
agents/explorer-with-warpgrep.toml
model = "gpt-5.3-codex-spark"
sandbox_mode = "read-only"
developer_instructions = """
Use WarpGrep for all codebase searches.
Prefer semantic queries over exact string matching.
"""
[mcp_servers.warpgrep]
url = "https://mcp.morphllm.com/warpgrep"Speed Up Multi-Agent Workflows
WarpGrep and Fast Apply reduce token costs and latency across every agent in your team. Try them on a real codebase.
Limitations
- Experimental only. Requires explicit opt-in via config flag. The API and config schema may change.
- CLI-only visibility. Multi-agent activity is not yet visible in the Codex desktop app or IDE extensions.
- No inter-agent messaging. Sub-agents can only return results to the parent. They cannot message each other directly (unlike Claude Code agent teams).
- Nesting defaults to depth 1. A child agent cannot spawn further sub-agents by default. Increasing
max_depthincreases cost and complexity. - Non-interactive approval failures. In
codex execor batch jobs, actions needing approval fail silently and surface errors to the parent. - Token costs scale linearly. Each agent maintains its own context window. A 6-agent review costs roughly 6x a single-agent review in tokens.
- CSV batch requires structured output. Workers must call
report_agent_job_resultexactly once. Workers that forget this call get marked as failed.
FAQ
How do I enable Codex CLI multi-agent?
Add multi_agent = true under [features] in ~/.codex/config.toml and restart Codex. Or use /experimental in the CLI to toggle it interactively.
What are the four built-in agent roles?
default (general-purpose fallback), worker (implementation and fixes), explorer (read-only codebase exploration), and monitor (long-running command/task monitoring with up to 1-hour polling windows).
Can I use different models per agent?
Yes. Each role's config_file can set model and model_reasoning_effort. A common pattern: gpt-5.3-codex-spark for explorers (fast, cheap) and gpt-5.3-codex for reviewers (thorough).
What is spawn_agents_on_csv?
A tool for batch fan-out. It reads a CSV, spawns one worker per row with a templated instruction, waits for all workers, and exports results to a new CSV. Each worker must call report_agent_job_result exactly once. Failed workers are flagged in the output.
How does this compare to Claude Code agent teams?
Codex multi-agent is config-driven with per-role models, MCP servers, and CSV batching. Claude Code agent teams are collaboration-driven with shared task lists and direct inter-agent messaging. Codex is better for structured, role-based workflows. Claude Code is better for emergent, self-coordinating work. See the full comparison.
Do sub-agents inherit my sandbox settings?
Yes. Sub-agents inherit the parent's sandbox policy and live runtime overrides (including --yolo). You can also restrict roles individually, for example marking an explorer as sandbox_mode = "read-only".
Can I connect MCP servers to specific agent roles?
Yes. Add [mcp_servers.<name>] in a role's config file. The MCP server is available only to agents spawned with that role. Useful for giving explorers access to WarpGrep, or giving docs researchers access to documentation APIs.