Codex CLI Multi-Agent: Agent Roles, CSV Batching, and Config Guide (2026)

Codex CLI ships multi-agent workflows: spawn specialized sub-agents in parallel, define custom agent roles in config.toml, batch CSV tasks with spawn_agents_on_csv. Full setup guide with examples.

March 5, 2026 · 1 min read

What Changed

Codex CLI gained the ability to spawn specialized sub-agents in parallel, wait for all results, and return a consolidated response. Before this, Codex was single-threaded: one model, one context window, one task at a time. Multi-agent breaks that constraint for problems that split naturally into parallel subtasks.

The implementation is config-driven. You define agent roles in config.toml with per-role models, instructions, sandbox modes, and even dedicated MCP servers. Codex decides when to spawn agents automatically, or you can tell it explicitly. A built-in CSV batch tool handles structured fan-out for audits and reviews.

4
Built-in agent roles
Per-role
Model selection
CSV
Batch fan-out tool
1 hr
Max wait/poll window

The feature is experimental and requires an opt-in flag. It works in the CLI today; visibility in the Codex desktop app and IDE extensions is coming.

Enable Multi-Agent

Two ways to enable it. From the CLI, run /experimental and toggle "Multi-agents", then restart Codex. Or add the flag directly to your config file:

~/.codex/config.toml

[features]
multi_agent = true

Restart Required

After changing the config, restart Codex for the flag to take effect. The multi-agent UI surfaces in the CLI only. Activity in the desktop app and IDE extension is not yet visible.

How It Works

Codex handles orchestration: spawning sub-agents, routing instructions, waiting for results, and closing threads. When multiple agents are running, Codex waits until all requested results are available before returning a consolidated response.

You can let Codex decide when to spawn agents, or request it explicitly. A typical prompt:

Example: parallel PR review

Review this PR (branch vs main). Spawn one agent per point,
wait for all of them, and summarize each result.

1. Security issues
2. Code quality
3. Bugs
4. Race conditions
5. Test flakiness
6. Maintainability

Codex spawns six sub-agents, each focused on one review dimension. Each agent gets its own context window and can read the codebase independently. Results come back as a single consolidated response.

For long-running commands, Codex can use the built-in monitor role, which is tuned for waiting and repeated status checks. The wait tool supports polling windows up to one hour per call.

Use /agent in the CLI to switch between active agent threads and inspect ongoing work. You can also steer running agents by asking Codex directly to redirect, stop, or close them.

Built-in Agent Roles

Codex ships four roles out of the box. Each is tuned for a specific type of work:

default

General-purpose fallback. Used when no specific role is requested. Full read-write access, standard model settings.

worker

Execution-focused. Designed for implementation tasks and fixes. Gets write access and runs at the standard model configuration.

explorer

Read-heavy codebase exploration. Traces execution paths, searches for patterns, and gathers evidence without proposing changes.

monitor

Long-running command and task monitoring. Optimized for waiting, polling, and repeated status checks. Supports up to 1-hour polling windows.

If you define a custom role with the same name as a built-in (e.g., explorer), your definition takes precedence. Any configuration not set by the agent role is inherited from the parent session.

Custom Agent Roles

Custom roles live in the [agents] section of config.toml. Each role has a name, optional description for when Codex should use it, and an optional config_file that sets model, sandbox mode, instructions, and MCP servers.

Project config: .codex/config.toml

[agents]
max_threads = 6
max_depth = 1

[agents.explorer]
description = "Read-only codebase explorer for gathering evidence."
config_file = "agents/explorer.toml"

[agents.reviewer]
description = "PR reviewer: correctness, security, missing tests."
config_file = "agents/reviewer.toml"

[agents.docs_researcher]
description = "Documentation specialist using docs MCP server."
config_file = "agents/docs-researcher.toml"

Each config file overrides the parent session's defaults for that role:

agents/explorer.toml

model = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"
sandbox_mode = "read-only"
developer_instructions = """
Stay in exploration mode.
Trace the real execution path, cite files and symbols.
Avoid proposing fixes unless the parent agent asks.
Prefer fast search and targeted file reads over broad scans.
"""

agents/reviewer.toml

model = "gpt-5.3-codex"
model_reasoning_effort = "high"
sandbox_mode = "read-only"
developer_instructions = """
Review code like an owner.
Prioritize correctness, security, behavior regressions, missing tests.
Lead with concrete findings. Include reproduction steps.
Avoid style-only comments unless they hide a real bug.
"""

agents/docs-researcher.toml (with MCP server)

model = "gpt-5.3-codex-spark"
model_reasoning_effort = "medium"
sandbox_mode = "read-only"
developer_instructions = """
Use the docs MCP server to confirm APIs and version-specific behavior.
Return concise answers with links or exact references.
Do not make code changes.
"""

[mcp_servers.openaiDeveloperDocs]
url = "https://developers.openai.com/mcp"

Key Design Principle

The best role definitions are narrow and opinionated. Give each role one clear job, a tool surface that matches that job, and instructions that keep it from drifting into adjacent work.

CSV Batch Processing

spawn_agents_on_csv is the structured fan-out tool. It reads a CSV, spawns one worker per row, waits for all to finish, and exports combined results to a new CSV. Each worker gets a templated instruction with {column_name} placeholders filled from its row.

CSV batch example

Create /tmp/components.csv with columns path,owner
and one row per frontend component.

Then call spawn_agents_on_csv with:
- csv_path: /tmp/components.csv
- id_column: path
- instruction: "Review {path} owned by {owner}. Return JSON
  with keys path, risk, summary, follow_up via
  report_agent_job_result."
- output_csv_path: /tmp/components-review.csv
- output_schema: { path, risk, summary, follow_up }

The tool accepts max_concurrency and max_runtime_seconds for job control. Each worker must call report_agent_job_result exactly once. Workers that exit without reporting are marked as failed in the exported CSV.

Good use cases for CSV batching:

  • Reviewing one file, package, or service per row
  • Checking a list of incidents, PRs, or migration targets
  • Generating structured summaries for many similar inputs
  • Auditing security configurations across microservices

The exported CSV includes original row data plus metadata: job_id, item_id, status, last_error, and result_json. When run through codex exec, a single-line progress update shows on stderr while the batch runs.

Config Schema Reference

FieldTypeDefaultPurpose
agents.max_threadsnumberMax concurrently open agent threads
agents.max_depthnumber1Max nesting depth (root = 0)
agents.job_max_runtime_secondsnumber1800Default per-worker timeout for CSV jobs
agents.<name>.descriptionstringRole guidance shown to Codex
agents.<name>.config_filestring (path)TOML config layer for this role

Validation Rules

Unknown fields in [agents.<name>] are rejected. The config_file path is validated at load time and must point to an existing file. Relative paths resolve from the config.toml that defines the role. If a role config file fails to load, agent spawns fail until you fix it.

Common settings to override per role: model, model_reasoning_effort, sandbox_mode, and developer_instructions. Any setting not specified in the role config inherits from the parent session.

Example: PR Review Team

Three roles split review into focused concerns. The explorer maps affected code paths, the reviewer finds real risks, and the docs_researcher verifies framework APIs.

Prompt

Review this branch against main. Have explorer map the affected
code paths, reviewer find real risks, and docs_researcher verify
the framework APIs that the patch relies on.

The explorer uses gpt-5.3-codex-spark at medium reasoning in read-only mode. The reviewer uses the full gpt-5.3-codex at high reasoning effort. The docs_researcher connects to an MCP server for API reference lookups. All three run in parallel, and Codex consolidates their findings into one response.

This pattern works because each role has one clear job with no overlap. The explorer does not propose fixes. The reviewer does not chase documentation. The docs_researcher does not edit code. Narrow scope reduces hallucination and context waste.

Example: Frontend Integration Debugging

A three-role setup for UI regressions and integration bugs:

explorer

Maps the code that owns the failing UI flow. Identifies entry points, state transitions, and likely files before the worker starts editing.

browser_debugger

Reproduces the issue in the browser, captures screenshots, console output, and network evidence. Uses a Chrome DevTools MCP server. Does not edit application code.

worker

Owns the fix once the issue is reproduced. Makes the smallest defensible change. Validates only the behavior it changed.

agents/browser-debugger.toml

model = "gpt-5.3-codex"
model_reasoning_effort = "high"
sandbox_mode = "workspace-write"
developer_instructions = """
Reproduce the issue in the browser.
Capture exact steps and report what the UI actually does.
Use browser tooling for screenshots, console output, network evidence.
Do not edit application code.
"""

[mcp_servers.chrome_devtools]
url = "http://localhost:3000/mcp"
startup_timeout_sec = 20

The sequence matters here: browser_debugger and explorer run in parallel to gather evidence, then worker takes over once the failure mode is clear. Codex handles this coordination automatically.

Approvals and Sandboxing

Sub-agents inherit the parent session's sandbox policy. In interactive CLI sessions, approval requests can surface from inactive agent threads while you're looking at the main thread. The approval overlay shows the source thread label, and you can press o to open that thread before deciding.

In non-interactive flows (like codex exec), actions that need fresh approval fail and the error surfaces to the parent workflow. Live runtime overrides (such as --yolo or /approvals changes) propagate to child agents at spawn time, even if the role's config file specifies different defaults.

You can also restrict specific roles. An explorer that should never write files gets sandbox_mode = "read-only" in its role config, regardless of what the parent session allows.

Codex vs Claude Code Multi-Agent

Both Codex and Claude Code Agent Teams solve the same problem: parallelizing work across multiple AI agents. The implementations differ significantly.

DimensionCodex CLIClaude Code Agent Teams
Configurationconfig.toml with role filesSettings JSON flag + inline prompts
Per-agent modelYes, per-role config_fileNo, all teammates use same model
Inter-agent communicationResults return to parent onlyDirect messaging + shared task list
Agent coordinationParent orchestrates all workSelf-coordinating via task claims
CSV batch processingspawn_agents_on_csv built-inNo equivalent
MCP servers per roleYes, per-agent configShared from parent session
Nesting depthConfigurable (default: 1)Configurable (default: 1)
Sandbox controlPer-role overrideInherited from parent
Worktree isolationPer-agent thread (app)Per-teammate branch
Best forStructured audits, role-based reviewsCollaborative exploration, big features

Codex's strength is structured configuration. You get per-role model selection, dedicated MCP servers per agent, and CSV batch processing for repetitive audits. Claude Code's strength is emergent collaboration. Teammates message each other, claim tasks autonomously, and coordinate without the lead acting as bottleneck.

Choose Codex multi-agent when you know the roles and structure in advance. Choose Claude Code agent teams when the problem requires agents to discover and divide work themselves.

Where Morph Fits

Multi-agent workflows amplify both the speed and cost of every tool the agents call. When six agents each do file rewrites through a frontier model, you pay frontier token prices six times. When each agent searches the codebase sequentially, the wall-clock time multiplies.

Two Morph services address these bottlenecks directly:

WarpGrep for Explorer Agents

WarpGrep runs parallel sub-agent searches across your codebase, returning precise results in under 6 seconds. Instead of each Codex explorer scanning files sequentially, connect WarpGrep as an MCP server and let it handle the search layer. 8 parallel tool calls per turn, 4 turns deep.

Fast Apply for Worker Agents

Every file edit through a worker agent costs 3,500-4,500 tokens at frontier prices if you do full-file rewrites. Fast Apply merges edits at 10,500 tok/s for $0.80/M input tokens. A 500-line file merge takes 0.8 seconds instead of 8-10 seconds through GPT-5.3.

Adding WarpGrep as an MCP server to a Codex explorer role:

agents/explorer-with-warpgrep.toml

model = "gpt-5.3-codex-spark"
sandbox_mode = "read-only"
developer_instructions = """
Use WarpGrep for all codebase searches.
Prefer semantic queries over exact string matching.
"""

[mcp_servers.warpgrep]
url = "https://mcp.morphllm.com/warpgrep"

Speed Up Multi-Agent Workflows

WarpGrep and Fast Apply reduce token costs and latency across every agent in your team. Try them on a real codebase.

Limitations

  • Experimental only. Requires explicit opt-in via config flag. The API and config schema may change.
  • CLI-only visibility. Multi-agent activity is not yet visible in the Codex desktop app or IDE extensions.
  • No inter-agent messaging. Sub-agents can only return results to the parent. They cannot message each other directly (unlike Claude Code agent teams).
  • Nesting defaults to depth 1. A child agent cannot spawn further sub-agents by default. Increasing max_depth increases cost and complexity.
  • Non-interactive approval failures. In codex exec or batch jobs, actions needing approval fail silently and surface errors to the parent.
  • Token costs scale linearly. Each agent maintains its own context window. A 6-agent review costs roughly 6x a single-agent review in tokens.
  • CSV batch requires structured output. Workers must call report_agent_job_result exactly once. Workers that forget this call get marked as failed.

FAQ

How do I enable Codex CLI multi-agent?

Add multi_agent = true under [features] in ~/.codex/config.toml and restart Codex. Or use /experimental in the CLI to toggle it interactively.

What are the four built-in agent roles?

default (general-purpose fallback), worker (implementation and fixes), explorer (read-only codebase exploration), and monitor (long-running command/task monitoring with up to 1-hour polling windows).

Can I use different models per agent?

Yes. Each role's config_file can set model and model_reasoning_effort. A common pattern: gpt-5.3-codex-spark for explorers (fast, cheap) and gpt-5.3-codex for reviewers (thorough).

What is spawn_agents_on_csv?

A tool for batch fan-out. It reads a CSV, spawns one worker per row with a templated instruction, waits for all workers, and exports results to a new CSV. Each worker must call report_agent_job_result exactly once. Failed workers are flagged in the output.

How does this compare to Claude Code agent teams?

Codex multi-agent is config-driven with per-role models, MCP servers, and CSV batching. Claude Code agent teams are collaboration-driven with shared task lists and direct inter-agent messaging. Codex is better for structured, role-based workflows. Claude Code is better for emergent, self-coordinating work. See the full comparison.

Do sub-agents inherit my sandbox settings?

Yes. Sub-agents inherit the parent's sandbox policy and live runtime overrides (including --yolo). You can also restrict roles individually, for example marking an explorer as sandbox_mode = "read-only".

Can I connect MCP servers to specific agent roles?

Yes. Add [mcp_servers.<name>] in a role's config file. The MCP server is available only to agents spawned with that role. Useful for giving explorers access to WarpGrep, or giving docs researchers access to documentation APIs.