AI Agent Workflows: 5 Patterns That Ship Code, and the Anti-Patterns That Burn Tokens

The 5 agent workflow patterns that actually work in production coding systems: plan-execute, iterative refinement, parallel fan-out, human-in-the-loop, and multi-agent pipeline. With cost analysis, real examples from Cursor and Devin, and the anti-patterns that waste thousands in tokens.

April 5, 2026 ยท 3 min read

Every coding agent runs on a workflow pattern, whether the team designed it or not. The pattern determines cost, reliability, and speed more than model choice does. Cognition measured that agents spend 60% of execution time on context retrieval alone. Anthropic found multi-agent architectures outperform single agents by 90% on parallelizable tasks. SWE-bench token analysis shows 10x variance in cost across runs of the same task. This page covers the 5 workflow patterns that survive production, the anti-patterns that drain budgets, and the cost math behind each.

Why the Workflow Matters More Than the Model

Three runs of Claude Opus 4.5 on SWE-Bench Pro with different agent scaffolding scored 50.2%, 53.1%, and 55.4%. Same model, same problems, same hardware. The 5-point spread came entirely from how the agent manages context and tool calls. The workflow is the variable.

A workflow defines four things: when the agent plans vs. acts, what goes into and out of the context window, how many agents run and whether they run in parallel, and where humans intervene. Get these wrong and you pay for it in tokens. Get them right and the model does more with less.

60%
Agent time spent on context retrieval
90%
Multi-agent improvement over single agent
10x
Token variance across identical tasks
15x
Multi-agent token cost vs. single chat

The tension in every workflow decision is the same: more structure means more reliable output but higher token cost. Less structure means cheaper individual runs but more frequent failures that require re-runs. The five patterns below sit at different points on that curve.

Pattern 1: Plan-Execute

Plan-execute separates thinking from doing. Instead of the ReAct loop where the agent reasons and acts on every step, a dedicated planner generates the full task breakdown upfront. An executor handles each step with tools. A replanner revises the remaining steps after each execution based on what actually happened.

When to use plan-execute

Multi-step tasks with dependencies between steps. Refactoring that touches a schema, API routes, and frontend components. Any task where executing step 3 before step 1 would produce garbage.

LangGraph's implementation makes the architecture concrete. Three nodes in a graph: a Planner that takes the user objective and outputs an ordered list of steps, an Executor that picks the next step and runs it with tool access, and a Replanner that receives the accumulated results and either revises the remaining plan or produces the final response.

Plan-execute state machine (LangGraph)

from langgraph.graph import StateGraph, END
from typing import TypedDict

class PlanExecuteState(TypedDict):
    objective: str
    plan: list[str]
    past_steps: list[tuple[str, str]]  # (step, result)
    response: str

graph = StateGraph(PlanExecuteState)
graph.add_node("planner", plan_step)
graph.add_node("executor", execute_step)
graph.add_node("replanner", replan_step)

graph.set_entry_point("planner")
graph.add_edge("planner", "executor")
graph.add_edge("executor", "replanner")
graph.add_conditional_edges(
    "replanner",
    should_continue,  # returns "executor" or END
    {"executor": "executor", END: END}
)

The key advantage over ReAct: the planner catches dependency errors before execution begins. A ReAct agent attempting to modify a database query that depends on a schema migration will discover the dependency only after the query edit fails. The plan-execute agent sequences them correctly from the start.

The key cost: an extra LLM call for planning and replanning at each cycle. For tasks with fewer than 3 steps, pure ReAct is cheaper. For tasks with 5+ steps, plan-execute saves tokens overall by avoiding the cascading retries that ReAct generates when it hits dependency errors mid-execution.

Planner

Takes the objective and current state. Outputs an ordered list of concrete steps. Uses a larger model (Opus/GPT-5) for reasoning quality.

Executor

Picks the next step. Calls tools (file edit, shell, search). Can use a cheaper, faster model (Sonnet/Haiku) since the reasoning already happened.

Replanner

Reviews accumulated results against remaining steps. Adjusts the plan, adds steps, or removes steps that are no longer needed. Catches when reality diverges from the original plan.

Real-world example: Jules (Google) implements plan-execute as its core workflow. Before writing any code, it generates a plan and presents it to the developer for approval. Only after approval does the executor begin. The plan doubles as both a dependency graph and an audit trail.

Pattern 2: Iterative Refinement

Iterative refinement has the agent review its own output and revise it. The simplest version: generate code, run tests, read failures, fix, repeat. The sophisticated version: a separate critic model evaluates the output against explicit criteria before the agent revises.

This is the pattern behind every "agentic loop" in coding tools. Cursor's agent mode runs edits, checks terminal output, and iterates. Claude Code generates, runs lint and tests, then fixes. The underlying pattern is always the same: generate, evaluate, revise.

Iterative refinement loop

function iterativeRefinement(task: string, maxIterations = 5) {
  let output = generate(task);

  for (let i = 0; i < maxIterations; i++) {
    const evaluation = evaluate(output); // run tests, lint, type-check
    if (evaluation.passes) return output;

    output = revise(task, output, evaluation.feedback);
  }

  return output; // return best effort after max iterations
}

The critical insight from the "tests-first agent loop" research: writing tests before implementation cuts thrash by 50%. When the agent has a concrete pass/fail signal from the start, it converges faster because the evaluation function is unambiguous. Without tests, the agent evaluates against the vague instruction and often "refines" in the wrong direction.

Self-Refine Pattern

Single model generates and critiques. Cheaper but prone to blind spots. If the model misunderstands the requirement, it will confidently refine toward the wrong goal.

Reflection Pattern

Separate evaluator (tests, lint, a different model) provides external feedback. More expensive per iteration but converges faster because feedback comes from a different source than the generation.

Cost reality

Each iteration reprocesses the full context. A 3-iteration loop on a task with 50K tokens of context costs 150K input tokens, not 50K. Context accumulation is the hidden multiplier. Keeping context lean between iterations is the difference between a $0.30 task and a $3.00 task.

When to use it: Code generation where correctness is verifiable (tests exist or can be written). Bug fixes where reproducing the bug provides a clear signal. Any task where "run it and check" gives a concrete pass/fail.

When not to use it: Open-ended design tasks where "better" is subjective. Greenfield code with no tests. Tasks where each iteration adds to the context window without converging. In these cases, the agent will loop to the max iteration count and produce mediocre output at maximum cost.

Pattern 3: Parallel Fan-Out

Parallel fan-out runs multiple agents on independent subtasks simultaneously, then merges results. This is the pattern that converts wall-clock time into token spend. You pay more total tokens but get the work done faster.

Cognition's data explains why this matters for coding agents specifically. If a single agent spends 60% of its time on context retrieval, three parallel agents each searching a different part of the codebase eliminate redundant serial searching. The search work happens once per agent instead of accumulating in a single agent's context window.

Claude Code Agent Teams

3-5 agents in separate git worktrees. File-system isolation prevents conflicts. JSONL task files with dependency tracking. Each agent gets its own context window.

Cursor Background Agents

Up to 8 concurrent agents in cloud sandboxes. Event-driven triggers: GitHub PRs, Slack messages, Linear issues. Independent CI pipelines per agent.

Codex CLI Multi-Agent

Up to 6 concurrent threads. Sub-agents inherit sandbox policies. State in SQLite. spawn_agents_on_csv parallelizes batch work with max_concurrency controls.

Fan-out / fan-in execution

async function parallelFanOut(subtasks: Task[]) {
  // Fan-out: run independent tasks in parallel
  const results = await Promise.all(
    subtasks.map(task =>
      spawnAgent({
        task,
        worktree: createGitWorktree(task.id),
        context: getRelevantContext(task),
      })
    )
  );

  // Fan-in: merge results back to main branch
  for (const result of results) {
    if (result.status === "completed") {
      await mergeWorktree(result.worktree, "main");
    } else {
      await retryOrEscalate(result);
    }
  }
}

A content workflow benchmark measured 36% speedup (6:10 down to 3:56) with parallel execution. The M1-Parallel framework reported 1.8x speedup with early stopping and 2.2x with early termination while preserving accuracy. These numbers are consistent: parallel execution scales sub-linearly because coordination overhead eats into the theoretical speedup.

The independence requirement

Parallel fan-out only works when subtasks are genuinely independent. Agent A refactoring the auth module cannot safely run in parallel with Agent B modifying the auth middleware. Git worktrees provide file-system isolation, but logical dependencies still cause merge conflicts. The planner must identify independent work units before fanning out.

Pattern 4: Human-in-the-Loop

Human-in-the-loop (HITL) inserts approval gates into the agent workflow. The agent proposes an action. Execution pauses. A human approves, rejects, or modifies. Then execution resumes.

This is not just "ask before doing dangerous things." The production-grade version uses confidence-based routing where the agent scores its own confidence on each action. High-confidence actions (renaming a variable, adding an import) proceed automatically. Low-confidence actions (deleting a file, modifying a production config) enter an approval queue. The confidence threshold is tunable per team and per action type.

Confidence-based routing

interface AgentAction {
  type: "file_edit" | "file_delete" | "shell_command" | "git_push";
  confidence: number;  // 0.0 - 1.0
  description: string;
  payload: unknown;
}

function routeAction(action: AgentAction, thresholds: Thresholds) {
  if (action.confidence >= thresholds.auto_approve) {
    return executeImmediately(action);
  }
  if (action.confidence >= thresholds.warn) {
    return executeWithNotification(action);
  }
  return queueForApproval(action);  // agent pauses here
}

Claude Code's permission system is a HITL implementation. By default, file reads are auto-approved, file writes require confirmation, and shell commands require confirmation. The --dangerously-skip-permissions flag removes all gates, and the allowedTools config in .claude/settings.json provides fine-grained control over which actions bypass approval.

Cursor's plan-mode works differently. The agent generates a complete plan and presents it for approval before executing anything. This is HITL at the plan level rather than the action level. Jules (Google) takes the same approach: plan first, human approves, then autonomous execution.

StyleApproval PointTrade-off
Action-level gatesBefore each risky actionMaximum safety, most interruptions. Good for production systems.
Plan-level approvalBefore execution beginsOne approval per task. Agent may deviate from approved plan during execution.
Confidence routingAutomatic based on confidence scoreScales well. Requires calibrated confidence estimates. Most agents over-estimate confidence.
Checkpoint-basedAt defined milestones (e.g., after schema change, before deploy)Low interruption frequency. Risk between checkpoints.

The cost of HITL is latency, not tokens. The agent pauses until a human responds. For async workflows (Cursor background agents, GitHub Copilot coding agent), this is fine. For interactive coding sessions, each approval gate adds 5-30 seconds of human attention. The design question is always: what is the cost of a wrong action vs. the cost of waiting for approval?

Pattern 5: Multi-Agent Pipeline

Multi-agent pipelines use an orchestrator that decomposes a task, assigns subtasks to specialized worker agents, and aggregates their results. The difference from parallel fan-out: pipelines handle dependencies between stages and use an evaluator (judge) to decide whether output is acceptable.

The planner-worker-judge architecture has converged as the standard across tools. Claude Code's team lead decomposes tasks and manages dependencies. Workers (teammates) claim tasks, execute in isolated worktrees, and push changes. A judge evaluates output quality through hooks like TeammateIdle and TaskCompleted.

Orchestrator-worker-judge pipeline

// Orchestrator decomposes the task
const plan = await orchestrator.decompose(task);
// => [
//   { id: "schema", desc: "Add user_preferences table", deps: [] },
//   { id: "api", desc: "Create /preferences endpoint", deps: ["schema"] },
//   { id: "ui", desc: "Build preferences panel", deps: ["api"] },
//   { id: "tests", desc: "Write integration tests", deps: ["api"] },
// ]

// Workers execute in dependency order, parallelizing where possible
const results = await pipeline.execute(plan, {
  maxConcurrency: 4,
  workerModel: "claude-sonnet-4",
  isolation: "git-worktree",
});

// Judge evaluates each result
for (const result of results) {
  const verdict = await judge.evaluate(result, plan);
  if (verdict === "reject") {
    await pipeline.retry(result, { feedback: verdict.feedback });
  }
}

Anthropic's internal research evaluation is the clearest benchmark for this pattern. A lead agent (Opus 4) coordinating 3-5 parallel subagents (Sonnet 4) outperformed a single Opus 4 agent by 90.2% on research tasks. The improvement came from spreading reasoning across multiple independent context windows, not from using more total compute. Each subagent gets a fresh context focused on its specific subtask, avoiding the context pollution that degrades single-agent performance on complex tasks.

Context Isolation Wins

Each worker gets a clean context window focused on its subtask. No accumulated noise from other workers' search results, failed attempts, or irrelevant code. This is why three focused agents outperform one generalist agent working three times as long.

Lightweight References

Workers store outputs in external state and pass short references back to the orchestrator. A 200KB code diff becomes a 100-byte pointer. This prevents the orchestrator's context window from filling with worker artifacts.

Multi-model routing

The pipeline pattern naturally supports multi-model routing. Use a large model (Opus, GPT-5) for the orchestrator where reasoning quality matters. Use a fast, cheaper model (Sonnet, Haiku) for workers where tool execution speed matters. Use a code-specialized model for the judge where evaluation accuracy matters. OpenCode demonstrated this by mixing GPT-5.3 Codex, Gemini 2.5 Pro, and Claude Sonnet 4 in a single team.

Cost Analysis by Pattern

Token cost is the operational constraint. Every workflow pattern trades tokens for some benefit: reliability, speed, correctness. Understanding the cost profile of each pattern prevents surprises when the monthly bill arrives.

PatternTokens per TaskWhen the Cost Is Justified
Single-pass (no workflow)5K-15KSimple, well-defined tasks. File renames, import additions, config changes.
Plan-execute20K-80KMulti-step tasks with dependencies. The planning overhead pays for itself by avoiding cascading retry loops.
Iterative refinement50K-200KTasks with verifiable correctness. Each iteration reprocesses context, so lean context is critical.
Parallel fan-out100K-500K total (spread across agents)Large independent workloads. Total tokens increase but wall-clock time decreases 36-55%.
Multi-agent pipeline200K-1M+Complex tasks requiring specialization. Anthropic's research eval showed 90% improvement but 15x token cost.

The cost trap is pattern mismatch. Running a multi-agent pipeline on a task that single-pass could handle wastes 10-100x tokens. Running single-pass on a complex task that needs iterative refinement wastes tokens on retries when the first attempt fails. The cheapest workflow is the simplest pattern that achieves the required reliability for the specific task.

The context accumulation problem

A session that starts at 5K tokens per call can reach 200K tokens per call by the end due to conversation history accumulation. A single complex debugging session with a frontier model can consume 500K+ tokens. Every workflow pattern that involves iteration or multi-turn conversation must account for this growth, either through context compression, context windowing, or fresh-context restarts per step.

The Anti-Patterns That Burn Tokens

Anti-patterns are not theoretical. They are the failure modes that teams discover after a $500 overnight run that produced nothing usable. Each corresponds to a missing constraint in the workflow.

Infinite Loops

Agent retries the same failing approach without changing strategy. Without circuit breakers, agents consume thousands of tokens on identical failing attempts. Some run all night, terminating only at a budget cap. Fix: dual-threshold circuit breakers. Warning at 80% budget, hard stop at 100%.

Context Bloat

Full API responses (40KB) dumped into context when 120 bytes of data are needed. Full file reads when one function is relevant. 50+ MCP tool definitions consuming 72K tokens before any work starts. Fix: memory pointers that store large outputs externally and pass short references to the LLM.

Retry Without Strategy Change

Agent gets an error, retries identically, gets the same error. Each retry adds the error message to context, making the next attempt worse because the model is now confused by the accumulated failure history. Fix: require the agent to explain what it will change before retrying.

Over-Decomposition

Breaking a 10-line change into 8 subtasks with an orchestrator, 3 workers, and a judge. The coordination overhead exceeds the actual work. Fix: start with the simplest pattern. Escalate to multi-agent only when single-agent fails or when wall-clock time is the binding constraint.

The Deadlock Anti-Pattern

In multi-agent systems, Agent A waits for Agent B's output. Agent B waits for Agent A's output. Both consume tokens polling for updates that never arrive. This is the classic computer science deadlock, but with tokens burning at every polling step. At 10K tokens per context per poll, two agents polling each other once per minute burn $3/minute. Across 50 concurrent threads, that is $9,000/hour.

The Context Pollution Spiral

Earlier mistakes and failed attempts stay in the context window. The agent sees its own wrong answers alongside the correct information. As context fills with noise, hallucination rate increases. More hallucinations mean more failed attempts added to context. Cognition's research found that polluting the main agent's context with search results was more detrimental than leaving some context out entirely. Their solution: delegate retrieval to a specialized subagent (SWE-grep) that uses its own context window, and pass only the relevant results to the main agent.

Circuit breaker implementation

class CircuitBreaker {
  private attempts = 0;
  private tokensUsed = 0;

  constructor(
    private maxAttempts: number = 5,
    private warningTokens: number = 100_000,
    private maxTokens: number = 200_000,
  ) {}

  shouldContinue(lastResult: AgentResult): Decision {
    this.attempts++;
    this.tokensUsed += lastResult.tokensConsumed;

    if (this.tokensUsed >= this.maxTokens) return "hard_stop";
    if (this.attempts >= this.maxAttempts) return "hard_stop";
    if (this.tokensUsed >= this.warningTokens) return "deliver_now";
    if (lastResult.error === this.lastError) return "change_strategy";

    return "continue";
  }
}

Choosing the Right Pattern

Pattern selection is a function of three variables: task complexity (number of steps and dependencies), verifiability (can you run a test?), and parallelizability (are subtasks independent?).

Task TypeRecommended PatternWhy
Simple, well-defined editSingle-pass (no workflow)Adding overhead to a 5K-token task wastes more than it saves.
Multi-step with dependenciesPlan-executeDependency graph prevents cascading retries. Planning overhead pays for itself at 5+ steps.
Code with test suiteIterative refinementTests provide an unambiguous evaluation signal. Write tests first, implement second.
Large independent workloadParallel fan-outSeparate modules, separate files, no shared state. Convert time into tokens.
Production system changesHuman-in-the-loopCost of a wrong action (downtime, data loss) exceeds cost of waiting for approval.
Complex feature across system boundariesMulti-agent pipelineSpecialized workers with isolated contexts outperform one agent juggling everything.

The patterns compose. A multi-agent pipeline where each worker uses iterative refinement internally. A plan-execute workflow with human-in-the-loop gates at high-risk steps. Parallel fan-out where each branch runs plan-execute. Start with the simplest pattern that works, then layer on complexity only when you have evidence that the simpler pattern fails.

Infrastructure Each Pattern Needs

Every workflow pattern calls the same set of execution primitives. The pattern defines when and how many times each primitive is called. The infrastructure provides the primitives themselves.

PrimitiveWhat It DoesWhich Patterns Need It
Sandboxed executionRun agent code in isolation. Prevent file conflicts between parallel agents. Container or worktree.Parallel fan-out, multi-agent pipeline
Fast applyMerge LLM-generated diffs into source files. Must handle partial edits, fuzzy matching, and concurrent writes.All patterns (every agent that edits code)
Semantic searchFind relevant code without reading entire files. Reduce the 60% context retrieval overhead.All patterns, critical for iterative refinement and plan-execute
Context compressionReduce token count between iterations. Summarize or drop irrelevant history.Iterative refinement, multi-agent pipeline
Circuit breakersStop runaway loops. Budget caps, attempt limits, strategy-change requirements.Iterative refinement, multi-agent pipeline (any pattern with loops)

Morph's Fast Apply model handles the apply primitive at 10,500+ tokens per second with deterministic merge behavior. Every workflow pattern generates code edits. The quality of the apply step determines whether those edits land cleanly or produce merge conflicts that trigger more retries, more tokens, and more cost. A fast, reliable apply layer reduces the iteration count across all five patterns.

WarpGrep addresses the search primitive. Cognition built SWE-grep specifically because context retrieval consumed 60% of agent execution time. WarpGrep takes the same approach: 8 parallel tool calls per turn across up to 4 turns, specialized for codebase search, keeping the main agent's context clean.

Frequently Asked Questions

What are AI agent workflows?

Execution patterns that structure how coding agents plan, execute, and verify their work. The five production patterns are plan-execute, iterative refinement, parallel fan-out, human-in-the-loop, and multi-agent pipeline.

What is the plan-execute agent pattern?

A workflow that separates planning from execution. A planner generates a structured task list. An executor handles one step at a time. A replanner adjusts remaining steps based on results. Avoids the cascading retries that ReAct-style agents generate when they hit dependency errors mid-execution.

How much do agent workflows cost in tokens?

Single-pass: 5K-15K tokens. Plan-execute: 20K-80K. Iterative refinement: 50K-200K. Parallel fan-out: 100K-500K total. Multi-agent pipeline: 200K-1M+. Anthropic reports multi-agent systems use approximately 15x more tokens than standard chat.

What is the most common agent anti-pattern?

The infinite loop. Agents retry failing approaches without changing strategy, consuming thousands of tokens on identical failures. The fix is circuit breakers: a warning threshold at 80% budget and a hard stop at 100%.

When should I use parallel fan-out?

When subtasks are genuinely independent: separate files, separate modules, no shared state. Benchmarks show 36% speedup for content workflows. Do not parallelize tasks with dependencies. Git worktrees provide file-system isolation, but logical dependencies still cause merge conflicts.

What is human-in-the-loop in agent workflows?

Approval gates at defined checkpoints. The scalable version uses confidence-based routing: high-confidence actions proceed automatically, low-confidence actions queue for human review. Claude Code's permission system and Cursor's plan-approval flow implement this pattern.

What causes context bloat in agent workflows?

Full tool outputs dumped into context (40KB JSON when 120 bytes are needed), full file reads when one function is relevant, and accumulated failure history. With 50+ MCP tools, definitions alone consume 72K tokens before work begins. The fix: external storage with short reference pointers passed to the LLM.

How do multi-agent coding pipelines work?

An orchestrator decomposes the task and spawns specialized workers, each in an isolated context (git worktree). Workers execute, store results externally, and pass lightweight references back. A judge evaluates output quality. Anthropic's research system outperformed single-agent Opus 4 by 90.2% using this pattern.

The Infrastructure Behind Every Agent Workflow

Every workflow pattern calls the same primitives: search, apply, execute. Morph's Fast Apply handles LLM-generated edits at 10,500+ tokens per second. WarpGrep handles codebase search with 8 parallel tool calls per turn. The reliability layer between your agent workflow and your repository.