Context Engineering: Why More Tokens Makes Agents Worse

Context engineering is the skill that separates developers who get 10x value from AI coding agents from those who get 2x. Martin Fowler published on it. Anthropic formalized the patterns. And a clear "1M token wall" means you cannot just dump everything into a large context window and hope for the best.

5.5x

Fewer tokens (Claude Code vs Cursor)

~1M

Token wall (performance ceiling)

95%

Context reduction via lazy loading

32%

Orgs cite quality as #1 barrier

What Is Context Engineering?

Context engineering is the discipline of designing systems that give AI models access to the right information at the right time. It encompasses everything the model sees: system prompts, conversation history, retrieved documents, tool outputs, memory, and structured data.

Martin Fowler's article on context engineering for coding agents made it a top Hacker News thread with one core insight: "Context is the bottleneck for coding agents now." Bigger windows do not automatically help. Including irrelevant data actively worsens hallucinations.

Anthropic's best practices formalize this: good context engineering means finding the smallest possible set of high-signal tokens that maximize the likelihood of desired outcomes. Not the most tokens. The right tokens.

Context Engineering vs. Prompt Engineering

Prompt engineering is a subset of context engineering. Prompt engineering asks "how should I phrase this?" Context engineering asks "what information does the model need access to right now?"

Prompt Engineering vs. Context Engineering

Dimension	Prompt Engineering	Context Engineering
Focus	How to phrase the input	What information the model sees
Scope	Single interaction	Entire session / multi-turn
Output quality	First output is good	Thousandth output is still good
Artifacts	Prompt templates	CLAUDE.md, .claudeignore, subagents, RAG
Cost impact	Minimal	5.5x fewer tokens = hundreds saved/month
Failure mode	Bad phrasing	Wrong or missing information

The shift happened because modern AI applications are long-running agents, not single-turn chats. A well-crafted prompt means nothing if the agent cannot access the right files, remember what it already tried, or use the right tools at the right moment.

The 1M Token Wall

SWE-rebench maintainer @Shevan05 reported a critical finding: models hit a clear performance ceiling around 1 million tokens. Performance degrades meaningfully past this point regardless of what the context window technically supports.

This means context engineering is not optional -- it is the only way to work with large codebases. You cannot dump a 500-file repository into context and expect good results. Even if the window fits it, the model's ability to attend to the right information drops as noise increases.

The cost angle

Every wasted token is wasted money. Claude Code uses 5.5x fewer tokens than Cursor for equivalent tasks, partly because of better context management. Over a month of heavy use, this difference saves hundreds of dollars. Context engineering is not just about quality -- it directly reduces your API bill.

The CLAUDE.md Playbook

The CLAUDE.md file is the single most important context engineering artifact. The community now treats it as essential infrastructure, not optional configuration. A well-written CLAUDE.md is the #1 context engineering move you can make.

Production CLAUDE.md template

# Project
Next.js 15, TypeScript, Tailwind, Drizzle ORM

# Architecture
- Server Components by default
- Server Actions for mutations in actions.ts
- API routes under /api
- All DB operations through Drizzle ORM

# Commands
bun run dev          # Dev server (port 3002)
bun run build        # Production build
bun run typecheck    # Type checking (run before commits)
bun run db:push      # Push schema changes

# Conventions
- Use bun, not npm
- Prefer editing existing files over creating new ones
- Run `bun run lint` before committing
- Use absolute imports (@/lib, @/components)
- Never commit .env files

# Key Paths
src/app/api/         # API routes
src/lib/db/schema.ts # Database schema (source of truth)
src/components/ui/   # Shared UI components

# Gotchas
- Dev server runs on port 3002, not 3000
- Stripe webhooks need STRIPE_WEBHOOK_SECRET
- Clerk middleware protects /dashboard/* routes
- PostHog proxied through /ingest/* (not direct)

CLAUDE.md Best Practices

Conciseness over completeness. It loads into every session. Every line must be universally applicable. If something only matters for one module, put it in a subdirectory CLAUDE.md instead.
Never send an LLM to do a linter's job. Code style guidelines add mostly-irrelevant context that degrades performance. Use linters and formatters, then tell the CLAUDE.md to run them.
Include an examples/ folder. AI coding assistants perform dramatically better when they can see patterns to follow. Reference the folder from CLAUDE.md instead of writing verbose instructions.
Document gotchas only. Things the agent would not discover on its own: non-standard ports, environment variable formats, deployment quirks, API rate limits.

Hierarchical context files

Claude Code supports CLAUDE.md at three levels: root (project-wide), subdirectory (module-specific), and user-level (~/.claude/CLAUDE.md for personal preferences). The agent merges them automatically, with more specific files taking precedence. Use subdirectory files for module-specific context that should not pollute every session.

File Selection & .claudeignore

The .claudeignore pattern works like .gitignore but for AI context. It excludes files the agent should never need to read. Most developers do not use it, and their agents waste context on megabytes of irrelevant files.

.claudeignore example

# Build artifacts
dist/
.next/
build/
out/

# Dependencies (never read these)
node_modules/
.pnp.*

# Generated code
*.generated.ts
prisma/generated/

# Large binary files
*.woff2
*.png
*.jpg
*.mp4

# Lock files (use commands, not file reading)
bun.lockb
package-lock.json
yarn.lock

# Test snapshots (too large, low signal)
__snapshots__/

# Environment (security)
.env*
!.env.example

Strategic file selection is the second-highest impact context engineering move after CLAUDE.md. The agent should never be reading node_modules, build output, binary assets, or lock files. Excluding these alone can reduce context consumption by 80%+ on a typical project.

Just-in-Time Context & Lazy Loading

Just-in-time context is Anthropic's recommended strategy for long-running agents. Instead of loading everything upfront, agents maintain lightweight references and dynamically load data at runtime.

Before & after: context loading strategies

// BAD: Load everything upfront (wastes tokens, hits 1M wall)
const context = await loadEntireCodebase();
// Result: 800K tokens, most irrelevant, model confused

// GOOD: Just-in-time loading (surgical context)
// 1. Agent reads task: "Fix auth token refresh"
// 2. Agent searches for auth-related files (WarpGrep)
// 3. Loads src/lib/auth.ts (the relevant file)
// 4. Discovers import from db/schema.ts
// 5. Loads schema.ts on demand
// Result: 2 files in context, not 200
// Same task, dramatically better output

MCP Tool Search: 95% Context Reduction

Claude Code's tool lazy loading reduces context by 95% by not loading tool definitions until needed. Instead of every MCP tool definition consuming context from the start, the agent discovers and loads tools on demand. This is context engineering applied to the agent's own capabilities -- even tool descriptions are loaded just-in-time.

Subagent Context Isolation

Subagent context isolation is the most powerful context engineering pattern for large tasks. When you spawn a subagent in Claude Code, it gets its own context window with its own tool permissions. This is divide and conquer applied to context management.

Main Agent

Holds project-level context: CLAUDE.md, task plan, high-level progress. Never polluted with file-level details from specialized work.

Search Subagent

Gets codebase access and search tools. Reads many files, returns only the relevant results. Its context stays isolated from the main thread.

Apply Subagent

Gets exactly 3 pieces of context: instruction, code, update. Merges the edit and returns the result. No planning context, no irrelevant files.

Defining a subagent in .claude/agents/

# .claude/agents/code-reviewer.md
You are a code review specialist.

## Your context
- Only review the diff provided to you
- Check for: security issues, performance problems,
  missing error handling, type safety gaps
- Do NOT suggest style changes (linters handle that)

## Your tools
- Read files (to check surrounding context)
- Grep (to find related patterns)
- No write access (review only)

## Output format
Return a list of issues with severity (critical/warning/info),
file path, line number, and suggested fix.

The key insight: each subagent gets exactly the context it needs for its specific task and nothing else. The main conversation stays focused on orchestration. No context pollution between tasks.

Compaction & Long-Running Agents

Compaction takes a conversation nearing the context window limit, summarizes its contents, and restarts with the summary. Anthropic recommends combining this with two practices:

Git commits as checkpoints: Commit progress with descriptive messages so the agent can use git log and git diff to reconstruct state after compaction.
Progress files: Write summaries to a progress file that the agent reads after compaction to understand what was completed, what is in progress, and what is blocked.

Compaction-friendly workflow

# Agent commits progress as checkpoints
git commit -m "Refactored auth: extracted token refresh
into separate service, added retry with backoff"

# Agent writes progress summary
# .claude/progress.md
## Session 3 Progress
- DONE: Auth token refresh refactor
- DONE: Added retry logic with exponential backoff
- IN PROGRESS: Rate limiting middleware
- BLOCKED: Need REDIS_URL env var for rate limiter
- FILES MODIFIED: src/lib/auth.ts, src/services/token.ts

# After compaction, agent reads:
# 1. CLAUDE.md (project context)
# 2. .claude/progress.md (what happened)
# 3. git log --oneline -10 (recent commits)
# Result: full working state reconstructed

The goal: make compaction lossless in practice. The agent reconstructs everything it needs from git history, progress files, and the current codebase state.

Retrieval & Agentic RAG

RAG has evolved from static pipelines into context engines. Simply retrieving text snippets via vector search is not enough. Context has to be governed, explainable, and adaptive to the agent's purpose.

RAG Evolution

Generation	Approach	Context Quality
Static RAG (2023)	Vector search, top-k, generate	Noisy, irrelevant chunks
Advanced RAG (2024)	Reranking, query expansion, hybrid search	Better relevance, still static
Agentic RAG (2026)	Agent-driven retrieval with reflection	High-signal, adaptive, governed

Agentic RAG embeds autonomous agents into the retrieval pipeline. Instead of static "query, retrieve, generate," the agent decides what to retrieve, evaluates result quality, and iterates until context is sufficient. For coding agents, this means understanding the task, determining relevant code structures, searching across files and dependencies, and loading additional context only when needed.

Augment Code's Context Engine is the enterprise expression of this: it indexes entire stacks (code, dependencies, architecture, git history) and charges premium prices for it. The market values context engineering highly enough that companies pay enterprise rates for better context selection.

Before & After: Context Engineering in Practice

The difference between bad and good context engineering is dramatic. Same model, same task, completely different output quality.

BAD: No context engineering

# What most developers do:
"Here's my entire codebase. Fix the authentication bug."

# What happens:
# - Agent reads 200+ files (most irrelevant)
# - 800K tokens consumed
# - Model hallucinates file paths
# - Edits wrong files
# - Takes 3 minutes, fails, requires re-prompting
# - Cost: ~$4.80 in API tokens

Update

# Result: broken edit, wasted tokens, frustrated developer

GOOD: With context engineering

# 1. CLAUDE.md provides project structure (always loaded)
# 2. .claudeignore excludes node_modules, dist, etc.
# 3. Agent uses WarpGrep: "auth token refresh logic"
#    → Returns: src/lib/auth.ts, src/middleware.ts
# 4. Agent loads only those 2 files into context
# 5. Specific instruction with reproduction steps:

"Fix token refresh in src/lib/auth.ts.
The refresh token call at line 47 doesn't handle
expired refresh tokens. Add a catch that redirects
to /login when the refresh token itself is expired.
See the error in middleware.ts:23."

# 6. Apply step isolated in its own context via Morph
# Cost: ~$0.35 in API tokens

Update

# Result: correct edit, 14x cheaper, first attempt success

Context Engineering for the Apply Step

The apply step is a pure context engineering problem. An edit needs exactly three pieces of context: the original file, the edit intent, and the update snippet. Too little and the merge fails. Too much and the model gets confused by irrelevant code.

Morph's Fast Apply model is purpose-built for this. It takes instruction + code + update and returns the complete merged output at over 10,500 tokens per second. By isolating the merge in a specialized model, the coding agent's primary context stays clean for planning and reasoning.

WarpGrep handles the retrieval side: parallel semantic searches that return only high-signal results. Together, they implement the context engineering stack: retrieve the minimum viable context, then merge with surgical precision.

Frequently Asked Questions

What is context engineering?

Context engineering is designing systems that give AI models the right information at the right time. It encompasses system prompts, conversation history, retrieved documents, tool outputs, memory, and structured data -- the entire information environment, not just the prompt.

How is context engineering different from prompt engineering?

Prompt engineering is a subset. It focuses on how to phrase a single input. Context engineering focuses on what information the model has access to across sessions. Prompt engineering gets the first output right. Context engineering keeps the thousandth output right.

What is the 1M token wall?

SWE-rebench maintainer @Shevan05 found that models hit a clear performance ceiling around 1 million tokens. Performance degrades past this point regardless of window size. This makes context engineering mandatory for large codebases.

What is CLAUDE.md and why does it matter?

CLAUDE.md is the agent's constitution -- a project context file loaded into every Claude Code session. The community treats it as essential infrastructure, not optional config. It is the single highest-impact context engineering artifact.

What is .claudeignore?

Like .gitignore but for AI context. It excludes node_modules, build artifacts, binary files, and generated code. Most developers do not use it, wasting context on megabytes of irrelevant files. Simple to add, immediate impact.

What is subagent context isolation?

The most powerful pattern for large tasks. Each subagent gets its own context window with its own tool permissions. The main conversation stays clean while specialized agents handle isolated tasks with exactly the context they need.

How does context engineering reduce costs?

Every wasted token costs money. Claude Code uses 5.5x fewer tokens than Cursor for equivalent tasks, partly through better context management. Over a month of heavy use, this saves hundreds of dollars in API costs.

Context Engineering Infrastructure

Morph provides the specialized tools that make context engineering work: WarpGrep for high-precision code search and Fast Apply for deterministic merges at 10,500+ tokens per second. Keep your agent's context clean.

Try WarpGrep

View Docs

GLM-5.2

Qwen

MiniMax

DeepSeek

Reflex

Fast Apply

WarpGrep

Compact

Model Router

Blog

Startup Credits

Contact Us

About

Careers