Agentic Context Engineering: How AI Agents Manage Their Own Context

Context engineering is the discipline of filling an AI model's context window with the right information. Agentic context engineering is what happens when the agent must do this itself, autonomously, across hundreds of tool calls and millions of tokens of intermediate state, with no human curating what goes in.

90.2%

Multi-agent gain over single-agent (Anthropic)

Failure rate when task duration doubles

100:1

Input-to-output token ratio (Manus)

95%

Context reduction via lazy tool loading

What Is Agentic Context Engineering?

Andrej Karpathy defined context engineering as "the delicate art and science of filling the context window with just the right information for the next step." In a chatbot, a human does this by writing a good prompt and attaching relevant files. In an agentic system, the agent must do it for itself.

Agentic context engineering is the set of strategies an AI agent uses to curate and maintain its own context during autonomous task execution. This includes deciding what to retrieve, when to compress, which tasks to delegate to subagents, and what to discard from the context window entirely.

The distinction matters because agents face problems humans never do. A human using Claude or ChatGPT can paste in exactly the files they want analyzed. An autonomous agent processing a 500-file codebase must figure out which 3 files are relevant. A human can restart a conversation when it gets unwieldy. An agent running a 25-hour task needs to manage its own context window continuously, without supervision.

Agentic vs. general context engineering

General context engineering covers the full discipline: CLAUDE.md files, .claudeignore rules, prompt design, RAG pipelines, and manual curation. Agentic context engineering is the subset that happens during autonomous execution, where the agent itself makes all context management decisions. This page focuses on the agentic side. For the general discipline, see the complete context engineering guide.

Context Rot: The Core Problem Agents Face

Context rot is the performance degradation that occurs as an agent's context window fills during long-running tasks. It is the central problem that agentic context engineering exists to solve.

Chroma Research tested 18 frontier models and found a universal pattern: every model gets worse as input length increases. Models advertising 200K-token windows become unreliable well before that limit. The mechanisms compound each other:

Lost-in-the-Middle

Performance drops over 30% when relevant information sits in the middle of the context rather than at the beginning or end. As agent sessions grow, critical early decisions get buried.

Attention Dilution

Transformer attention is quadratic. At 10K tokens: 100M pairwise relationships. At 100K tokens: 10B. More context does not just dilute relevance. It makes the model physically worse at attending.

Observation Accumulation

Each tool call adds thousands of tokens. Processing 5-7 operations generates 5K+ tokens of results. Without management, input tokens hit 100K within minutes of autonomous work.

Research on long-running agents shows that every agent's success rate decreases after 35 minutes of continuous operation. Doubling task duration quadruples the failure rate. This is not a model quality problem. It is a context management problem. The agents that solve it outperform those that do not, regardless of which underlying model they use.

Automated Compaction: Fighting Context Rot at Scale

Compaction is the practice of summarizing a conversation nearing the context limit and restarting with the summary. In agentic systems, this must happen automatically. No human is watching the token counter.

Agent	Trigger	Strategy
Claude Code	~95% context capacity	Hierarchical summarization, git checkpoints, progress files
OpenAI Codex	Token threshold exceeded	Replace input with smaller representative list
Manus	Workflow phase boundaries	Aggressive tool output pruning, rolling summaries
Devin	Context-aware (model senses limit)	Subagent offloading, LLM-driven history compression

Factory AI evaluated compression strategies across real-world agent sessions spanning debugging, code review, and feature implementation. Their finding: structured summarization retains more useful information than generic approaches. Structured summaries use explicit sections for session intent, file modifications, decisions made, and next steps. This format gives the agent a clear map of what happened, not a narrative blob it has to parse.

Structured compaction output (what the agent sees after compaction)

## Session State After Compaction
### Intent
Migrate authentication from JWT to session-based auth

### Completed
- Removed JWT middleware from src/middleware.ts
- Added session store using Redis (src/lib/session.ts)
- Updated 4 API routes to use session validation
- Committed: "Replace JWT auth with Redis sessions" (a3f8c21)

### In Progress
- Rate limiting middleware needs session-aware logic

### Key Decisions
- Chose Redis over Postgres for sessions (latency)
- Kept JWT for API-key auth (backward compat)

### Files Modified
src/middleware.ts, src/lib/session.ts, src/lib/auth.ts,
src/app/api/chat/route.ts, src/app/api/usage/route.ts

Manus identified KV-cache hit rate as the critical metric for production agents. Their input-to-output token ratio averages 100:1. When compaction invalidates the KV-cache, all subsequent inference steps slow down. This is why Manus avoids dynamically adding or removing tools mid-iteration: tool definitions live near the front of context, and any change invalidates the cache for all following actions.

Morph Compact for agent context

Morph Compact provides context compression as infrastructure. Instead of building custom compaction logic, agents call Compact to reduce their context to the minimum viable token set. It preserves file references, decision history, and task state at over 10,500 tokens per second, fast enough to run inline without adding latency to the agent loop.

Subagent Delegation: Divide and Conquer for Context

Subagent delegation is the most effective technique for keeping an agent's primary context clean. Instead of one agent that searches, plans, writes code, runs tests, and reviews, all in the same context window, specialized subagents handle isolated tasks in their own context windows.

Anthropic's multi-agent research system uses an orchestrator-worker pattern. The lead agent decomposes queries into subtasks and spawns subagents to explore different aspects simultaneously. Each subagent gets its own context window, custom system prompt, specific tool access, and independent permissions. The result: 90.2% improvement over single-agent performance on research evaluations.

Context flow in a multi-agent coding task

// Orchestrator agent (clean context: ~8K tokens)
// Task: "Add rate limiting to the API"

// Step 1: Spawn retrieval subagent
//   Context: task description + codebase access
//   Action: searches for API routes, middleware patterns
//   Returns: 3 relevant file paths + key code snippets
//   Subagent context: discarded (not in orchestrator)

// Step 2: Spawn planning subagent
//   Context: task + file snippets from Step 1
//   Action: produces implementation plan
//   Returns: ordered list of changes
//   Subagent context: discarded

// Step 3: Orchestrator executes changes
//   Context: plan + only files being edited
//   Each file edit isolated via Morph Fast Apply

// Step 4: Spawn test subagent
//   Context: changed files + test framework config
//   Action: writes and runs tests
//   Returns: pass/fail + coverage report

// Total: orchestrator never exceeded 15K tokens
// Each subagent used 5-20K tokens independently
// Single-agent approach would have used 80K+ tokens

Cognition, the company behind Devin, found that context retrieval consumes 60% or more of agent time in coding workflows. This is exactly the work that benefits most from delegation. A retrieval subagent can explore extensively, using tens of thousands of tokens, but returns only a condensed summary of 1,000 to 2,000 tokens to the orchestrator.

Devin itself operates as a swarm of specialized models: a Planner for high-reasoning tasks, a Coder for implementation, a Critic for security review, and a Browser for documentation synthesis. Each model processes only the context relevant to its specialty.

Just-in-Time Retrieval: Load Only What You Need

The opposite of good agentic context engineering is loading everything upfront. Agents that dump entire codebases, full documentation sets, or complete conversation histories into context fail faster and cost more.

Just-in-time retrieval means the agent maintains lightweight references (file paths, URLs, stored queries) and loads data into context only when the current step requires it. Anthropic recommends this as the primary strategy for long-running agents.

Just-in-time vs. upfront loading

// UPFRONT (bad): Agent loads everything at session start
// 500 files loaded → 400K tokens consumed
// Agent edits 3 files → 397 files were noise
// Cost: high. Quality: degraded by irrelevant context.

// JUST-IN-TIME (good): Agent loads per-step
// Step 1: Read task → "Fix auth bug in refresh flow"
// Step 2: WarpGrep search → finds auth.ts, middleware.ts
// Step 3: Load auth.ts (2K tokens)
// Step 4: Read error → needs session.ts
// Step 5: Load session.ts (1.5K tokens)
// Total context: 3.5K tokens of code (+ task + system prompt)
// Same task, 100x less context, better results.

Claude Code implements this pattern through MCP Tool Search with lazy loading. Tool definitions are not loaded into context until the agent needs them. This single optimization reduces context usage by 95% for agents with access to many tools. It is agentic context engineering applied to the agent's own capability surface.

WarpGrep provides the same pattern for code retrieval. Instead of loading entire repositories, the agent calls WarpGrep with a semantic query and gets back only the files and functions relevant to its current task. This is surgical retrieval designed for agent workflows.

Reversible Compression: Drop Content, Keep References

Not all context needs to stay in the window. Reversible compression removes content from active context while preserving references that let the agent reload it later. A web page's full HTML can be dropped if the URL stays. A file's contents can be omitted if the path remains accessible. The agent operates with a smaller active window while maintaining access to the full information surface.

This technique is distinct from summarization. Summarization replaces detailed content with a condensed version, losing fidelity. Reversible compression removes content entirely but keeps a pointer, losing nothing. The tradeoff is latency: reloading from a reference takes a tool call. But for information the agent might need only 10% of the time, this is the right tradeoff.

Practical example

An agent reviews a 50-file pull request. After reading each file and noting issues, it drops the file contents from context but keeps the file paths and issue summaries. By the time it writes its review, active context contains only the issue list and file references, not 50 full files. If it needs to re-check a specific line, it reloads that one file.

Manual vs. Agentic Context Engineering

Manual context engineering and agentic context engineering are complementary. Manual engineering sets the foundation. Agentic engineering manages the runtime.

Aspect	Manual (Human-Driven)	Agentic (Agent-Driven)
When it happens	Before execution (setup)	During execution (runtime)
Who decides	Developer	The agent itself
Artifacts	CLAUDE.md, .claudeignore, prompt templates	Compaction summaries, subagent spawns, retrieval calls
Scope	Session-level (one conversation)	Task-level (hours or days of work)
Failure mode	Wrong files included, missing context	Context rot, observation accumulation, cache invalidation
Example tools	CLAUDE.md, .claudeignore, few-shot examples	Auto-compact, WarpGrep, Morph Compact, subagent delegation

Both layers matter. A well-written CLAUDE.md gives the agent a strong starting context. Good .claudeignore rules prevent noise from entering the window in the first place. But once the agent starts a multi-hour autonomous run, it needs to manage its own context. Manual engineering cannot anticipate every retrieval decision, compaction trigger, or delegation opportunity that arises during execution.

The Three Context Challenges for Coding Agents

Coding agents face specific context engineering problems that distinguish them from general-purpose agents. These three dominate:

1. Tool Observation Accumulation

Every file read, every grep result, every test run, every shell command adds tokens to context. An agent processing 5 to 7 operations generates over 5,000 tokens of tool results. A typical debugging session involves dozens of tool calls. Without active management, the context window fills with stale observations from earlier steps that are no longer relevant.

2. Codebase Navigation

A 500-file repository does not fit in any context window. The agent must search surgically, loading only the files relevant to the current step. This requires effective code search, not just keyword grep but semantic understanding of which files relate to the current task. This is exactly what WarpGrep provides: parallel semantic search that returns high-signal results instead of everything that matches a string pattern.

3. The Apply Step

Merging an edit into a file requires exactly three pieces of context: the original file, the edit intent, and the update snippet. Too much surrounding context confuses the merge. Too little and the model cannot locate the edit target. Morph's Fast Apply model isolates this step in a specialized context, keeping the primary agent's window clean for planning and reasoning.

Context budget for a typical coding agent task

// Task: "Add input validation to the signup form"

// Context budget breakdown:
// System prompt + CLAUDE.md:     ~3,000 tokens
// Task description:              ~200 tokens
// Retrieved files (2-3 files):   ~4,000 tokens
// Tool definitions (loaded):     ~1,500 tokens
// Conversation history:          ~2,000 tokens
// -----------------------------------------
// Total active context:          ~10,700 tokens

// Compare to naive approach:
// System prompt + CLAUDE.md:     ~3,000 tokens
// Entire src/ directory:         ~180,000 tokens
// All tool definitions:          ~30,000 tokens
// Full conversation history:     ~15,000 tokens
// -----------------------------------------
// Total:                         ~228,000 tokens
// 21x more tokens. Slower. More expensive. Worse output.

Infrastructure for Agentic Context Engineering

Agentic context engineering is not just a set of techniques. It requires infrastructure. Agents need fast compression, surgical retrieval, and isolated execution contexts. Building these from scratch for every agent is wasteful.

Morph Compact

Compresses agent context to the minimum viable token set. Preserves file references, decision history, and task state. Runs at 10,500+ tok/s, fast enough for inline agent use. Purpose-built for structured agent context, not generic summarization.

WarpGrep

Parallel semantic code search that returns only high-signal results. Agents get exactly the files relevant to the current step. No noise from unrelated modules. API-first: works as a subagent in any framework.

Together, Compact and WarpGrep implement the two most critical agentic context engineering operations: compression (reduce what you have) and retrieval (load only what you need). Both run fast enough to operate inline during agent execution, which is a hard requirement. An agent cannot pause for 10 seconds to compress its context. The compression must be nearly invisible in the agent loop.

Frequently Asked Questions

What is agentic context engineering?

Agentic context engineering is the discipline of curating, compressing, and routing the optimal set of tokens to AI agents during autonomous task execution. The agent itself manages what enters and exits its context window, without human curation. It encompasses automated compaction, subagent delegation, just-in-time retrieval, and reversible compression.

How is it different from regular context engineering?

Regular context engineering involves a human setting up CLAUDE.md files, .claudeignore rules, and crafting targeted prompts. Agentic context engineering is what happens during autonomous execution: auto-compaction triggers without human input, subagent delegation happens based on the agent's assessment of task complexity, and retrieval decisions are made in real time. The two are complementary. Manual engineering sets the foundation, agentic engineering manages the runtime.

What is context rot and why does it matter?

Context rot is performance degradation as an agent's context fills during long tasks. Chroma Research found every frontier model gets worse as input length increases. The lost-in-the-middle effect drops performance by over 30%. Doubling task duration quadruples the failure rate. Context rot is the primary failure mode for autonomous coding agents.

How do Claude Code and OpenAI Codex handle context?

Claude Code auto-compacts at ~95% context capacity using hierarchical summarization, combined with git checkpoints and progress files. Codex replaces input with a smaller representative list when tokens exceed a threshold. Codex demonstrated a 25-hour uninterrupted run using ~13M tokens, which is only possible with robust autonomous context management.

What role do subagents play in context engineering?

Subagents provide context isolation. Each subagent gets its own context window with its own tools. A retrieval subagent might use 20K tokens exploring a codebase but returns only a 1K-token summary to the orchestrator. Anthropic reported a 90.2% improvement when using multi-agent systems compared to single-agent approaches, largely due to better context management.

How does Morph help with agentic context engineering?

Morph Compact provides context compression as infrastructure, reducing agent context to the minimum viable token set at 10,500+ tokens per second. WarpGrep provides surgical code retrieval via parallel semantic search. Together, they handle the two most critical operations: compressing what you have and retrieving only what you need.

Context Engineering Infrastructure for Agents

Morph Compact compresses agent context to the minimum viable token set. WarpGrep retrieves exactly the code your agent needs. Both run fast enough for inline use during autonomous execution.

Try Morph Compact

Try WarpGrep

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Agentic Context Engineering: How AI Agents Manage Their Own Context

What Is Agentic Context Engineering?

Agentic vs. general context engineering

Context Rot: The Core Problem Agents Face

Lost-in-the-Middle

Attention Dilution

Observation Accumulation

Automated Compaction: Fighting Context Rot at Scale

Structured compaction output (what the agent sees after compaction)

Morph Compact for agent context

Subagent Delegation: Divide and Conquer for Context

Context flow in a multi-agent coding task

Just-in-Time Retrieval: Load Only What You Need

Just-in-time vs. upfront loading

Reversible Compression: Drop Content, Keep References

Practical example

Manual vs. Agentic Context Engineering

The Three Context Challenges for Coding Agents

1. Tool Observation Accumulation

2. Codebase Navigation

3. The Apply Step

Context budget for a typical coding agent task

Infrastructure for Agentic Context Engineering

Morph Compact

WarpGrep

Frequently Asked Questions

What is agentic context engineering?

How is it different from regular context engineering?

What is context rot and why does it matter?

How do Claude Code and OpenAI Codex handle context?

What role do subagents play in context engineering?

How does Morph help with agentic context engineering?

Context Engineering Infrastructure for Agents