Claude Code Context Window: Limits, Compaction, and How to Manage It

Claude Code has a 200K token context window. Performance degrades around 147K-152K tokens. Auto-compaction triggers at 64-75% capacity. This guide covers exact token breakdowns, compaction behavior, subagent isolation, CLAUDE.md strategies, and how to keep your agent effective across long sessions.

February 27, 2026 · 2 min read

Claude Code's context window is 200,000 tokens. That sounds like a lot, but system prompts, tool definitions, MCP schemas, and memory files consume 30,000-40,000 tokens before you type anything. Performance starts degrading around 147K tokens, not 200K. This guide covers the exact token breakdown, how auto-compaction works, and seven strategies to keep your agent effective across long coding sessions.

200K
Total context window (tokens)
~147K
Effective quality ceiling
64-75%
Auto-compaction trigger point
8-10%
Tokens consumed by tool definitions

How the 200K Window Gets Allocated

Claude Code's 200K tokens are not all yours. The window is shared across every component the agent needs to function. Run /context in any Claude Code session to see the exact breakdown.

ComponentTokens% of WindowNotes
System prompt~2,6001.3%Base instructions for the agent
System tools~17,6008.8%Read, Write, Bash, Grep, etc.
MCP tools900-51,0000.5-25%Varies wildly by server count
Custom agents~9350.5%Subagent definitions
Memory files~3020.2%CLAUDE.md content
Autocompact buffer~33,00016.5%Reserved for compaction process
Free for conversation~114,00057%What you actually get to use

In a clean session with minimal MCP tools, you get about 160K-170K tokens for actual work. Add a few MCP servers and that drops to 120K-130K. Add many MCP servers and you can lose 50K+ tokens to tool schemas before the session begins.

The /context command is your diagnostic tool

Run /context regularly during sessions to see where tokens are going. It is the fastest way to identify whether MCP tools, large file reads, or accumulated conversation history is eating your context budget.

When Performance Actually Degrades

The 200K limit is a hard ceiling. But the effective quality ceiling is much lower. Geoffrey Huntley, an engineer at Sourcegraph (makers of the Amp coding agent), found that context quality degrades around 147,000-152,000 tokens. That is 25% below the advertised limit.

This is the lost-in-the-middle problem. LLMs pay the most attention to tokens at the start and end of the context. Information in the middle gets deprioritized. As your session grows, earlier file reads, error messages, and decisions gradually lose influence on the model's output.

200K
Advertised context window
~150K
Quality starts degrading
~114K
Usable after system overhead

The practical consequence: a Claude Code session at 60% capacity with 40% noise tokens produces worse output than the same agent at 30% capacity with clean context. Token quantity is not the bottleneck. Token quality is.

Context rot is not unique to Claude

Every LLM exhibits degraded recall and accuracy as context length increases. Research consistently shows this across GPT-4, Claude, Gemini, and open-source models. The context rot problem is a fundamental property of transformer attention, not a bug in any specific model.

How Auto-Compaction Works

When Claude Code approaches the context limit, it runs auto-compaction: an automated process that summarizes conversation history to free up space. This keeps sessions running indefinitely without hard crashes, but the summarization is lossy.

The Compaction Process

  1. Tool outputs cleared first. Old file reads, grep results, and bash outputs are removed or truncated. These are the largest and least valuable tokens in a long session.
  2. Conversation summarized. The full conversation history gets condensed into a structured summary: what was completed, what is in progress, what files were modified.
  3. Session restarts with summary. The compacted summary becomes the new context baseline. The agent continues from there.

When It Triggers

Older versions of Claude Code waited until 90%+ capacity to compact. Current versions trigger much earlier, at 64-75% capacity. Anthropic's engineers built in a completion buffer so the agent has enough room to finish its current task before compaction interrupts.

DimensionAuto-CompactManual /compact
TriggerAutomatic at 64-75% capacityYou decide when
TimingCan interrupt mid-taskYou pick a clean break point
PreservationGeneric summaryCustom: '/compact preserve file paths and error codes'
RiskMay lose critical detailsYou control what matters

The problem with auto-compact is timing. It triggers based on token count, not task state. If compaction fires while the agent is mid-debugging, the summary may drop the exact error message or file path the agent needs for the next step. Manual compaction at logical breakpoints, after finishing a feature or fixing a bug, produces better summaries because the context is cleaner at those moments.

Custom compaction instructions

When running /compact manually, you can add instructions: /compact Preserve all file paths, error messages, and the list of modified files. This tells the summarizer what to prioritize. You can also add default compaction instructions to your CLAUDE.md file.

Hidden Context Drains: MCP Tools and Tool Definitions

MCP (Model Context Protocol) tools are the most common hidden context drain. Each MCP server loads its full tool schema into context on every request, even when none of its tools are called. A server with 20 tools can consume 5,000-10,000 tokens just by existing.

8-30%
Context consumed by tool definitions
85%
Overhead reduction with Tool Search
51K to 8.5K
MCP tokens after optimization

Claude Code now auto-enables Tool Search when MCP tools would consume more than 10% of context. Instead of loading every tool schema upfront, Tool Search defers tool definitions and loads them on demand. This cut MCP token overhead from 51K to 8.5K in one benchmark, a 46.9% reduction in total context usage.

Other Hidden Drains

Large File Reads

Reading a 400-line file consumes thousands of tokens. Use --lines or specify ranges to read only relevant sections. A targeted read uses 70% fewer tokens than reading the whole file.

Verbose Command Output

Bash commands that dump hundreds of lines (npm install, test runners, build logs) fill context fast. Pipe through tail or grep before returning results to the agent.

Accumulated Conversation

Every message, including your one-word confirmations, stays in context. Long back-and-forth sessions accumulate thousands of tokens of low-signal dialogue that dilute the important parts.

Seven Strategies to Manage Claude Code Context

1. Put Persistent Instructions in CLAUDE.md

CLAUDE.md is a special file that Claude Code reads at the start of every session and preserves through every compaction cycle. It is the only reliable place for instructions that must survive the entire session. Put your coding conventions, project structure, key file paths, common commands, and workflow rules here.

Keep CLAUDE.md under 200 lines and 2,000 tokens. It loads into context on every request, so a bloated CLAUDE.md consumes its own share of the window. Write it for the model, not for humans: concise, structured, and specific.

2. Use /clear Between Distinct Tasks

When you finish implementing a feature and start debugging something unrelated, run /clear. This resets the context window entirely. The context from the first task is pure noise for the second. Starting fresh gives the agent a clean 160K+ tokens instead of a polluted 80K.

3. Compact Manually at Logical Breakpoints

Do not wait for auto-compact. After completing a feature, fixing a bug, or reaching any natural stopping point, run /compact with custom preservation instructions. The summary will be higher quality because the context is clean at that moment.

4. Delegate Large-Output Tasks to Subagents

Each subagent (via the Task tool) gets its own isolated 200K context window. Running tests, fetching documentation, processing log files, or searching large codebases in a subagent keeps the verbose output contained. Only the relevant summary returns to your main conversation.

Up to 10 subagents run concurrently. For a complex task, three parallel subagents give you an effective 600K tokens of total context without polluting the main session.

5. Disable Unused MCP Servers

Run /context to see which MCP servers are consuming tokens. If you are not using a server's tools in the current session, disable it. Each disabled server frees up its full schema size, often 2,000-10,000 tokens.

6. Use Targeted File Reads

Instead of reading entire files, specify line ranges. Read lines 40-90 of src/api/handler.ts uses a fraction of the tokens compared to reading all 500 lines. This is especially important for repeated reads in debugging loops where the agent re-reads the same file multiple times.

7. Compress Tool Outputs with Morph Compact

Context compression reduces tool outputs before they enter the main conversation. Instead of a 5,000-token file read filling 2.5% of your context, compression reduces it to 1,500-2,500 tokens while preserving the exact code, file paths, and error messages the agent needs.

Context Compression with Morph Compact

Morph Compact is a purpose-built model for context compression. Unlike summarization, which rewrites your context and can alter file paths or code snippets, Compact uses verbatim compaction: it deletes low-signal tokens while keeping every surviving sentence identical to the original.

50-70%
Token reduction
3,300+
Tokens per second
98%
Verbatim accuracy
0%
Hallucination risk

For Claude Code specifically, Compact addresses the core problem: tool outputs (file reads, grep results, bash outputs) fill 60-80% of the context window. Compressing these outputs inline, as they arrive, keeps the context clean throughout the session instead of waiting for a threshold-based compaction that fires too late.

DimensionBuilt-in Auto-CompactMorph Compact
MethodSummarization (lossy rewrite)Verbatim deletion (zero rewrite)
What gets compressedEntire conversation at onceIndividual tool outputs inline
When it runsAt 64-75% capacityPer tool call (continuous)
Code fidelityLow (paraphrasing)100% (exact original text)
File paths preservedSometimes lostAlways preserved
Can be combinedYes, runs as fallbackYes, runs before compaction

The two approaches are complementary. Morph Compact reduces noise inline so auto-compaction fires less often. When compaction does fire, the conversation is already cleaner, so the summary is higher quality. The combination extends your effective session length significantly.

Works with any agent, not just Claude Code

Morph Compact works through the standard OpenAI SDK. Point the base URL at api.morphllm.com/v1 and use the morph-compact model. It works with any agent framework that manages context programmatically.

Code Example: Compressing Claude Code Tool Outputs

If you build custom tooling around Claude Code or run it programmatically via the API, you can compress tool outputs before they enter the conversation.

Inline compression for agent tool outputs (TypeScript)

import OpenAI from "openai";

const morph = new OpenAI({
  apiKey: process.env.MORPH_API_KEY,
  baseURL: "https://api.morphllm.com/v1",
});

async function compactToolOutput(output: string): Promise<string> {
  // Short outputs pass through — only compress large ones
  if (output.length < 2000) return output;

  const response = await morph.chat.completions.create({
    model: "morph-compact",
    messages: [{ role: "user", content: output }],
  });

  return response.choices[0].message.content ?? output;
}

// In your agent loop:
// File read returned 8,000 tokens → compressed to ~3,000
// Grep result returned 4,500 tokens → compressed to ~1,500
// Agent sees clean, high-signal context throughout the session

Python: compress before adding to conversation

from openai import OpenAI

morph = OpenAI(
    api_key="your-morph-api-key",
    base_url="https://api.morphllm.com/v1"
)

def compact_if_large(content: str, threshold: int = 2000) -> str:
    """Compress content only if it exceeds token threshold."""
    if len(content) < threshold:
        return content

    response = morph.chat.completions.create(
        model="morph-compact",
        messages=[{"role": "user", "content": content}]
    )
    return response.choices[0].message.content

# Example: compress a large file read before adding to context
file_content = read_file("src/api/webhooks/stripe.ts")  # 8K tokens
compressed = compact_if_large(file_content)               # ~3K tokens
# Every line in compressed output is verbatim from the original

Frequently Asked Questions

What is Claude Code's context window size?

200,000 tokens. This is shared across system prompts (~2.6K tokens), tool definitions (~17.6K tokens), MCP server schemas, CLAUDE.md memory files, and your conversation history. After system overhead, roughly 160K-170K tokens are available for actual conversation and tool outputs. With multiple MCP servers enabled, that number can drop to 120K or lower.

What is auto-compaction in Claude Code?

Auto-compaction is Claude Code's built-in mechanism for managing context as it approaches the window limit. It summarizes older messages, clears old tool outputs, and restarts with compressed state. Current versions trigger at 64-75% capacity with a completion buffer so the current task can finish. You can also run /compact manually with custom instructions for what to preserve.

Why does Claude Code forget things during long sessions?

Two reasons. First, the lost-in-the-middle problem: information in the center of a long context gets deprioritized by the attention mechanism. Second, auto-compaction is lossy. Summarization can drop specific file paths, error codes, and architectural decisions. The fix: put persistent instructions in CLAUDE.md, use subagents for large-output tasks, and compact manually at logical breakpoints.

How do I check my Claude Code context usage?

Run /context. It shows token counts and percentages for every component: system prompt, system tools, MCP tools, custom agents, memory files, messages, autocompact buffer, and free space. Use it to identify MCP servers or large conversation segments consuming disproportionate context.

Does Claude Code support a 1 million token context window?

Claude's API supports up to 1 million tokens for Opus 4.6, Sonnet 4.6, Sonnet 4.5, and Sonnet 4 via a beta header. Claude Code itself defaults to 200K. Requests beyond 200K are charged at premium rates (2x input, 1.5x output). Even with extended context, the lost-in-the-middle problem means performance degrades as token count grows. More tokens does not mean better output.

How can I prevent Claude Code from losing context?

Six strategies: (1) Put persistent rules in CLAUDE.md. (2) Run /compact manually at logical breakpoints with preservation instructions. (3) Use /clear between distinct tasks. (4) Delegate large-output tasks to subagents with isolated context windows. (5) Disable unused MCP servers. (6) Use Morph Compact to compress tool outputs inline, keeping the context clean throughout the session.

Keep Your Claude Code Context Clean

Morph Compact compresses tool outputs inline with 50-70% token reduction and zero hallucination risk. Every surviving line is verbatim from the original. Your Claude Code sessions stay effective longer.