What "Context Left Until Auto-Compact" Means
Short answer
Claude Code's 200K token context window is nearly full. The agent is about to summarize the entire conversation to free up space. After it fires, the agent loses detailed memory of earlier file reads, tool outputs, and debugging steps. Everything compresses to a short summary.
Claude Code operates within a fixed 200,000 token context window. Every message you send, every file Claude reads, every bash command output, every grep result, and every tool definition occupies space in that window. The "context left until auto-compact" warning appears when usage reaches approximately 75-80% of that capacity.
When the remaining space drops below the threshold, Claude Code automatically runs auto-compact: it summarizes the full conversation history, clears old tool outputs, and restarts from a compressed state. The warning is a countdown. Once it fires, the session continues, but with a fraction of the original detail.
Why You See This Warning
Context fills up faster than most people expect. The 200K window sounds large, but a significant portion is consumed before you even start working:
| Component | Typical tokens | Notes |
|---|---|---|
| System prompt + built-in tools | ~20,000 | Fixed cost, always present |
| MCP tool schemas | 900-51,000 | More MCP servers = faster compaction |
| CLAUDE.md files | 300-2,000 | Survives compaction, loads every request |
| Auto-compact buffer | ~33,000 | Reserved, cannot be used |
| Your conversation + outputs | 100-140K | This is what fills up |
The biggest token consumers during a session are file reads and command outputs. A single file read can dump 2,000-5,000 tokens into context. A test suite output might add 10,000+. Ten file reads and a few debugging loops can consume 50K-80K tokens within 15-30 minutes of active work.
Cognition (the team behind Devin) measured that coding agents spend 60% of their time searching for code: reading entire files to find specific functions, scanning grep results, navigating dependency chains. Each search dumps full files into context whether or not they contain what the agent needs. This is the primary driver of context exhaustion.
What Happens When Auto-Compact Triggers
Auto-compact runs three steps in sequence:
- Old tool outputs cleared. File read results, grep outputs, and bash command outputs from earlier in the conversation are removed or truncated. These are the largest token consumers.
- Conversation summarized. The full chat history gets compressed into a structured summary: what was completed, what's in progress, which files were touched.
- Session continues from summary. The agent picks up from the compressed state with a fresh token budget. CLAUDE.md files are re-injected from disk (they always survive compaction).
Your files on disk are safe
Auto-compact does not modify files on disk. All code changes, file writes, and git commits are preserved. Only the conversation history in memory is affected. The risk is not data loss but amnesia: the agent forgets what it changed and may make contradictory decisions.
The session continues after auto-compact, but the agent is working from a summary that compresses 100K+ tokens into roughly 5K tokens. Specific error messages, file paths, line numbers, decision rationale, and architectural context are reduced to brief descriptions. A debugging session where the agent narrowed a bug to a specific function call becomes "debugged authentication issue in middleware."
How Much Context Do You Have Before It Triggers?
The exact trigger point depends on your setup, primarily how many MCP servers are configured. Here are realistic budgets for three common configurations:
Minimal setup (no MCP)
~140K usable tokens. System prompt takes ~20K, buffer takes ~33K, leaving the most room for work. You can read 30-50 files before the warning appears.
Typical setup (2-3 MCP servers)
~110-120K usable tokens. MCP tool schemas add 5K-15K. This is the most common configuration. Expect the warning after 20-30 minutes of active work.
Heavy setup (5+ MCP servers)
~80-100K usable tokens. MCP overhead can consume 30K-50K tokens. The warning appears much sooner, sometimes within 10-15 minutes of active debugging.
Check your exact budget
Run /context in any Claude Code session to see the exact token breakdown: system prompt, tool definitions, MCP overhead, conversation messages, auto-compact buffer, and free space. You can also configure a custom status line to display context percentage continuously as you work.
What Gets Lost During Auto-Compact
| Information type | Before compaction | After compaction |
|---|---|---|
| File contents | Full source code from every file read | Brief mention: 'read auth.ts' |
| Error messages | Full stack traces with line numbers | 'Encountered TypeError in auth flow' |
| Grep results | Every matching line with context | 'Searched for token validation usage' |
| Decision reasoning | Full analysis of why approach A over B | 'Chose approach A for auth' |
| Code edits made | Exact lines changed, before/after | 'Modified auth middleware' |
| CLAUDE.md content | Full content | Full content (survives compaction) |
| Task list | Full task breakdown | Persists (stored separately) |
The pattern that causes the most problems: the agent modified several files to implement a feature, then auto-compact fires. The summary says "implemented rate limiting in auth middleware" but does not record which files were changed, what rate limit values were set, or what edge cases were handled. The agent may then re-edit the same files with different logic, creating conflicts with its own earlier work.
What survives compaction
CLAUDE.md files always survive. They are re-loaded from disk after every compaction. Task lists persist because they are stored separately. Files on disk are not affected. If you put critical instructions in CLAUDE.md, the agent will have them after compaction. If you relied on the conversation to carry that context, it will be lost.
Manual /compact vs Auto-Compact
Claude Code offers two ways to compact: manual (/compact) and automatic. They use the same underlying mechanism but produce very different results.
| Dimension | Manual /compact | Auto-compact |
|---|---|---|
| When it runs | You choose the timing | Fires at ~75-80% context usage |
| Custom instructions | Yes: /compact preserve file paths and test results | No: uses generic summarization |
| Task awareness | You compact at logical breakpoints | Fires mid-task with no state awareness |
| Summary quality | Higher: you guide what to preserve | Lower: generic, loses more detail |
| Can be prevented | You choose not to run it | Cannot be disabled |
| CLAUDE.md behavior | Re-loaded from disk | Re-loaded from disk |
The best practice: run /compact manually at natural breakpoints in your work. After finishing a feature, after fixing a bug, after completing a research phase. Use specific preservation instructions:
Manual compact with custom instructions
# After finishing a feature
/compact preserve all modified file paths, the test results, and the remaining TODO items
# After a debugging session
/compact preserve the root cause analysis, the fix applied to src/auth/middleware.ts, and the test that validates it
# After code review
/compact preserve the list of files reviewed, issues found, and fixes appliedYou can also add default compaction instructions to your CLAUDE.md file so auto-compact produces better summaries when it does fire:
CLAUDE.md compact instructions
# Compact instructions
When compacting, always preserve:
- All file paths that were modified
- Current test status (passing/failing)
- The specific error messages being debugged
- Which approach was chosen and whyHow to Delay Auto-Compact
You cannot disable auto-compact, but you can significantly delay it. The strategies fall into two categories: reducing token waste per operation, and structuring your workflow to keep context clean.
1. Break work into smaller sessions
One task per session. When you finish implementing a feature, run /clear and start fresh. The context from task A is noise for task B. A clean session gives you the full usable budget instead of a polluted window with stale context from completed work.
2. Use subagents for large-output tasks
Each subagent gets its own isolated 200K context window. Delegate tasks that produce verbose outputs (running test suites, searching large codebases, processing log files) to subagents. Only the relevant result returns to your main session. This keeps your main context clean while still getting the full output.
3. Compact manually at logical breakpoints
After finishing a feature or fixing a bug, run /compact with specific instructions about what to preserve. Manual compaction at clean breakpoints produces summaries that are 3-5x more useful than auto-compact summaries fired mid-task.
4. Reduce MCP server overhead
Each MCP server adds tool schemas to your context on every request. Run /mcp to see per-server costs, and disable servers you are not actively using. With ENABLE_TOOL_SEARCH=auto:5, Claude Code defers tool definitions until they are actually needed, keeping idle MCP overhead near zero.
5. Put persistent context in CLAUDE.md
CLAUDE.md files survive every compaction cycle. Put project architecture, coding conventions, key file paths, and workflow rules there. The agent never needs to re-discover this information after compaction, saving thousands of tokens per cycle. Keep it under 200 lines for best adherence.
6. Use the /rewind summarize feature
Press Esc twice to open the rewind menu, then select "Summarize from here." This lets you compress specific portions of the conversation (like a verbose debugging session) while keeping earlier context intact. More targeted than /compact, which summarizes everything.
How FlashCompact Prevents Context Waste
The strategies above help manage context after it fills up. FlashCompact addresses the root cause: it reduces how many tokens each operation consumes in the first place.
The two biggest context consumers in a Claude Code session are file reads (dumping entire files to find specific code) and file writes (rewriting entire files to change a few lines). FlashCompact attacks both:
WarpGrep: Targeted Search
Returns only the relevant code snippets instead of entire files. One semantic search call replaces 5-10 sequential file reads. 0.73 F1 accuracy in 3.8 steps on SWE-Bench. Saves 5-10x tokens per search operation.
Fast Apply: Compact Diffs
Outputs only changed lines instead of rewriting the entire file. A 3-line edit in a 200-line file produces ~20 tokens of diff instead of ~2,000 tokens of full content. 10,500 tok/s throughput. ~90% fewer output tokens per edit.
Morph Compact: Verbatim Cleanup
Compresses remaining conversation noise without hallucination. 3,300+ tok/s processing speed. Zero hallucination rate because it operates on verbatim content, not summaries. Cleans up whatever WarpGrep and Fast Apply don't prevent.
| Operation | Default approach | With FlashCompact | Savings |
|---|---|---|---|
| Find a function | Read 5-8 files (10K-40K tokens) | 1 WarpGrep search (500-2K tokens) | 5-20x |
| Edit 3 lines in a file | Rewrite full file (2K+ tokens) | Compact diff (~20 tokens) | ~100x |
| Trace a dependency chain | Grep + read results (5K-15K tokens) | Scoped search (1K-3K tokens) | 3-5x |
| Refactor 15 files | 30K tokens of file rewrites | ~3K tokens of diffs | 10x |
The combined effect: sessions run 3-4x longer before the "context left until auto-compact" warning appears. A session that would compact after 20 minutes of active work runs 60-80 minutes instead. Fewer compaction cycles means less information loss and fewer cases where the agent forgets its earlier work.
State-of-the-art on SWE-Bench Pro
FlashCompact tools achieve state-of-the-art results on SWE-Bench Pro, the benchmark for real-world software engineering tasks. The context efficiency gains translate directly to better task completion: agents that retain more working memory make fewer mistakes and complete tasks in fewer steps.
Frequently Asked Questions
What does "context left until auto-compact" mean?
Claude Code's 200K token context window is nearly full. When remaining space drops below roughly 20-25%, Claude Code automatically summarizes the conversation to free tokens. The warning tells you the agent is about to lose detailed memory of file reads, tool outputs, error messages, and conversation history. Run /compact manually with specific preservation instructions before it fires automatically.
Can I disable auto-compact?
No. Auto-compact is a built-in safety mechanism that prevents the context window from overflowing. You can delay it by reducing token waste and by running /compact manually at better timing, but the automatic trigger cannot be turned off.
What information gets lost during auto-compact?
Specific file contents, error messages, stack traces, grep results, debugging steps, and detailed reasoning all get compressed to brief descriptions. A 100K token conversation compresses to roughly 5K tokens. CLAUDE.md files and task lists survive compaction. Files on disk are not affected.
Does auto-compact delete my code changes?
No. Auto-compact only affects the conversation history in memory. All file writes, code edits, and git commits are preserved on disk. The risk is amnesia, not data loss: the agent forgets what it changed and may make contradictory edits because the summary lost those details.
What is the difference between /compact and auto-compact?
/compact lets you choose when to compact and what to preserve. Auto-compact fires automatically based on token count with no awareness of task state. Manual compaction at logical breakpoints (after finishing a feature, after fixing a bug) produces much better summaries. You can also add "Compact instructions" to your CLAUDE.md to improve auto-compact summaries.
How do I check how much context I have left?
Run /context to see the exact token breakdown. You can also configure a custom status line to display context percentage continuously as you work. The /cost command shows cumulative token usage for the session.
How does FlashCompact help?
FlashCompact reduces context consumption at the source. WarpGrep returns only relevant code snippets instead of entire files (5-10x fewer tokens per search). Fast Apply uses compact diffs instead of full file rewrites (~90% fewer output tokens per edit). Together, they extend effective session length by 3-4x.
How long can I work before auto-compact fires?
Depends on your setup and how you work. With a minimal configuration (no MCP servers), aggressive debugging sessions typically trigger compaction in 20-30 minutes. With 5+ MCP servers, it can be as fast as 10-15 minutes. Using FlashCompact tools extends this to 60-80+ minutes by reducing per-operation token consumption.
Related Guides
Stop the Context Countdown
FlashCompact tools reduce context waste from both reads and writes. The 'context left until auto-compact' warning appears 3-4x less often. No configuration changes, no workflow disruption.