"Context Left Until Auto-Compact" in Claude Code: What It Means

What "Context Left Until Auto-Compact" Means

Short answer

Claude Code's 200K token context window is nearly full. The agent is about to summarize the entire conversation to free up space. After it fires, the agent loses detailed memory of earlier file reads, tool outputs, and debugging steps. Everything compresses to a short summary.

Claude Code operates within a fixed 200,000 token context window. Every message you send, every file Claude reads, every bash command output, every grep result, and every tool definition occupies space in that window. The "context left until auto-compact" warning appears when usage reaches approximately 75-80% of that capacity.

When the remaining space drops below the threshold, Claude Code automatically runs auto-compact: it summarizes the full conversation history, clears old tool outputs, and restarts from a compressed state. The warning is a countdown. Once it fires, the session continues, but with a fraction of the original detail.

200K

Total context window

~150K

Auto-compact trigger point

~33K

Reserved buffer

100-140K

Usable for actual work

Why You See This Warning

Context fills up faster than most people expect. The 200K window sounds large, but a significant portion is consumed before you even start working:

Component	Typical tokens	Notes
System prompt + built-in tools	~20,000	Fixed cost, always present
MCP tool schemas	900-51,000	More MCP servers = faster compaction
CLAUDE.md files	300-2,000	Survives compaction, loads every request
Auto-compact buffer	~33,000	Reserved, cannot be used
Your conversation + outputs	100-140K	This is what fills up

The biggest token consumers during a session are file reads and command outputs. A single file read can dump 2,000-5,000 tokens into context. A test suite output might add 10,000+. Ten file reads and a few debugging loops can consume 50K-80K tokens within 15-30 minutes of active work.

Cognition (the team behind Devin) measured that coding agents spend 60% of their time searching for code: reading entire files to find specific functions, scanning grep results, navigating dependency chains. Each search dumps full files into context whether or not they contain what the agent needs. This is the primary driver of context exhaustion.

What Happens When Auto-Compact Triggers

Auto-compact runs three steps in sequence:

Old tool outputs cleared. File read results, grep outputs, and bash command outputs from earlier in the conversation are removed or truncated. These are the largest token consumers.
Conversation summarized. The full chat history gets compressed into a structured summary: what was completed, what's in progress, which files were touched.
Session continues from summary. The agent picks up from the compressed state with a fresh token budget. CLAUDE.md files are re-injected from disk (they always survive compaction).

Your files on disk are safe

Auto-compact does not modify files on disk. All code changes, file writes, and git commits are preserved. Only the conversation history in memory is affected. The risk is not data loss but amnesia: the agent forgets what it changed and may make contradictory decisions.

The session continues after auto-compact, but the agent is working from a summary that compresses 100K+ tokens into roughly 5K tokens. Specific error messages, file paths, line numbers, decision rationale, and architectural context are reduced to brief descriptions. A debugging session where the agent narrowed a bug to a specific function call becomes "debugged authentication issue in middleware."

How Much Context Do You Have Before It Triggers?

The exact trigger point depends on your setup, primarily how many MCP servers are configured. Here are realistic budgets for three common configurations:

Minimal setup (no MCP)

~140K usable tokens. System prompt takes ~20K, buffer takes ~33K, leaving the most room for work. You can read 30-50 files before the warning appears.

Typical setup (2-3 MCP servers)

~110-120K usable tokens. MCP tool schemas add 5K-15K. This is the most common configuration. Expect the warning after 20-30 minutes of active work.

Heavy setup (5+ MCP servers)

~80-100K usable tokens. MCP overhead can consume 30K-50K tokens. The warning appears much sooner, sometimes within 10-15 minutes of active debugging.

Check your exact budget

Run /context in any Claude Code session to see the exact token breakdown: system prompt, tool definitions, MCP overhead, conversation messages, auto-compact buffer, and free space. You can also configure a custom status line to display context percentage continuously as you work.

What Gets Lost During Auto-Compact

Information type	Before compaction	After compaction
File contents	Full source code from every file read	Brief mention: 'read auth.ts'
Error messages	Full stack traces with line numbers	'Encountered TypeError in auth flow'
Grep results	Every matching line with context	'Searched for token validation usage'
Decision reasoning	Full analysis of why approach A over B	'Chose approach A for auth'
Code edits made	Exact lines changed, before/after	'Modified auth middleware'
CLAUDE.md content	Full content	Full content (survives compaction)
Task list	Full task breakdown	Persists (stored separately)

The pattern that causes the most problems: the agent modified several files to implement a feature, then auto-compact fires. The summary says "implemented rate limiting in auth middleware" but does not record which files were changed, what rate limit values were set, or what edge cases were handled. The agent may then re-edit the same files with different logic, creating conflicts with its own earlier work.

What survives compaction

CLAUDE.md files always survive. They are re-loaded from disk after every compaction. Task lists persist because they are stored separately. Files on disk are not affected. If you put critical instructions in CLAUDE.md, the agent will have them after compaction. If you relied on the conversation to carry that context, it will be lost.

Manual /compact vs Auto-Compact

Claude Code offers two ways to compact: manual (/compact) and automatic. They use the same underlying mechanism but produce very different results.

Dimension	Manual /compact	Auto-compact
When it runs	You choose the timing	Fires at ~75-80% context usage
Custom instructions	Yes: /compact preserve file paths and test results	No: uses generic summarization
Task awareness	You compact at logical breakpoints	Fires mid-task with no state awareness
Summary quality	Higher: you guide what to preserve	Lower: generic, loses more detail
Can be prevented	You choose not to run it	Cannot be disabled
CLAUDE.md behavior	Re-loaded from disk	Re-loaded from disk

The best practice: run /compact manually at natural breakpoints in your work. After finishing a feature, after fixing a bug, after completing a research phase. Use specific preservation instructions:

Manual compact with custom instructions

# After finishing a feature
/compact preserve all modified file paths, the test results, and the remaining TODO items

# After a debugging session
/compact preserve the root cause analysis, the fix applied to src/auth/middleware.ts, and the test that validates it

# After code review
/compact preserve the list of files reviewed, issues found, and fixes applied

You can also add default compaction instructions to your CLAUDE.md file so auto-compact produces better summaries when it does fire:

CLAUDE.md compact instructions

# Compact instructions

When compacting, always preserve:
- All file paths that were modified
- Current test status (passing/failing)
- The specific error messages being debugged
- Which approach was chosen and why

How to Delay Auto-Compact

You cannot disable auto-compact, but you can significantly delay it. The strategies fall into two categories: reducing token waste per operation, and structuring your workflow to keep context clean.

1. Break work into smaller sessions

One task per session. When you finish implementing a feature, run /clear and start fresh. The context from task A is noise for task B. A clean session gives you the full usable budget instead of a polluted window with stale context from completed work.

2. Use subagents for large-output tasks

Each subagent gets its own isolated 200K context window. Delegate tasks that produce verbose outputs (running test suites, searching large codebases, processing log files) to subagents. Only the relevant result returns to your main session. This keeps your main context clean while still getting the full output.

3. Compact manually at logical breakpoints

After finishing a feature or fixing a bug, run /compact with specific instructions about what to preserve. Manual compaction at clean breakpoints produces summaries that are 3-5x more useful than auto-compact summaries fired mid-task.

4. Reduce MCP server overhead

Each MCP server adds tool schemas to your context on every request. Run /mcp to see per-server costs, and disable servers you are not actively using. With ENABLE_TOOL_SEARCH=auto:5, Claude Code defers tool definitions until they are actually needed, keeping idle MCP overhead near zero.

5. Put persistent context in CLAUDE.md

CLAUDE.md files survive every compaction cycle. Put project architecture, coding conventions, key file paths, and workflow rules there. The agent never needs to re-discover this information after compaction, saving thousands of tokens per cycle. Keep it under 200 lines for best adherence.

6. Use the /rewind summarize feature

Press Esc twice to open the rewind menu, then select "Summarize from here." This lets you compress specific portions of the conversation (like a verbose debugging session) while keeping earlier context intact. More targeted than /compact, which summarizes everything.

How FlashCompact Prevents Context Waste

The strategies above help manage context after it fills up. FlashCompact addresses the root cause: it reduces how many tokens each operation consumes in the first place.

The two biggest context consumers in a Claude Code session are file reads (dumping entire files to find specific code) and file writes (rewriting entire files to change a few lines). FlashCompact attacks both:

WarpGrep: Targeted Search

Returns only the relevant code snippets instead of entire files. One semantic search call replaces 5-10 sequential file reads. 0.73 F1 accuracy in 3.8 steps on SWE-Bench. Saves 5-10x tokens per search operation.

Fast Apply: Compact Diffs

Outputs only changed lines instead of rewriting the entire file. A 3-line edit in a 200-line file produces ~20 tokens of diff instead of ~2,000 tokens of full content. 10,500 tok/s throughput. ~90% fewer output tokens per edit.

Morph Compact: Verbatim Cleanup

Compresses remaining conversation noise without hallucination. 3,300+ tok/s processing speed. Zero hallucination rate because it operates on verbatim content, not summaries. Cleans up whatever WarpGrep and Fast Apply don't prevent.

Operation	Default approach	With FlashCompact	Savings
Find a function	Read 5-8 files (10K-40K tokens)	1 WarpGrep search (500-2K tokens)	5-20x
Edit 3 lines in a file	Rewrite full file (2K+ tokens)	Compact diff (~20 tokens)	~100x
Trace a dependency chain	Grep + read results (5K-15K tokens)	Scoped search (1K-3K tokens)	3-5x
Refactor 15 files	30K tokens of file rewrites	~3K tokens of diffs	10x

The combined effect: sessions run 3-4x longer before the "context left until auto-compact" warning appears. A session that would compact after 20 minutes of active work runs 60-80 minutes instead. Fewer compaction cycles means less information loss and fewer cases where the agent forgets its earlier work.

3-4x

Longer sessions before compaction

0.73

F1 score (WarpGrep)

10,500

tok/s (Fast Apply)

3,300+

tok/s (Morph Compact)

State-of-the-art on SWE-Bench Pro

FlashCompact tools achieve state-of-the-art results on SWE-Bench Pro, the benchmark for real-world software engineering tasks. The context efficiency gains translate directly to better task completion: agents that retain more working memory make fewer mistakes and complete tasks in fewer steps.

Frequently Asked Questions

What does "context left until auto-compact" mean?

Claude Code's 200K token context window is nearly full. When remaining space drops below roughly 20-25%, Claude Code automatically summarizes the conversation to free tokens. The warning tells you the agent is about to lose detailed memory of file reads, tool outputs, error messages, and conversation history. Run /compact manually with specific preservation instructions before it fires automatically.

Can I disable auto-compact?

No. Auto-compact is a built-in safety mechanism that prevents the context window from overflowing. You can delay it by reducing token waste and by running /compact manually at better timing, but the automatic trigger cannot be turned off.

What information gets lost during auto-compact?

Specific file contents, error messages, stack traces, grep results, debugging steps, and detailed reasoning all get compressed to brief descriptions. A 100K token conversation compresses to roughly 5K tokens. CLAUDE.md files and task lists survive compaction. Files on disk are not affected.

Does auto-compact delete my code changes?

No. Auto-compact only affects the conversation history in memory. All file writes, code edits, and git commits are preserved on disk. The risk is amnesia, not data loss: the agent forgets what it changed and may make contradictory edits because the summary lost those details.

What is the difference between /compact and auto-compact?

/compact lets you choose when to compact and what to preserve. Auto-compact fires automatically based on token count with no awareness of task state. Manual compaction at logical breakpoints (after finishing a feature, after fixing a bug) produces much better summaries. You can also add "Compact instructions" to your CLAUDE.md to improve auto-compact summaries.

How do I check how much context I have left?

Run /context to see the exact token breakdown. You can also configure a custom status line to display context percentage continuously as you work. The /cost command shows cumulative token usage for the session.

How does FlashCompact help?

FlashCompact reduces context consumption at the source. WarpGrep returns only relevant code snippets instead of entire files (5-10x fewer tokens per search). Fast Apply uses compact diffs instead of full file rewrites (~90% fewer output tokens per edit). Together, they extend effective session length by 3-4x.

How long can I work before auto-compact fires?

Depends on your setup and how you work. With a minimal configuration (no MCP servers), aggressive debugging sessions typically trigger compaction in 20-30 minutes. With 5+ MCP servers, it can be as fast as 10-15 minutes. Using FlashCompact tools extends this to 60-80+ minutes by reducing per-operation token consumption.

Related Guides

Stop the Context Countdown

FlashCompact tools reduce context waste from both reads and writes. The 'context left until auto-compact' warning appears 3-4x less often. No configuration changes, no workflow disruption.

Try FlashCompact

View Docs

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

"Context Left Until Auto-Compact" in Claude Code: What It Means and How to Fix It