Claude Code Token Limit: Exact Costs, Triggers & Optimization

Claude Code's 200K token window sounds large until you see what fills it. System prompts, tool schemas, MCP definitions, and CLAUDE.md files consume 25,000-35,000 tokens before your first message. A single file read can cost 8,000 tokens. Three file reads, a few tool calls, and some conversation later, you're at 50% capacity. This guide breaks down the exact token cost of every action, explains what happens when you hit the limit, and covers five strategies to stay under it.

200K

Total context window (tokens)

~165K

Usable after system overhead

5-8K

Tokens per 500-line file read

64-75%

Auto-compact trigger threshold

Claude Code Token Limits by Model

Every model available in Claude Code shares the same 200K token context window. The difference is not capacity but what fills it. The system prompt and tool definitions are roughly constant across models, so the practical token budget for your conversation and tool outputs is the same regardless of which model you use.

Model	Context Window	System Overhead	Effective User Budget
Claude 4 Opus	200,000	~25,000-35,000	~165,000-175,000
Claude 4 Sonnet	200,000	~25,000-35,000	~165,000-175,000
Haiku 4	200,000	~25,000-35,000	~165,000-175,000

System overhead varies depending on how many MCP servers you have enabled and the size of your CLAUDE.md files. A minimal setup with no MCP servers uses roughly 20,000 tokens of overhead. A setup with several MCP servers can consume 50,000+ tokens before your first message.

Check your actual overhead

Run /context in any Claude Code session to see the exact token breakdown. It shows every component: system prompt, system tools, MCP tools, memory files, messages, and free space. The numbers vary per session based on your configuration.

What Counts as Tokens

Every piece of information in a Claude Code session consumes tokens. Some sources are obvious (your messages, file contents). Others are invisible (tool call schemas, MCP server definitions). Here is what occupies your 200K budget and roughly what each component costs.

System Prompt (~2,600 tokens)

Claude Code's base instructions. Always present, cannot be reduced. Tells the model how to behave as a coding agent, what tools are available, and how to format responses.

Tool Definitions (~17,600 tokens)

Schemas for Read, Write, Edit, Bash, Grep, Glob, and other built-in tools. Each tool's parameter types, descriptions, and usage rules load into context on every request.

CLAUDE.md Files (~300-2,000 tokens)

Your project instructions, coding conventions, and persistent rules. Loaded at session start and preserved through compaction. A 200-line CLAUDE.md costs roughly 1,500-2,000 tokens.

MCP Server Schemas (~900-50,000 tokens)

Each MCP server loads its full tool schema into context, even when unused. A single server with 20 tools can cost 5,000-10,000 tokens. Multiple servers compound quickly.

File Contents (varies widely)

A 100-line file costs roughly 1,000-1,600 tokens. A 500-line file costs 5,000-8,000. Code with long lines costs more than sparse configs. Every Read tool call adds to the running total.

Screenshots (~1,000 tokens each)

Claude Code encodes screenshots as image tokens. Each screenshot costs roughly 1,000 tokens regardless of content. Multiple screenshots in a debugging session add up fast.

Conversation History (cumulative)

Every message you send and every response Claude generates stays in context. Your one-word confirmations, Claude's explanations, error messages, all of it accumulates turn by turn.

Tool Call Overhead (~50-200 tokens each)

Each tool invocation adds overhead beyond the tool output itself: the function call structure, parameter names, and return formatting. In a session with 50+ tool calls, this adds 2,500-10,000 tokens.

The invisible costs add up

Tool definitions and MCP schemas are the most overlooked token costs. They consume 8-30% of your context window before you do anything. Run /context to see how much of your budget is gone before your first prompt.

Why You Hit the Limit Faster Than Expected

Developers regularly report surprise when auto-compact triggers after what feels like a short session. The math explains why. Consider a typical 20-minute debugging session:

Action	Token Cost	Running Total
Session starts (system overhead)	~30,000	30,000
Read 3 source files (~400 lines each)	~18,000	48,000
Run 2 grep searches	~3,000	51,000
Run test suite (bash output)	~4,000	55,000
5 back-and-forth messages	~8,000	63,000
Edit 2 files (tool calls + diffs)	~5,000	68,000
Re-read edited files to verify	~12,000	80,000
Run tests again	~4,000	84,000
Read error log + 2 more files	~14,000	98,000
3 more messages discussing fix	~5,000	103,000

103,000 tokens consumed in under 20 minutes of work. That is 51.5% of the 200K window. One more debugging cycle pushes past the auto-compact threshold. The session you thought had plenty of room is already half gone.

File Reads Are the Biggest Cost

The Read tool is the single largest token consumer in most sessions. Every file read loads the full file contents into context. If the agent reads the same file three times across a session (initial read, after edit, after second edit), that file's tokens are counted three times. A 500-line file read three times costs 15,000-24,000 tokens.

Bash Outputs Are Unpredictable

Running npm install, bun test, or build commands can produce hundreds of lines of output. A test runner that dumps a full stack trace on failure can cost 3,000-5,000 tokens per run. Build logs are often 2,000-4,000 tokens. These costs are invisible because you do not see them directly, but the model carries them in context.

Each Turn Adds Overhead

Short confirmations like "yes" or "looks good" cost ~10 tokens each, but the model's response explaining what it did costs 200-500. Over 20 turns, conversational overhead alone accounts for 4,000-10,000 tokens. Giving the agent clear, complete instructions in fewer messages reduces this waste.

What Happens When You Hit the Token Limit

Claude Code does not crash when context fills up. Instead, it runs auto-compaction, a process that summarizes the conversation to free space. This keeps sessions running indefinitely but introduces trade-offs.

The Auto-Compact Process

Tool outputs cleared. Old file reads, grep results, and bash outputs are removed or truncated. These are the largest and least essential tokens in a long session.
Conversation summarized. The full history gets condensed into a structured summary: what was completed, what is in progress, what files were modified.
Session restarts with summary. The compressed summary becomes the new baseline. The agent continues from there with a fresh context budget.

What Gets Lost

Summaries are lossy. Auto-compact may drop specific error messages, exact file paths, variable names referenced three exchanges ago, or the reasoning behind a design decision. The agent keeps working, but it may re-read files it already read, repeat approaches it already tried, or lose track of multi-step plans.

Compaction timing matters

Auto-compact triggers based on token count, not task state. If it fires mid-debugging, the summary may drop the exact error message the agent needs for the next step. Use /compact manually at natural breakpoints (after fixing a bug, after completing a feature) for cleaner summaries. You can add instructions: /compact Preserve all file paths and error codes.

64-75%

Compact triggers at this capacity

~128K

Token count when compact fires

~40K

Context after compaction

How to Reduce Token Usage

1. Keep CLAUDE.md Concise

CLAUDE.md loads into context on every request and survives every compaction cycle. That makes it valuable for persistent instructions, but also means every extra line costs tokens throughout the entire session. Keep it under 200 lines. Write for the model, not for humans: concise, structured, no prose explanations where a bullet point works. See our Claude Code best practices guide for examples.

2. Use Targeted File Reads

Instead of reading entire files, specify line ranges. "Read lines 40-90 of src/api/handler.ts" uses roughly 500-800 tokens instead of 5,000-8,000 for the full file. This matters most in debugging loops where the agent re-reads the same file multiple times. Each targeted read saves thousands of tokens over a full-file read.

3. Use Subagents for Parallel Work

Each subagent (via the Task tool) gets its own isolated 200K context window. Delegating file searches, test runs, or documentation lookups to subagents keeps the verbose output contained. Only the relevant summary returns to your main conversation. Three parallel subagents give you 600K tokens of effective context without polluting the main session.

4. Break Tasks Into Smaller Sessions

Run /clear between unrelated tasks. The context from implementing a feature is pure noise when you start debugging something else. Starting fresh gives the agent a clean 165K+ tokens instead of a polluted 80K. Use /compact within a task at logical breakpoints, and /clear between tasks.

5. Disable Unused MCP Servers

Each MCP server loads its full tool schema on every request, even when unused. Run /context to see which servers consume tokens. Disabling an unused server with 20 tools frees 5,000-10,000 tokens instantly. See the context window guide for a full breakdown of MCP token costs.

How Morph Reduces Token Consumption

Two Morph products directly address token consumption in coding agents. Both target the root cause: tool outputs (file reads, search results, bash outputs) account for 60-80% of tokens consumed in a typical session.

WarpGrep: Semantic Search Instead of Full-File Reads

Cognition (the team behind Devin) measured that coding agents spend 60% of their time searching for context. Most of that search involves reading entire files to find a few relevant lines. WarpGrep replaces this pattern with semantic search that returns only relevant snippets.

Instead of reading a 500-line file (5,000-8,000 tokens) to find a 20-line function, WarpGrep returns just the matching snippet (200-400 tokens). Over a session with 15-20 file reads, this saves 50,000-100,000 tokens. That is the difference between hitting auto-compact in 20 minutes and completing the task in a single session.

60%

Agent time spent searching (Cognition)

90%

Token savings per search vs full read

Parallel tool calls per turn

Fast Apply: Compact Diffs Over Full Rewrites

When a coding agent edits a file, it can either rewrite the entire file (sending all tokens through the output) or send a compact diff (only the changed lines). Morph Fast Apply processes compact diffs at 10,500 tok/s, using a fraction of the output tokens compared to full file rewrites. A 500-line file edit that changes 10 lines costs ~200 output tokens as a diff versus ~8,000 as a full rewrite.

Token savings compound across a session

A coding session with 20 file reads and 10 edits can consume 150,000+ tokens with full reads and full rewrites. WarpGrep + Fast Apply reduce that to 30,000-40,000 tokens for the same work. That is a 3-5x improvement in effective context budget. See the Claude Code pricing guide for cost implications.

Frequently Asked Questions

What is the token limit for Claude Code?

200,000 tokens across all models (Claude 4 Opus, Claude 4 Sonnet, Haiku 4). After system prompts (~2,600 tokens), tool definitions (~17,600 tokens), and MCP schemas (900-50,000 tokens), the practical limit for user content is 165,000-175,000 tokens in a minimal setup. With several MCP servers, that drops to 120,000-130,000.

What triggers auto-compact in Claude Code?

Auto-compact triggers when context reaches 64-75% capacity (roughly 128,000-150,000 tokens used). Claude Code summarizes older messages, clears old tool outputs, and restarts with compressed state. A completion buffer allows the current task to finish before compaction. You can also run /compact manually at any time.

How many tokens does reading a file cost in Claude Code?

A 100-line file costs roughly 1,000-1,600 tokens. A 500-line file costs 5,000-8,000 tokens. Dense code with long lines costs more than sparse configurations. Screenshots cost roughly 1,000 tokens each. You can reduce file read costs by specifying line ranges instead of reading entire files.

How can I reduce token usage in Claude Code?

Five strategies: (1) Read specific line ranges instead of entire files. (2) Keep CLAUDE.md under 200 lines. (3) Use subagents for tasks producing large outputs, giving each its own 200K context. (4) Run /compact manually at logical breakpoints instead of waiting for auto-compact. (5) Disable unused MCP servers that load tool schemas on every request. For deeper coverage, see our best practices guide.

Stretch Your Claude Code Token Budget

WarpGrep cuts file search tokens by 90%. Fast Apply uses compact diffs instead of full rewrites at 10,500 tok/s. Together they extend your effective session length 3-5x.

Try WarpGrep

See Pricing

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Claude Code Token Limit: What Counts, What Gets Cut, and How to Stay Under