Limits by Plan
Claude Code is available on Pro ($20/month), Max 5x ($100/month), and Max 20x ($200/month). Free tier users cannot use Claude Code at all. All plans share usage between Claude's web interface and Claude Code.
| Metric | Pro ($20/mo) | Max 5x ($100/mo) | Max 20x ($200/mo) |
|---|---|---|---|
| Prompts per 5-hour window | ~45 | ~225 | ~800 |
| Weekly active hours (Sonnet) | 40–80 hrs | 480 hrs | 480 hrs |
| Weekly active hours (Opus) | — | 40 hrs | 40 hrs |
| Model access | Sonnet only | Sonnet + Opus | Sonnet + Opus |
| Shared with Claude web | Yes | Yes | Yes |
| Extra usage available | Yes | Yes | Yes |
Holiday Limits Incident
In December 2025, Anthropic doubled everyone's limits as a holiday promotion. When limits returned to normal on January 1, 2026, developers reported what felt like a 60% reduction. The Register covered the resulting backlash. Lesson: don't calibrate your workflow to temporary boosts.
"Active hours" means time Claude spends processing tokens, not wall-clock time. A 30-second thinking response counts as 30 seconds of active time. Idle time between prompts doesn't count. This is why limits feel different depending on task complexity.
What Counts Toward Usage
Token consumption in Claude Code is not just prompt + response. Every interaction includes hidden overhead that most developers don't account for.
Input Tokens (You Pay For)
Your prompt text. Conversation history (grows each turn). System prompts and CLAUDE.md files. Tool definitions for all enabled MCP servers. File contents from reads and searches. Previous tool call results still in context.
Output Tokens (You Pay For)
Model's response text. Extended thinking tokens (often 5-10x the visible response). Tool call arguments. Code generation output. Each of these counts against your quota.
The biggest surprise for most users: conversation history accumulates. By turn 15 of a conversation, you might be sending 100,000+ tokens of history with every single prompt. This is why the first few prompts in a session feel fast and cheap, but limits arrive suddenly mid-conversation.
Extended Thinking Multiplier
Extended thinking (ultrathink) mode can consume 5x the tokens of a normal response. A single complex reasoning prompt with extended thinking can use as many tokens as 5 regular prompts. Use it selectively for architectural decisions, not routine code edits.
Extra Usage Billing
When you hit your plan limit, Claude Code stops responding until the 5-hour window resets. Extra usage lets you keep working by switching to pay-per-token billing at standard API rates.
How to Enable
- Go to Settings > Usage in Claude.ai
- Enable extra usage and add a payment method
- Set a daily spending cap (or choose unlimited)
- Optionally enable auto-reload to prepay credits
Extra usage charges appear as a separate line item from your subscription. At API rates, a heavy Claude Code session can cost $5-15 per hour depending on model choice and context size. Opus is roughly 5x more expensive than Sonnet per token.
Prompt Caching Bug (February 2026)
On February 27, 2026, Anthropic reset weekly limits for all Claude Code users after discovering a prompt caching bug that was consuming tokens at 2-3x normal rate. If your extra usage charges seemed unusually high around that date, check if a credit was applied.
How to Check Your Usage
In Claude Code
Run /status in any Claude Code session to see your current usage against plan limits. The statusline (bottom of the terminal) shows context usage in real-time if you have it enabled.
On Claude.ai
Go to Settings > Usage to see a dashboard with your current period's consumption, remaining quota, and extra usage charges. The dashboard shows usage across both Claude web and Claude Code combined.
Key Commands
| Command | What It Shows |
|---|---|
| /status | Current usage vs plan limits, time until reset |
| /compact | Compresses conversation history to free up context space |
| /clear | Starts a fresh conversation (doesn't reset usage limits) |
| Statusline | Real-time context percentage in terminal footer |
Note: /clear starts a new conversation but does not reset your usage quota. Your 5-hour window and weekly limits are account-level, not per-conversation.
Why You Hit Limits Faster Than Expected
Most developers assume their prompts and Claude's responses are the primary token consumers. In practice, hidden overhead accounts for 60-80% of total token usage.
Codebase Search Overhead
Claude Code's built-in search reads files one by one, injecting each into context. Cognition measured this at 60% of agent time. A single 'find the auth handler' query might read 15-20 files before finding the right one. Each file's contents stay in context for the rest of the conversation.
Context Accumulation
Every tool call result, file read, and search output stays in the conversation history. By turn 10, you're sending 80,000+ tokens of history with each prompt. The model re-reads all of it every turn. This is why limits feel fine for 30 minutes, then suddenly hit.
Extended Thinking Costs
Extended thinking generates internal reasoning tokens that count against your quota. A complex architectural question might generate 10,000+ thinking tokens that you never see. With ultrathink mode, this can reach 50,000+ tokens per response.
MCP Server Definitions
Each MCP server adds its tool definitions to every prompt. If you have 5 MCP servers with 10 tools each, that's 50 tool definitions injected into every single interaction. This baseline cost is constant regardless of what you're actually doing.
The February 2026 prompt caching bug made this worse temporarily. Anthropic's caching normally prevents re-processing identical system prompts and tool definitions. When caching failed, every prompt was charged the full overhead cost.
How to Reduce Token Consumption
Conversation Hygiene
- Start fresh conversations for unrelated tasks. A conversation about auth shouldn't carry context from a CSS debugging session.
- Use
/compactwhen context grows large. This compresses history, preserving key information while reducing token count. - Batch related instructions into single prompts. "Fix the auth bug AND update the test" uses fewer total tokens than two separate prompts because history isn't duplicated.
- Keep CLAUDE.md files lean. Every word in your CLAUDE.md is injected into every prompt. Remove outdated sections.
Model Selection
- Use Sonnet for routine tasks. Code edits, file creation, test writing. Sonnet is fast and cheap.
- Reserve Opus for architecture. System design, complex refactors, multi-file coordination. Opus costs roughly 5x more per token.
- Disable extended thinking for simple tasks. Don't use ultrathink to rename a variable.
Offload Search to WarpGrep
Codebase search is the largest hidden token cost. Claude Code's default behavior is to grep files and read them one at a time, injecting each into context. A typical search reads 10-20 files before finding the right code.
WarpGrep replaces this with a trained search model that finds the right code in 3.8 steps on average. It runs as an MCP server, so Claude Code calls it instead of doing its own file-by-file search. The search tokens are consumed by WarpGrep's smaller model, not by Claude.
If 60% of your token consumption comes from search, and WarpGrep reduces search steps by 40-50%, that translates directly to more prompts before hitting your limit. On a Pro plan, that could be the difference between running out at 45 minutes versus lasting the full 5-hour window.
Faster Code Edits with Fast Apply
Morph Fast Apply processes code edits at 10,500 tokens/second. Faster edits mean less time spent generating output tokens, which means less active time counted against your weekly hours quota.
Frequently Asked Questions
How many prompts can I send on the Pro plan?
Approximately 45 prompts per 5-hour rolling window, with 40-80 Sonnet active hours per week. The exact number varies by prompt complexity, context size, and whether extended thinking is enabled. Simple prompts cost fewer tokens than complex multi-file operations.
What is extra usage and how much does it cost?
Extra usage lets you keep using Claude Code after hitting your plan limits by paying API rates per token. Enable it in Settings > Usage on Claude.ai. You set a daily spending cap (max $2,000/day). Sonnet costs roughly $3 per million input tokens and $15 per million output tokens. A heavy coding session with extra usage typically costs $5-15/hour.
Do Claude Code and Claude web share limits?
Yes. Pro and Max plans have a single usage pool. Messages sent in Claude.ai count against the same limits as Claude Code prompts. If you use Claude web heavily during the day, you'll have fewer Claude Code prompts available.
Why did my limits seem to drop in January 2026?
Anthropic ran a holiday promotion in December 2025 that doubled everyone's limits. When limits returned to normal on January 1, the perceived drop felt like a 60% reduction. Separately, a prompt caching bug in February 2026 caused tokens to be consumed at 2-3x normal rate. Anthropic reset weekly limits for all users after fixing it.
How can I make my limits last longer?
Start new conversations for unrelated tasks. Use /compact to compress growing context. Batch related instructions into single prompts. Use Sonnet instead of Opus for routine work. Offload codebase search to WarpGrep so Claude doesn't burn tokens reading files one by one. Keep CLAUDE.md files concise.
Related Articles
Stop Burning Tokens on Codebase Search
WarpGrep finds the right code in 3.8 steps, not 15. Plug it into Claude Code as an MCP server and make your usage limits last 2-3x longer.