Claude Rate Limits: Every Tier, Plan & Model (2026)

Every Claude rate limit across API tiers, consumer plans, and Claude Code, in one place. Updated for March 2026, including the Max plan changes and token bucket mechanics.

4,000

Max RPM (Tier 4)

Max ITPM (Tier 4 Sonnet)

Effective throughput w/ caching

429

Rate limit error code

How Token Bucket Rate Limiting Works

Anthropic uses a token bucket algorithm, not fixed-window resets. Your bucket has a maximum capacity equal to your per-minute limit. Tokens flow in at a constant rate. Each API request removes tokens. If the bucket is empty, you get a 429 rate_limit_error.

The practical effect: short bursts above your sustained rate are fine, as long as you have accumulated capacity. You do not need to wait for a reset window. Capacity replenishes every second, proportional to your tier limit.

The system tracks three independent dimensions. Hitting any one triggers throttling:

RPM

Requests per minute. Total API calls regardless of size. The bluntest constraint.

ITPM

Input tokens per minute. Your prompts, system messages, and context. Only uncached tokens count.

OTPM

Output tokens per minute. Model-generated text. The hardest to predict in advance.

Key optimization

Cached input tokens do not count toward ITPM. With 80% cache hit rate, your effective ITPM is 5x the nominal limit. Enable prompt caching before upgrading tiers.

API Rate Limits by Tier

Tier upgrades are based on cumulative spend, not monthly spend. Once you reach a tier, you stay there. Each model has independent limits within the same tier.

Claude Sonnet 4

Tier	Spend Required	RPM	ITPM	OTPM
Free	$0	~5	Low	Low
Tier 1	$5	50	30,000	10,000
Tier 2	$40	1,000	450,000	90,000
Tier 3	$200	2,000	800,000	200,000
Tier 4	$400	4,000	2,000,000	400,000

Claude Opus 4

Tier	Spend Required	RPM	ITPM	OTPM
Tier 1	$5	50	30,000	10,000
Tier 2	$40	1,000	150,000	30,000
Tier 3	$200	2,000	400,000	80,000
Tier 4	$400	4,000	800,000	200,000

Claude Haiku 3.5

Tier	Spend Required	RPM	ITPM	OTPM
Tier 1	$5	50	30,000	10,000
Tier 2	$40	1,000	500,000	100,000
Tier 3	$200	2,000	1,000,000	200,000
Tier 4	$400	4,000	4,000,000	800,000

Model Context and Output Limits

Context window and max output tokens are per-request limits, independent of rate limits. They determine how much you can send and receive in a single API call.

Model	Context Window	Max Output Tokens
Claude Opus 4	200K tokens	16,384 tokens
Claude Sonnet 4	200K tokens	16,384 tokens
Claude Haiku 3.5	200K tokens	8,192 tokens
Enterprise (select)	500K tokens	Varies

The 200K context window fits roughly 150,000 words or 500 pages of text. In practice, most API calls use far less. Sending full context every turn is the primary cause of ITPM exhaustion in agentic workflows.

Consumer Plan Usage Caps

Consumer plans on claude.ai use rolling message windows, not fixed daily counts. The exact number of messages varies because not all messages cost the same.

Plan	Price	Window	Approx. Messages	Notes
Free	$0	5-hour	~15-40 per window	Varies by demand
Pro	$20/mo	5-hour	~45 per window	~5x free tier
Max 5x	$100/mo	Weekly	5x Pro capacity	Includes Claude Code
Max 20x	$200/mo	Weekly	20x Pro capacity	Includes Claude Code
Team	$25-30/seat	5-hour	Similar to Pro	Admin + SSO features
Enterprise	Custom	Custom	Custom	Dedicated capacity

Why message costs vary

The 30th message in a long conversation can cost 5-10x the first message. Each turn reprocesses the full context. Opus 4 messages consume 3-5x more quota than Sonnet 4. Starting fresh conversations resets this accumulation.

Claude Code Limits

Claude Code operates under three independent, overlapping constraints. The dashboard usage percentage reflects only one of these dimensions. Your API tier limits from above still apply as a separate ceiling.

In March 2026, Max 5x subscribers reported exhausting their rate limit in roughly 90 minutes during normal agentic workloads. Anthropic attributed this to a bug in limit enforcement and temporarily doubled off-peak usage limits through March 28 as mitigation.

Practical tips for Claude Code rate management:

Start fresh sessions

Long sessions accumulate context that inflates token consumption per turn. New sessions reset the context cost curve.

Use subagents for research

Delegate file exploration to subagents. They run in separate context windows and return only relevant results, saving 40%+ of input tokens.

Compact at 70%, not 90%

Proactive compaction before context fills keeps per-turn costs low. Waiting until 90% means several expensive turns before compaction triggers.

Choose models deliberately

Opus 4 consumes 3-5x more quota than Sonnet 4 per message. Use Sonnet for routine tasks and reserve Opus for complex reasoning.

Rate Limits vs Usage Limits

Anthropic enforces two separate ceilings. Confusing them is a common source of unexpected blocks.

Rate limits

Cap throughput per minute: RPM, ITPM, OTPM. You get a 429 error and can retry after a short wait. These are the limits discussed in most of this guide.

Usage limits

Cap total monthly spend. Free: $10/mo. Tier 1: $100/mo. Tier 2: $500/mo. Tier 3: $1,000/mo. Tier 4: $5,000/mo. Exceeding this blocks API access until the next billing cycle.

Tier	Monthly Spend Cap	How to Increase
Free	$10	Add payment method
Tier 1	$100	Spend $40 cumulative
Tier 2	$500	Spend $200 cumulative
Tier 3	$1,000	Spend $400 cumulative
Tier 4	$5,000	Contact Anthropic sales

Handling 429 Errors

When you hit a rate limit, the API returns a 429 status with type: "rate_limit_error" and a retry-after header indicating seconds to wait.

Response headers to monitor

Every API response includes rate limit headers, even successful ones:

x-ratelimit-limit: Maximum allowed for this dimension
x-ratelimit-remaining: Current remaining capacity
x-ratelimit-reset: When the limit fully replenishes
retry-after: Seconds to wait (only on 429 responses)

Retry strategy

Implement exponential backoff with jitter. Start at 1 second, double each retry, add random jitter of 0-500ms. Do not retry in tight loops. The official Anthropic SDKs (Python and TypeScript) handle retries automatically with sensible defaults.

Production tip

Build adaptive throttling using x-ratelimit-remaining. When remaining capacity drops below 20%, slow your request rate proactively rather than waiting for 429 errors.

Staying Under Limits in Production

Most applications never need Tier 4. The right combination of caching, context management, and model selection keeps production workloads within Tier 2 limits.

Enable prompt caching

Cached input tokens are free against ITPM. With 80% cache hit rate, your effective limit is 5x nominal. This alone can eliminate the need to upgrade tiers.

Send less context

Most ITPM exhaustion comes from sending entire files when only a few functions are relevant. Send targeted context, not full repositories.

Use smaller models

Haiku 3.5 has 2x the ITPM of Sonnet at the same tier. Route classification, extraction, and simple tasks to Haiku. Reserve Sonnet and Opus for reasoning.

For agentic coding workflows specifically, the biggest optimization is reducing tokens per action. WarpGrep runs code searches in separate context windows, returning only relevant line ranges instead of full files. Morph Fast Apply compacts code edits so agents send diffs rather than complete file rewrites. When an agent uses 60% fewer tokens per action, it effectively gets 2.5x the rate limit headroom without changing tiers.

Frequently Asked Questions

What are Claude's API rate limits?

Claude API rate limits vary by tier. Tier 1 (after $5 cumulative spend) provides 50 RPM and 30,000 input tokens per minute for Sonnet. Tier 2 ($40 spend) jumps to 1,000 RPM and 450,000 ITPM. Tier 3 ($200) provides 2,000 RPM and 800,000 ITPM. Tier 4 ($400) reaches 4,000 RPM and 2,000,000 ITPM. Each model has independent limits.

How does Claude's token bucket algorithm work?

Anthropic uses a token bucket that replenishes continuously rather than resetting at fixed intervals. Your bucket capacity equals your per-minute limit. Tokens flow in at a constant rate, and each request removes tokens. Short bursts above sustained rate are allowed if you have accumulated capacity. When the bucket empties, you receive a 429 error.

Do cached tokens count against Claude rate limits?

No. With prompt caching, only uncached input tokens count toward ITPM. An application with 80% cache hit rate effectively gets 5x the nominal throughput. This is the single biggest optimization for high-volume API usage.

What are Claude Pro and Max plan usage limits?

Pro ($20/month) allows approximately 45 messages per 5-hour rolling window. Max 5x ($100/month) provides 5x Pro limits on weekly rolling windows. Max 20x ($200/month) provides 20x Pro limits. Message cost varies: the 30th message in a conversation can cost 5-10x the first due to full context reprocessing.

How do I fix Claude 429 rate limit errors?

Check the retry-after header for seconds to wait. Implement exponential backoff with jitter starting at 1 second. Long-term fixes: enable prompt caching, reduce context size, choose smaller models for simple tasks, and upgrade your API tier if needed.

What are Claude Code rate limits?

Claude Code operates under three independent, overlapping constraints. Your dashboard percentage reflects only one dimension. API tier limits still apply. In March 2026, Max 5x users reported limit exhaustion in roughly 90 minutes, which Anthropic attributed to a bug and temporarily mitigated by doubling off-peak limits.

What is the difference between rate limits and usage limits?

Rate limits cap throughput per minute (RPM, ITPM, OTPM) and result in temporary 429 errors. Usage limits cap total monthly spend ($10 for Free through $5,000 for Tier 4) and block API access entirely until the next billing cycle. They are enforced independently.

Reduce Token Consumption, Stretch Your Rate Limits

Morph Fast Apply and WarpGrep reduce the tokens coding agents consume per action by up to 60%, effectively multiplying your rate limit headroom.

Try WarpGrep

View API Docs

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Claude Rate Limits: Every Tier, Every Model, Every Plan (2026)