Claude Rate Limits: Every Tier, Every Model, Every Plan (2026)

Complete reference for Claude API rate limits across all tiers, consumer plan usage caps, Claude Code limits, and token bucket mechanics. Updated March 2026.

March 27, 2026 · 5 min read

Every Claude rate limit across API tiers, consumer plans, and Claude Code, in one place. Updated for March 2026, including the Max plan changes and token bucket mechanics.

4,000
Max RPM (Tier 4)
2M
Max ITPM (Tier 4 Sonnet)
5x
Effective throughput w/ caching
429
Rate limit error code

How Token Bucket Rate Limiting Works

Anthropic uses a token bucket algorithm, not fixed-window resets. Your bucket has a maximum capacity equal to your per-minute limit. Tokens flow in at a constant rate. Each API request removes tokens. If the bucket is empty, you get a 429 rate_limit_error.

The practical effect: short bursts above your sustained rate are fine, as long as you have accumulated capacity. You do not need to wait for a reset window. Capacity replenishes every second, proportional to your tier limit.

The system tracks three independent dimensions. Hitting any one triggers throttling:

RPM

Requests per minute. Total API calls regardless of size. The bluntest constraint.

ITPM

Input tokens per minute. Your prompts, system messages, and context. Only uncached tokens count.

OTPM

Output tokens per minute. Model-generated text. The hardest to predict in advance.

Key optimization

Cached input tokens do not count toward ITPM. With 80% cache hit rate, your effective ITPM is 5x the nominal limit. Enable prompt caching before upgrading tiers.

API Rate Limits by Tier

Tier upgrades are based on cumulative spend, not monthly spend. Once you reach a tier, you stay there. Each model has independent limits within the same tier.

Claude Sonnet 4

TierSpend RequiredRPMITPMOTPM
Free$0~5LowLow
Tier 1$55030,00010,000
Tier 2$401,000450,00090,000
Tier 3$2002,000800,000200,000
Tier 4$4004,0002,000,000400,000

Claude Opus 4

TierSpend RequiredRPMITPMOTPM
Tier 1$55030,00010,000
Tier 2$401,000150,00030,000
Tier 3$2002,000400,00080,000
Tier 4$4004,000800,000200,000

Claude Haiku 3.5

TierSpend RequiredRPMITPMOTPM
Tier 1$55030,00010,000
Tier 2$401,000500,000100,000
Tier 3$2002,0001,000,000200,000
Tier 4$4004,0004,000,000800,000

Model Context and Output Limits

Context window and max output tokens are per-request limits, independent of rate limits. They determine how much you can send and receive in a single API call.

ModelContext WindowMax Output Tokens
Claude Opus 4200K tokens16,384 tokens
Claude Sonnet 4200K tokens16,384 tokens
Claude Haiku 3.5200K tokens8,192 tokens
Enterprise (select)500K tokensVaries

The 200K context window fits roughly 150,000 words or 500 pages of text. In practice, most API calls use far less. Sending full context every turn is the primary cause of ITPM exhaustion in agentic workflows.

Consumer Plan Usage Caps

Consumer plans on claude.ai use rolling message windows, not fixed daily counts. The exact number of messages varies because not all messages cost the same.

PlanPriceWindowApprox. MessagesNotes
Free$05-hour~15-40 per windowVaries by demand
Pro$20/mo5-hour~45 per window~5x free tier
Max 5x$100/moWeekly5x Pro capacityIncludes Claude Code
Max 20x$200/moWeekly20x Pro capacityIncludes Claude Code
Team$25-30/seat5-hourSimilar to ProAdmin + SSO features
EnterpriseCustomCustomCustomDedicated capacity

Why message costs vary

The 30th message in a long conversation can cost 5-10x the first message. Each turn reprocesses the full context. Opus 4 messages consume 3-5x more quota than Sonnet 4. Starting fresh conversations resets this accumulation.

Claude Code Limits

Claude Code operates under three independent, overlapping constraints. The dashboard usage percentage reflects only one of these dimensions. Your API tier limits from above still apply as a separate ceiling.

In March 2026, Max 5x subscribers reported exhausting their rate limit in roughly 90 minutes during normal agentic workloads. Anthropic attributed this to a bug in limit enforcement and temporarily doubled off-peak usage limits through March 28 as mitigation.

Practical tips for Claude Code rate management:

Start fresh sessions

Long sessions accumulate context that inflates token consumption per turn. New sessions reset the context cost curve.

Use subagents for research

Delegate file exploration to subagents. They run in separate context windows and return only relevant results, saving 40%+ of input tokens.

Compact at 70%, not 90%

Proactive compaction before context fills keeps per-turn costs low. Waiting until 90% means several expensive turns before compaction triggers.

Choose models deliberately

Opus 4 consumes 3-5x more quota than Sonnet 4 per message. Use Sonnet for routine tasks and reserve Opus for complex reasoning.

Rate Limits vs Usage Limits

Anthropic enforces two separate ceilings. Confusing them is a common source of unexpected blocks.

Rate limits

Cap throughput per minute: RPM, ITPM, OTPM. You get a 429 error and can retry after a short wait. These are the limits discussed in most of this guide.

Usage limits

Cap total monthly spend. Free: $10/mo. Tier 1: $100/mo. Tier 2: $500/mo. Tier 3: $1,000/mo. Tier 4: $5,000/mo. Exceeding this blocks API access until the next billing cycle.

TierMonthly Spend CapHow to Increase
Free$10Add payment method
Tier 1$100Spend $40 cumulative
Tier 2$500Spend $200 cumulative
Tier 3$1,000Spend $400 cumulative
Tier 4$5,000Contact Anthropic sales

Handling 429 Errors

When you hit a rate limit, the API returns a 429 status with type: "rate_limit_error" and a retry-after header indicating seconds to wait.

Response headers to monitor

Every API response includes rate limit headers, even successful ones:

  • x-ratelimit-limit: Maximum allowed for this dimension
  • x-ratelimit-remaining: Current remaining capacity
  • x-ratelimit-reset: When the limit fully replenishes
  • retry-after: Seconds to wait (only on 429 responses)

Retry strategy

Implement exponential backoff with jitter. Start at 1 second, double each retry, add random jitter of 0-500ms. Do not retry in tight loops. The official Anthropic SDKs (Python and TypeScript) handle retries automatically with sensible defaults.

Production tip

Build adaptive throttling using x-ratelimit-remaining. When remaining capacity drops below 20%, slow your request rate proactively rather than waiting for 429 errors.

Staying Under Limits in Production

Most applications never need Tier 4. The right combination of caching, context management, and model selection keeps production workloads within Tier 2 limits.

Enable prompt caching

Cached input tokens are free against ITPM. With 80% cache hit rate, your effective limit is 5x nominal. This alone can eliminate the need to upgrade tiers.

Send less context

Most ITPM exhaustion comes from sending entire files when only a few functions are relevant. Send targeted context, not full repositories.

Use smaller models

Haiku 3.5 has 2x the ITPM of Sonnet at the same tier. Route classification, extraction, and simple tasks to Haiku. Reserve Sonnet and Opus for reasoning.

For agentic coding workflows specifically, the biggest optimization is reducing tokens per action. WarpGrep runs code searches in separate context windows, returning only relevant line ranges instead of full files. Morph Fast Apply compacts code edits so agents send diffs rather than complete file rewrites. When an agent uses 60% fewer tokens per action, it effectively gets 2.5x the rate limit headroom without changing tiers.

Frequently Asked Questions

What are Claude's API rate limits?

Claude API rate limits vary by tier. Tier 1 (after $5 cumulative spend) provides 50 RPM and 30,000 input tokens per minute for Sonnet. Tier 2 ($40 spend) jumps to 1,000 RPM and 450,000 ITPM. Tier 3 ($200) provides 2,000 RPM and 800,000 ITPM. Tier 4 ($400) reaches 4,000 RPM and 2,000,000 ITPM. Each model has independent limits.

How does Claude's token bucket algorithm work?

Anthropic uses a token bucket that replenishes continuously rather than resetting at fixed intervals. Your bucket capacity equals your per-minute limit. Tokens flow in at a constant rate, and each request removes tokens. Short bursts above sustained rate are allowed if you have accumulated capacity. When the bucket empties, you receive a 429 error.

Do cached tokens count against Claude rate limits?

No. With prompt caching, only uncached input tokens count toward ITPM. An application with 80% cache hit rate effectively gets 5x the nominal throughput. This is the single biggest optimization for high-volume API usage.

What are Claude Pro and Max plan usage limits?

Pro ($20/month) allows approximately 45 messages per 5-hour rolling window. Max 5x ($100/month) provides 5x Pro limits on weekly rolling windows. Max 20x ($200/month) provides 20x Pro limits. Message cost varies: the 30th message in a conversation can cost 5-10x the first due to full context reprocessing.

How do I fix Claude 429 rate limit errors?

Check the retry-after header for seconds to wait. Implement exponential backoff with jitter starting at 1 second. Long-term fixes: enable prompt caching, reduce context size, choose smaller models for simple tasks, and upgrade your API tier if needed.

What are Claude Code rate limits?

Claude Code operates under three independent, overlapping constraints. Your dashboard percentage reflects only one dimension. API tier limits still apply. In March 2026, Max 5x users reported limit exhaustion in roughly 90 minutes, which Anthropic attributed to a bug and temporarily mitigated by doubling off-peak limits.

What is the difference between rate limits and usage limits?

Rate limits cap throughput per minute (RPM, ITPM, OTPM) and result in temporary 429 errors. Usage limits cap total monthly spend ($10 for Free through $5,000 for Tier 4) and block API access entirely until the next billing cycle. They are enforced independently.

Reduce Token Consumption, Stretch Your Rate Limits

Morph Fast Apply and WarpGrep reduce the tokens coding agents consume per action by up to 60%, effectively multiplying your rate limit headroom.