Diagnose the Problem
Before fixing anything, figure out which kind of slow you're dealing with. They look similar but have different root causes.
| Symptom | Likely Cause | Quick Fix |
|---|---|---|
| Responses get slower each turn | Context bloat | /compact or start fresh conversation |
| Long pause before response starts | Codebase search grinding through files | Use WarpGrep MCP server |
| Responses slow after heavy usage | Rate limiting / throttling | Check /status, wait for window reset |
| Agent stuck mid-task, no output | Hung process or infinite loop | Press Escape, try /compact |
| Everything slow, not just your session | Anthropic server issues | Check status.anthropic.com |
Run /status first. If it shows you're near your usage limit, that's the answer. If usage is fine, check the statusline at the bottom of your terminal for context percentage. Above 70% means context bloat.
Context Bloat: The Progressive Slowdown
Claude re-reads the entire conversation history with every prompt. This is not optional. The model has no persistent memory between turns. Everything from turn 1 gets re-processed at turn 20.
What accumulates in context: your prompts, Claude's responses, every file Claude read, every tool call result, every search output. A single file read can add 5,000-10,000 tokens. By turn 15, you're carrying more context about past operations than about the current task.
Fixes
/compactcompresses the conversation history, keeping key information while discarding verbose tool outputs and intermediate results. Use it when context hits 60-70%./clearstarts a completely fresh conversation. Use it when switching to an unrelated task. Context from a CSS debugging session has no business in your auth refactor.- Lean CLAUDE.md files. Your CLAUDE.md is injected into every prompt. If it's 2,000 words of outdated instructions, that's 3,000+ tokens of overhead before you even type.
- Batch related instructions. "Fix the bug AND update the test" as one prompt uses fewer total tokens than two separate prompts, because history isn't duplicated.
Search Overhead: The Hidden Time Sink
When you ask Claude Code to modify existing code, it first needs to find that code. The built-in approach is sequential file reading: grep for a pattern, read a file, check if it's the right one, repeat. Cognition measured this at 60% of total agent execution time.
The Search Tax
On a project with 500+ files, a single "find the authentication middleware" request can trigger 15-20 file reads. Each read adds 3-10 seconds and 5,000+ tokens to context. Total: 45-200 seconds and 75,000-200,000 tokens just to locate the code, before any actual work begins.
This compounds with context bloat. Every file Claude reads stays in the conversation. After two search-heavy operations, context is bloated with file contents that are no longer relevant.
Fix: Offload Search to WarpGrep
WarpGrep is an MCP server that replaces Claude Code's file-by-file search with a trained search model. It finds the right code in 3.8 steps on average, compared to 10-20 with native search.
Because WarpGrep runs as a separate model, the search tokens don't count against your Claude usage. Less context consumed means faster responses and fewer rate limit issues.
Rate Limiting and Throttling
Anthropic applies two kinds of limits: hard cutoffs (you get an explicit "usage limit reached" message) and soft throttling (responses get progressively slower as you approach the limit). The throttling is harder to detect because there's no error message.
How to Check
- Run
/statusto see current usage vs. plan limits - If you're above 80% of your 5-hour window, throttling is likely
- Check if Claude web usage is eating into your pool (they share limits)
Plan Limits
| Plan | 5-Hour Window | Weekly Active Hours |
|---|---|---|
| Pro ($20/mo) | ~45 prompts | 40-80 Sonnet hrs |
| Max 5x ($100/mo) | ~225 prompts | 480 Sonnet / 40 Opus hrs |
| Max 20x ($200/mo) | ~800 prompts | 480 Sonnet / 40 Opus hrs |
Fixes
- Enable extra usage in Settings > Usage to continue at API rates after hitting limits
- Switch to Sonnet for routine tasks. Opus consumes roughly 5x more active time per prompt.
- Reduce token waste. Fewer search tokens = slower quota drain. WarpGrep helps here.
- Avoid peak hours. Server-side throttling is worse during US business hours (9am-6pm PT).
Stuck Mid-Task: Agent Not Responding
Different from slow responses. The agent produces no output at all. Common scenarios:
Infinite Tool Loops
Claude calls the same tool repeatedly without making progress. Often happens during search when the agent can't find the right file and keeps trying variations of the same grep pattern. Press Escape to interrupt.
Extended Thinking Timeout
With extended thinking enabled, Claude can spend 60+ seconds reasoning before producing any visible output. This looks like a hang but isn't. Wait longer, or disable extended thinking for simpler tasks.
MCP Server Hang
A broken or unresponsive MCP server can block Claude Code entirely. If you recently added a new MCP server and things stopped working, disable it in settings and test.
Network Issues
Claude Code requires a persistent connection to Anthropic's API. Flaky WiFi, VPN timeouts, or corporate firewalls can cause silent failures. Check your network if other causes are ruled out.
Recovery Steps
- Press Escape to interrupt the current operation
- Run
/compactto reduce context size - If still stuck, run
/clearand re-state your task - As a last resort, close and reopen Claude Code
Server-Side Slowdowns
Sometimes the problem is on Anthropic's end. On March 11, 2026, Claude experienced a major outage affecting logins and response times. In February 2026, a prompt caching bug caused tokens to be consumed at 2-3x the normal rate.
How to Tell
- Check status.anthropic.com for active incidents
- Search Twitter/X for "claude down" to see if others are affected
- If a fresh conversation with a one-line prompt is slow, the problem is server-side
During outages, having an extra usage budget set up won't help. Your only options are waiting for the incident to resolve or switching to a different tool temporarily.
Frequently Asked Questions
Why is Claude Code so slow?
Three causes: context bloat (conversation history grows with each turn), search overhead (reading files one by one to find code), or rate limiting (approaching your plan's usage ceiling). Check /status and your context percentage to determine which.
How do I fix Claude Code when it gets stuck?
Press Escape to interrupt, then run /compact to compress history. If still stuck, use /clear for a fresh start. Check that MCP servers aren't hanging by disabling recently added ones. For search-related hangs, WarpGrep replaces sequential file reading with semantic search that resolves in 3.8 steps.
Does Claude Code get slower the longer you use it?
Yes, within a single conversation. Each turn adds history that Claude re-processes with every prompt. A fresh conversation responds in ~3 seconds. By turn 20, the same prompt can take 15+ seconds. Start new conversations for unrelated tasks and use /compact regularly.
Is Claude Code down or is it just me?
Check status.anthropic.com for active incidents. If the status page is clear, run a fresh one-line prompt in a new conversation. If that's fast, the problem is local (context bloat or rate limiting). If it's slow too, you may be hitting server-side throttling during peak hours.
Related Articles
Cut 60% of the Search Time That Slows Claude Code Down
WarpGrep finds code in 3.8 steps instead of 15. Less search = less context bloat = faster responses. Installs as an MCP server in under a minute.