Most Cursor users have seen the Max Mode toggle but aren't sure what it actually does. The short version: it sends more of your codebase to the model on each request. More context means better results on complex tasks, but each query costs significantly more credits. This guide covers what Max Mode changes, when it helps, and when you're better off without it.
What Is Cursor Max Mode?
Normal mode in Cursor truncates the context sent to the model. Even if you're using Claude with a 200K token context window, Cursor's normal mode only sends roughly 10,000-15,000 tokens of your codebase per request. It selects the open file, directly imported files, and a compressed summary of the broader project. This keeps requests fast and cheap.
Max Mode removes that truncation. It sends the full context window's worth of your codebase to the model: more files, more surrounding code, and longer conversation history. It also raises the tool call limit from 25 to 200 per interaction, so the agent can take more autonomous steps before stopping.
The result is a model with a significantly better understanding of your project. It sees more files, more types, more imports, more tests. For tasks that span multiple components, this extra visibility translates directly to better code.
How to Enable Max Mode
Toggle Max Mode in the model dropdown at the top of the chat panel. Select any model, then check the “Max Mode” option. You can also switch it on per-conversation. Max Mode is available on all paid Cursor plans (Pro, Pro+, Ultra, Teams, Enterprise).
Max Mode vs Normal Mode
The differences come down to three things: how much context the model sees, how many steps it can take, and how much each query costs.
| Dimension | Normal Mode | Max Mode |
|---|---|---|
| Context window | ~10K-15K tokens (truncated) | Full model window (128K-200K+) |
| Tool calls per interaction | 25 | 200 |
| Files in context | Open file + direct imports + summary | Many files across the project |
| Conversation history | Summarized/truncated after ~5 turns | Full history retained longer |
| Response speed | Faster (less to process) | Slower (more tokens in, more out) |
| Cost per query | Included in credit pool | Per-token API pricing + 20% margin |
| Best for | Single-file edits, quick questions | Multi-file refactors, architecture, complex debugging |
For most day-to-day coding, normal mode is sufficient. Cursor's context selection algorithm is good at pulling in the most relevant files. Max Mode becomes valuable when the relevant context is spread across many files that normal mode wouldn't include.
When to Use Max Mode
Max Mode isn't a “better mode” you should always have on. It's a tool for specific situations where normal context isn't enough.
Good Use Cases
Cross-file refactoring
Renaming a core interface used in 15 files, restructuring a module boundary, or migrating from one pattern to another across the codebase. The model needs to see all usage sites to make correct changes.
Multi-component debugging
Tracing a bug that crosses API routes, middleware, database queries, and frontend components. Normal mode might only see 2-3 of these layers. Max Mode sees all of them.
Architecture questions
Asking 'how does auth flow through this app?' or 'what would break if I changed the User schema?' requires understanding the full dependency graph, not just the file you have open.
Generating code with many dependencies
Building a new feature that imports from 8 different modules. The model needs to see the actual types and function signatures, not guess at them. Max Mode reduces hallucinated imports.
When Normal Mode Is Better
Single-file edits
Fixing a bug in one function, adding a new method to a class, writing a unit test for code that's self-contained. Normal mode has all the context it needs.
Quick questions
'What does this regex do?' or 'Is there a simpler way to write this loop?' These don't need 200K tokens of codebase context.
Code completions
Tab completions and inline suggestions work the same regardless of mode. Max Mode doesn't improve autocomplete.
Iterative small changes
When you're making a series of small, focused edits in the same file, normal mode is faster and costs nothing extra.
Cost Impact
Max Mode uses per-token API pricing with a 20% margin. Your Cursor subscription includes a credit pool for normal mode requests. Max Mode draws from that same pool but at a much higher rate.
| Model | Normal Mode Cost | Max Mode Cost | Multiplier |
|---|---|---|---|
| Claude Sonnet 4 | ~1 credit | ~5 credits | ~5x |
| Claude Opus 4 | ~2 credits | ~20-40 credits | ~10-20x |
| GPT-5 | ~1 credit | ~3-5 credits | ~3-5x |
| Gemini 3 Pro | ~1 credit | ~2-4 credits | ~2-4x |
On the Pro plan ($20/month), your credit pool supports roughly 225 normal Claude Sonnet requests. In Max Mode with Sonnet, that same pool covers about 45 requests. With Opus in Max Mode, you might get 5-10 requests before your credits are gone.
Credit Pool Math
A developer doing 10 Max Mode queries per day with Claude Sonnet 4 uses roughly 50 credits/day. On the $20/month Pro plan, that exhausts the entire pool in about 4 days. Pro+ ($60/month, 3x credits) extends this to around 12 days. Ultra ($200/month, 20x credits) can sustain heavy Max Mode use all month. If you use Max Mode regularly, Pro+ is the minimum viable plan. See Cursor Model Pricing for full breakdowns.
Common Problems
Quota exhaustion
The most common complaint. Users enable Max Mode for one complex task, forget to turn it off, and burn through their entire credit pool on routine edits that didn't need it. Max Mode doesn't auto-disable.
Slower responses
More context means more tokens to process. Max Mode requests with Claude take 2-4x longer than normal mode. With Opus, response times can exceed 30 seconds. For iterative workflows where you're waiting on each response, this adds up.
Diminishing returns
More context doesn't always help. LLMs are worse at finding relevant information buried in the middle of long contexts (the 'lost in the middle' problem). Dumping 200K tokens of code can actually cause the model to miss the important file.
Context confusion
When the model sees 50 files instead of 5, it sometimes imports from the wrong module, confuses similarly-named functions, or applies patterns from one part of the codebase to another where they don't fit.
The Toggle Trap
Max Mode stays on until you explicitly turn it off. There is no auto-disable after a session ends or after your credits drop below a threshold. If you enable Max Mode for a complex refactor at 2pm, every request for the rest of the day uses Max Mode pricing unless you remember to switch back. Check the model dropdown before starting any new conversation.
Alternatives to Max Mode
Max Mode solves a real problem: the model doesn't have enough context to do its job well. But there are other ways to get relevant context into the model without paying the full Max Mode tax.
@file and @folder mentions
Type @filename in the chat to pull specific files into context. This is surgical: you choose exactly which files the model sees. For a cross-file refactor, mentioning the 5 relevant files costs far less than Max Mode loading 50.
.cursor/rules/ files
Persistent instructions that load automatically based on glob patterns. Put architectural constraints, naming conventions, and module boundaries here. The model gets project knowledge without needing to read every file.
Semantic code search (WarpGrep)
Instead of dumping your entire codebase into the context window, use a search tool that finds and returns only the relevant code. WarpGrep runs as an MCP server and lets the agent search semantically, pulling in 10-20 relevant snippets instead of 200K tokens of everything.
Context compaction
Morph Compact summarizes long contexts into dense, information-preserving representations. When context is growing but you don't want to start a new session, compaction preserves the important information while freeing space. Works with Cursor and Claude Code.
The pattern that works best for most developers: stay in normal mode, use @-mentions to pull in specific files, keep .cursor/rules/ files for persistent project knowledge, and reach for Max Mode only when a task genuinely needs the model to see many files at once.
WarpGrep as a Max Mode Alternative
WarpGrep is a semantic code search MCP server that gives AI agents targeted codebase context. Instead of Max Mode's approach of “send everything,” WarpGrep lets the agent search for exactly what it needs: function definitions, type declarations, usage examples. On SWE-bench, agents using WarpGrep improved by ~4% while using 5.5x fewer tokens. It works inside Cursor, Claude Code, and any tool supporting MCP.
Morph Compact takes a different approach: instead of selecting better context upfront, it compresses existing context so you can fit more into the same window. These strategies complement each other. Use WarpGrep to find the right context, use Compact to fit more of it in.
Frequently Asked Questions
What is Cursor Max Mode?
Max Mode removes the context truncation that normal mode applies. Instead of sending ~10K-15K tokens of your codebase per request, Max Mode sends the full context window of the underlying model (up to 200K tokens for Claude). It also raises the tool call limit from 25 to 200 per interaction. It uses per-token API pricing with a 20% margin.
How much does Max Mode cost?
It depends on the model. Claude Sonnet 4 in Max Mode costs roughly 5x a normal request. Claude Opus 4 costs 10-20x. On the $20/month Pro plan, heavy Max Mode usage can exhaust your credit pool in 2-4 days. Pro+ ($60/month) or Ultra ($200/month) are better fits for regular Max Mode use.
Should I leave Max Mode on all the time?
No. Max Mode is useful for complex, multi-file tasks. For single-file edits, quick questions, and code completions, normal mode is faster and consumes far fewer credits. The most common complaint about Max Mode is accidental quota exhaustion from forgetting to turn it off.
Does Max Mode make responses better?
For tasks that need broad codebase awareness, yes. The model makes fewer hallucinated imports, understands type relationships better, and accounts for side effects across files. For simple tasks, the difference is negligible, and the extra context can actually introduce confusion (the “lost in the middle” problem with LLMs).
Get Better Context Without the Max Mode Cost
WarpGrep gives AI coding agents targeted codebase context through semantic search. Instead of sending your entire codebase to the model, it finds and returns only the relevant code. Works as an MCP server inside Cursor, Claude Code, and any compatible tool.