Cursor's Context Window Limits
Cursor does not have a single context window. The effective limit depends on which model you're using, which mode you're in, and how much of the window Cursor consumes internally for system prompts, codebase indexing, and file management.
| Mode | Typical Model | Advertised Window | Effective User Tokens |
|---|---|---|---|
| Tab (autocomplete) | Cursor-small | ~8K | ~2K-4K |
| Chat | Claude Sonnet 4.6 | 200K | ~40K-60K |
| Composer | Claude Sonnet 4.6 | 200K | ~30K-50K |
| Agent | Claude Sonnet 4.6 / GPT-4.1 | 200K-1M | ~40K-80K |
The gap between advertised and effective matters. When you select Claude Sonnet 4.6 in Cursor, you're not getting 200K tokens for your code. Cursor uses tokens for its system prompt, codebase index results, conversation history, and file contents it automatically includes. What remains for your actual request varies, but it's consistently less than half the advertised window.
Why Context Windows Matter for Coding Agents
A context window is the total number of tokens a model can process in a single call. For coding agents, this is the hard ceiling on how much code, conversation history, and tool output the model can see at once.
A 10,000-line TypeScript project runs roughly 200K tokens. If your effective window is 50K tokens, the model sees 25% of your codebase per request. It will miss imports from files it can't see. It will forget decisions from earlier in the conversation. It will produce edits that break dependencies it doesn't know exist.
Context Rot: Performance Degrades Before You Hit the Limit
Bigger windows don't solve the problem. Chroma's context rot research tested 18 frontier models including Claude Opus 4, GPT-4.1, and Gemini 2.5. Every model got worse as context grew. The degradation starts early and compounds.
Lost in the Middle
Liu et al. at Stanford found LLMs perform 30%+ worse when relevant information sits in the middle of the context, not the beginning or end. Performance follows a U-shaped curve. In a long Cursor session, your most recent edits and your initial instructions get attention. Everything in between gets degraded.
Attention Dilution
Transformer attention is quadratic. At 10K tokens, the model tracks 100 million pairwise relationships. At 100K tokens, that's 10 billion. More tokens in the window means less attention per token. The model physically can't attend to relevant code as effectively when the context is packed full.
The 35-Minute Wall
Cognition measured that agent success rates decrease after 35 minutes of continuous operation. Doubling task duration quadruples the failure rate. This isn't a model capability problem. The models are smart enough. The context just fills up with noise faster than the agent can use it.
How Cursor Handles Context
Cursor uses several mechanisms to manage what goes into the model's context window. Understanding these helps explain both the strengths and the limits.
Codebase Indexing
Cursor indexes your project locally using embeddings. When you ask a question or request an edit, it retrieves relevant code snippets via similarity search (RAG). This means the model gets context about files you didn't explicitly open. The tradeoff: retrieved chunks consume window tokens, and the retrieval isn't always accurate for cross-file type dependencies or implicit contracts.
@-Mentions
You can explicitly include files, folders, or docs in your prompt with @file, @folder, or @docs. This gives you direct control over what the model sees, bypassing the automatic retrieval. Useful when you know exactly which files matter. Less useful when the relevant code spans 15 files across 4 directories.
.cursorrules and Persistent Instructions
The .cursorrules file (or .cursor/rules directory) lets you define project-level instructions that persist across every prompt. Coding conventions, architectural patterns, dependency preferences. These tokens are always present in the context, which means they reduce the space available for code, but they prevent you from repeating the same instructions every time.
Composer and Agent Mode Context
Cursor's Composer and Agent modes operate across multiple files, which increases context pressure. The agent reads files, runs commands, and accumulates tool outputs, all consuming the same finite window. A multi-step refactor that touches 8 files will fill the context faster than a single-file edit. By the third or fourth step, the agent is operating with degraded awareness of what it did in step one.
Workarounds Within Cursor
These won't eliminate the context window constraint, but they help you get more out of the space you have.
Keep Files Small
Files under 300 lines fit more cleanly in context. Split large files into focused modules. A 2,000-line utils.ts wastes tokens on the 1,800 lines the model doesn't need for your current edit.
Use @-Mentions Deliberately
Instead of letting Cursor auto-retrieve, explicitly @mention the 2-3 files that matter. This gives you control over context allocation and prevents the window from filling with tangentially related code.
Write Better Prompts
Specific prompts produce shorter, more targeted context retrieval. 'Refactor the auth middleware in src/middleware/auth.ts to use JWT instead of session tokens' retrieves less noise than 'fix the auth system'.
Reset Conversations Frequently
Long conversations accumulate stale context. Starting a new chat for each distinct task gives the model a clean window. You lose conversation continuity, but you gain context quality.
These are all forms of manual context engineering. They work, but they shift cognitive load onto you. Every decision about which files to include, when to reset, and how to phrase your prompt is a decision the tool could be making automatically.
The Compaction Approach
The workarounds above manage context by excluding information. Compaction takes a different approach: keep everything, but make it smaller.
Morph Compact runs at 33,000 tok/s on a custom inference engine. It reads your context, removes filler tokens (boilerplate, redundant comments, verbose formatting, repeated patterns), and outputs a compressed version where every surviving sentence is verbatim from the original. No paraphrasing. No summarization. No information rewriting.
What This Means for Cursor Users
A 50K effective context window becomes 100K-150K of actual code content after compaction. A multi-file refactor that previously lost coherence after 4 files can now track 8-12 files with the same context budget. The model sees more of your codebase per request without any change to the underlying window size.
Compaction vs Summarization
Summarization rewrites your context in the model's own words. Factory's evaluation scored summarization 3.4-3.7/5 on accuracy. Compaction deletes tokens but never changes surviving text. The difference matters when the model needs to reference exact function signatures, variable names, or error messages from earlier in the session.
Proactive vs Reactive Compaction
Most tools compact reactively, triggering when context hits 95% capacity. By that point, performance has already degraded. The model has been operating with a bloated context for the entire session, making worse decisions at each step. Proactive compaction runs continuously, keeping the context lean throughout. There's no quality cliff because the context never reaches the cliff.
How Other Tools Handle Context
Claude Code: Auto-Compact
Claude Code runs in the terminal with full file system access. When context reaches 95% capacity, auto-compact triggers and compresses the conversation history. This extends sessions but the 95% trigger point means performance has already degraded by the time compaction fires. The recent context awareness feature in Sonnet 4.6 gives the model a live token budget counter.
Aider: Tree-Sitter Repo Map
Aider takes a prevention-first approach. Instead of sending full file contents, it uses tree-sitter to parse your codebase into a structural map of functions, classes, and imports. Only the map goes into context. When the model needs to edit a specific file, Aider sends that file. This keeps baseline context small but requires accurate map-to-file retrieval.
Windsurf: Cascade Memory
Windsurf's Cascade system uses a memory layer that persists across conversations. It stores project context, prior decisions, and file relationships outside the model's context window. This reduces per-session context pressure but adds latency for memory retrieval and can introduce stale information if the codebase changes between sessions.
Copilot: Snippet Retrieval
GitHub Copilot uses a proprietary retrieval system that sends relevant code snippets to the model. The context window is relatively small (optimized for fast completions) but the retrieval is tightly integrated with the VS Code editor. Good for line-level completions, limited for multi-file reasoning.
Context Approach Comparison
| Approach | Cursor | Claude Code | Aider | Morph Compact |
|---|---|---|---|---|
| Context strategy | RAG + @-mentions | Full files + auto-compact | Repo map + selective files | Verbatim compaction |
| Effective window | ~50K tokens | ~180K tokens | ~120K tokens | 2-3x any tool's window |
| Compaction method | None built-in | 95% trigger, conversation compress | Tree-sitter structural map | Proactive, 33K tok/s, verbatim |
| Multi-file coherence | Degrades after 4-5 files | Good until auto-compact triggers | Good for structure, limited for content | Maintains across 8-12+ files |
| Manual context management | High (@-mentions, resets) | Medium (file selection) | Low (automatic repo map) | None (automatic) |
| Long session support | Degrades, manual resets needed | Auto-compact extends sessions | Stable for repo-map tasks | Continuous, no degradation cliff |
Frequently Asked Questions
What is Cursor's context window size?
It depends on the model and mode. Claude Sonnet 4.6 in Cursor has a 200K advertised window, but Cursor's internal token usage reduces the effective space to roughly 40K-60K usable tokens. Tab completion uses around 2K-4K. Agent mode with GPT-4.1 can reach 80K effective tokens. The gap between advertised and usable is consistently 50-75%.
Why does Cursor lose context during long sessions?
Two compounding factors. The context window fills with accumulated conversation history, file contents, and tool outputs. Once full, older context gets dropped. Simultaneously, context rot degrades model performance at every length increment, meaning the model gets worse at using context even before the window is full.
How do I increase Cursor's context window?
You can't increase the raw window. You can use it more efficiently: @mention specific files instead of letting Cursor auto-retrieve, keep files under 300 lines, write specific prompts, and reset conversations between tasks. For a larger improvement, external compaction like Morph Compact gives you 2-3x more content in the same window by removing filler tokens.
What is context compaction?
Compaction removes unnecessary tokens (boilerplate, redundant comments, verbose formatting) while keeping every surviving sentence verbatim. Unlike summarization, it never rewrites or paraphrases. Morph Compact runs at 33,000 tok/s and achieves 50-70% compression. A 60K token context becomes 18K-30K tokens with the same information content.
Does Cursor support 1 million token context?
Cursor can use models with 1M windows (GPT-4.1, Gemini 2.5 Pro). The effective user tokens are still limited by Cursor's internal overhead. And context rot research shows performance degrades well before any window limit. A 1M window doesn't mean 1M tokens of useful context.
How does Cursor's codebase indexing work?
Cursor indexes your project using embeddings and retrieves relevant code snippets via similarity search when you make a request. This is RAG (retrieval-augmented generation) applied to your codebase. It helps find relevant code without manual file selection, but retrieved chunks consume context window tokens and retrieval accuracy drops for cross-file type relationships.
Is Cursor or Claude Code better for large codebases?
Claude Code uses auto-compact to compress context during long sessions and operates directly on the file system. For projects over 50K lines, this typically maintains better cross-file coherence than Cursor's RAG approach. Cursor's visual interface and inline editing are faster for focused, smaller edits. The best setup for large codebases is often Cursor for editing plus external compaction for context management.
What is the lost-in-the-middle effect?
A finding from Liu et al. at Stanford: LLMs perform 30%+ worse when relevant information is in the middle of the context window rather than at the beginning or end. In a Cursor session, this means your initial instructions and most recent edits get the most attention. Code context retrieved via indexing that lands in the middle of the prompt is disproportionately likely to be ignored by the model.
Related Articles
Get 10x More Code in Cursor's Context Window
Morph Compact compresses context 50-70% at 33,000 tok/s. Every surviving sentence stays verbatim. Your agents run longer, see more code, and make fewer mistakes.