Quick Comparison
Both SWE-grep and WarpGrep are RL-trained subagents that handle code search so the main coding model can focus on reasoning. They share the same design insight: context pollution kills agent performance, and a dedicated search subagent fixes it.
| Aspect | SWE-grep | WarpGrep v2 |
|---|---|---|
| Built by | Cognition (Windsurf/Devin) | Morph |
| Training method | Multi-turn RL with policy gradient | RL for parallel tool formats |
| Parallel calls per turn | 8 | Up to 36 in <5s |
| Max turns | 4 | 4 |
| Inference speed | 2,800+ tok/s (mini, Cerebras) | Sub-6s per search |
| Availability | Windsurf only | MCP server (any agent) |
| SWE-Bench Pro impact | Not published | +2.1 to +3.7 points |
| Pricing | Included in Windsurf plan | $0.80/1M tokens (in + out) |
| Models | SWE-grep, SWE-grep-mini | WarpGrep v2 (single model) |
| Tool set | grep, read, glob | grep, read, glob + file ops |
What Is SWE-grep
Cognition built SWE-grep after measuring that coding agents spend over 60% of their first turn retrieving context. The model was trained with multi-turn reinforcement learning using a custom policy gradient approach. The reward function is an average of weighted F1 scores over file retrieval and line retrieval tasks, with precision weighted higher than recall. The reasoning: polluting the main agent's context is worse than missing a file, because the agent can always search again.
SWE-grep runs 8 parallel tool calls per turn for a maximum of 4 turns. The parallel behavior emerged during training without explicit incentivization. The tools are restricted to grep, read, and glob for cross-platform compatibility.
SWE-grep-mini is a distilled variant with additional RL training. It serves at 2,800+ tokens per second on Cerebras hardware, 20x faster than Claude Haiku 4.5 at 140 tok/s. The full SWE-grep model runs at 650+ tok/s, 4.5x faster than Haiku 4.5.
Fast Context Subagent
Activates automatically in Windsurf Cascade when a query requires code search. Triggers manually with Cmd+Enter (Mac) or Ctrl+Enter (Windows/Linux).
Precision Over Recall
The RL reward weights precision higher because context pollution is worse than a missing file. The agent can always search again in the next turn.
Cerebras Inference
Deployed on Cerebras WSE hardware for low-latency inference. SWE-grep-mini hits 2,800+ tok/s, enabling sub-second response times per tool call.
Two Model Variants
| Aspect | SWE-grep | SWE-grep-mini |
|---|---|---|
| Speed | 650+ tok/s | 2,800+ tok/s |
| Optimized for | Complex retrieval tasks | Maximum speed |
| Training | Multi-turn RL | Distillation from SWE-grep + additional RL |
| vs Haiku 4.5 (140 tok/s) | 4.5x faster | 20x faster |
What Is WarpGrep
WarpGrep v2 is an RL-trained code search subagent built by Morph. It separates search from reasoning: the main coding model delegates search to WarpGrep, which runs in its own context window, issues parallel tool calls, and returns only the relevant file spans. The main model never sees the search noise.
WarpGrep completes most searches in under 6 seconds, executing up to 36 grep/read tool calls per search. It finds relevant code in an average of 3.8 steps and returns precise (file, [start_line, end_line]) spans rather than entire files. This keeps the main agent's context clean and its token budget intact.
SWE-Bench Pro Results
WarpGrep v2 lifts every model it is paired with on SWE-Bench Pro:
SWE-Bench Pro Scores: Baseline vs With WarpGrep v2
Beyond accuracy, WarpGrep v2 reduces input tokens by 17%, output tokens by 13%, and cuts Opus 4.6 per-task cost from $3.06 to $2.51, a 15.6% reduction. Wall-clock time drops 28% on production repositories.
Architecture Comparison
Both tools follow the same core pattern: a small, fast model handles search so the large, expensive model handles reasoning. The architectural differences are in parallelism, tool sets, and how they integrate with the coding agent.
Search Loop
SWE-grep runs 8 parallel calls per turn across 4 turns, for a theoretical maximum of 32 tool calls per search. The tool set is restricted to grep, read, and glob. Cognition designed the tool set for cross-platform compatibility, ensuring the same behavior on macOS, Linux, and Windows.
WarpGrep issues up to 36 tool calls in under 5 seconds across 4 turns. The tool set includes grep, read, glob, and additional file operations. WarpGrep returns (file, [start_line, end_line]) spans, so the main agent gets precisely scoped code rather than full files.
Context Pollution Prevention
Both tools exist because of the same insight: when a frontier model like Claude Opus searches a codebase itself, every file it reads goes into its context window. After 10-15 searches, the context is full of irrelevant code, and the model starts hallucinating file paths and misattributing functions. A dedicated subagent with its own context window absorbs that noise and returns only the signal.
SWE-grep calls this "preserving the context budget and intelligence for the main agent." WarpGrep calls it "separating reasoning from search." Same idea, different phrasing.
RL Training
SWE-grep uses multi-turn RL with per-sequence importance sampling, leave-one-out baseline variance reduction, and trajectory masking for overlong sequences. The reward is weighted F1 over file and line retrieval tasks. The parallel tool-calling behavior emerged naturally during training.
WarpGrep was trained with RL for parallel tool formats, optimizing for both retrieval accuracy and search efficiency. Morph describes the approach as combining "logical breadth + logical depth" to balance exploration width with targeted drilling.
Benchmark Results
Direct comparison is difficult because the two tools publish different metrics. SWE-grep reports tokens per second and weighted F1 on retrieval tasks. WarpGrep reports SWE-Bench Pro lift, cost reduction, and time savings. Neither publishes the other's preferred metric.
| Metric | SWE-grep | WarpGrep v2 |
|---|---|---|
| SWE-Bench Pro impact | Not published | +2.1 to +3.7 points across 3 models |
| Inference speed (tok/s) | 2,800+ (mini), 650+ (full) | Not published |
| Search latency claim | 20x faster than Haiku 4.5 | <6s per search |
| Token reduction | Not published | 17% fewer input tokens |
| Cost reduction | Not published | 15.6% (Opus 4.6) |
| Time reduction | Not published | 28% (production repos) |
| Retrieval F1 | Matches frontier models (exact score unpublished) | 0.73 F1 in 3.8 steps |
Benchmark transparency gap
Hacker News commenters flagged that Cognition has not released the benchmark code or dataset for SWE-grep, making independent verification difficult. One commenter wrote: "please release the benchmark or the benchmark code. Like this is just 'trust me bro.'" WarpGrep publishes SWE-Bench Pro traces through the SEAL leaderboard, which uses Scale AI's standardized scaffolding.
Availability and Integration
This is the sharpest difference. SWE-grep is locked to Windsurf. WarpGrep works everywhere MCP does.
| Integration | SWE-grep | WarpGrep v2 |
|---|---|---|
| Windsurf | Built-in (Fast Context) | Via MCP |
| Claude Code | Not available | MCP server |
| Cursor | Not available | MCP server |
| Codex CLI | Not available | MCP server |
| VS Code Copilot | Not available | MCP server |
| Custom agents (SDK) | Not available | TypeScript SDK + raw API |
| GitHub search | Not available | Public repos without cloning |
Cognition deploys SWE-grep and SWE-grep-mini across DeepWiki, Devin, and Windsurf Tab internally. Developers on Hacker News have requested an MCP server or API release, but Cognition has not announced a timeline.
WarpGrep ships as an MCP server installable in one command. It also has a TypeScript SDK for programmatic integration and a raw API protocol for non-Node environments. GitHub search mode can query public repositories without cloning them locally.
Pricing
| Aspect | SWE-grep | WarpGrep v2 |
|---|---|---|
| Pricing model | Bundled with Windsurf subscription | Per-token ($0.80/1M input + output) |
| Free tier | Yes (Windsurf free plan includes Fast Context) | Free tier available |
| Windsurf Pro ($15/mo) | Included | Separate cost |
| Standalone access | No | Yes (MCP server, SDK, API) |
| Net cost impact | No marginal cost (within Windsurf) | Saves 15.6% on main model cost (offsets own token cost) |
SWE-grep has a simpler cost story: if you use Windsurf, you already have it. WarpGrep charges per token but saves more than it costs. On Opus 4.6 tasks, WarpGrep reduces per-task cost from $3.06 to $2.51. The $0.55 saving covers WarpGrep's own token cost and leaves a net reduction.
When to Use Each
Use SWE-grep when...
You already use Windsurf as your primary IDE. Fast Context activates automatically, no configuration needed. On Cerebras hardware, SWE-grep-mini's 2,800 tok/s inference is difficult to beat on raw speed. If Windsurf is your daily driver, SWE-grep is the zero-friction option.
Use WarpGrep when...
You use Claude Code, Cursor, Codex, or any other MCP-compatible agent. WarpGrep works across all of them with one installation. If you run benchmarks or care about reproducible SWE-Bench Pro scores, WarpGrep publishes its traces on the SEAL leaderboard. If you build custom agent systems, the TypeScript SDK and raw API give you programmatic control.
| Situation | Best Choice | Why |
|---|---|---|
| Windsurf-only workflow | SWE-grep | Built-in, no setup, no extra cost |
| Claude Code user | WarpGrep | MCP server, direct integration |
| Cursor user | WarpGrep | MCP server, works in Cursor agent mode |
| Multi-agent system builder | WarpGrep | SDK + API for programmatic integration |
| Maximum raw search speed | SWE-grep-mini | 2,800 tok/s on Cerebras is the fastest published number |
| Verified benchmark impact | WarpGrep | Published SWE-Bench Pro traces on SEAL leaderboard |
| Budget-constrained | Depends | SWE-grep is free in Windsurf free tier; WarpGrep saves 15.6% on main model cost |
Frequently Asked Questions
What is SWE-grep?
An RL-trained code retrieval model built by Cognition. It runs 8 parallel tool calls per turn for up to 4 turns, with SWE-grep-mini serving at 2,800+ tok/s on Cerebras. It powers Windsurf's Fast Context subagent and is currently only available within Windsurf.
What is WarpGrep?
An RL-trained code search subagent built by Morph. It runs up to 36 tool calls in under 5 seconds, lifts every model by 2.1-3.7 points on SWE-Bench Pro, and cuts cost by 15.6%. Ships as an MCP server for Claude Code, Cursor, Codex, and any MCP-compatible agent.
Can I use SWE-grep outside Windsurf?
Not currently. SWE-grep is available to Windsurf individual users (free and paid). Cognition has not released an API or MCP server. Developers have requested this on Hacker News, but no timeline has been announced.
How does WarpGrep improve SWE-Bench Pro scores?
By separating search from reasoning. The main model delegates code search to WarpGrep, which runs in its own context window. The main model receives only relevant file spans, keeping its context clean. Opus 4.6 goes from 55.4% to 57.5% (+2.1). GPT-5.3-Codex goes from 56.0% to 59.1% (+3.1). MiniMax 2.5 goes from 55.4% to 57.6% (+3.7).
Which is faster?
SWE-grep-mini publishes 2,800+ tok/s on Cerebras. WarpGrep publishes sub-6-second end-to-end search time. They measure different things, so direct comparison requires testing both on the same codebase. Both are significantly faster than using a frontier model for search.
Related Comparisons
Try WarpGrep in Your Agent
Install WarpGrep as an MCP server in Claude Code, Cursor, or Codex. One command, sub-6-second searches, 2.1-3.7 point lift on SWE-Bench Pro. Works with any MCP-compatible agent.
Sources
- Introducing SWE-grep and SWE-grep-mini (Cognition)
- Fast Context Documentation (Windsurf)
- Case Study: Cognition x Cerebras (Cerebras)
- SWE-grep HN Discussion (Hacker News)
- WarpGrep v2 Launch (Y Combinator)
- WarpGrep: Fast, Parallel Code Retrieval with RL (Morph)
- WarpGrep Documentation (Morph)
- SWE-Bench Pro Public Leaderboard (SEAL / Scale AI)
- Cognition, Windsurf Launch SWE-grep Duo (TestingCatalog)