SWE-grep vs WarpGrep: Two RL-Trained Search Subagents, Different Tradeoffs (2026)

SWE-grep runs 8 parallel tool calls per turn at 2,800 tok/s on Cerebras but only works inside Windsurf. WarpGrep v2 lifts every model by 2.1-3.7 points on SWE-Bench Pro and works as an MCP server in any agent. We compared architecture, benchmarks, availability, and cost.

March 9, 2026 · 1 min read

Quick Comparison

Both SWE-grep and WarpGrep are RL-trained subagents that handle code search so the main coding model can focus on reasoning. They share the same design insight: context pollution kills agent performance, and a dedicated search subagent fixes it.

8
SWE-grep parallel calls/turn
36
WarpGrep tool calls in <5s
2,800
SWE-grep-mini tok/s
+3.7
WarpGrep max SWE-Bench lift
AspectSWE-grepWarpGrep v2
Built byCognition (Windsurf/Devin)Morph
Training methodMulti-turn RL with policy gradientRL for parallel tool formats
Parallel calls per turn8Up to 36 in <5s
Max turns44
Inference speed2,800+ tok/s (mini, Cerebras)Sub-6s per search
AvailabilityWindsurf onlyMCP server (any agent)
SWE-Bench Pro impactNot published+2.1 to +3.7 points
PricingIncluded in Windsurf plan$0.80/1M tokens (in + out)
ModelsSWE-grep, SWE-grep-miniWarpGrep v2 (single model)
Tool setgrep, read, globgrep, read, glob + file ops

What Is SWE-grep

Cognition built SWE-grep after measuring that coding agents spend over 60% of their first turn retrieving context. The model was trained with multi-turn reinforcement learning using a custom policy gradient approach. The reward function is an average of weighted F1 scores over file retrieval and line retrieval tasks, with precision weighted higher than recall. The reasoning: polluting the main agent's context is worse than missing a file, because the agent can always search again.

SWE-grep runs 8 parallel tool calls per turn for a maximum of 4 turns. The parallel behavior emerged during training without explicit incentivization. The tools are restricted to grep, read, and glob for cross-platform compatibility.

SWE-grep-mini is a distilled variant with additional RL training. It serves at 2,800+ tokens per second on Cerebras hardware, 20x faster than Claude Haiku 4.5 at 140 tok/s. The full SWE-grep model runs at 650+ tok/s, 4.5x faster than Haiku 4.5.

Fast Context Subagent

Activates automatically in Windsurf Cascade when a query requires code search. Triggers manually with Cmd+Enter (Mac) or Ctrl+Enter (Windows/Linux).

Precision Over Recall

The RL reward weights precision higher because context pollution is worse than a missing file. The agent can always search again in the next turn.

Cerebras Inference

Deployed on Cerebras WSE hardware for low-latency inference. SWE-grep-mini hits 2,800+ tok/s, enabling sub-second response times per tool call.

Two Model Variants

AspectSWE-grepSWE-grep-mini
Speed650+ tok/s2,800+ tok/s
Optimized forComplex retrieval tasksMaximum speed
TrainingMulti-turn RLDistillation from SWE-grep + additional RL
vs Haiku 4.5 (140 tok/s)4.5x faster20x faster

What Is WarpGrep

WarpGrep v2 is an RL-trained code search subagent built by Morph. It separates search from reasoning: the main coding model delegates search to WarpGrep, which runs in its own context window, issues parallel tool calls, and returns only the relevant file spans. The main model never sees the search noise.

WarpGrep completes most searches in under 6 seconds, executing up to 36 grep/read tool calls per search. It finds relevant code in an average of 3.8 steps and returns precise (file, [start_line, end_line]) spans rather than entire files. This keeps the main agent's context clean and its token budget intact.

SWE-Bench Pro Results

WarpGrep v2 lifts every model it is paired with on SWE-Bench Pro:

SWE-Bench Pro Scores: Baseline vs With WarpGrep v2

1Opus 4.6
55.4%
2Opus 4.6 + WarpGrep
57.5%
3GPT-5.3 Codex
56%
4Codex + WarpGrep
59.1%
5MiniMax 2.5
55.4%
6MiniMax + WarpGrep
57.6%

Beyond accuracy, WarpGrep v2 reduces input tokens by 17%, output tokens by 13%, and cuts Opus 4.6 per-task cost from $3.06 to $2.51, a 15.6% reduction. Wall-clock time drops 28% on production repositories.

+2.1 to +3.7
SWE-Bench Pro lift
15.6%
Cost reduction (Opus)
28%
Time reduction
17%
Fewer input tokens

Architecture Comparison

Both tools follow the same core pattern: a small, fast model handles search so the large, expensive model handles reasoning. The architectural differences are in parallelism, tool sets, and how they integrate with the coding agent.

Search Loop

SWE-grep runs 8 parallel calls per turn across 4 turns, for a theoretical maximum of 32 tool calls per search. The tool set is restricted to grep, read, and glob. Cognition designed the tool set for cross-platform compatibility, ensuring the same behavior on macOS, Linux, and Windows.

WarpGrep issues up to 36 tool calls in under 5 seconds across 4 turns. The tool set includes grep, read, glob, and additional file operations. WarpGrep returns (file, [start_line, end_line]) spans, so the main agent gets precisely scoped code rather than full files.

Context Pollution Prevention

Both tools exist because of the same insight: when a frontier model like Claude Opus searches a codebase itself, every file it reads goes into its context window. After 10-15 searches, the context is full of irrelevant code, and the model starts hallucinating file paths and misattributing functions. A dedicated subagent with its own context window absorbs that noise and returns only the signal.

SWE-grep calls this "preserving the context budget and intelligence for the main agent." WarpGrep calls it "separating reasoning from search." Same idea, different phrasing.

RL Training

SWE-grep uses multi-turn RL with per-sequence importance sampling, leave-one-out baseline variance reduction, and trajectory masking for overlong sequences. The reward is weighted F1 over file and line retrieval tasks. The parallel tool-calling behavior emerged naturally during training.

WarpGrep was trained with RL for parallel tool formats, optimizing for both retrieval accuracy and search efficiency. Morph describes the approach as combining "logical breadth + logical depth" to balance exploration width with targeted drilling.

Benchmark Results

Direct comparison is difficult because the two tools publish different metrics. SWE-grep reports tokens per second and weighted F1 on retrieval tasks. WarpGrep reports SWE-Bench Pro lift, cost reduction, and time savings. Neither publishes the other's preferred metric.

MetricSWE-grepWarpGrep v2
SWE-Bench Pro impactNot published+2.1 to +3.7 points across 3 models
Inference speed (tok/s)2,800+ (mini), 650+ (full)Not published
Search latency claim20x faster than Haiku 4.5<6s per search
Token reductionNot published17% fewer input tokens
Cost reductionNot published15.6% (Opus 4.6)
Time reductionNot published28% (production repos)
Retrieval F1Matches frontier models (exact score unpublished)0.73 F1 in 3.8 steps

Benchmark transparency gap

Hacker News commenters flagged that Cognition has not released the benchmark code or dataset for SWE-grep, making independent verification difficult. One commenter wrote: "please release the benchmark or the benchmark code. Like this is just 'trust me bro.'" WarpGrep publishes SWE-Bench Pro traces through the SEAL leaderboard, which uses Scale AI's standardized scaffolding.

Availability and Integration

This is the sharpest difference. SWE-grep is locked to Windsurf. WarpGrep works everywhere MCP does.

IntegrationSWE-grepWarpGrep v2
WindsurfBuilt-in (Fast Context)Via MCP
Claude CodeNot availableMCP server
CursorNot availableMCP server
Codex CLINot availableMCP server
VS Code CopilotNot availableMCP server
Custom agents (SDK)Not availableTypeScript SDK + raw API
GitHub searchNot availablePublic repos without cloning

Cognition deploys SWE-grep and SWE-grep-mini across DeepWiki, Devin, and Windsurf Tab internally. Developers on Hacker News have requested an MCP server or API release, but Cognition has not announced a timeline.

WarpGrep ships as an MCP server installable in one command. It also has a TypeScript SDK for programmatic integration and a raw API protocol for non-Node environments. GitHub search mode can query public repositories without cloning them locally.

Pricing

AspectSWE-grepWarpGrep v2
Pricing modelBundled with Windsurf subscriptionPer-token ($0.80/1M input + output)
Free tierYes (Windsurf free plan includes Fast Context)Free tier available
Windsurf Pro ($15/mo)IncludedSeparate cost
Standalone accessNoYes (MCP server, SDK, API)
Net cost impactNo marginal cost (within Windsurf)Saves 15.6% on main model cost (offsets own token cost)

SWE-grep has a simpler cost story: if you use Windsurf, you already have it. WarpGrep charges per token but saves more than it costs. On Opus 4.6 tasks, WarpGrep reduces per-task cost from $3.06 to $2.51. The $0.55 saving covers WarpGrep's own token cost and leaves a net reduction.

When to Use Each

Use SWE-grep when...

You already use Windsurf as your primary IDE. Fast Context activates automatically, no configuration needed. On Cerebras hardware, SWE-grep-mini's 2,800 tok/s inference is difficult to beat on raw speed. If Windsurf is your daily driver, SWE-grep is the zero-friction option.

Use WarpGrep when...

You use Claude Code, Cursor, Codex, or any other MCP-compatible agent. WarpGrep works across all of them with one installation. If you run benchmarks or care about reproducible SWE-Bench Pro scores, WarpGrep publishes its traces on the SEAL leaderboard. If you build custom agent systems, the TypeScript SDK and raw API give you programmatic control.

SituationBest ChoiceWhy
Windsurf-only workflowSWE-grepBuilt-in, no setup, no extra cost
Claude Code userWarpGrepMCP server, direct integration
Cursor userWarpGrepMCP server, works in Cursor agent mode
Multi-agent system builderWarpGrepSDK + API for programmatic integration
Maximum raw search speedSWE-grep-mini2,800 tok/s on Cerebras is the fastest published number
Verified benchmark impactWarpGrepPublished SWE-Bench Pro traces on SEAL leaderboard
Budget-constrainedDependsSWE-grep is free in Windsurf free tier; WarpGrep saves 15.6% on main model cost

Frequently Asked Questions

What is SWE-grep?

An RL-trained code retrieval model built by Cognition. It runs 8 parallel tool calls per turn for up to 4 turns, with SWE-grep-mini serving at 2,800+ tok/s on Cerebras. It powers Windsurf's Fast Context subagent and is currently only available within Windsurf.

What is WarpGrep?

An RL-trained code search subagent built by Morph. It runs up to 36 tool calls in under 5 seconds, lifts every model by 2.1-3.7 points on SWE-Bench Pro, and cuts cost by 15.6%. Ships as an MCP server for Claude Code, Cursor, Codex, and any MCP-compatible agent.

Can I use SWE-grep outside Windsurf?

Not currently. SWE-grep is available to Windsurf individual users (free and paid). Cognition has not released an API or MCP server. Developers have requested this on Hacker News, but no timeline has been announced.

How does WarpGrep improve SWE-Bench Pro scores?

By separating search from reasoning. The main model delegates code search to WarpGrep, which runs in its own context window. The main model receives only relevant file spans, keeping its context clean. Opus 4.6 goes from 55.4% to 57.5% (+2.1). GPT-5.3-Codex goes from 56.0% to 59.1% (+3.1). MiniMax 2.5 goes from 55.4% to 57.6% (+3.7).

Which is faster?

SWE-grep-mini publishes 2,800+ tok/s on Cerebras. WarpGrep publishes sub-6-second end-to-end search time. They measure different things, so direct comparison requires testing both on the same codebase. Both are significantly faster than using a frontier model for search.

Related Comparisons

Try WarpGrep in Your Agent

Install WarpGrep as an MCP server in Claude Code, Cursor, or Codex. One command, sub-6-second searches, 2.1-3.7 point lift on SWE-Bench Pro. Works with any MCP-compatible agent.

Sources