Warp Grep Benchmarks

Fast agentic code search performance on real-world repositories

Agent Capabilities Improvement

We ran the official SWE-bench evaluation with and without Warp Grep as the code search tool. All runs used Claude 4.5 Opus (20251101) as the base model.

The agent using Warp Grep consumed 39% fewer input tokens, required 26% fewer reasoning turns, and solved 10% more tasks—demonstrating that better search directly improves agent effectiveness.

Input Tokens
39% fewer
14K9K
Agent Turns
26% fewer
35.026.0
Tasks Solved
10% more
74.4%81.9%
Input Tokens39% fewer
Without Warp Grep
14K
With Warp Grep
9K
Agent Turns26% fewer
Without Warp Grep
35.0
With Warp Grep
26.0
Tasks Solved10% more
Without Warp Grep
74.4%
With Warp Grep
81.9%

F1 Score Comparison

F1 score measures the balance between precision (relevant results returned) and recall (relevant results found). Each system operated within its native harness and was given a maximum of 15 steps.

On surface-level search queries, agentic search and semantic search appear to perform at par—both can find obvious symbols and patterns effectively, but semantic search completing in just 5s on average.

However, on queries requiring deeper logic—bug tracing, following code paths, understanding control flow—agentic search shows 2x–6x the performance of semantic search with a single query, while maintaining comparable return times.

Average Steps to Complete
Warp Grep3.8
SWE Grep3.7
Claude Haiku12.4
Gemini Flash10.8
GLM 4.514.5
mgrep1

Warp Grep achieves 73.0% F1 in just 3.8 steps—3x fewer than comparable agentic approaches

Build the best coding agents today

Join 500+ teams using Morph to reduce token costs and apply edits at lightning speed.

40k
tok/s of prefill
The fastest way to find relevant context
Better • Faster • Cheaper Context Collection