Warp Grep Benchmarks
Fast agentic code search performance on real-world repositories
Agent Capabilities Improvement
We ran the official SWE-bench evaluation with and without Warp Grep as the code search tool. All runs used Claude 4.5 Opus (20251101) as the base model.
The agent using Warp Grep consumed 39% fewer input tokens, required 26% fewer reasoning turns, and solved 10% more tasks—demonstrating that better search directly improves agent effectiveness.
F1 Score Comparison
F1 score measures the balance between precision (relevant results returned) and recall (relevant results found). Each system operated within its native harness and was given a maximum of 15 steps.
On surface-level search queries, agentic search and semantic search appear to perform at par—both can find obvious symbols and patterns effectively, but semantic search completing in just 5s on average.
However, on queries requiring deeper logic—bug tracing, following code paths, understanding control flow—agentic search shows 2x–6x the performance of semantic search with a single query, while maintaining comparable return times.
Warp Grep achieves 73.0% F1 in just 3.8 steps—3x fewer than comparable agentic approaches
Build the best coding agents today
Join 500+ teams using Morph to reduce token costs and apply edits at lightning speed.