When you type @web in Cursor, the search goes to Exa. Not Google. Not Bing. A search engine built from scratch for machines instead of humans. That design choice explains most of what makes Exa different.
What Exa Is
Exa is a web search API designed for AI agents and LLMs. The core difference from Google, Bing, or any traditional search engine: Exa uses neural embeddings to understand query meaning, not keyword matching.
The company was founded in 2021 by Will Bryk and Jeff Wang, two Harvard roommates who watched GPT-3 launch and saw a gap. GPT-3 could understand natural language at a level that Google's keyword algorithms could not match. But search, the primary way software retrieves information, was still built on term frequency and link graphs. Bryk and Wang started Metaphor Systems (YC W22), rebranded to Exa in January 2024, and raised $85M at a $700M valuation in September 2025.
Exa now indexes tens of billions of web pages with minute-level refresh rates. The infrastructure is built in Rust, running a custom vector database with Matryoshka embeddings, document clustering, binary compression, and assembly-level SIMD optimizations. The embedding model was trained for over a month on a 144-GPU H200 cluster.
The business model is per-query pricing instead of ads. As Bryk puts it: "Our incentive is to make people like the search as much as possible. Google's incentive is making them click on ads."
How Embeddings Search Differs from Keywords
Traditional search engines match words. You type "best smartphone 2026," and they return pages containing those exact terms, ranked by link authority and engagement signals. If a page about "top mobile devices" never uses the word "smartphone," keyword search misses it.
Exa converts both the query and every indexed page into high-dimensional vectors (embeddings) that represent semantic meaning. Finding relevant results means finding vectors that are close in this mathematical space. "Best smartphone 2026" and "top mobile devices this year" produce nearly identical vectors despite sharing zero keywords.
How it works under the hood
When you submit a query, Exa encodes it into a vector, typically hundreds or thousands of numbers. Every web page in the index has already been encoded the same way. Search becomes a nearest-neighbor lookup: find the page vectors closest to the query vector. The embedding model captures relationships like synonymy, topic similarity, and conceptual proximity that keyword matching cannot.
This is particularly powerful for AI agents. An agent constructing a query like "companies in San Francisco using assembly language for embedded systems" has no single keyword to match. Exa's embeddings encode the full intent and return pages that match semantically.
The tradeoff: embeddings can falter on highly specific technical queries where exact string matching is what you need. Searching for a specific function name or error code is better served by keyword search. Exa's auto mode intelligently combines neural and keyword methods to handle both cases.
Search Types and Latency
Exa exposes multiple search types tuned for different speed-quality tradeoffs. This matters because a chatbot needs sub-200ms results while a research agent can afford to spend a minute finding the best answer.
| Type | Latency | Best For | How It Works |
|---|---|---|---|
| instant | ~200ms | Real-time chat, voice assistants | Lowest-latency search optimized for interactive use |
| fast | ~450ms | Speed-sensitive apps with quality needs | Streamlined neural search with minimal quality loss |
| auto (default) | ~1s | General-purpose use | Intelligently combines neural and keyword methods |
| deep-lite | 2-10s | Quick synthesis tasks | Lightweight multi-step search with synthesized output |
| deep | 5-60s | Research agents, complex queries | Multi-step agentic search with structured outputs and grounding |
| deep-reasoning | 10-60s | Deep research, analytical work | Higher-reasoning synthesis for demanding queries |
The deep family is where Exa gets interesting for agent builders. Deep search doesn't just embed and retrieve. It agentically searches, processes results, and searches again until it finds high-quality information. The P50 latency is 3.5 seconds, but it can run longer for complex queries. This is Exa doing the multi-turn reasoning loop internally, so your agent doesn't have to.
API Usage and Code Examples
The Exa SDK is available for TypeScript/JavaScript and Python. Install with npm install exa-js or pip install exa_py.
Basic search with content extraction (TypeScript)
import Exa from "exa-js";
const exa = new Exa(process.env.EXA_API_KEY);
// Semantic search with highlights
const results = await exa.searchAndContents(
"companies using LLMs for contract review",
{
type: "auto",
numResults: 10,
highlights: { maxCharacters: 4000 },
}
);
for (const result of results.results) {
console.log(result.title, result.url);
console.log(result.highlights.join("\n"));
}Deep search with structured output
// Deep search with JSON schema output
const research = await exa.search(
"What are the leading open-source LLM inference engines?",
{
type: "deep",
numResults: 10,
contents: {
text: true,
summary: { query: "Key features and performance" },
},
}
);
// Each result includes text, summary, and grounding citations
for (const r of research.results) {
console.log(r.title);
console.log(r.summary);
}Find similar pages
// Find pages similar to a given URL
const similar = await exa.findSimilarAndContents(
"https://arxiv.org/abs/2401.14196",
{
numResults: 10,
highlights: true,
excludeDomains: ["arxiv.org"],
}
);Category-specific search
// Search Exa's specialized indexes
const people = await exa.searchAndContents(
"machine learning engineers at fintech startups",
{
category: "people",
numResults: 20,
}
);
const companies = await exa.searchAndContents(
"Series A AI infrastructure companies",
{
category: "company",
numResults: 15,
}
);
const papers = await exa.searchAndContents(
"retrieval augmented generation improvements 2025",
{
category: "research paper",
numResults: 10,
contents: { text: { maxCharacters: 2000 } },
}
);Key API Features
Contents Extraction
Return clean text, highlights, or AI-generated summaries inline with search results. Highlights are 10x more token-efficient than full text. Summaries can follow a JSON schema for structured extraction.
Domain Filtering
includeDomains and excludeDomains (up to 1,200 each) let you scope searches to specific sites. Date filters (startPublishedDate, endPublishedDate) constrain by publication time.
Specialized Indexes
Category parameter targets curated indexes: 1B+ LinkedIn profiles (people), 50M+ companies, 100M+ research papers, plus news, financial reports, and personal sites.
Live Crawl
maxAgeHours controls cache freshness. Set to 0 for always-fresh live crawling. subpages parameter crawls child pages for deeper extraction from a single result.
Vercel AI SDK integration
Exa ships an official Vercel AI SDK provider (@exalabs/ai-sdk). You can add web search as a tool alongside your language model in a few lines, with results automatically formatted for the AI SDK's tool-call interface.
Exa vs Tavily vs Serper
Three APIs dominate web search for AI agents. Each makes different tradeoffs.
| Dimension | Exa | Tavily | Serper |
|---|---|---|---|
| Search method | Neural embeddings over proprietary index | Aggregates + synthesizes from multiple sources | Proxies Google Search results |
| Agent benchmark score | 8.7 / 10 | 8.6 / 10 | 8.0 / 10 |
| Best at | Semantic/conceptual research, entity search | Agent-native workflows, pre-synthesized content | Current events, Google-familiar results |
| Weakest at | Highly specific technical string queries | Result ordering consistency between versions | No semantic mode, purely keyword-based |
| Content extraction | Built-in: text, highlights, summaries, JSON schema | Built-in: raw content, synthesized answers | Snippets only, separate scraping needed |
| Specialized indexes | People, companies, research papers, news | No | News, images, scholar (via Google) |
| Free tier | 1,000 requests/month | 1,000 requests/month | 2,500 requests/month |
| Paid pricing | $7 / 1K searches (with contents) | $5 / 1K searches | $1.65 / 1K searches (Google results) |
| Latency (standard) | ~1s (auto) | ~1-2s | ~0.5s |
When Each Wins
Choose Exa when your agent needs to find semantically related content across topic clusters, search for specific people or companies, or process paragraph-length queries that would break keyword search. Exa's embeddings find content that keyword engines miss entirely.
Choose Tavily when you want content pre-processed for LLM consumption without building extraction logic. Tavily's search_depth parameter gives you a genuine latency-quality lever, and it aggregates from up to 20 sources per query.
Choose Serper when freshness matters most, when you want Google's result quality at lower cost, or when your queries are keyword-shaped. Serper's structured JSON includes knowledge graphs and answer boxes that other APIs lack.
Who Uses Exa
Exa serves thousands of companies. The most notable integration is Cursor, where Exa powers the @web feature. When you use @web in Cursor's chat, a separate model determines the search query from your message and conversation history, then sends it to Exa. The results come back as structured context for the coding model.
This is documented in Cursor's official docs: "With @Web, Cursor searches the web using exa.ai to find up-to-date information and add it as context."
Cursor
Powers @web for in-editor web search. A language model determines the search query from context, Exa returns structured results, and the coding model uses them for answers.
Notion AI
Uses Exa for news search within Notion's AI features. The semantic search helps surface relevant news without requiring exact keyword matches.
Financial Services
Top private equity and consulting firms use Exa for financial data retrieval. The company search index and domain filtering make it useful for due diligence and market mapping.
Flatfile
Reported 15-20x faster market mapping at lower costs compared to conventional list vendors. Exa's company and people indexes replaced manual research workflows.
MCP Integrations
Exa ships an official MCP server that connects Claude Desktop, Claude Code, VS Code, Windsurf, Gemini CLI, and other AI assistants to web search and code search tools.
Websets (Lead Gen)
Exa's Websets product uses the same search infrastructure for B2B lead generation. AI agents verify every result against specified criteria, reducing false positives.
Pricing
Exa simplified its pricing in March 2026, bundling content extraction into base search costs for most users.
| Endpoint | Price per 1,000 Requests | Notes |
|---|---|---|
| Search (with contents) | $7 | Includes up to 10 results with text and highlights |
| Deep search | $12 | Multi-step agentic search with grounding |
| Deep-reasoning | $15 | Higher synthesis quality for analytical queries |
| Answer endpoint | $5 | Direct answers with citations |
| Contents (standalone) | $1 per 1K pages | Extract text/highlights from known URLs |
| Additional results (>10) | $1 per 1K results | Applies to search, deep, and answer |
| AI summaries | $1 per 1K pages | Available across all endpoints |
Free tier and grants
Exa provides 1,000 free requests per month on the free plan. Startups and education projects can apply for $1,000 in free credits. Enterprise plans include custom rate limits, dedicated support, and custom indexing.
When to Use Exa (and When Not To)
Use Exa when:
- Your agent needs semantic understanding of queries. "Companies in SF using Rust for systems programming" is a natural Exa query. No keyword engine handles this well.
- You need structured content extraction built into search. Exa returns clean text, highlights, and summaries inline. No separate scraping step.
- You're searching for people or companies. The specialized indexes (1B+ LinkedIn profiles, 50M+ companies) are stronger than general web search for entity lookup.
- You want findSimilar functionality. Give Exa a URL and it returns semantically similar pages, useful for competitive analysis and research expansion.
- Your agent sends long, natural-language queries. Exa handles paragraph-length queries natively. Google-based APIs degrade on anything beyond a few keywords.
Don't use Exa when:
- You need exact string matching. Searching for a specific error message or function name is better served by keyword search or grep.
- Freshness is the primary concern and you trust Google's index. Serper gives you Google's results at lower cost with better freshness signals.
- You're searching within a codebase, not the web. Web search APIs, including Exa, are the wrong tool for repository-level code retrieval. That requires purpose-built code search.
Web search vs code search
Exa solves web retrieval for agents. Code retrieval is a different problem with different tools. Agentic code search uses multi-turn reasoning with parallel tool calls (grep, file reads, directory listing) to follow import chains and call graphs. Models like WarpGrep are RL-trained for this: 8 parallel tool calls per turn, 3-4 turns to find the right code, 0.73 F1 on retrieval benchmarks. Web search for web context, code search for code context.
Frequently Asked Questions
What is Exa Search API?
Exa Search API is a web search engine built for AI agents and LLMs. Instead of keyword matching like Google, Exa uses neural embeddings to understand the semantic meaning of queries and return structured results. It powers Cursor's @web feature and is used by companies like Notion AI for news search. The API offers multiple search types from sub-200ms instant search to 60-second deep research, with built-in content extraction that returns clean text instead of raw HTML.
How does Exa compare to Tavily?
Exa and Tavily are both built for AI agents but take different approaches. Exa uses neural embeddings over a proprietary index of tens of billions of pages, excelling at semantic and conceptual retrieval. Tavily aggregates and synthesizes content from multiple sources, optimized for agent consumption patterns with features like search_depth for latency-quality tradeoffs. Exa scores 8.7 in agent benchmarks versus Tavily's 8.6. Choose Exa for semantic research across topic clusters. Choose Tavily when you want pre-synthesized content with built-in extraction.
How does Exa compare to Serper?
Serper proxies Google Search results and returns structured JSON. It excels at freshness and familiar result patterns because it uses Google's index. Exa uses its own neural index with embeddings-based retrieval, which finds semantically related content that keyword matching misses. Serper is better for current events and when you want Google-equivalent results. Exa is better for complex, multi-condition queries and finding content that doesn't share exact keywords with your query.
What does Exa Search API cost?
Exa offers 1,000 free requests per month. Search with contents is $7 per 1,000 requests (includes up to 10 results with text and highlights). Deep search is $12 per 1,000. Deep-reasoning is $15 per 1,000. The contents endpoint is $1 per 1,000 pages. AI summaries are $1 per 1,000 pages. Enterprise plans with custom rate limits and indexing are available. Startups and education projects can apply for $1,000 in free credits.
Does Cursor use Exa for @web?
Yes. Cursor's official documentation states: "With @Web, Cursor searches the web using exa.ai to find up-to-date information and add it as context." A separate language model examines your message, conversation history, and current file to determine the search query, which is then sent to Exa/SerpApi.
What search types does Exa support?
Exa supports: auto (default, ~1s), instant (sub-200ms for real-time chat), fast (~450ms), deep-lite (2-10s lightweight synthesis), deep (5-60s multi-step agentic search), deep-reasoning (10-60s higher synthesis), and deep-max (highest-effort mode). The auto type intelligently combines neural and keyword methods based on query characteristics.
WarpGrep: Code Search for AI Agents
Exa handles web search. WarpGrep handles code search. An RL-trained subagent that finds the right code in under 4 seconds with 8 parallel tool calls per turn. 0.73 F1 on retrieval benchmarks. #1 on SWE-Bench Pro when paired with frontier models.