Exa Search API: Embeddings-First Web Search for AI Agents (2026)

When you type @web in Cursor, the search goes to Exa. Not Google. Not Bing. A search engine built from scratch for machines instead of humans. That design choice explains most of what makes Exa different.

$700M

Valuation ($85M Series B, Benchmark-led)

<200ms

Instant search latency

1B+

LinkedIn profiles indexed

Per 1,000 search requests

What Exa Is

Exa is a web search API designed for AI agents and LLMs. The core difference from Google, Bing, or any traditional search engine: Exa uses neural embeddings to understand query meaning, not keyword matching.

The company was founded in 2021 by Will Bryk and Jeff Wang, two Harvard roommates who watched GPT-3 launch and saw a gap. GPT-3 could understand natural language at a level that Google's keyword algorithms could not match. But search, the primary way software retrieves information, was still built on term frequency and link graphs. Bryk and Wang started Metaphor Systems (YC W22), rebranded to Exa in January 2024, and raised $85M at a $700M valuation in September 2025.

Exa now indexes tens of billions of web pages with minute-level refresh rates. The infrastructure is built in Rust, running a custom vector database with Matryoshka embeddings, document clustering, binary compression, and assembly-level SIMD optimizations. The embedding model was trained for over a month on a 144-GPU H200 cluster.

The business model is per-query pricing instead of ads. As Bryk puts it: "Our incentive is to make people like the search as much as possible. Google's incentive is making them click on ads."

How Embeddings Search Differs from Keywords

Traditional search engines match words. You type "best smartphone 2026," and they return pages containing those exact terms, ranked by link authority and engagement signals. If a page about "top mobile devices" never uses the word "smartphone," keyword search misses it.

Exa converts both the query and every indexed page into high-dimensional vectors (embeddings) that represent semantic meaning. Finding relevant results means finding vectors that are close in this mathematical space. "Best smartphone 2026" and "top mobile devices this year" produce nearly identical vectors despite sharing zero keywords.

How it works under the hood

When you submit a query, Exa encodes it into a vector, typically hundreds or thousands of numbers. Every web page in the index has already been encoded the same way. Search becomes a nearest-neighbor lookup: find the page vectors closest to the query vector. The embedding model captures relationships like synonymy, topic similarity, and conceptual proximity that keyword matching cannot.

This is particularly powerful for AI agents. An agent constructing a query like "companies in San Francisco using assembly language for embedded systems" has no single keyword to match. Exa's embeddings encode the full intent and return pages that match semantically.

The tradeoff: embeddings can falter on highly specific technical queries where exact string matching is what you need. Searching for a specific function name or error code is better served by keyword search. Exa's auto mode intelligently combines neural and keyword methods to handle both cases.

Search Types and Latency

Exa exposes multiple search types tuned for different speed-quality tradeoffs. This matters because a chatbot needs sub-200ms results while a research agent can afford to spend a minute finding the best answer.

Type	Latency	Best For	How It Works
instant	~200ms	Real-time chat, voice assistants	Lowest-latency search optimized for interactive use
fast	~450ms	Speed-sensitive apps with quality needs	Streamlined neural search with minimal quality loss
auto (default)	~1s	General-purpose use	Intelligently combines neural and keyword methods
deep-lite	2-10s	Quick synthesis tasks	Lightweight multi-step search with synthesized output
deep	5-60s	Research agents, complex queries	Multi-step agentic search with structured outputs and grounding
deep-reasoning	10-60s	Deep research, analytical work	Higher-reasoning synthesis for demanding queries

The deep family is where Exa gets interesting for agent builders. Deep search doesn't just embed and retrieve. It agentically searches, processes results, and searches again until it finds high-quality information. The P50 latency is 3.5 seconds, but it can run longer for complex queries. This is Exa doing the multi-turn reasoning loop internally, so your agent doesn't have to.

API Usage and Code Examples

The Exa SDK is available for TypeScript/JavaScript and Python. Install with npm install exa-js or pip install exa_py.

Basic search with content extraction (TypeScript)

import Exa from "exa-js";

const exa = new Exa(process.env.EXA_API_KEY);

// Semantic search with highlights
const results = await exa.searchAndContents(
  "companies using LLMs for contract review",
  {
    type: "auto",
    numResults: 10,
    highlights: { maxCharacters: 4000 },
  }
);

for (const result of results.results) {
  console.log(result.title, result.url);
  console.log(result.highlights.join("\n"));
}

Deep search with structured output

// Deep search with JSON schema output
const research = await exa.search(
  "What are the leading open-source LLM inference engines?",
  {
    type: "deep",
    numResults: 10,
    contents: {
      text: true,
      summary: { query: "Key features and performance" },
    },
  }
);

// Each result includes text, summary, and grounding citations
for (const r of research.results) {
  console.log(r.title);
  console.log(r.summary);
}

Find similar pages

// Find pages similar to a given URL
const similar = await exa.findSimilarAndContents(
  "https://arxiv.org/abs/2401.14196",
  {
    numResults: 10,
    highlights: true,
    excludeDomains: ["arxiv.org"],
  }
);

Category-specific search

// Search Exa's specialized indexes
const people = await exa.searchAndContents(
  "machine learning engineers at fintech startups",
  {
    category: "people",
    numResults: 20,
  }
);

const companies = await exa.searchAndContents(
  "Series A AI infrastructure companies",
  {
    category: "company",
    numResults: 15,
  }
);

const papers = await exa.searchAndContents(
  "retrieval augmented generation improvements 2025",
  {
    category: "research paper",
    numResults: 10,
    contents: { text: { maxCharacters: 2000 } },
  }
);

Key API Features

Contents Extraction

Return clean text, highlights, or AI-generated summaries inline with search results. Highlights are 10x more token-efficient than full text. Summaries can follow a JSON schema for structured extraction.

Domain Filtering

includeDomains and excludeDomains (up to 1,200 each) let you scope searches to specific sites. Date filters (startPublishedDate, endPublishedDate) constrain by publication time.

Specialized Indexes

Category parameter targets curated indexes: 1B+ LinkedIn profiles (people), 50M+ companies, 100M+ research papers, plus news, financial reports, and personal sites.

Live Crawl

maxAgeHours controls cache freshness. Set to 0 for always-fresh live crawling. subpages parameter crawls child pages for deeper extraction from a single result.

Vercel AI SDK integration

Exa ships an official Vercel AI SDK provider (@exalabs/ai-sdk). You can add web search as a tool alongside your language model in a few lines, with results automatically formatted for the AI SDK's tool-call interface.

Exa vs Tavily vs Serper

Three APIs dominate web search for AI agents. Each makes different tradeoffs.

Dimension	Exa	Tavily	Serper
Search method	Neural embeddings over proprietary index	Aggregates + synthesizes from multiple sources	Proxies Google Search results
Agent benchmark score	8.7 / 10	8.6 / 10	8.0 / 10
Best at	Semantic/conceptual research, entity search	Agent-native workflows, pre-synthesized content	Current events, Google-familiar results
Weakest at	Highly specific technical string queries	Result ordering consistency between versions	No semantic mode, purely keyword-based
Content extraction	Built-in: text, highlights, summaries, JSON schema	Built-in: raw content, synthesized answers	Snippets only, separate scraping needed
Specialized indexes	People, companies, research papers, news	No	News, images, scholar (via Google)
Free tier	1,000 requests/month	1,000 requests/month	2,500 requests/month
Paid pricing	$7 / 1K searches (with contents)	$5 / 1K searches	$1.65 / 1K searches (Google results)
Latency (standard)	~1s (auto)	~1-2s	~0.5s

When Each Wins

Choose Exa when your agent needs to find semantically related content across topic clusters, search for specific people or companies, or process paragraph-length queries that would break keyword search. Exa's embeddings find content that keyword engines miss entirely.

Choose Tavily when you want content pre-processed for LLM consumption without building extraction logic. Tavily's search_depth parameter gives you a genuine latency-quality lever, and it aggregates from up to 20 sources per query.

Choose Serper when freshness matters most, when you want Google's result quality at lower cost, or when your queries are keyword-shaped. Serper's structured JSON includes knowledge graphs and answer boxes that other APIs lack.

Who Uses Exa

Exa serves thousands of companies. The most notable integration is Cursor, where Exa powers the @web feature. When you use @web in Cursor's chat, a separate model determines the search query from your message and conversation history, then sends it to Exa. The results come back as structured context for the coding model.

This is documented in Cursor's official docs: "With @Web, Cursor searches the web using exa.ai to find up-to-date information and add it as context."

Cursor

Powers @web for in-editor web search. A language model determines the search query from context, Exa returns structured results, and the coding model uses them for answers.

Notion AI

Uses Exa for news search within Notion's AI features. The semantic search helps surface relevant news without requiring exact keyword matches.

Financial Services

Top private equity and consulting firms use Exa for financial data retrieval. The company search index and domain filtering make it useful for due diligence and market mapping.

Flatfile

Reported 15-20x faster market mapping at lower costs compared to conventional list vendors. Exa's company and people indexes replaced manual research workflows.

MCP Integrations

Exa ships an official MCP server that connects Claude Desktop, Claude Code, VS Code, Windsurf, Gemini CLI, and other AI assistants to web search and code search tools.

Websets (Lead Gen)

Exa's Websets product uses the same search infrastructure for B2B lead generation. AI agents verify every result against specified criteria, reducing false positives.

Pricing

Exa simplified its pricing in March 2026, bundling content extraction into base search costs for most users.

Endpoint	Price per 1,000 Requests	Notes
Search (with contents)	$7	Includes up to 10 results with text and highlights
Deep search	$12	Multi-step agentic search with grounding
Deep-reasoning	$15	Higher synthesis quality for analytical queries
Answer endpoint	$5	Direct answers with citations
Contents (standalone)	$1 per 1K pages	Extract text/highlights from known URLs
Additional results (>10)	$1 per 1K results	Applies to search, deep, and answer
AI summaries	$1 per 1K pages	Available across all endpoints

Free tier and grants

Exa provides 1,000 free requests per month on the free plan. Startups and education projects can apply for $1,000 in free credits. Enterprise plans include custom rate limits, dedicated support, and custom indexing.

When to Use Exa (and When Not To)

Use Exa when:

Your agent needs semantic understanding of queries. "Companies in SF using Rust for systems programming" is a natural Exa query. No keyword engine handles this well.
You need structured content extraction built into search. Exa returns clean text, highlights, and summaries inline. No separate scraping step.
You're searching for people or companies. The specialized indexes (1B+ LinkedIn profiles, 50M+ companies) are stronger than general web search for entity lookup.
You want findSimilar functionality. Give Exa a URL and it returns semantically similar pages, useful for competitive analysis and research expansion.
Your agent sends long, natural-language queries. Exa handles paragraph-length queries natively. Google-based APIs degrade on anything beyond a few keywords.

Don't use Exa when:

You need exact string matching. Searching for a specific error message or function name is better served by keyword search or grep.
Freshness is the primary concern and you trust Google's index. Serper gives you Google's results at lower cost with better freshness signals.
You're searching within a codebase, not the web. Web search APIs, including Exa, are the wrong tool for repository-level code retrieval. That requires purpose-built code search.

Web search vs code search

Exa solves web retrieval for agents. Code retrieval is a different problem with different tools. Agentic code search uses multi-turn reasoning with parallel tool calls (grep, file reads, directory listing) to follow import chains and call graphs. Models like WarpGrep are RL-trained for this: 8 parallel tool calls per turn, 3-4 turns to find the right code, 0.73 F1 on retrieval benchmarks. Web search for web context, code search for code context.

Frequently Asked Questions

What is Exa Search API?

Exa Search API is a web search engine built for AI agents and LLMs. Instead of keyword matching like Google, Exa uses neural embeddings to understand the semantic meaning of queries and return structured results. It powers Cursor's @web feature and is used by companies like Notion AI for news search. The API offers multiple search types from sub-200ms instant search to 60-second deep research, with built-in content extraction that returns clean text instead of raw HTML.

How does Exa compare to Tavily?

Exa and Tavily are both built for AI agents but take different approaches. Exa uses neural embeddings over a proprietary index of tens of billions of pages, excelling at semantic and conceptual retrieval. Tavily aggregates and synthesizes content from multiple sources, optimized for agent consumption patterns with features like search_depth for latency-quality tradeoffs. Exa scores 8.7 in agent benchmarks versus Tavily's 8.6. Choose Exa for semantic research across topic clusters. Choose Tavily when you want pre-synthesized content with built-in extraction.

How does Exa compare to Serper?

Serper proxies Google Search results and returns structured JSON. It excels at freshness and familiar result patterns because it uses Google's index. Exa uses its own neural index with embeddings-based retrieval, which finds semantically related content that keyword matching misses. Serper is better for current events and when you want Google-equivalent results. Exa is better for complex, multi-condition queries and finding content that doesn't share exact keywords with your query.

What does Exa Search API cost?

Exa offers 1,000 free requests per month. Search with contents is $7 per 1,000 requests (includes up to 10 results with text and highlights). Deep search is $12 per 1,000. Deep-reasoning is $15 per 1,000. The contents endpoint is $1 per 1,000 pages. AI summaries are $1 per 1,000 pages. Enterprise plans with custom rate limits and indexing are available. Startups and education projects can apply for $1,000 in free credits.

Does Cursor use Exa for @web?

Yes. Cursor's official documentation states: "With @Web, Cursor searches the web using exa.ai to find up-to-date information and add it as context." A separate language model examines your message, conversation history, and current file to determine the search query, which is then sent to Exa/SerpApi.

What search types does Exa support?

Exa supports: auto (default, ~1s), instant (sub-200ms for real-time chat), fast (~450ms), deep-lite (2-10s lightweight synthesis), deep (5-60s multi-step agentic search), deep-reasoning (10-60s higher synthesis), and deep-max (highest-effort mode). The auto type intelligently combines neural and keyword methods based on query characteristics.

WarpGrep: Code Search for AI Agents

Exa handles web search. WarpGrep handles code search. An RL-trained subagent that finds the right code in under 4 seconds with 8 parallel tool calls per turn. 0.73 F1 on retrieval benchmarks. #1 on SWE-Bench Pro when paired with frontier models.

Try WarpGrep

How Agentic Code Search Works

Morph Fast Apply

Morph WarpGrep

Morph Compact

Morph Glance

Morph MCP

Morph Monitor

Blog

Startup Credits

Students

Contact Us

About

Careers

Exa Search API: The Embeddings-First Web Search That Powers Cursor's @web