Turbopuffer vs Pinecone 2026: Architecture, Pricing, and Why Cursor Switched

Quick Verdict: Turbopuffer vs Pinecone

Bottom Line

Turbopuffer is built on object storage. Pinecone is built on SSDs. If most of your vectors are cold (accessed infrequently), Turbopuffer costs 10-100x less. If you need guaranteed sub-10ms latency on every query regardless of access pattern, Pinecone is the safer bet. Cursor, Notion, and Linear chose Turbopuffer. Most enterprise RAG teams still default to Pinecone.

$0.02/GB

Turbopuffer storage (object storage)

$0.33/GB

Pinecone storage (serverless)

95%

Cost reduction when Cursor switched

Architecture: Object Storage vs SSD-First

This is the fundamental difference. Everything else (pricing, latency, multi-tenancy) follows from this architectural choice.

Turbopuffer: S3 as Source of Truth

All vectors live on object storage (S3, GCS, or Azure Blob) by default. Hot data gets cached on NVMe SSDs and RAM automatically based on access patterns. Cold queries hit object storage at ~300-500ms. Warm queries hit cache at sub-10ms. The SPFresh centroid-based index minimizes roundtrips to storage by first identifying which clusters to fetch, then downloading only those clusters.

Pinecone: SSD-First Serverless

Pinecone Serverless stores vectors on SSDs with a compute layer that scales independently. Queries hit pre-indexed data on fast storage. Latency is consistently low regardless of access patterns. The tradeoff is cost: SSD storage with replication runs roughly $0.60/GB, which Pinecone passes through at $0.33/GB. For hot workloads this is fine. For cold workloads it is 16x more expensive than object storage.

The Pufferfish Effect

Turbopuffer calls their caching model the "pufferfish effect." Data inflates from object storage (~$20/TB/month) to NVMe SSD (~$100/TB/month) to RAM as it gets accessed more frequently. When access drops, data deflates back to cheaper tiers. You pay for the tier the data actually lives on, not the most expensive tier. For workloads where 90% of data is cold (most code search indexes, most multi-tenant RAG systems), this saves an order of magnitude.

Write Path Differences

Turbopuffer writes go to a write-ahead log on object storage first, then get asynchronously indexed. Write latency is ~285ms p50. Throughput is high (10,000+ vectors/sec), but each namespace allows one WAL entry per second, with concurrent writes batched together. Pinecone writes are faster at the individual operation level and index in near-real-time. If your use case requires rapid, low-latency writes with immediate searchability, Pinecone has the edge.

How Cursor Uses Turbopuffer for Code Search

Cursor is the highest-profile Turbopuffer customer and the clearest illustration of why object storage matters for code search.

Cursor indexes millions of developer codebases. Each codebase gets chunked using tree-sitter for AST-aware splitting, embedded, and stored as vectors in Turbopuffer. When you ask Cursor a question about your code, it computes an embedding for your query, sends it to Turbopuffer for nearest-neighbor search, and retrieves the most semantically relevant code chunks.

The economics made the migration inevitable. Most developers open Cursor, work on their project for a few hours, then close it. Their codebase embeddings sit untouched until the next session. With an SSD-first database, you pay peak storage rates 24/7 for vectors that get queried a few hours a day. With Turbopuffer, cold embeddings live on S3 at $0.02/GB. When a developer opens their project, the first query takes ~300ms as vectors load from object storage into cache. Subsequent queries in that session hit cache at sub-10ms.

The Cursor Migration

When Cursor emailed Turbopuffer about their vector DB costs, Simon Eskildsen flew to San Francisco. The team migrated the entire workload within a week, cutting costs by 95%. Turbopuffer's CTO pulled 24-hour coding sessions during the proof-of-concept, fixing 300ms of latency in three hours. Cursor now stores billions of vectors across millions of codebases on Turbopuffer.

Why This Matters for Code Search

Code embeddings have a natural hot/cold pattern. Active projects get queried constantly. Abandoned repos, old branches, and rarely-opened projects sit cold. A vector database that charges the same rate for hot and cold data penalizes this access pattern. Turbopuffer's tiered storage matches how developers actually work: burst access during coding sessions, idle the rest of the time.

Pricing Comparison

Dimension	Turbopuffer	Pinecone Serverless
Minimum monthly spend	$64 (Launch)	$0 (Starter) / $50 (Standard)
Storage cost	~$0.02/GB (object storage)	$0.33/GB
Read pricing	Per GB queried + returned (volume discounts)	$8.25/1M read units (Standard)
Write pricing	Per GB written (batch discounts up to 50%)	$2/1M write units
Free tier	No	Yes (2GB storage, limited RU/WU)
Enterprise minimum	$4,096/month	$500/month
Cost at 100M vectors (1536-dim)	~$90-220/month	~$500-2,000/month
Cost at 1B vectors (1536-dim)	~$500-2,000/month	~$5,000-20,000/month

The gap widens with scale. At small volumes (under 1M vectors), Pinecone's free Starter tier and $50 Standard minimum beat Turbopuffer's $64 minimum. At 100M+ vectors, Turbopuffer's object storage economics dominate. The exact savings depend on your hot/cold ratio: the more cold data you have, the bigger the gap.

Pinecone's pricing is also harder to predict. Read unit costs depend on namespace size (1 RU per GB of namespace queried), write unit costs depend on request size (1 WU per KB), and the actual dollar-per-unit rate varies by cloud provider and region. Turbopuffer bills on logical bytes, which is easier to forecast.

Multi-Tenancy: Where Turbopuffer Pulls Ahead

If you are building a multi-tenant application (one namespace per customer, per codebase, or per document collection), namespace limits matter.

Turbopuffer: No Namespace Limits

Turbopuffer has no enforced namespace limits. Each namespace is isolated on object storage, so adding tenants does not degrade performance for existing tenants. Cursor runs millions of namespaces (one per codebase) without hitting caps. This is a direct benefit of the object-storage-first architecture: creating a new namespace is just creating a new S3 prefix.

Pinecone: 10K-100K Namespaces, 20 Indexes

Pinecone supports up to 10,000 namespaces per index on Standard plans (100,000 on Enterprise), with a limit of 20 indexes. For most applications this is plenty. But if you need millions of isolated tenants (like Cursor's per-codebase model), you either hit the cap or resort to workarounds like packing multiple tenants into shared namespaces with metadata filters.

This is the second reason Cursor chose Turbopuffer, beyond cost. Millions of codebases means millions of namespaces. Pinecone's 10K namespace limit on Standard (100K on Enterprise) would have required architectural compromises.

Feature Comparison

Feature	Turbopuffer	Pinecone
Vector search (ANN)	SPFresh centroid-based	Proprietary (optimized ANN)
Hybrid search (BM25 + vector)	Yes, native BM25	No native BM25 (keyword via sparse vectors)
Metadata filtering	Yes	Yes
Built-in embeddings	No (BYOE)	Yes (integrated embedding models)
Built-in reranking	No	Yes
RAG pipeline (Assistant)	No	Yes (Pinecone Assistant)
Consistency model	Strong consistency (default), eventual optional	Eventual consistency
Update semantics	Primary key upserts (LSM engine)	ID-based upserts
SOC 2 Type II	Yes	Yes
HIPAA	Yes (Scale plan BAA)	Yes (Enterprise)
Regions	AWS, GCP, Azure	AWS, GCP, Azure
Self-host / open source	No (managed only)	No (managed only)

Pinecone has more features out of the box: built-in embeddings, reranking, and a full RAG pipeline via Pinecone Assistant. If you want a batteries-included solution where you upload documents and get answers, Pinecone offers that. Turbopuffer is a lower-level primitive: it stores vectors and searches them, and it does both very cheaply at scale. You bring your own embedding model, your own reranker, your own RAG orchestration.

Turbopuffer's native BM25 support is notable. Hybrid search (combining keyword matching with vector similarity) consistently outperforms pure vector search for retrieval quality. With Pinecone, you need to encode keywords as sparse vectors. With Turbopuffer, BM25 is a first-class query type alongside vector search.

When Turbopuffer Wins

Billions of Vectors, Mostly Cold

Code search (like Cursor), document archives, multi-tenant SaaS with thousands of customers. Any workload where most data is accessed infrequently. Object storage costs $0.02/GB. SSDs cost 16x more. The math is straightforward.

Unlimited Multi-Tenancy

No namespace limits. Each tenant gets full isolation without performance degradation. If you need one namespace per user, per repo, or per document collection, and the count is in the hundreds of thousands or millions, Turbopuffer is the only managed option that does not cap you.

Hybrid Search (BM25 + Vector)

Native BM25 alongside vector search. No need to encode keywords as sparse vectors or run a separate search engine. For RAG applications, hybrid search retrieves 10-30% more relevant results than vector-only search, especially for code and technical content where exact keyword matches matter.

Predictable Billing

Billed on logical bytes (data you control), not physical storage or opaque unit pricing. Turbopuffer's pricing is easier to model and forecast than Pinecone's RU/WU system where costs vary by namespace size, request size, region, and provider.

When Pinecone Wins

Consistent Low Latency

Every query hits SSD-backed storage. No cold starts, no 300ms first-query penalty. If your application cannot tolerate variable latency (real-time chat, recommendation feeds, search-as-you-type), Pinecone's architecture delivers more predictable performance.

Getting Started Fast

Free Starter tier. Built-in embeddings, reranking, and RAG pipeline (Pinecone Assistant). Upload documents, get answers. No need to choose an embedding model, build chunking logic, or orchestrate a retrieval pipeline. For prototyping or small-scale production, Pinecone removes more friction.

Small-Scale Workloads

Under 10M vectors, Pinecone's $0 Starter or $50 Standard plan is cheaper than Turbopuffer's $64 minimum. The cost advantage of object storage only kicks in at scale. If your dataset fits in a few gigabytes, storage costs are noise either way.

Enterprise Ecosystem

Pinecone has been in market since 2019. More integrations (LangChain, LlamaIndex, Haystack), more tutorials, more Stack Overflow answers. The enterprise sales motion is mature. If your procurement team needs a vendor with a long track record and extensive documentation, Pinecone has it.

Or Skip the Vector DB Entirely

Both Turbopuffer and Pinecone solve the same problem: store embeddings, search them fast. But if your goal is code search specifically, a vector database is one component of a multi-step pipeline. You need to chunk code (tree-sitter, line-based, or function-level), choose and run an embedding model, manage index updates as code changes, handle stale embeddings, tune retrieval parameters, and build the query layer. Cursor built all of this. Most teams do not need to.

WarpGrep provides semantic code search as a managed API. It handles chunking, embedding, indexing, and retrieval. You send a natural language query and a repository, and get back the relevant code. No vector database to manage, no embedding pipeline to build, no stale index to debug.

This is not a replacement for Turbopuffer or Pinecone in general. If you are building a RAG pipeline over documents, a recommendation engine, or a multi-modal search system, you need a vector database. But if your specific use case is "help my coding agent find the right code," WarpGrep solves the problem at a higher level of abstraction.

The Build vs Buy Calculation

Cursor built their own code search pipeline on Turbopuffer because they have the engineering team, the scale to justify it, and code search is their core product. For teams using coding agents (Claude Code, Aider, Cline, OpenCode) who want better code context without building infrastructure, WarpGrep provides the same capability as a single API call.

Frequently Asked Questions

Why did Cursor switch from Pinecone to Turbopuffer?

Cursor indexes millions of codebases, most of which are accessed infrequently. Turbopuffer's object-storage-first architecture stores cold vectors at $0.02/GB instead of SSD rates. The migration cut Cursor's vector database costs by 95%. Turbopuffer's unlimited namespaces (one per codebase) also eliminated the namespace cap issue they would have hit with Pinecone.

Is Turbopuffer cheaper than Pinecone?

At scale, significantly. Storage alone is $0.02/GB vs $0.33/GB. But Turbopuffer's minimum is $64/month compared to Pinecone's free Starter tier. For small workloads (under 10M vectors, light query volume), Pinecone is cheaper or free. The crossover point depends on data volume and access patterns, but Turbopuffer typically becomes cheaper once you pass a few gigabytes of vector data.

What is Turbopuffer's cold query latency?

Around 300-500ms when data needs to be fetched from object storage. After the first query, subsequent queries to the same namespace hit NVMe/RAM cache at sub-10ms. For search and RAG applications where the user is already waiting for an LLM response (which takes 1-5 seconds), 300ms of added latency on the first retrieval is rarely noticeable.

Does Turbopuffer have a free tier?

No. The minimum plan (Launch) costs $64/month. Pinecone's Starter tier is free with 2GB storage. If you are prototyping or running a small side project, Pinecone Starter, the Qdrant free tier, or self-hosted pgvector are more cost-effective starting points.

Can I use Turbopuffer for code search?

Yes. Cursor uses Turbopuffer for exactly this. You will need to build the embedding pipeline yourself: chunking code, generating embeddings with a model like OpenAI text-embedding-3-small, and managing index freshness. Alternatively, WarpGrep provides semantic code search as a managed API without requiring a vector database.

Who else uses Turbopuffer besides Cursor?

Notion (for Notion AI search), Linear, Superhuman, Anthropic, Atlassian, Suno, and Telus. The company grew revenue 10x in 2025, manages trillions of vectors and tens of petabytes, and has ARR in the tens of millions. It was founded by Simon Eskildsen (ex-Shopify infrastructure lead) and Justine Li, with 17 employees as of early 2026.

Related Comparisons

Skip the Embedding Pipeline. Search Code Directly.

WarpGrep provides semantic code search as a managed API. No vector database, no embedding model, no chunking logic. Send a query, get the relevant code. Works with Claude Code, Cursor, Aider, and any MCP-compatible tool.

Try WarpGrep for Semantic Code Search

See How It Works

Fast Apply

WarpGrep

Compact

Model Router

DeepSeek

MiniMax

Qwen

Blog

Startup Credits

Students

Contact Us

About

Careers

Turbopuffer vs Pinecone: Why Cursor Chose a $0 Vector DB Over Pinecone