Quick Verdict: Turbopuffer vs Pinecone
Bottom Line
Turbopuffer is built on object storage. Pinecone is built on SSDs. If most of your vectors are cold (accessed infrequently), Turbopuffer costs 10-100x less. If you need guaranteed sub-10ms latency on every query regardless of access pattern, Pinecone is the safer bet. Cursor, Notion, and Linear chose Turbopuffer. Most enterprise RAG teams still default to Pinecone.
Architecture: Object Storage vs SSD-First
This is the fundamental difference. Everything else (pricing, latency, multi-tenancy) follows from this architectural choice.
Turbopuffer: S3 as Source of Truth
All vectors live on object storage (S3, GCS, or Azure Blob) by default. Hot data gets cached on NVMe SSDs and RAM automatically based on access patterns. Cold queries hit object storage at ~300-500ms. Warm queries hit cache at sub-10ms. The SPFresh centroid-based index minimizes roundtrips to storage by first identifying which clusters to fetch, then downloading only those clusters.
Pinecone: SSD-First Serverless
Pinecone Serverless stores vectors on SSDs with a compute layer that scales independently. Queries hit pre-indexed data on fast storage. Latency is consistently low regardless of access patterns. The tradeoff is cost: SSD storage with replication runs roughly $0.60/GB, which Pinecone passes through at $0.33/GB. For hot workloads this is fine. For cold workloads it is 16x more expensive than object storage.
The Pufferfish Effect
Turbopuffer calls their caching model the "pufferfish effect." Data inflates from object storage (~$20/TB/month) to NVMe SSD (~$100/TB/month) to RAM as it gets accessed more frequently. When access drops, data deflates back to cheaper tiers. You pay for the tier the data actually lives on, not the most expensive tier. For workloads where 90% of data is cold (most code search indexes, most multi-tenant RAG systems), this saves an order of magnitude.
Write Path Differences
Turbopuffer writes go to a write-ahead log on object storage first, then get asynchronously indexed. Write latency is ~285ms p50. Throughput is high (10,000+ vectors/sec), but each namespace allows one WAL entry per second, with concurrent writes batched together. Pinecone writes are faster at the individual operation level and index in near-real-time. If your use case requires rapid, low-latency writes with immediate searchability, Pinecone has the edge.
How Cursor Uses Turbopuffer for Code Search
Cursor is the highest-profile Turbopuffer customer and the clearest illustration of why object storage matters for code search.
Cursor indexes millions of developer codebases. Each codebase gets chunked using tree-sitter for AST-aware splitting, embedded, and stored as vectors in Turbopuffer. When you ask Cursor a question about your code, it computes an embedding for your query, sends it to Turbopuffer for nearest-neighbor search, and retrieves the most semantically relevant code chunks.
The economics made the migration inevitable. Most developers open Cursor, work on their project for a few hours, then close it. Their codebase embeddings sit untouched until the next session. With an SSD-first database, you pay peak storage rates 24/7 for vectors that get queried a few hours a day. With Turbopuffer, cold embeddings live on S3 at $0.02/GB. When a developer opens their project, the first query takes ~300ms as vectors load from object storage into cache. Subsequent queries in that session hit cache at sub-10ms.
The Cursor Migration
When Cursor emailed Turbopuffer about their vector DB costs, Simon Eskildsen flew to San Francisco. The team migrated the entire workload within a week, cutting costs by 95%. Turbopuffer's CTO pulled 24-hour coding sessions during the proof-of-concept, fixing 300ms of latency in three hours. Cursor now stores billions of vectors across millions of codebases on Turbopuffer.
Why This Matters for Code Search
Code embeddings have a natural hot/cold pattern. Active projects get queried constantly. Abandoned repos, old branches, and rarely-opened projects sit cold. A vector database that charges the same rate for hot and cold data penalizes this access pattern. Turbopuffer's tiered storage matches how developers actually work: burst access during coding sessions, idle the rest of the time.
Pricing Comparison
| Dimension | Turbopuffer | Pinecone Serverless |
|---|---|---|
| Minimum monthly spend | $64 (Launch) | $0 (Starter) / $50 (Standard) |
| Storage cost | ~$0.02/GB (object storage) | $0.33/GB |
| Read pricing | Per GB queried + returned (volume discounts) | $8.25/1M read units (Standard) |
| Write pricing | Per GB written (batch discounts up to 50%) | $2/1M write units |
| Free tier | No | Yes (2GB storage, limited RU/WU) |
| Enterprise minimum | $4,096/month | $500/month |
| Cost at 100M vectors (1536-dim) | ~$90-220/month | ~$500-2,000/month |
| Cost at 1B vectors (1536-dim) | ~$500-2,000/month | ~$5,000-20,000/month |
The gap widens with scale. At small volumes (under 1M vectors), Pinecone's free Starter tier and $50 Standard minimum beat Turbopuffer's $64 minimum. At 100M+ vectors, Turbopuffer's object storage economics dominate. The exact savings depend on your hot/cold ratio: the more cold data you have, the bigger the gap.
Pinecone's pricing is also harder to predict. Read unit costs depend on namespace size (1 RU per GB of namespace queried), write unit costs depend on request size (1 WU per KB), and the actual dollar-per-unit rate varies by cloud provider and region. Turbopuffer bills on logical bytes, which is easier to forecast.
Multi-Tenancy: Where Turbopuffer Pulls Ahead
If you are building a multi-tenant application (one namespace per customer, per codebase, or per document collection), namespace limits matter.
Turbopuffer: No Namespace Limits
Turbopuffer has no enforced namespace limits. Each namespace is isolated on object storage, so adding tenants does not degrade performance for existing tenants. Cursor runs millions of namespaces (one per codebase) without hitting caps. This is a direct benefit of the object-storage-first architecture: creating a new namespace is just creating a new S3 prefix.
Pinecone: 10K-100K Namespaces, 20 Indexes
Pinecone supports up to 10,000 namespaces per index on Standard plans (100,000 on Enterprise), with a limit of 20 indexes. For most applications this is plenty. But if you need millions of isolated tenants (like Cursor's per-codebase model), you either hit the cap or resort to workarounds like packing multiple tenants into shared namespaces with metadata filters.
This is the second reason Cursor chose Turbopuffer, beyond cost. Millions of codebases means millions of namespaces. Pinecone's 10K namespace limit on Standard (100K on Enterprise) would have required architectural compromises.
Feature Comparison
| Feature | Turbopuffer | Pinecone |
|---|---|---|
| Vector search (ANN) | SPFresh centroid-based | Proprietary (optimized ANN) |
| Hybrid search (BM25 + vector) | Yes, native BM25 | No native BM25 (keyword via sparse vectors) |
| Metadata filtering | Yes | Yes |
| Built-in embeddings | No (BYOE) | Yes (integrated embedding models) |
| Built-in reranking | No | Yes |
| RAG pipeline (Assistant) | No | Yes (Pinecone Assistant) |
| Consistency model | Strong consistency (default), eventual optional | Eventual consistency |
| Update semantics | Primary key upserts (LSM engine) | ID-based upserts |
| SOC 2 Type II | Yes | Yes |
| HIPAA | Yes (Scale plan BAA) | Yes (Enterprise) |
| Regions | AWS, GCP, Azure | AWS, GCP, Azure |
| Self-host / open source | No (managed only) | No (managed only) |
Pinecone has more features out of the box: built-in embeddings, reranking, and a full RAG pipeline via Pinecone Assistant. If you want a batteries-included solution where you upload documents and get answers, Pinecone offers that. Turbopuffer is a lower-level primitive: it stores vectors and searches them, and it does both very cheaply at scale. You bring your own embedding model, your own reranker, your own RAG orchestration.
Turbopuffer's native BM25 support is notable. Hybrid search (combining keyword matching with vector similarity) consistently outperforms pure vector search for retrieval quality. With Pinecone, you need to encode keywords as sparse vectors. With Turbopuffer, BM25 is a first-class query type alongside vector search.
When Turbopuffer Wins
Billions of Vectors, Mostly Cold
Code search (like Cursor), document archives, multi-tenant SaaS with thousands of customers. Any workload where most data is accessed infrequently. Object storage costs $0.02/GB. SSDs cost 16x more. The math is straightforward.
Unlimited Multi-Tenancy
No namespace limits. Each tenant gets full isolation without performance degradation. If you need one namespace per user, per repo, or per document collection, and the count is in the hundreds of thousands or millions, Turbopuffer is the only managed option that does not cap you.
Hybrid Search (BM25 + Vector)
Native BM25 alongside vector search. No need to encode keywords as sparse vectors or run a separate search engine. For RAG applications, hybrid search retrieves 10-30% more relevant results than vector-only search, especially for code and technical content where exact keyword matches matter.
Predictable Billing
Billed on logical bytes (data you control), not physical storage or opaque unit pricing. Turbopuffer's pricing is easier to model and forecast than Pinecone's RU/WU system where costs vary by namespace size, request size, region, and provider.
When Pinecone Wins
Consistent Low Latency
Every query hits SSD-backed storage. No cold starts, no 300ms first-query penalty. If your application cannot tolerate variable latency (real-time chat, recommendation feeds, search-as-you-type), Pinecone's architecture delivers more predictable performance.
Getting Started Fast
Free Starter tier. Built-in embeddings, reranking, and RAG pipeline (Pinecone Assistant). Upload documents, get answers. No need to choose an embedding model, build chunking logic, or orchestrate a retrieval pipeline. For prototyping or small-scale production, Pinecone removes more friction.
Small-Scale Workloads
Under 10M vectors, Pinecone's $0 Starter or $50 Standard plan is cheaper than Turbopuffer's $64 minimum. The cost advantage of object storage only kicks in at scale. If your dataset fits in a few gigabytes, storage costs are noise either way.
Enterprise Ecosystem
Pinecone has been in market since 2019. More integrations (LangChain, LlamaIndex, Haystack), more tutorials, more Stack Overflow answers. The enterprise sales motion is mature. If your procurement team needs a vendor with a long track record and extensive documentation, Pinecone has it.
Or Skip the Vector DB Entirely
Both Turbopuffer and Pinecone solve the same problem: store embeddings, search them fast. But if your goal is code search specifically, a vector database is one component of a multi-step pipeline. You need to chunk code (tree-sitter, line-based, or function-level), choose and run an embedding model, manage index updates as code changes, handle stale embeddings, tune retrieval parameters, and build the query layer. Cursor built all of this. Most teams do not need to.
WarpGrep provides semantic code search as a managed API. It handles chunking, embedding, indexing, and retrieval. You send a natural language query and a repository, and get back the relevant code. No vector database to manage, no embedding pipeline to build, no stale index to debug.
This is not a replacement for Turbopuffer or Pinecone in general. If you are building a RAG pipeline over documents, a recommendation engine, or a multi-modal search system, you need a vector database. But if your specific use case is "help my coding agent find the right code," WarpGrep solves the problem at a higher level of abstraction.
The Build vs Buy Calculation
Cursor built their own code search pipeline on Turbopuffer because they have the engineering team, the scale to justify it, and code search is their core product. For teams using coding agents (Claude Code, Aider, Cline, OpenCode) who want better code context without building infrastructure, WarpGrep provides the same capability as a single API call.
Frequently Asked Questions
Why did Cursor switch from Pinecone to Turbopuffer?
Cursor indexes millions of codebases, most of which are accessed infrequently. Turbopuffer's object-storage-first architecture stores cold vectors at $0.02/GB instead of SSD rates. The migration cut Cursor's vector database costs by 95%. Turbopuffer's unlimited namespaces (one per codebase) also eliminated the namespace cap issue they would have hit with Pinecone.
Is Turbopuffer cheaper than Pinecone?
At scale, significantly. Storage alone is $0.02/GB vs $0.33/GB. But Turbopuffer's minimum is $64/month compared to Pinecone's free Starter tier. For small workloads (under 10M vectors, light query volume), Pinecone is cheaper or free. The crossover point depends on data volume and access patterns, but Turbopuffer typically becomes cheaper once you pass a few gigabytes of vector data.
What is Turbopuffer's cold query latency?
Around 300-500ms when data needs to be fetched from object storage. After the first query, subsequent queries to the same namespace hit NVMe/RAM cache at sub-10ms. For search and RAG applications where the user is already waiting for an LLM response (which takes 1-5 seconds), 300ms of added latency on the first retrieval is rarely noticeable.
Does Turbopuffer have a free tier?
No. The minimum plan (Launch) costs $64/month. Pinecone's Starter tier is free with 2GB storage. If you are prototyping or running a small side project, Pinecone Starter, the Qdrant free tier, or self-hosted pgvector are more cost-effective starting points.
Can I use Turbopuffer for code search?
Yes. Cursor uses Turbopuffer for exactly this. You will need to build the embedding pipeline yourself: chunking code, generating embeddings with a model like OpenAI text-embedding-3-small, and managing index freshness. Alternatively, WarpGrep provides semantic code search as a managed API without requiring a vector database.
Who else uses Turbopuffer besides Cursor?
Notion (for Notion AI search), Linear, Superhuman, Anthropic, Atlassian, Suno, and Telus. The company grew revenue 10x in 2025, manages trillions of vectors and tens of petabytes, and has ARR in the tens of millions. It was founded by Simon Eskildsen (ex-Shopify infrastructure lead) and Justine Li, with 17 employees as of early 2026.
Related Comparisons
Skip the Embedding Pipeline. Search Code Directly.
WarpGrep provides semantic code search as a managed API. No vector database, no embedding model, no chunking logic. Send a query, get the relevant code. Works with Claude Code, Cursor, Aider, and any MCP-compatible tool.