For two years the vector database debate was a drag race. Whose recall held up at high QPS, whose p99 stayed under fifty milliseconds, whose HNSW graph traversed fastest with everything resident in memory. It was a fine question right up until it became the wrong one.
The thing that changed is architectural, and it's the spine of every comparison worth having in 2026: a new class of vector database decouples storage from compute by putting the index on cheap object storage — S3, GCS, R2 — instead of keeping every vector hot in RAM or on local SSD. That single move re-prices the entire category. And it re-prices it most violently for the workload almost nobody designed for: the long tail of cold, idle, mostly-untouched indexes.
The old model was "keep it all in memory," and it was ruinous for idle data
Classic vector search — Pinecone's old pod model, a self-run Qdrant or HNSW index — assumes your vectors live in RAM, or at least on fast local disk, because that's where approximate-nearest-neighbor graphs want to be. That's wonderful for one big index you hammer all day. It is financially absurd for the case that actually dominates modern AI products: multi-tenant apps with millions of small indexes, one per user or workspace, the overwhelming majority of which are queried approximately never.
Think of the shape literally. Every Notion workspace. Every Cursor codebase. Every Linear team's issue history. You don't have an index; you have a million of them, and at any given second maybe a thousand are warm. Paying to hold all million in memory is paying rent on a city of empty apartments.
Object storage breaks the lease. turbopuffer's architecture docs put cold storage at roughly two cents per gigabyte against something like sixty cents per gigabyte for redundant SSD — and the company says customers have cut costs up to 95% on the switch. Cursor, in turbopuffer's own customer writeup, reports exactly that 95% drop after moving code retrieval over in late 2023; Notion migrated in 2024 to scale past ten billion vectors. Treat the percentages as vendor-supplied — they're on the vendor's own customer pages — but the direction is structural, not spin.
How you hide the latency you just signed up for
There's no free lunch. Reading an index off object storage instead of RAM adds real latency — hundreds of milliseconds, not microseconds. The entire engineering game is hiding that on the reads that matter while letting the idle tail stay slow-and-cheap.
Everyone solves it the same way in outline — a caching pyramid — and the differences are in the details.
- turbopuffer inflates data upward as it heats: object storage, then NVMe SSD, then RAM. Its docs describe a cold first query to a million-document namespace around 874ms p50, collapsing to roughly 14ms once cached. It leans on a centroid-based index (SPFresh) precisely because centroids minimize the round trips and write amplification that graph indexes like HNSW inflict on object storage.
- Pinecone serverless makes blob storage the source of truth and organizes each namespace's records into immutable files it calls slabs, served by a stateless pool of query nodes. Its answer to the "but my fresh writes" problem is a freshness layer that tails the write log and builds a small index over not-yet-compacted data, so edits are searchable in seconds. Pinecone claims up to 50x cost reduction versus the old pod model — again, a vendor figure, framed as such, in its serverless architecture post.
- Cloudflare Vectorize runs the same pattern at the edge. Per Cloudflare's engineering blog, a Rust query service reads index data from R2 — Cloudflare's object storage — through Cloudflare's Cache, fragmenting the index so a query fetches as little as possible. It's bound to Workers, so compute lands close to the user.
The real question was never "which is fastest." It's whether your data is one hot haystack or a million cold needles — and those want opposite machines.
So which one — and the question that actually decides it
Forget the leaderboard. Ask what your workload looks like.
One big, always-hot index serving constant high-QPS traffic — a single product search corpus, a shared knowledge base everyone hits? The object-storage tax buys you little, because nothing is ever cold. This is still classic territory; a RAM-resident system (or Pinecone's pod-style indexes, or a tuned Qdrant) is a defensible choice, and the trade-offs between pgvector, Pinecone and Qdrant are mostly about ops posture there, not storage tiering.
A million cold namespaces — per-user, per-workspace, per-repo, mostly idle? This is the sweet spot the object-storage architectures were built for, and the cost gap isn't marginal, it's an order of magnitude. turbopuffer treats namespace-per-tenant as the whole point. Pinecone's multitenancy guide recommends one namespace per tenant and is refreshingly explicit that querying a 1GB namespace costs ~1 RU while metadata-filtering one tenant out of a shared 100GB namespace costs ~100 RUs — because the filter still scans everything. Vectorize supports tens of thousands of namespaces per index and puts compute at the edge.
The deploy axis breaks the tie when cost and shape don't. Vectorize only makes sense if you're already on Cloudflare Workers — it's a platform play, not a standalone database. Pinecone is the fully-managed, no-knobs option. turbopuffer sits in between: managed, but willing to run against your own object storage, which matters if your data has a home it can't leave.
All three are proprietary managed services, so "openness" isn't really the axis — if you want a self-hostable engine with the same storage-compute split, that's a different shelf, and worth weighing against how the underlying ANN index types differ, since centroid indexes and graph indexes behave very differently once the data lives on S3.
But the framing holds regardless of vendor. The serverless vector database isn't faster. It's cheaper at rest, and it's reorganized the whole category around the workload that classic designs quietly bankrupted you on. Match the machine to the shape of your data, and the rest is tuning.



