For two years the vector database debate was a drag race. Whose recall held up at high QPS, whose p99 stayed under fifty milliseconds, whose HNSW graph traversed fastest with everything resident in memory. It was a fine question right up until it became the wrong one.

The thing that changed is architectural, and it's the spine of every comparison worth having in 2026: a new class of vector database decouples storage from compute by putting the index on cheap object storage — S3, GCS, R2 — instead of keeping every vector hot in RAM or on local SSD. That single move re-prices the entire category. And it re-prices it most violently for the workload almost nobody designed for: the long tail of cold, idle, mostly-untouched indexes.

The old model was "keep it all in memory," and it was ruinous for idle data

Classic vector search — Pinecone's old pod model, a self-run Qdrant or HNSW index — assumes your vectors live in RAM, or at least on fast local disk, because that's where approximate-nearest-neighbor graphs want to be. That's wonderful for one big index you hammer all day. It is financially absurd for the case that actually dominates modern AI products: multi-tenant apps with millions of small indexes, one per user or workspace, the overwhelming majority of which are queried approximately never.

Think of the shape literally. Every Notion workspace. Every Cursor codebase. Every Linear team's issue history. You don't have an index; you have a million of them, and at any given second maybe a thousand are warm. Paying to hold all million in memory is paying rent on a city of empty apartments.

Object storage breaks the lease. turbopuffer's architecture docs put cold storage at roughly two cents per gigabyte against something like sixty cents per gigabyte for redundant SSD — and the company says customers have cut costs up to 95% on the switch. Cursor, in turbopuffer's own customer writeup, reports exactly that 95% drop after moving code retrieval over in late 2023; Notion migrated in 2024 to scale past ten billion vectors. Treat the percentages as vendor-supplied — they're on the vendor's own customer pages — but the direction is structural, not spin.

How you hide the latency you just signed up for

There's no free lunch. Reading an index off object storage instead of RAM adds real latency — hundreds of milliseconds, not microseconds. The entire engineering game is hiding that on the reads that matter while letting the idle tail stay slow-and-cheap.

Everyone solves it the same way in outline — a caching pyramid — and the differences are in the details.

The real question was never "which is fastest." It's whether your data is one hot haystack or a million cold needles — and those want opposite machines.

So which one — and the question that actually decides it

Forget the leaderboard. Ask what your workload looks like.

One big, always-hot index serving constant high-QPS traffic — a single product search corpus, a shared knowledge base everyone hits? The object-storage tax buys you little, because nothing is ever cold. This is still classic territory; a RAM-resident system (or Pinecone's pod-style indexes, or a tuned Qdrant) is a defensible choice, and the trade-offs between pgvector, Pinecone and Qdrant are mostly about ops posture there, not storage tiering.

A million cold namespaces — per-user, per-workspace, per-repo, mostly idle? This is the sweet spot the object-storage architectures were built for, and the cost gap isn't marginal, it's an order of magnitude. turbopuffer treats namespace-per-tenant as the whole point. Pinecone's multitenancy guide recommends one namespace per tenant and is refreshingly explicit that querying a 1GB namespace costs ~1 RU while metadata-filtering one tenant out of a shared 100GB namespace costs ~100 RUs — because the filter still scans everything. Vectorize supports tens of thousands of namespaces per index and puts compute at the edge.

The deploy axis breaks the tie when cost and shape don't. Vectorize only makes sense if you're already on Cloudflare Workers — it's a platform play, not a standalone database. Pinecone is the fully-managed, no-knobs option. turbopuffer sits in between: managed, but willing to run against your own object storage, which matters if your data has a home it can't leave.


All three are proprietary managed services, so "openness" isn't really the axis — if you want a self-hostable engine with the same storage-compute split, that's a different shelf, and worth weighing against how the underlying ANN index types differ, since centroid indexes and graph indexes behave very differently once the data lives on S3.

But the framing holds regardless of vendor. The serverless vector database isn't faster. It's cheaper at rest, and it's reorganized the whole category around the workload that classic designs quietly bankrupted you on. Match the machine to the shape of your data, and the rest is tuning.