The question gets asked backwards. Teams sit down to choose a vector database and start by collecting p99 latency numbers and recall-at-10 curves, as if the decision turns on which engine answers a nearest-neighbor query a few milliseconds faster. It almost never does. pgvector, Pinecone, and Qdrant — the three you'll actually shortlist in 2026 — all clear the bar that matters for nearly any agent you're likely to build: sub-30ms p95 at high recall, into the tens of millions of vectors. Benchmark-shopping among them is optimizing the axis that's already decided.

The axis that isn't decided is where the operating cost lives. Every vector store imposes a tax — in money, in engineering time, or in architectural complexity — and the three options here are honest about charging it in three different currencies. That's the comparison worth making, because unlike latency, it doesn't have a single right answer. It has a crossover.

Three currencies for the same bill

pgvector charges in complexity avoided. It's a Postgres extension, not a separate service: you CREATE EXTENSION vector, add a column, and your embeddings live in the same database as the rows they describe. The non-obvious payoff isn't speed — it's the sync layer you never build. A standalone vector store means writing and operating a pipeline that keeps it consistent with your source-of-truth database; pgvector deletes that entire class of bug because there's only one database. With the HNSW index and the pgvectorscale extension, its throughput is competitive into the tens of millions of vectors. Its ceiling is operational, not algorithmic: vector search now competes for the same CPU and memory as your transactional load.

Pinecone charges in money to skip the ops. It's fully managed and closed-source — you push vectors in and query them out, with no index to tune, no node to scale, no failover to design. By most accounts it holds the dominant share of the managed segment, and the reason is exactly that surface: a team with no appetite for running stateful infrastructure can be in production in an afternoon. The serverless tier trades some latency for not thinking about capacity; the pod-based tier buys it back for noticeably more money. Either way you're renting the absence of an operations burden.

Qdrant charges in ops work for a cheap marginal query. Written in Rust and built only for vector search, it posts strong numbers on its own and third-party benchmarks — and crucially it's open-source, so you can self-host. That's where its economics get interesting: once you're running it yourself, each additional query is nearly free, where a managed competitor keeps metering you. It also handles filtered search — "nearest neighbors where tenant = X and date > Y" — particularly well, which is the query shape real agents actually issue. The cost is the obvious one: someone on your team now operates a database.

The crossover nobody benchmarks

They're all fast enough. The decision is who pays the rendering of "fast enough" into production — your wallet, your on-call rotation, or your architecture diagram.

Here's the part the latency charts hide. Because Qdrant's marginal cost trends toward zero while a managed service charges per query forever, there's a volume where self-hosting flips from more expensive to dramatically cheaper. Published cost analyses put that crossover around 60–80 million queries per month: above it, self-hosted Qdrant runs roughly 3–10x cheaper than the managed alternatives; below it, the engineering time to run, monitor, and back it up costs more than you'd save, so a managed option wins on the all-in bill.

That single number reframes the whole choice. Below the crossover, you are mostly buying back human time, and the ranking is about your team: pgvector if you already run Postgres and want one less service; Pinecone if you'd rather expense the problem than staff it. Above the crossover, the math starts paying for the operational headcount, and a self-hostable engine like Qdrant becomes the adult decision. Most teams badly overestimate where they sit on this line — they provision for the traffic they dream about, not the traffic they have.

How to actually decide

Start from your team and your volume, not a leaderboard:

The deeper point is that "which vector database" is the wrong granularity of question. It's really "how much of this do I want to operate, and at what volume" — the same question that sits under picking an inference engine or an LLM gateway. The engines converged on performance a while ago. They diverged on who carries the weight. Choose the currency you'd rather pay in, and the database mostly chooses itself.