Is pgvector fast enough for production RAG?

Yes, up to a point. With the HNSW index (pgvector 0.5+) and pgvectorscale, published benchmarks show competitive throughput into the tens of millions of vectors. Its real ceiling isn't speed but operational: it shares resources with your transactional workload, so very large or write-heavy vector loads eventually want a dedicated store.

When does self-hosting Qdrant beat managed Pinecone on cost?

Cost analyses put the crossover around 60–80M queries/month: above it, self-hosted Qdrant runs roughly 3–10x cheaper; below it, the engineering time to operate it usually outweighs the savings, so a managed option is cheaper all-in.

Do I need a dedicated vector database at all?

Often no. If you already run PostgreSQL and have under ~10M vectors, pgvector adds vector search with no new service and no sync layer — frequently the right default. Reach for a dedicated store when scale, heavy metadata filtering, or write volume strain the shared database.

pgvector vs Pinecone vs Qdrant: Picking a Vector Database in 2026

The question gets asked backwards. Teams sit down to choose a vector database and start by collecting p99 latency numbers and recall-at-10 curves, as if the decision turns on which engine answers a nearest-neighbor query a few milliseconds faster. It almost never does. pgvector, Pinecone, and Qdrant — the three you'll actually shortlist in 2026 — all clear the bar that matters for nearly any agent you're likely to build: sub-30ms p95 at high recall, into the tens of millions of vectors. Benchmark-shopping among them is optimizing the axis that's already decided.

The axis that isn't decided is where the operating cost lives. Every vector store imposes a tax — in money, in engineering time, or in architectural complexity — and the three options here are honest about charging it in three different currencies. That's the comparison worth making, because unlike latency, it doesn't have a single right answer. It has a crossover.

Three currencies for the same bill

pgvector charges in complexity avoided. It's a Postgres extension, not a separate service: you CREATE EXTENSION vector, add a column, and your embeddings live in the same database as the rows they describe. The non-obvious payoff isn't speed — it's the sync layer you never build. A standalone vector store means writing and operating a pipeline that keeps it consistent with your source-of-truth database; pgvector deletes that entire class of bug because there's only one database. With the HNSW index and the pgvectorscale extension, its throughput is competitive into the tens of millions of vectors. Its ceiling is operational, not algorithmic: vector search now competes for the same CPU and memory as your transactional load.

Pinecone charges in money to skip the ops. It's fully managed and closed-source — you push vectors in and query them out, with no index to tune, no node to scale, no failover to design. By most accounts it holds the dominant share of the managed segment, and the reason is exactly that surface: a team with no appetite for running stateful infrastructure can be in production in an afternoon. The serverless tier trades some latency for not thinking about capacity; the pod-based tier buys it back for noticeably more money. Either way you're renting the absence of an operations burden.

Qdrant charges in ops work for a cheap marginal query. Written in Rust and built only for vector search, it posts strong numbers on its own and third-party benchmarks — and crucially it's open-source, so you can self-host. That's where its economics get interesting: once you're running it yourself, each additional query is nearly free, where a managed competitor keeps metering you. It also handles filtered search — "nearest neighbors where tenant = X and date > Y" — particularly well, which is the query shape real agents actually issue. The cost is the obvious one: someone on your team now operates a database.

The crossover nobody benchmarks

They're all fast enough. The decision is who pays the rendering of "fast enough" into production — your wallet, your on-call rotation, or your architecture diagram.

Here's the part the latency charts hide. Because Qdrant's marginal cost trends toward zero while a managed service charges per query forever, there's a volume where self-hosting flips from more expensive to dramatically cheaper. Published cost analyses put that crossover around 60–80 million queries per month: above it, self-hosted Qdrant runs roughly 3–10x cheaper than the managed alternatives; below it, the engineering time to run, monitor, and back it up costs more than you'd save, so a managed option wins on the all-in bill.

That single number reframes the whole choice. Below the crossover, you are mostly buying back human time, and the ranking is about your team: pgvector if you already run Postgres and want one less service; Pinecone if you'd rather expense the problem than staff it. Above the crossover, the math starts paying for the operational headcount, and a self-hostable engine like Qdrant becomes the adult decision. Most teams badly overestimate where they sit on this line — they provision for the traffic they dream about, not the traffic they have.

How to actually decide

Start from your team and your volume, not a leaderboard:

Already on PostgreSQL, under ~10M vectors? Default to pgvector. The right number of new services to operate is often zero, and removing the sync layer removes a real source of stale-data bugs.
No ops capacity, want production this week? Pinecone. You're paying to make the infrastructure someone else's problem, and for many teams that's the cheapest line item there is.
High query volume, heavy metadata filtering, or a need to keep data in your own environment? Qdrant, self-hosted — especially once you're plausibly north of the cost crossover.

The deeper point is that "which vector database" is the wrong granularity of question. It's really "how much of this do I want to operate, and at what volume" — the same question that sits under picking an inference engine or an LLM gateway. The engines converged on performance a while ago. They diverged on who carries the weight. Choose the currency you'd rather pay in, and the database mostly chooses itself.

pgvector vs Pinecone vs Qdrant: Picking a Vector Database in 2026

Three currencies for the same bill

The crossover nobody benchmarks

How to actually decide

Frequently asked

Priya Sundaram

Dispatches from the machines, in your inbox

pgvector vs Pinecone vs Qdrant: Picking a Vector Database in 2026

Three currencies for the same bill

The crossover nobody benchmarks

How to actually decide

Frequently asked

Priya Sundaram

Continue reading

How to Choose a Vector Database for AI Agents: pgvector vs Pinecone vs Qdrant

Tavily vs Exa vs Linkup: Picking a Web Search API for AI Agents

E2B vs Modal vs Daytona: Picking a Code Execution Sandbox for AI Agents

Dispatches from the machines, in your inbox