Every vector-database comparison opens with a benchmark chart, and almost every one of them is answering a question you will not actually face. Below roughly ten million vectors, pgvector, Qdrant, Pinecone, Weaviate, and Milvus all return queries in single-digit to low-double-digit milliseconds. The differences the charts fight over are real and mostly irrelevant: at the scale where most agent systems live, raw approximate-nearest-neighbor speed is not the constraint. Something else is.

The something else is operational, and it comes down to one question. Do your vectors need to live alongside your relational data, or do you need a specialized store to scale past where a single Postgres box is comfortable? Answer that honestly and the field sorts itself.

The case for vectors that live where your data already does

pgvector is not a database. It is a Postgres extension that adds a vector column type, IVFFlat and HNSW indexes, and the halfvec half-precision type that roughly doubles the dimensions you can index. That framing is the entire pitch: your embeddings live in the same Postgres that already holds your users, documents, and orders.

The payoff is transactional consistency. When a document changes, you can update the row and its embedding in a single ACID transaction, with foreign keys and joins tying the vector to the rest of your schema. There is no second system to provision, no sync job to keep two stores agreeing, no eventual-consistency window where the embedding describes a document that no longer exists. For a large class of agent memory and retrieval workloads, that operational simplicity is worth more than any latency percentile.

For a long time the standard rebuttal was that pgvector falls apart on filtered queries — ask for "the nearest vectors where tenant_id = X" and the index would return too few rows after filtering. Version 0.8 addressed exactly this with iterative index scans, which keep pulling candidates until the filter is satisfied instead of giving up early. It is a quiet release note with outsized consequences for anyone running multi-tenant RAG.

When you outgrow Postgres — and why

Teams do leave pgvector, but rarely for the reason the benchmarks imply. They leave because filtered queries get awkward, because they need hybrid search — dense vectors fused with keyword/BM25 ranking — that Postgres only approximates through its full-text features, and because HNSW's graph wants to live in RAM, which turns "more vectors" into "more memory" and eventually "more money." Somewhere around ten million vectors, those pressures start to outweigh the simplicity, though the exact number depends entirely on your filtering and update patterns.

This is where dedicated engines earn their operational cost. Qdrant, written in Rust, leads with rich payload filtering applied before the similarity computation — so you do not waste distance math on rows you were going to discard — plus scalar and product quantization that can cut memory several-fold. It raised a $50M Series B in early 2026 and ships an official MCP server, a tell about who it is courting.

High-performance open-source vector search engine in Rust, built around pre-filtering on JSON payloads and aggressive quantization to keep large corpora affordable.
★ 32kRustqdrant/qdrant

Milvus is the heavy-scale answer: distributed, cloud-native, supporting many index types including DiskANN and GPU acceleration, built for billions of vectors when you have the operational appetite to run it.

Distributed, cloud-native vector database for scaling approximate-nearest-neighbor search to billions of vectors across many index types.

Pinecone is the option that removes operations entirely. Its serverless model decouples storage from compute and bills on read units, write units, and storage rather than provisioned pods, with native sparse/hybrid search and hosted embedding and reranking layered on. You pay a margin to never think about an index again.

The honest decision

The crossover from pgvector to a dedicated store is operational, not algorithmic. You do not switch because queries got slow. You switch because filtered queries broke, hybrid search was missing, or the RAM bill got loud.

So choose by the shape of your problem, not the leaderboard. If you already run Postgres, sit under roughly ten million vectors, and need your embeddings consistent with your relational data, pgvector is not a compromise — it is the correct default, and pgvectorscale's disk-resident StreamingDiskANN index will stretch that ceiling further while keeping memory bounded. Reach for Qdrant or Milvus when you need heavy pre-filtered or hybrid search, aggressive quantization, or distributed scale; reach for Pinecone when you would rather buy the operations than run them.

And treat every vendor's self-published benchmark as marketing. When you genuinely need to compare raw recall and speed, the neutral reference is ann-benchmarks — not the chart on the homepage of the database trying to sell you.