Pick any "best vector database for AI agents" roundup and you will find the same shape: a table of recall numbers, a p99 latency column, a GitHub-star tiebreaker. It is a useful table. It is also answering a question your multi-agent system will never ask, because every row in it was produced by firing one query at a time against a frozen index — and a multi-agent system is, almost by definition, the opposite workload.

That gap isn't academic. It is the difference between the database that wins the chart and the database that survives your production run.

The ranking flips between one query and many#

Start with a benchmark rigorous enough to show the seam. Tiger Data's pgvector-vs-Qdrant comparison ran both engines over 50 million 768-dimension vectors. On single-query tail latency at 99% recall, Qdrant won cleanly: 38.71 ms p99 versus 74.60 ms for Postgres with pgvector and pgvectorscale — a 48% edge. If that column is your decision, you buy Qdrant.

But the same test measured throughput under concurrent clients, and the verdict inverted: Postgres with pgvectorscale sustained 471.57 queries per second to Qdrant's 41.47 — 11.4x more. Same two engines. Same hardware. Opposite winner. The only variable that changed was whether the benchmark sent one request or saturated the box with many.

The honest reading is that some of that gap is configuration — pgvectorscale ran with parallel query execution on, Qdrant's per-query numbers were taken with batch mode off for a fair latency read. But that is precisely the point, not a footnote to it. Throughput under concurrency is a property of architecture and configuration, not a fixed rank you can copy off a leaderboard. And a multi-agent system lives entirely on the concurrent side of that seam.

The database that wins the single-query chart and the database that wins under concurrent load can be the same two engines, ranked in opposite order.

Agents don't just read — they write while others read#

Here is the part no static benchmark captures at all. A multi-agent system doesn't just issue concurrent reads. Agents persist: they write new memories, episodic traces, tool results, and summaries back into the store while other agents are querying it. The index is never frozen. It is being mutated under load, continuously.

That workload has a measured cost, and it lands where it hurts most. The recent study bluntly titled "When More Cores Hurts: The Vector Database Scaling Paradox" ran insertion and querying concurrently across engines and found throughput dropping between 23.44% (Milvus) and 50.57% (Qdrant). The tail was worse: P99 latency rose by roughly 280% to 345% depending on the dataset.

The mechanism is worth internalizing because it generalizes across vendors. Newly inserted vectors don't join the HNSW graph for free — in Milvus and Qdrant they land in unindexed segments that queries must scan exhaustively until a background build folds them in. Weaviate mutates the live index directly, trading the exhaustive scan for contention on the graph itself. Either way, index maintenance and query serving fight over the same CPU and the same locks. Under a write-heavy agent swarm, your p99 doesn't drift — it steps.

For a single chatbot, a 300% tail spike is an annoyance. For a multi-agent pipeline where step N waits on step N−1's retrieval, tail latency is completion time, and it compounds across the graph.

What actually decides the choice#

So stop reading the recall column as if it settles anything, and evaluate the axes that a multi-agent workload actually stresses:

None of this makes the leaderboards wrong. It makes them answers to a narrower question than the one you are actually asking. "Best vector database" is a single-client, frozen-index question. "Best vector database for a multi-agent system" is a concurrent-read-plus-write, tail-latency, tenant-isolation question — and the two need not have the same answer. Measure the workload you'll run, not the one that photographs well.