There is a sentence that has launched a thousand over-engineered systems: "You have embeddings, so you need a vector database." It sounds like a syllogism. It is a marketing slogan wearing a lab coat. The feature a vector database actually sells you — approximate nearest-neighbor search — is a solution to a problem of scale, and most applications adopt it long before they have the scale that makes it worth its cost.
Let's be precise about what you're buying and what you're paying.
What the index is actually for#
The naive way to find the nearest vectors to a query is to compute the distance to every stored vector, sort, and return the top k. This is exact brute-force search — "flat," in FAISS terms. It has one flaw and several underrated virtues. The flaw: its cost grows linearly with the number of vectors, so at ten million rows it becomes too slow for a low-latency path. The virtues are the part nobody puts on the slide: it returns the true nearest neighbors with 100% recall, it needs no index to build, it stores nothing but the raw vectors, and a vector inserted a millisecond ago is instantly searchable.
An approximate index — HNSW, IVF — exists to defeat that linear cost. It prebuilds a structure that lets a query visit only a fraction of the space. In exchange for speed at scale, you accept three things people rarely price in:
- Imperfect recall. ANN means approximate. It can miss true neighbors, so recall stops being 100% and becomes a number you have to measure — and, as HNSW at scale shows, one that quietly drops as your data grows.
- A memory premium. An HNSW graph costs roughly 30–50% more memory than the raw vectors, just for the graph links.
- Operational drag. Indexes take real time to build, degrade as you insert, sometimes need rebuilds, and hand you tuning knobs (
ef_search,nprobe) you now babysit forever.
That is the trade. You give up exactness and simplicity to survive a query volume you might not have.
Where the crossover actually sits#
Here is the number folklore gets wrong. The crossover — the vector count where an ANN index starts beating brute force — is higher than "you have embeddings." For collections up to roughly 10,000–100,000 vectors, exact search is typically as fast or faster than an approximate index, because at that size the index-building cost never gets amortized and a tight linear scan over contiguous memory is brutally efficient on modern hardware. FAISS says it plainly: if you're doing only a handful of thousands of searches, use Flat, because the index build won't pay for itself. And IndexFlat is the only index that guarantees exact results — every ANN benchmark on earth reports recall relative to it.
Approximate search is not the default and exact search the fallback. It's the reverse: exact is the baseline, and approximation is the tax you pay once — and only once — brute force can't meet your latency budget.
The part the slogan hides#
The deepest reason to delay the vector database isn't performance — it's that premature approximation creates work. Adopt HNSW at 40,000 vectors and you've bought a recall problem you didn't have: now retrieval sometimes misses the obviously-correct chunk, your RAG answers get mysteriously vaguer, and you spend a sprint tuning ef_search and re-running evals to claw back the exactness you started with for free. You paid engineering time to make your results worse and your system more complex, in anticipation of a scale that, for many internal tools and B2B apps, simply never arrives.
None of this means vector databases are a mistake. At millions of vectors and real QPS, ANN isn't optional and the crossover is unambiguous. And a real database gives you filtering, persistence, replication, and hybrid search — genuine reasons to adopt one. But note that those are database features, not approximate-search features. Postgres with pgvector will do exact search — no HNSW, no IVF — with all your metadata, joins, and transactions intact, right up until scale forces you to add the index. You can have the database without paying the approximation tax before you owe it.
The rule#
Start exact. Measure your real p95 latency at your real vector count and query rate. Add an ANN index — and, if you need it, the database around it — the day the measurement says brute force can't keep up, and not one commit before. "You have embeddings, so you need a vector database" is how you end up tuning a recall knob to recover accuracy you were handed for nothing. The most sophisticated retrieval system is often the one that refused to get sophisticated until the numbers made it.



