Most teams tune their retrieval pipeline in the wrong order. They swap embedding models, rewrite chunking, add a reranker, and bolt on query expansion — all while the single cheapest recall lever they own sits at its factory default, untouched. That lever is ef_search, and understanding why it matters more than the other two HNSW knobs is the difference between a vector index you operate and one that quietly betrays you.
HNSW — Hierarchical Navigable Small World — is the graph index under almost every vector database you'll reach for: pgvector, Qdrant, Weaviate, Milvus, Lucene. It builds a layered graph where a sparse top layer of long-range links lets a search hop quickly across the space, then hands off to denser lower layers for the fine-grained nearest-neighbor hunt. It has exactly three parameters worth knowing. The trick to tuning it is not memorizing what each does — it's knowing when each one is frozen.
Two knobs are baked in; one is free#
Here is the asymmetry that should organize your entire mental model:
- M — the number of connections each node keeps — is set at build time. pgvector defaults it to 16. Higher M makes a denser graph with better recall and faster convergence, at the cost of more memory (the graph already carries a 30–50% memory tax over your raw vectors, and M is why) and slower builds. It's especially load-bearing for high-dimensional embeddings, where a sparse graph strands true neighbors.
- ef_construction — how many candidates the builder evaluates while inserting each node — is also set at build time (default 64). Raising it produces a more accurate graph. Crucially, its only runtime cost is a slower build; it does not slow your queries at all. So if your build window has slack, a higher ef_construction (200 is a common production value) is nearly free recall.
- ef_search — how many candidates a query explores — is set per request (pgvector default 40). This is the one you can
SETon the very next query, with no rebuild, no downtime, no migration.
That last point is the whole argument. M and ef_construction are decisions you make once and then live with; changing them means rebuilding the index. ef_search is a dial on the front of the machine. Raise it and recall climbs while latency rises a little; lower it and queries get faster while recall falls. You can tune it per query, per tenant, per endpoint — cheap retrieval for autocomplete, expensive high-recall retrieval for the legal-search path — all against the same index.
If you tune HNSW in one place, tune ef_search. It is the only knob you can change after the index exists, and it is the one shipping at a conservative default in front of you right now.
The default is set for speed, not for you#
pgvector ships ef_search at 40. That is a reasonable demo default and a poor production one. Forty candidates is a narrow beam; for anything past a few hundred thousand vectors it routinely leaves true top-k neighbors unvisited. The failure mode is insidious because it doesn't error — retrieval just returns plausible results that are quietly missing the best match, and your RAG answers get vaguer for reasons no trace will explain. Before you conclude your embeddings are bad, set ef_search to 200 and re-run the eval. Half the time the "embedding problem" is a beam-width problem.
The knob you'll forget: recall is not constant#
Now the part almost nobody accounts for. It is tempting to think of recall as a property of your configuration — "we're at 97% recall" — as if it were fixed. It isn't. HNSW recall is a function of collection size, and it drifts down as you grow.
The reason is structural: a fixed ef_search explores a fixed number of candidates, but as the graph accumulates millions of nodes, the true nearest neighbors sit behind more hops and are easier to miss within that fixed budget. So an index tuned to 98% recall at 100k vectors can, with byte-for-byte identical parameters, slip below 90% at 10M — and nobody gets paged, because recall has no exception. The system that passed its launch eval slowly rots as it succeeds and fills up.
This is exactly why the build-time/query-time split matters operationally. If recall were a build-time property, scaling would force painful periodic rebuilds. Because the corrective lever is ef_search, you can treat recall as something you monitor and re-tune — measure recall against an exact brute-force baseline on a sample as the collection grows, and step ef_search up when it sags. Reach for a higher M rebuild only when even a generous ef_search can't recover the recall you need.
The playbook#
Concretely, in order:
- Set ef_construction high at build (say 200) — it costs only build time, and you rebuild rarely.
- Pick M for your data and memory — 16 is a fine start; go higher for high-dimensional vectors, lower if memory is tight.
- Tune ef_search against a recall target, per query path, and leave headroom.
- Re-measure recall as the collection grows, and raise
ef_searchbefore your users feel it.
Two of your three knobs are decisions. One is a dial. Most teams spend their effort re-litigating the decisions and never touch the dial — which is backwards, because the dial is the only part that was ever going to move.



