A user pastes ERR_TLS_CERT_ALTNAME_INVALID into your support bot. Your pure-vector RAG, the one that demoed beautifully on "how do I rotate a certificate," returns three calm, well-written passages about TLS handshakes, none of which mention that error code. The page that actually documents it — the one with the literal string in a fenced code block — sits at rank 14, below a paragraph about certificate alternative names that the embedding model decided was "close enough."

That is not a tuning problem. That is what semantic search is.

Embeddings smear the literal into the vicinity

Dense embedding models are trained to generalize across language — to put "car" near "automobile" and "refund" near "money back." That generalization is the whole point, and it is also exactly what kills exact-match retrieval. A rare token — an error code, a SKU, a function name, an acronym, a part number — is something the model has barely seen. So the query vector lands in a generic neighborhood, and cosine similarity happily returns the nearest plausible thing.

The canonical failure: dense retrieval sees PROD-SKU-7842X and confidently returns PROD-SKU-7842Y. Wrong answer, high score. Identifiers like 0x80070005, INV-2024-00847, or ENOMEM carry almost no semantic signal; the model has no principled way to keep 7842X and 7842Y apart, because in meaning-space they are the same thing.

BM25 — the lexical workhorse from the 1990s that still anchors every serious search stack — does not have this problem, because it does not think. It scores query terms against an inverted index of exact tokens: term frequency with diminishing returns, inverse document frequency so rare terms count more, length normalization. It either finds ERR_TLS_CERT_ALTNAME_INVALID or it doesn't. There is no "close enough."

Semantic search fails loudly on paraphrase and silently on identifiers. The silent failures are the ones that reach production.

The flip side is just as real. Ask BM25 "my site won't load after I changed the cert" and it flails — no shared tokens with the doc titled Resolving certificate validation errors. Vectors nail that. The two methods fail in opposite directions, which is the entire argument for running both.

Hybrid is the default — and that's where the work starts

"Run both and combine" sounds trivial until you try to combine. You have two ranked lists. The vector side hands you cosine similarities clustered in a narrow band — 0.78, 0.81, 0.83. The BM25 side hands you unbounded scores — 4.2, 11.7, 28.0 — whose magnitude depends on corpus statistics and query length. You cannot add 0.81 and 11.7 and expect the sum to mean anything. The scales aren't just different; they're incomparable, and any fixed weighting you pick is an arbitrary scaling decision wearing a lab coat.

There are two honest ways out.

Why RRF won

RRF, from Cormack, Clarke, and Büttcher's 2009 SIGIR paper, is almost insultingly simple. For each document, sum across the lists it appears in:

score(d) = Σ  1 / (k + rank_i(d))

rank_i(d) is the document's position in list i; k is a constant that defaults to 60 — the value from the original paper that has survived nearly two decades of benchmarks. That's it. A rank of 1 contributes 1/61; a rank of 14 contributes 1/74. A document ranked decently in both lists beats one ranked #1 in a single list and absent from the other. RRF rewards agreement, not the loudness of any single retriever's vote.

The reason it won production is the reason it looks too simple: by using ranks, it never has to reconcile BM25's unbounded scores with cosine's narrow band. There is nothing to normalize and nothing to tune. Elasticsearch ships it as a first-class rrf retriever with rank_constant defaulting to 60; Weaviate offers it as rankedFusion alongside an alpha knob that slides between pure-keyword and pure-vector; Qdrant exposes it in its Query API; Pinecone, OpenSearch, and pgvector-plus-ParadeDB all give you a hybrid path. Native hybrid is now table stakes for a vector database, not a differentiator.


Hybrid is not free

Two honest costs, since nobody in a comparison table mentions them.

First, you now maintain two indexes — a dense vector index and an inverted index — over the same corpus. More storage, more to keep in sync at ingest, two retrieval calls per query instead of one. Your chunking now has to serve both masters: chunks small enough for clean embeddings but with enough literal tokens left intact for BM25 to grip. Contextual retrieval — prepending a short context blurb to each chunk before indexing — happens to help both sides at once, which is part of why it caught on.

Second, fusion gives you a merged candidate set, not a final answer. The standard production shape is hybrid retrieve → fuse → rerank, where a cross-encoder re-scores the top fused candidates with full query-document attention. If you're assembling this pipeline, the reranker is the usual final stage, and it's where a lot of the real relevance lift lives.

The verdict

Default to hybrid. For nearly all real corpora — docs, code, support tickets, catalogs — queries are a mix of paraphrase and literal, and you cannot predict which a given user will type. Hybrid + RRF + a reranker is the boring, correct baseline, and RRF means you get it with essentially zero fusion tuning.

Skip hybrid only when your traffic is genuinely one-shaped:

Everything between those poles is where vector search fails silently and quietly outranks the one document the user actually needed. That document is sitting at rank 14, with the literal string right there in it. Hybrid is how you stop shipping that.