Your RAG pipeline returns the wrong chunks, so you do the obvious thing: you re-chunk the corpus, swap to a bigger embedding model, bolt on a reranker. Sometimes that works. Often it doesn't, because the failure was never in the index. It was in the question.
A user query is a bad search key more often than anyone admits. It's three words long. It uses vocabulary the documents never use. It says "how much is that one?" with no antecedent. It asks one question that secretly contains three. None of that is fixed by a better index — you're handing a great retriever a broken instruction. Three well-cited techniques attack this, and they get listed side by side as if you pick one off a menu. You don't. They fix different breakages of the same thing, and understanding which breakage you actually have is the entire decision.
HyDE: search with a fake answer
HyDE — Hypothetical Document Embeddings, Gao et al., 2022 — starts from a quietly profound observation: a question and its answer don't live in the same neighborhood of embedding space. "What causes auroras?" and the paragraph explaining solar wind hitting the magnetosphere are semantically related but lexically and structurally different, so a dense retriever matching the question's vector against answer paragraphs is fighting an asymmetry.
HyDE's fix is almost cheeky: ask an LLM to write a hypothetical answer first, then embed that and search with it. You're now matching an answer against answers. The hypothetical is "unreal and may contain false details," in the paper's words — but the encoder's compression filters the invented specifics and keeps the relevance pattern. Zero-shot, no labels, no fine-tuning, and it substantially outperformed an unsupervised dense retriever on the TREC deep-learning tracks.
The catch is structural: HyDE is only as good as the generator's grasp of the topic. On a general-knowledge query it writes a plausible answer-shaped document and retrieval improves. On a niche internal corpus the model has never seen — your company's part numbers, a brand-new API — it confidently invents a hypothetical that points retrieval away from the right documents. HyDE is the highest-variance bet here: it can actively poison retrieval, where the others mostly just cost tokens.
Multi-Query and RAG-Fusion: ask the same thing several ways
If the query is vague rather than mismatched, the fix is breadth. LangChain's MultiQueryRetriever prompts an LLM for several rephrasings (three by default), retrieves for each, and returns the deduplicated union. Each phrasing catches documents the others miss, so recall climbs on ambiguous, underspecified questions.
RAG-Fusion (Rackauckas, 2024) adds one step that matters: instead of a flat union, it merges the separate ranked lists with Reciprocal Rank Fusion — the 2009 IR technique that scores each document by 1/(k + rank) summed across lists, rewarding documents that rank well across multiple phrasings rather than spiking on one. That fusion step is the whole difference, and it's the most common thing people get wrong: MultiQueryRetriever does not use RRF; RAG-Fusion does. The honest cost is that more phrasings mean more retrieval calls, and a badly-generated variant drifts off-topic and drags noise into the merge.
HyDE bets the query is phrased wrong. Multi-Query bets it's phrased too narrowly. Rewriting bets it's phrased incompletely. Same pipeline stage, three different diagnoses.
Query Rewriting and decomposition: clean it up, or break it apart
The third family edits the query directly. Query Rewriting (Ma et al., 2023) formalized the Rewrite-Retrieve-Read order — put a rewrite step before retrieval — and even trained a small rewriter with reinforcement learning to adapt messy queries to a frozen retriever. In practice the most valuable rewrite is the boring one: conversational contextualization, turning "how much is that one?" into "how much is the Pro plan per month?" by resolving the pronoun against chat history. Without it, a follow-up question retrieves nothing useful because the antecedent lives three turns back.
Decomposition is its sharper cousin, and worth keeping distinct: rewriting reformulates one query into a better one; decomposition splits one query into many sub-questions. "Does our refund policy cover EU customers differently than US ones?" is two retrievals, not one — pull each policy separately, then let the model synthesize. This is the move that makes multi-hop questions tractable, and it's why decomposition shows up wherever an agent has to reason across several documents rather than answer from one.
The decision is a diagnosis, not a ranking
Here's the reframe that should govern all of this. These three are not competing answers to "how do I make RAG better." They are one family — query-side fixes — and they are orthogonal to fixing the index (chunking, embeddings), the ordering (rerankers), or the generation-time check like Self-RAG or CRAG. Every one of them adds at least one LLM call before you retrieve, so every one trades latency for recall.
That means the question is never "which is best." It's: is my retrieval failing because of the query at all? If your chunks are garbage or your embedding model is wrong for the domain, query transformation just stacks a model call on top of a broken index and makes the latency worse. So measure which stage is actually failing first. Then, if it's the query: HyDE for a vocabulary gap, Multi-Query or RAG-Fusion for vagueness, rewriting for conversational mess, decomposition for multi-hop. The cheapest query transformation is the one you didn't need because the problem was somewhere else.



