Every retrieval-augmented system rests on one quiet assumption, and it is usually wrong: that whatever the retriever hands back is worth answering from. Naive RAG takes the top-k passages and conditions the model on them unconditionally — no step in the pipeline is allowed to say this context is junk, don't use it. So when retrieval misses, the model doesn't fail loudly. It hallucinates fluently on top of bad evidence, which is worse. This is the same gap that pushes teams toward agentic RAG — letting the model drive retrieval — but Self-RAG and CRAG go after it without handing the whole loop to an agent.

Two well-cited methods fix this, and they get filed under the same "advanced RAG" heading as if they were competitors choosing between the same job. They aren't. Self-RAG and Corrective RAG (CRAG) intervene at different points, fix different failures, and — this is the part that should actually drive your decision — make opposite bets about where the judgment should live.

Self-RAG: teach the model to doubt itself

Self-RAG (Asai et al., 2023) moves the judgment inside the model. It fine-tunes the language model to emit special reflection tokens interleaved with its normal output, so that critiquing becomes part of generation rather than a step bolted on around it. There are four:

The effect is a model that retrieves on demand and grades its own work segment by segment, even down-weighting a generation branch when ISSUP says the claim isn't backed. The intelligence is in the weights. That is its strength and its catch: you get adaptive, low-overhead self-criticism at inference time, but only after you have fine-tuned a model to do it — and you are then locked to that model.

CRAG: judge the evidence before the model sees it

CRAG (Yan et al., 2024) makes the opposite bet: leave the LLM completely untouched and put the judgment outside it. It adds a lightweight retrieval evaluator — a small, fast classifier — that scores the retrieved documents for a query and returns a confidence, which maps to three actions:

Crucially, all of this happens before the generator runs, and none of it touches the generator's weights. CRAG is plug-and-play and model-agnostic — it wraps any black-box LLM you're calling over an API.

Self-RAG retrains the reader to be skeptical of its sources. CRAG hires an editor to vet the sources before the reader ever opens them. Different fix, different place, different cost.

The decision is build-vs-bolt-on, not better-vs-worse

Lined up honestly, the "vs" dissolves. They fix different failures: Self-RAG improves how the model reasons over evidence; CRAG improves the evidence itself. Self-RAG can decide whether to retrieve; CRAG can decide what to do when retrieval was bad. In a serious system you might run both — CRAG cleans and, if necessary, replaces the context; Self-RAG reasons carefully over whatever survives.

So the real axis isn't quality. It's a question about your constraints: do you control the model's weights?

Before you reach for either

One caution the papers won't give you. Both methods earn their keep only when retrieval quality genuinely varies and the cost of a confident wrong answer is high — medical, legal, support systems where a fluent hallucination is a real liability. If your corpus is clean and your retriever is already strong, you are reaching for a second model in the loop to solve a problem a reranker and a similarity threshold would have handled for a fraction of the latency. Add the self-checking machinery when you've measured that retrieval is the thing failing you. Not before — the most expensive correction step is the one guarding a pipeline that was already retrieving fine.