Every Fine-Tuning & Training comparison and buyer's guide for building AI agents — 9 pieces and counting. Each is a head-to-head or a “best X for Y” roundup with a sources-backed verdict.
GRPO didn't win on optimization theory. It won by removing a policy-sized value network from the training loop — and the memory it saved is what put RL post-training within reach of a single node.
Multi-LoRA serving turns "one GPU per model" into "one GPU per base model, amortized across hundreds of tenants." Here are the tools that do it, and the kernel trick that makes it work.
You quantized the weights to 4-bit and thought memory was solved. At long context the KV cache dwarfs the weights — and it needs a different kind of quantization to shrink safely.
The three formats aren't competing for the same job — one buys you faster math, one buys you smaller weights, and one is the fallback for hardware that can't do the first. Know which bottleneck you're paying down.
GRPO is now a commodity all three ship. The thing that actually sorts them is who owns the distributed orchestration — and how you keep one starving inference engine fed.
The three ways to align a model on preference data aren't a quality ladder — they're a pipeline being dismantled one component at a time. The thing each method removes tells you what it costs.
The three options differ by orders of magnitude in GPU memory — but the part that actually decides your result isn't the rank, and it isn't the quantization.
Three open-source fine-tuning frameworks that look like rivals but are actually three different bets on which part of training is your real bottleneck.
The format you pick is downstream of where you run the model — and in 2025 the tooling quietly consolidated under your feet. A field guide to the three that matter and the libraries that survived.