The moment your agent talks to a second model provider, you have a gateway problem whether you named it or not. One vendor for the cheap classifier, another for the hard reasoning step, a local model for the privacy-sensitive path, a fallback for when the primary 529s — and suddenly every call needs a place to decide where it goes, in what format, retried how many times, logged where. You can scatter that logic across your codebase, or you can put a gateway in the middle.

The three most-installed open-source options have quietly stopped being the same kind of thing. A year ago "LLM gateway" meant a translation layer: take an OpenAI-shaped request, reshape it for Anthropic or Gemini, hand it back. That problem is solved. What these projects now disagree about is how much of your stack the gateway should absorb — and the clearest signal of where each one wants to sit is the language it's written in.

The Python one that lives in your app

A Python SDK and proxy server that exposes 100+ providers behind one OpenAI-format API, with cost tracking, budgets, virtual keys, and load balancing
★ 51kPythonBerriAI/litellm

LiteLLM is the default because it meets you where most agent code already is. Import it as a library and completion() calls a hundred providers in the OpenAI request shape you already know; or run the same thing as a proxy with a dashboard, per-key budgets, and spend tracking. By a wide margin the most-starred project here, it won by being the path of least resistance for the Python developer who just wants model="claude-opus-4-8" and model="gpt-4o" to work through one call site.

The Python-ness is the tell. LiteLLM wants to live inside your application process, close to the agent loop, in the same language as your orchestration. That is exactly why it's frictionless to adopt — and also why, at high request volume sitting in your hot path, the proxy's overhead and the GIL become things you eventually have to think about. It is the gateway you reach for when the gateway should be a library, not a separate piece of infrastructure.

The TypeScript one that lives at the edge

A fast, edge-deployable gateway that routes to 1,600+ models with automatic retries, fallbacks, load balancing, and built-in guardrails
★ 12.1kTypeScriptPortkey-AI/gateway

Portkey's gateway makes the opposite bet: the gateway is not part of your app, it's a thing your app calls through. Written to deploy on edge runtimes and start routing in under two minutes, it treats reliability primitives — retries, fallbacks across providers, load balancing, and a library of guardrails — as the core product rather than add-ons. Point your traffic at it and the failover logic you'd otherwise hand-roll in every service becomes configuration.

TypeScript and "edge-deployable" together describe the intent: this wants to sit in front of everything, including non-Python clients, as a lightweight network hop with no language allegiance. The cost is the second moving part — you now operate a gateway as a service, not a dependency you pip install. The benefit is that browser apps, Go services, and your Python agent all get the same routing, the same guardrails, and the same fallback behavior without each reimplementing it.

The Rust one that wants to be your data plane

A Rust LLMOps data plane that unifies the gateway with observability, evaluations, experimentation, and optimization, storing every inference in your own database

TensorZero is the most ambitious reading of what a gateway is for. The routing is sub-millisecond Rust, but routing is the least of it: every inference and every piece of downstream feedback is written to your database, which turns the gateway into the place where observability, A/B experiments, evaluations, and prompt/model optimization all happen. The argument is that the gateway already sees every request and every outcome, so it is the natural home for the feedback loop that improves them — rather than bolting a separate observability vendor and a separate eval harness onto the side.

The Rust is not incidental. This is built to be a standalone, high-throughput data plane you run as infrastructure, the way you'd run a database or a message queue — not a library, not a quick edge function. That is heavier to stand up than the other two, and it only pays off if you actually want the experimentation and optimization machinery. If you do, you stop wiring four tools together. If you don't, you're running a lot of platform to normalize an API shape.

What the split is really about

All three will let your agent call ten providers through one interface. The real question is how much you want the gateway to own. LiteLLM keeps it a library in your Python app and gets out of the way. Portkey makes it a language-agnostic reliability layer at the edge. TensorZero makes it the data plane where your whole feedback loop converges.

Pick by where you want that seam to sit, not by counting providers supported — they all support more than you'll use. The gateway is the one component every multi-model agent has, and it's the cheapest place to add routing, logging, and failover or the most expensive place to over-build. The language each project chose is the honest signal of which it's trying to be.