For two years an "AI gateway" meant something modest: a proxy that gave you one OpenAI-compatible endpoint across many providers, tracked spend, and load-balanced keys. Useful plumbing. In 2026 the job changed. The gateway is now where an agent's policy lives — its budget, its allowed tools, its guardrails, its trace — because that's the one chokepoint every model call and tool call already passes through. And once you put the control plane on the hot path of an agent, its performance stops being a footnote.

Here's the physics the newcomers are built around. A chatbot makes roughly one model call per user turn. An agent makes hundreds to thousands of tool-call round-trips to finish a single task. Whatever overhead your gateway adds per request gets multiplied by that fan-out. A proxy tax that's invisible behind a chat box becomes the dominant latency line in an agent loop. That single fact is why the challenger wave is written in Go and Rust and why they all benchmark against the same Python incumbent.

The incumbents: application-layer breadth#

The reason to start here is coverage and speed-of-adoption. These wrap the most providers and give you app-layer features (keys, budgets, fallbacks, caching) with the least ceremony.

Python SDK + proxy for 100+ LLM APIs in OpenAI format, with cost tracking, guardrails, and load balancing
★ 52.5kPythonBerriAI/litellm
Fast AI gateway routing to 1,600+ LLMs with 50+ integrated guardrails and MCP support
★ 12.3kTypeScriptPortkey-AI/gateway
Self-hosted, single-binary OpenAI-compatible gateway for multi-provider key management, quotas, and redistribution
★ 35.5kJavaScriptsongquanpeng/one-api

LiteLLM is the default and the one everyone else measures against — its README now literally describes the proxy as an "AI Gateway," and its topics have quietly grown to include mcp-gateway, rust, and rust-ai. That last detail is the tell: the Python incumbent is adding Rust to its own hot path because it feels the same pressure the challengers are exploiting.

The infrastructure layer: Envoy lineage and Go/Rust hot paths#

The second camp doesn't think of itself as an SDK. It thinks of itself as infrastructure — a data plane that belongs in your service mesh, with cluster-grade traffic management and overhead measured in microseconds.

Envoy/Istio-based AI-native API gateway with LLM plugins and cloud-native traffic management
CNCF, Kubernetes-native gateway built on Envoy Gateway for unified access to GenAI services
High-performance Go AI gateway with an adaptive load balancer, cluster mode, and guardrails; markets <100µs overhead at 5k RPS
★ 6.25kGomaximhq/bifrost
Rust AI-native proxy and data plane for agentic apps: smart LLM routing, guardrails, and observability (formerly archgw)
★ 6.6kRustkatanemo/plano

Bifrost's positioning — "50x faster than LiteLLM" — is marketing, but it's marketing aimed at a real seam: it's only a compelling claim because agent fan-out is what makes gateway overhead matter in the first place. Higress and Envoy AI Gateway come at it from the opposite direction — not "replace LiteLLM" but "the AI gateway is just another Envoy filter in the mesh you already run." plano (the rename of Katanemo's archgw) is the Rust data-plane bet, purpose-built for agent routing rather than retrofitted from a chat proxy.

The choice stopped being "which gateway wraps the most providers" and became "which runtime and control-plane model do I want on the hot path of every tool call."

The cautionary case in the same category#

It's worth naming what doesn't survive this shift. TensorZero was a Rust LLMOps stack with a gateway at its heart — ~11.7k stars, real production usage — and in June 2026 its founders archived the repo and wound the company down with most of the seed unspent. The lesson that overlaps directly with this roundup: a gateway has to commit to being either the fast infrastructure layer or the deeply-integrated application layer. The undifferentiated neutral middle — feature-parity with what the labs now bundle for free — is the exact position getting squeezed out.

How to actually pick#

Start with your scale and your mesh. If you're a team wiring up multi-provider access and want it working this afternoon, LiteLLM or Portkey (or one-api for pure key management) is the pragmatic call — breadth and ergonomics win at moderate scale, and you can move later. If your gateway must be real infrastructure inside an existing Kubernetes/Envoy deployment, Higress or Envoy AI Gateway meet you where your ops already are. And if you're running high-fan-out agents where the proxy's per-request cost is measurable in your traces, that's the case for a performance-first data plane like Bifrost or plano — the whole reason they exist. Match the gateway to the workload, not the star count. (For the narrower feature bake-off, see LiteLLM vs Portkey vs TensorZero and, for MCP-specific routing, the MCP-gateway comparison.)