Every write-up of OpenAI's new chip leads with the same number: Jalapeño, unveiled with Broadcom on June 24, claims roughly 50% lower cost per inference token than current Nvidia GPUs, at performance comparable to Blackwell and better performance-per-watt. It is a clean, quotable, Nvidia-rattling figure. It is also self-reported, measured on OpenAI's own chosen workloads, with no disclosed baseline and no independent test behind it. So set the number aside for a moment. The interesting thing about Jalapeño is not that it might be cheaper. It's why it can be — and what OpenAI had to give up to get there.
What the chip actually is#
Physically, Jalapeño is unglamorous in a way that tells you its whole philosophy. It's a single reticle-sized ASIC on TSMC's 3nm node, with eight HBM stacks arranged on a silicon interposer around the compute die — 2.5D packaging whose only job is to shorten the distance between memory and math, because that distance is where most inference latency and energy quietly disappear. Inside, it's a systolic array: a grid of processing elements that hand data cell-to-cell in lockstep, which is close to the ideal shape for the dense matrix multiplications that dominate serving a transformer.
That's not a general-purpose accelerator. A systolic array is good at exactly one thing and correspondingly bad at the irregular, branch-heavy, variable-shape compute that shows up in training. Which is the point, and also the tell: this is an inference-only chip. OpenAI's frontier-model training stays on Nvidia GPUs, and will for a long time, because training is still the part of the workload that keeps changing. Jalapeño is aimed squarely at the part that doesn't.
An ASIC is a bet that your workload has stopped moving#
Here's the framing the cost headline buries. A GPU sells you flexibility — one chip that runs whatever model architecture you throw at it next quarter. An ASIC sells you efficiency — one chip that runs today's serving pattern extraordinarily well and can't be repurposed when that pattern shifts. Choosing an ASIC over a GPU is therefore not mainly a performance decision. It's a bet that your workload is stable enough to freeze into silicon 18 to 24 months before it runs.
Most companies can't make that bet honestly, because they don't control their own architecture. If you're serving someone else's open-weights model, or your serving loop still churns every few months, a general-purpose GPU is insurance — you're paying a premium in silicon area, power, and price for the option to change your mind. That premium is rational right up until you stop needing it.
A GPU is insurance against your own roadmap. An ASIC is the decision to stop paying the premium — and OpenAI can cancel the policy early because it writes the roadmap.
This is the actual reason OpenAI, and not you, gets to build Jalapeño first. The chip was designed "from scratch around its deep understanding of LLM fundamentals, informed by its roadmap of models, kernels, serving systems, and product needs." Read that as: OpenAI doesn't have to predict where its inference workload is going. It gets to decide. The systolic layout and the memory topology are a cast taken directly from OpenAI's own serving loop. Co-design is what shrinks the risk window on the ASIC bet from "guess the industry" to "match yourself" — and it's also, plausibly, where a real chunk of that 50% comes from. The saving isn't a process-node miracle; it's the co-design dividend of never building capability you won't use.
The dependency doesn't vanish — it moves inside#
The tidy narrative is "OpenAI escapes Nvidia," and it joins a familiar list doing so: Google has TPUs, Amazon has Trainium and Inferentia, Apple has its own silicon. But there's a difference worth naming. Those chips grew out of companies that already ran infrastructure businesses; Jalapeño is a first chip from a company that has never shipped silicon, taped out in a claimed-record nine months with OpenAI's own models helping optimize the design. Ambitious, and unproven in production at fleet scale.
More importantly, vertical integration doesn't delete a dependency — it relocates it. Buying Nvidia is an external supplier dependency: you're exposed to someone else's roadmap, pricing, and allocation. Building Jalapeño converts that into an internal coupling dependency: the chip is only as valuable as OpenAI's discipline in keeping its next model architecture from drifting away from the fixed-function silicon it already committed to. If a future model wants an attention variant or a sparsity pattern the systolic array serves poorly, the cost advantage erodes — and unlike a GPU swap, you can't just buy your way out next quarter. That's the wager. It's a good one for a lab that controls both ends and has enough steady inference volume to amortize the part. It would be a terrible one for almost anyone who doesn't.
So the number to remember isn't 50%. It's the structure: own the model, own the chip, own the demand — and accept that you now have to keep all three pointed the same direction. Jalapeño isn't Nvidia's obituary. It's a demonstration that at sufficient scale, inference has become predictable enough to hard-wire — and that the moat is no longer the chip. It's the coupling.



