For three years the open-weight pitch came with an asterisk. The models were free to download and a generation behind, and the honest advice was that if the work mattered you paid OpenAI or Anthropic. In mid-June, Z.ai shipped GLM-5.2 and the asterisk got a lot smaller — not because an open model topped a general leaderboard, but because it pulled even on the one axis where the bill hurts most.

GLM-5.2 scores 74.4% on FrontierSWE, the long-horizon software-engineering benchmark. Claude Opus 4.8 sits at 75.1%. GPT-5.5 sits at 72.6%. So an MIT-licensed model you can download is now wedged between the two flagship closed models on agentic coding, beating one and trailing the other by under a point. That is the sentence everyone screenshotted. It is also the least useful number in the release.

The benchmark gap is noise; the price gap is structural#

A 0.7-point difference on a benchmark with a handful of points of run-to-run variance is, operationally, zero. If GLM-5.2 and Opus traded places next month nobody's production agent would notice. Treat the FrontierSWE line as "all three are in the same class" and move on, because the number sitting next to it is not noisy at all.

GLM-5.2's standalone API is $1.40 per million input tokens and $4.40 per million output. GPT-5.5 is $5 and $30. Claude Opus 4.8 is $5 and $25. That is roughly one-sixth the blended cost of GPT-5.5, and it is not a launch-week promotion — it's the structural consequence of a sparse mixture-of-experts model that activates about 40B of its 753B parameters per token, plus the margin compression that open weights force on anyone reselling them.

The benchmark gap to Opus is under a point. The price gap is six to one. Only one of those numbers changes what you'll actually run.

Why this lands on coding specifically#

A 6x price cut would be nice anywhere. It is decisive in agentic coding because coding is the most token-hungry thing an LLM does. A single chat turn pays for one forward pass. A coding agent working a real ticket re-reads the repository, ingests file after file, swallows the stdout of every test run and every failed build, and re-reads its own prior reasoning on each step — hundreds of model calls where the input context balloons toward the limit and stays there. Input tokens, not output, dominate that bill, and they compound across the loop.

This is exactly where a cheaper input token stops being a rounding error and starts being the line item. It's also why two of GLM-5.2's quieter specs matter more than its SWE score: the 1M-token context, which lets the agent hold a large codebase without constant re-retrieval, and the $0.26-per-million cached-input rate, which is the real lever for a loop that re-sends the same system prompt and repo snapshot thousands of times. If you've ever watched agent token costs and noticed that prompt caching is most of your savings, you already know that the cached-input price tag tells you more about your monthly bill than the headline benchmark ever will.

What it doesn't change#

Two cautions, because "open model wins" is a genre with a bad accuracy record. First, these are largely vendor-reported figures on benchmarks the vendor chose to highlight; independent re-runs from outfits like Artificial Analysis routinely come in below launch-day numbers, and you should validate on your own repository before you migrate anything. Coding ability is wildly task-dependent, and a model that ties on FrontierSWE can still lose badly on your stack.

Second, "open weight" is not "free." At 40B active and 753B total, self-hosting GLM-5.2 wants something like an 8×H200 node at full precision — real capital that only pencils out against the API at high, steady volume. For almost everyone the move isn't to rack GPUs; it's to point the same coding-agent harness at a $1.40 endpoint instead of a $5 one and watch the bill fall by most of a factor.

The story of the last three years was that you paid closed-model prices for closed-model quality and the open option was for hobbyists and the cost-desperate. GLM-5.2 is the first time, on the workload that burns the most tokens, that the open option is neither a downgrade nor a compromise on quality — only on price. The leaderboard will keep churning. The thing that actually changed is that the cheap column and the good column are now, for coding agents, the same column.