Anthropic shipped Claude Sonnet 5 on June 30 with a headline that reads like a gift: near-Opus agentic performance at Sonnet money. The rate card confirms it. Introductory pricing is $2 / $10 per million input/output tokens through August 31; on September 1 it settles to $3 / $15 — the exact standard rate Sonnet 4.6 already charged. Same card, much better model. Swap and save.
Except the rate card is measuring the wrong thing.
The meter changed, not just the price#
Sonnet 5 ships with a new tokenizer. That sounds like plumbing, but it's the one change that quietly rewrites your bill. Multiple production teardowns report the new tokenizer emits roughly 1.0x to 1.35x more tokens for the same text than 4.6 did — and the inflation isn't uniform. It concentrates in code, structured data like JSON and tables, and non-English text.
Look at that list again. Code, structured payloads, tool schemas, non-English strings. That's not an edge case for an agent — that is the agent's workload. A coding loop or an extraction pipeline lives in exactly the token classes the new tokenizer inflates most.
Billing is per token. If the tokenizer splits the same job into more tokens, your cost rises even though your workload didn't change a byte.
This is the part the rate card can't show you. "$3 per million tokens" is only a stable price if a million tokens means the same amount of work across model versions. Change the tokenizer and that assumption breaks: the unit got smaller, so buying the same task now costs more units at the same price per unit. The number on the card held still while the thing it counts shrank.
Why the launch still feels cost-neutral#
Here's the sleight of hand, and it's worth naming because it's genuinely clever pricing rather than an accident. The introductory $2 / $10 is discounted about a third below the standard $3 / $15. A ~33% discount roughly cancels a ~35% token inflation. So today, migrating a real workload from Sonnet 4.6 to Sonnet 5 lands close to cost-neutral — you get a much stronger model for about what you were paying. That's a great deal, and it's real.
The trap is the calendar. On September 1 two things happen at once: the per-token rate rises ~50% (from $2/$10 to $3/$15), and the tokenizer keeps inflating your token count exactly as it does now. Stack a rate increase on top of a count increase and the effective cost per task is reported to land 20–35% above where Sonnet 4.6 sat — while the rate card still prints the same $3/$15 it always did. Nobody sends you a price-increase email, because on paper there wasn't one.
The number that lies, and the one that doesn't#
The capability case for Sonnet 5 is strong and separate from all this. It reportedly scores 63.2% on SWE-bench Pro against Opus 4.8's 69.2%, and actually beats Opus on Terminal-Bench 2.1 (80.4% vs 74.6%), at a fraction of flagship price — the tier-inversion we covered in Sonnet 5 vs Opus 4.8 for agents and, one generation over, in Gemini 3 Flash vs Pro. For most agent loops it's the right default. This piece is not an argument to use it less.
It's an argument to stop trusting $/million tokens as a cost forecast. The moment a vendor changes tokenization, the per-token rate stops being comparable across versions — it's quoting you a price in a unit that just changed size. The only figure that survives a tokenizer swap is cost per completed task, measured on your own traffic — the discipline we make the case for in cost-aware agent evaluation: run a representative sample of real jobs through both models, then divide total spend by the tasks that actually succeeded. That one number folds in tokenizer inflation, retries, and reasoning overhead, and it's the only one your finance team can act on.
The rate card is a marketing surface. Your task-cost dashboard is the meter. After September 1, they will disagree — and the meter will be right.



