For eighteen days in June, the most capable model Anthropic had ever shipped simply did not exist for most of the planet. On June 12 the US government issued an export-control directive suspending all foreign-national access to Claude Fable 5 and its research-grade sibling Mythos 5 — inside the United States and out, employees included — and Anthropic complied by turning them off. On June 30 the controls came off. On July 1 Fable 5 came back. And tucked into the return was something more durable than any single model: a proposal, co-signed by Anthropic, Amazon, Microsoft, Google, and dozens of other partners, for how the industry should score a jailbreak.
That second thing is the story. The outage will be forgotten by August. The rubric is trying to make sure the next one doesn't happen the same way.
What actually broke#
The trigger was narrow and specific. Amazon researchers, testing Fable under one of Anthropic's own partner programs, found a way to bypass its safeguards: prompt the model to comb a codebase for software vulnerabilities, and in at least one case, get it to emit proof-of-concept code demonstrating how a flaw could be exploited. Amazon escalated. The finding reached the White House. The government, citing national security — and, per multiple accounts, a suspicion that a China-linked group had already touched the Mythos weights — pulled the export lever.
Here is the part worth slowing down on. Anthropic did not think this was a serious jailbreak. Its public position was that the bypass is narrow, non-universal, amounts to little more than asking a capable model to read code and point at flaws, and produces nothing you couldn't also coax out of other public frontier models — OpenAI's GPT-5.5 among them. The administration, through AI adviser David Sacks, saw a model that would write an operable cyber weapon on request and found it "difficult to fathom" how that could be called anything but serious. Sacks said the export control was issued reluctantly, after Anthropic declined to patch or de-deploy on request, and that the ball was in Anthropic's court.
The two sides weren't disagreeing about the exploit. They were disagreeing about the adjective. And there was no scale to settle it.
Strip out the personalities and what you have is a vocabulary failure. One party said "not serious." The other said "cyber weapon." Both were describing the same technique. Neither had a shared instrument to convert the technique into a number both could argue over — so the disagreement didn't get adjudicated on its merits. It got adjudicated by the bluntest tool in the drawer, an export-control order, under political timelines rather than technical ones. A model vanished for eighteen days because "how bad is it" had no agreed answer.
The rubric is CVSS for jailbreaks#
So Anthropic and its Project Glasswing partners — the 45-plus-organization consortium, AWS and Google and Microsoft among them, originally assembled to point frontier models at open-source security bugs — put forward a four-axis severity framework. A jailbreak gets scored on:
- Capability gain — how far it takes an attacker beyond the tools they already have.
- Breadth — how many different attacks the same trick unlocks.
- Ease of weaponization — how much skill and effort it takes to turn the output into a real attack.
- Discoverability — how easy the technique is to find or copy independently.
If that shape looks familiar, it should. This is CVSS for model exploits — the same move the vulnerability world made two decades ago, when "this bug is bad" gave way to a vector string that a vendor, a customer, and a regulator could all read the same way. The value of CVSS was never that it was correct in some cosmic sense; reasonable people still argue about individual scores. Its value was liquidity: it turned a thousand bespoke arguments into one argument conducted in shared units.
That is what the jailbreak rubric is reaching for, and it is why calling it a "safety" measure slightly misses the point. It does not make Fable 5 safer. It makes the disagreement about Fable 5 legible. Run the actual episode through the four axes and you can see exactly where it would have fractured — the table below is that reading. Breadth: low, by Anthropic's account. Capability gain: contested, because reproducibility on other models is doing real work in the argument. Ease of weaponization: high, because someone produced running exploit code, and that is the fact the government could not get past. Discoverability: effectively maxed, because the technique was already out. Two of four axes point at "manageable"; two point at "act now." A shared rubric doesn't magically resolve that split — but it forces both sides to name which axis they're fighting about, instead of trading adjectives.
What Anthropic actually gave up#
The rubric came bundled with terms, and the terms are the more honest signal of where control now lives. To bring Fable 5 back, Anthropic layered on defense in depth: a new classifier that blocks the Amazon technique in over 99% of cases and quietly reroutes flagged requests to Opus 4.8, a deliberately wider safety margin that will refuse some benign requests as the cost of catching the bad ones, pre-release government access and evaluation, rapid safeguard information-sharing, round-the-clock monitoring of jailbreak submissions, and a public HackerOne channel. Fable 5 returned throttled to half of weekly usage limits through July 7. Mythos 5 — which runs without those classifiers — did not return to the open at all; it stays walled off to approved US organizations.
Read that list again and notice what's carrying the weight. It isn't the safety training baked into the model. It's the perimeter around who is allowed to hold the model, and under what monitoring. Anthropic has been drifting toward consortium access as the real control for months; the Fable episode makes it explicit. The capable weights sit behind a membership gate, and the gate — not the model's refusals — is what the government is negotiating over. This is the same logic, pointed inward, as the chip-location mandates working their way through Congress: when the artifact is too dangerous to trust on its own recognizance, control migrates to the channel around it.
Why a developer should care about a policy document#
Because model availability is now a dependency, and dependencies fail. If your agents call Fable 5 — or Opus 4.8, or any frontier model that could plausibly write exploit code on a bad day — the Fable episode is your outage postmortem written in advance. A single contested safety finding, escalated with no shared way to price it, took a production-grade model off the board globally for the better part of three weeks. That is not a reward-hacking benchmark result you can note and route around. It's your provider going dark.
A severity standard is, unglamorously, the thing standing between you and the next version of that. If the four axes hold — if the next Amazon-grade finding gets scored instead of shouted about — the disagreement gets resolved in technical units on a technical clock, and maybe the model stays up while the argument runs. If they don't hold, we do this again the next time a lab and a government reach for different adjectives. The rubric is not a safety win. It's an attempt to make the governance of these models behave less like a coin flip. For anyone building on top of them, that is the more valuable of the two.



