You ask a model for a JSON object and it returns one wrapped in an apology, or with a trailing comma, or with the field named user_name when your parser wants username. So you reach for "structured output" — and immediately hit a confusion that costs teams real production incidents: the phrase bundles together three completely different guarantees, and most of the tooling only delivers one of them.

The three levels are worth naming, because the entire decision falls out of them:

Here is the thesis, and it's the thing to hold onto: the strong methods give you the first two for free, by construction — and not one of them, ever, gives you the third.

What each method actually promises

JSON mode is the weakest and the most misunderstood. OpenAI's response_format: {type: "json_object"} forces output that parses as a JSON object — and nothing more. It does not know your schema. It will happily return a valid object with the wrong fields, the wrong nesting, or (its classic footgun, if you don't say "JSON" in the prompt) an unbounded stream of whitespace until it hits the token limit. JSON mode buys syntax. That's the whole contract.

Function / tool calling was, for a long time, the real way to get structured data — especially from Claude, which historically had no JSON mode at all and whose recommended pattern was to define a tool with an input_schema and force the model to call it, making the tool's arguments your object. This adds schema shape: you describe the fields and types, and the model fills them in. But a tool schema describes structure, not truth — the model can still pass a wrong-but-valid enum or a hallucinated number into a perfectly well-formed argument.

Strict Structured Outputs is the current state of the art on the closed APIs. OpenAI shipped it in August 2024 (response_format: {type: "json_schema", strict: true}); Anthropic and Google now offer equivalents. The promise is hard schema conformance, and OpenAI's own benchmark frames the jump: a model with Structured Outputs followed a complex schema ~100% of the time, versus under 40% without. That is not a better prompt. It's a different mechanism.

They all do the same trick

The mechanism is the part worth understanding, because it explains both the power and the limits. Strict Structured Outputs, Claude's structured outputs, and the open-source constrained decoders all do the identical thing: they compile your schema into a grammar (a context-free grammar or finite-state machine), and then, at every single decoding step, they mask the model's next-token distribution so that any token which would violate the grammar gets probability zero. The model literally cannot emit a non-conforming token. That's why the guarantee is "by construction" rather than "by training" — it's enforced at sampling time, not hoped for in the weights.

A grammar can stop the model from writing an invalid field. It cannot stop the model from writing the wrong value into a valid one. Masking is a formatting tool, not a reasoning tool.

Which is exactly why semantics stay out of reach. A constraint that allows the enum ["refund", "cancel", "escalate"] cannot stop the model picking the wrong one of the three. A {"price": number} constraint will faithfully emit a hallucinated number. Teams burn days on this: they see "100% schema-valid" on the dashboard and read it as "100% correct," when all the grammar ever promised was that the JSON would parse and fit the shape.

Closed mode vs open grammar

There's a practical fork that decides your stack. The closed APIs hand you a curated version of this power: OpenAI, Anthropic, and Gemini all accept only a subset of JSON Schema — OpenAI drops keywords like default, Gemini restricts you to an OpenAPI-style subset, and the first request with a new schema pays a grammar-compilation latency cost. You cannot hand any of them an arbitrary regex or a custom grammar.

If your constraint isn't expressible as plain JSON Schema — you need provably valid SQL, a phone-number regex, a forced choice among a fixed list, or a domain-specific language — you have to self-host. vLLM exposes guided_json, guided_regex, guided_choice, and guided_grammar, backed by XGrammar, Outlines, or llguidance. Same logit-masking mechanism the big APIs use internally; the open stack just lets you hold the grammar yourself. The decision rule is clean: closed API for JSON-shaped data, self-hosted constrained decoding for anything that needs a richer grammar than JSON Schema can express.

The reasoning debate, and the move that settles it

One live controversy deserves an honest airing. The paper "Let Me Speak Freely?" (EMNLP 2024) reported that format restrictions degrade reasoning — worst under JSON mode — on tasks like GSM8K. The structured-generation crowd at dottxt fired back in "Say What You Mean," arguing the result was an artifact: the study conflated true constrained decoding with API JSON-mode and used a weaker prompt for the structured case; with matched prompts, they found no degradation.

The synthesis both camps can live with: the masking mechanism doesn't erase reasoning ability, but forcing the answer into a rigid shape too early does — if the schema leaves no room to think before the final object, you've amputated the model's scratchpad. So the move that settles it in practice is the one experienced teams already use: let the model reason in free text first, then constrain only the final extracted object (or give the schema an explicit reasoning field to think in). Constrain the output, not the thought. Do that, and structured decoding stops being a tax on intelligence and goes back to being what it should be — a formatting guarantee you no longer have to pray for.