There is a default shape to a multi-agent system in 2026, and you have almost certainly reached for it. You give one agent a supervisor's job. It reads the running conversation, decides which specialist to call next, reads what comes back, decides again. Every major framework has a name for it — LangGraph calls it a supervisor, Anthropic calls it the lead agent, others call it the orchestrator — and every one of them points you at it first. It is intuitive: the system is made of intelligence, so of course the thing coordinating it should be intelligent too.

Microsoft just open-sourced a tool built on the opposite instinct, and the instinct is worth taking seriously even if you never touch the tool.

The dumb router#

Conductor is a small MIT-licensed CLI, released in May, that runs multi-agent workflows defined in plain YAML. The agents inside it are ordinary LLMs — it speaks to GitHub Copilot, Claude, the Claude Agent SDK, and experimentally to NousResearch's Hermes. What's different is the layer above them. Conductor's own README states the whole thesis in one line:

Routing uses Jinja2 templates and expression evaluation. First matching condition wins. No LLM in the orchestration loop, no tokens spent deciding what runs next.

That is the entire argument. The agents are smart; the wiring is stupid, on purpose. A workflow is a list of steps — agent steps that call a model, script steps that run a shell command and branch on its exit code or parsed JSON output, set steps that compute a value, terminate steps that stop. Which step runs next is decided by evaluating conditions against the workflow's state, top to bottom, first match wins. It is the routing logic of a Makefile, not a mind.

What the LLM router actually costs you#

The reflex objection is that this is just a workflow engine wearing an agent costume — and for genuinely open-ended tasks, that objection is correct (more on that below). But for the large middle of real production systems, the deterministic router isn't a downgrade. It's a fix for two costs the LLM supervisor quietly imposes.

The first is money, and it's larger than it looks. Anthropic's own writeup of its multi-agent research system reports that agentic setups burn roughly fifteen times the tokens of a normal chat interaction. An LLM supervisor pays that multiplier not once but on every routing decision, and it pays it to answer a question — "who goes next?" — that in a structured workflow has a known answer. You are renting a reasoning model to read a routing table you could have written down.

The second cost is the one that should worry you more, because it compounds. In that same system, Anthropic found that about 80% of the performance variance on their BrowseComp evaluation was explained by token usage alone. Tokens are the scarce resource that buys quality. Every token the supervisor spends deciding what runs next is a token it did not spend on the work. The LLM router isn't just an expensive coordinator — it's actively competing with your agents for the budget that makes them good.

And then there's reproducibility, which barely survives contact with an LLM in the control path. A deterministic router takes the same inputs to the same path every time; you can diff it, version it, and reason about it. A model-driven router samples. Run the same task twice and it may route differently, loop, or mis-route in a way that produces a plausible-looking wrong answer — the exact failure mode that makes these systems so hard to test. Conductor's routing lives in a YAML file in your repo. The supervisor's routing lives in a prompt, a model version, and a temperature — three things you don't fully control.

The inversion#

Here is the non-obvious part. The industry spent 2025 trying to make the orchestrator smarter — better lead-agent prompts, better handoff protocols, better supervisor-vs-swarm topologies. Conductor's bet is that for a huge class of systems the correct move is to make the orchestrator dumber, and spend the intelligence you save inside the agents where it produces answers instead of routing decisions.

That reframes the whole agent-versus-workflow argument. That debate is usually posed as a binary — hand control to the model, or hard-code the pipeline. Deterministic orchestration refuses the binary: keep the model's intelligence inside each step, take it out of the wiring. You get agent-grade work with workflow-grade cost, reproducibility, and auditability. It's the same move CrewAI's Flows make with code-defined @start/@listen edges — the difference is only whether your deterministic routing lives in a data file or in Python.

Where the honesty is#

The thesis has a boundary, and it's task structure. The case for the LLM orchestrator is real and it's Anthropic's: their multi-agent system beat single-agent Opus by 90.2% on an internal research eval, and it did so precisely because research is open-ended — you cannot enumerate in advance which sub-question to chase next, so a model reading intermediate results genuinely routes better than any condition you could write. When you can't write the branches down, a reasoning router earns its tokens.

But most production workflows are not open-ended research. They are: classify this, then depending on the class do one of four known things, then check the result and either finish or retry. That structure is knowable at author time. Routing it with an LLM doesn't buy flexibility you'll use — it buys a token bill that scales with every hop and a system you can't reproduce, in exchange for handling branches that don't exist.

The design question was never "agent or workflow." It's narrower and more useful than that: which layer needs to be smart? For the work itself, the answer is obviously the model. For deciding what runs next, the honest answer — far more often than the default admits — is nothing at all.