Every argument for code-execution agents ends in the same uncomfortable place. You explain that letting the model write a program that calls your tools — instead of emitting one tool call per turn and piping each result back through the context window — is dramatically cheaper. Anthropic put one workflow's cost at ~150,000 tokens down to ~2,000; Cloudflare reported similar cuts. Then someone asks the only question that matters in production: and where does that model-written code run? The honest answer has always been a sentence that undoes half the enthusiasm — "in a sandbox you now have to operate, secure, and pay for."

Microsoft Agent Framework's CodeAct, one of the headline features from BUILD 2026, is interesting less for the pattern than for what it does to that sentence.

The pattern, briefly#

MAF reached 1.0 GA on April 2, 2026, folding AutoGen and Semantic Kernel into one supported runtime. CodeAct is its take on code execution. Instead of the model choosing a tool, waiting, and choosing the next, it writes one short Python program that calls your tools through a call_tool() interface, runs it once, and returns a consolidated result. A five-step plan that used to be five model turns — each one re-reading the whole transcript to decide the next move — becomes a single turn that emits a single program.

Microsoft's numbers on representative multi-tool workloads: about 52% lower end-to-end latency and 64% fewer tokens. Neither figure is surprising once you see the mechanism. The tokens vanish because a 4,000-row intermediate result gets filtered inside the program instead of being narrated back to the model. The latency drops because you've deleted round-trips, and model round-trips are the slow part of most agent loops. As Microsoft puts it, modern agents are often bottlenecked not by model quality but by orchestration overhead.

The protocol was never the bottleneck and the model usually isn't either. For multi-tool agents, the tax is the round-trip — and code execution deletes it.

The part that's actually new#

None of that is unique to Microsoft; the code-execution idea has been circulating for months. What CodeAct ships alongside it is the answer to where the code runs, and the answer is the real news: a fresh Hyperlight micro-VM per call.

Hyperlight is Microsoft's micro-VM runtime, open-sourced in 2024, built to run untrusted code in a hardware-isolated guest without a full OS underneath. Its headline property is startup time: a new micro-VM boots in 1–2 milliseconds, against >120ms for a conventional VM. CodeAct's alpha agent-framework-hyperlight package uses this to run each execute_code call in its own fresh guest — its own memory, no host filesystem beyond what you mount, no network beyond the domains you allow.

Sit with the number. Two milliseconds is faster than most of the tool calls the code will make. It is far below the noise floor of a single model turn. That means the historical objection to code execution — you'd be crazy to run model-written code without strong isolation, and strong isolation is slow and expensive — quietly stops being true. You can afford a brand-new, hardware-isolated VM for every single tool-calling program, and it costs you nothing you'd notice. The sandbox was the hard part of this pattern. Hyperlight makes it the cheap part.

That reframes the gVisor-vs-Firecracker-vs-Kata conversation for agents. Those weigh isolation strength against startup cost because you're keeping sandboxes warm and reusing them. At a 1–2ms cold start you don't reuse anything — you throw the VM away after each call and boot a new one, which is stronger isolation than a pooled sandbox, not weaker. Disposability becomes the security model. (It's the same micro-VM-vs-Wasm-vs-isolate trade, resolved by making the micro-VM as cheap as the isolate.)

What moves when the wall moves#

Here's the twist worth internalizing before you flip CodeAct on: the cost didn't disappear, it relocated. A fresh VM per call means no state survives between calls unless you explicitly mount it. The convenience of one-tool-call-per-turn was that the conversation was your state — every result sat in the transcript, available to the next step for free. CodeAct takes that away. Intermediate data lives inside a program that ran and vanished. If step three needs what step one produced, that has to happen within the same program, or be written to something you deliberately persisted.

So "let the model write code that calls your tools" turns into a genuine design question rather than a free efficiency win. What do you mount? What network egress do you allow? Which workflows are actually a single program, and which are inherently interactive and shouldn't be collapsed? The rule of thumb Microsoft offers — CodeAct pays off at three or more tools per task, especially when data flows between steps — is really a statement about where that design cost is worth paying. Below it, a plain tool call is simpler, and you keep your state in the conversation where it was convenient all along.

The broader signal is that code execution is graduating from a clever token-saving trick into default agent plumbing, and it's doing so because the infrastructure caught up to the idea. For two years the pattern's ceiling was the sandbox. Someone just raised it by two orders of magnitude — and handed everyone a new decision to make about what their agents are allowed to touch.