Your Container Is Not A Sandbox

For years, "run it in a container" was the answer to every question about untrusted code, and most of the time it was a lie we got away with. A Linux container is a process with some namespaces and cgroups bolted on. It shares the host kernel. A kernel has a few hundred syscalls, any one of which might have a bug, and a determined process inside a container is one such bug away from the host. We tolerated this because the code inside the container was usually ours — buggy, maybe, but not adversarial.

Agents changed the threat model. When a model writes the code it then executes, the code is, by definition, untrusted at the moment it runs. You did not review it, and you cannot review it at the rate an agent generates it. The failure mode is no longer a crash — it's an agent prompt-injected into running rm -rf against a credential it found in the environment. The shared kernel, an acceptable risk for your own services, becomes the single thin wall holding back a rising tide of code nobody vetted.

So the industry did the honest thing: it started moving the security boundary down a layer. The repos below are best read as a single descent — from a process pretending to be isolated, to a userspace kernel, to a real virtual machine, to a managed cloud that hides the whole apparatus. Each step buys a stronger guarantee and charges for it in latency, complexity, or your own infrastructure.

The Boundary Most People Already Have

Start where most teams actually are: Docker, and a library that makes it usable for code execution.

▟ vndee/llm-sandbox

A lightweight Python library that runs LLM-generated code in isolated containers across Docker, Podman, or Kubernetes backends, with resource limits and language support beyond Python.

★ 1kPythonvndee/llm-sandbox

This is the pragmatic floor. It does not invent a new isolation model; it wraps the one you have and adds the ergonomics — timeouts, memory caps, network isolation, automatic capture of plots — that you would otherwise hand-roll badly. For internal tooling where the blast radius is small, it is the right amount of engineering. Just be clear-eyed about what it is: a container, with all the kernel-sharing that implies. It raises the wall; it does not move it.

Reimplementing The Kernel To Avoid Trusting It

Google's answer to the shared-kernel problem is to stop sharing the kernel.

▟ google/gvisor

An application kernel for containers that intercepts syscalls in userspace, giving VM-like isolation with container-like overhead via its OCI runtime, runsc.

★ 19kGogoogle/gvisor

gVisor is the cleverest position in the whole stack. Instead of letting your container talk to the host kernel, it puts a second kernel — written in Go, in userspace — between the two. The untrusted process makes its syscalls to gVisor, which handles them itself and only touches the real kernel through a tiny, locked-down surface. You get most of a VM's isolation without paying for a VM's boot time. The cost is paid in compatibility and performance corners: some workloads run slower, and some syscalls behave subtly differently. It is the runtime under a great deal of serverless code you've already used without knowing it.

The Real Boundary: A Virtual Machine

If you want the boundary that actually holds — a hardware-enforced one — you want a microVM.

▟ firecracker-microvm/firecracker

Amazon's minimal VMM in Rust, built to boot stripped-down microVMs in roughly a hundred milliseconds — the isolation engine under Lambda and Fargate.

★ 35kRustfirecracker-microvm/firecracker

Firecracker is the load-bearing primitive of this entire category, even when you can't see it. It strips a virtual machine down to almost nothing — no BIOS, no PCI, a handful of devices — so the thing boots in around a hundred milliseconds and costs a few megabytes of overhead, while still giving you a genuine guest kernel behind a hardware virtualization boundary. This is the breakthrough that made "spin up a fresh VM per request" economically sane. Most of the cloud sandbox products you'll meet are, underneath the branding, an orchestration layer wrapped around Firecracker or something just like it.

▟ microsandbox/microsandbox

A self-hosted runtime for launching microVMs in under a second to run untrusted workloads, with built-in MCP support so agents can spin up their own sandboxes.

★ 7kRustmicrosandbox/microsandbox

Microsandbox is the interesting newcomer because it aims that VM-grade isolation directly at the agent use case and keeps it local. It runs standard OCI images inside microVMs, boots fast enough to feel interactive, and — the telling detail — ships an MCP server, so an agent can request its own hardware-isolated sandbox as a tool call. That is the whole thesis of this piece compressed into one feature: the entity that needs the sandbox is now the thing asking for it.

A container shares the kernel; a microVM does not. Everything else in this category is a negotiation over who pays for that difference, and when.

Letting Someone Else Run The VMs

The descent ends where most teams will actually land: paying a service to operate the isolation layer so you never touch a VMM.

▟ e2b-dev/E2B

Open-source infrastructure for running AI-generated code in secure cloud sandboxes, with Python and TypeScript SDKs aimed squarely at code-interpreter workloads.

★ 13kTypeScripte2b-dev/E2B

▟ daytonaio/daytona

A secure, elastic runtime for AI-generated code and agent workflows, spinning isolated sandboxes — dedicated kernel, filesystem, and network — in under a hundred milliseconds.

★ 72kTypeScriptdaytonaio/daytona

E2B and Daytona are competing answers to the same admission: that running microVM fleets safely, at scale, with fast cold starts and clean teardown, is a real operations problem, and most teams should not own it. You get an SDK, a sandbox spins up in well under a second, your agent runs its code, you read back the result, and the environment evaporates. The isolation guarantee is roughly the VM-grade one from the layer above — you are simply renting it. What you trade away is sovereignty: your agents' code now executes on someone else's metal, which is a fine trade for a startup and a hard conversation for a bank.

Where The Wall Goes

Read the stack top to bottom and the argument is plain. The security boundary has been quietly migrating downward — from the process, to a userspace kernel, to the hypervisor — and the agent era is what finally forced the move into the open, because agents are the first "users" who write and run code faster than anyone can audit it. Pick your layer by your blast radius, not by fashion. Internal scripts can live in a hardened container. Anything touching real secrets or running genuinely adversarial output wants a VM boundary, whether you operate it or rent it. The one position no longer defensible is the old reflex — "just put it in a container" — said as if a container were ever a wall, rather than a fence we agreed to respect.

Your Container Is Not A Sandbox

The Boundary Most People Already Have

Reimplementing The Kernel To Avoid Trusting It

The Real Boundary: A Virtual Machine

Letting Someone Else Run The VMs

Where The Wall Goes

Dex Mareno

Dispatches from the machines, in your inbox

Your Container Is Not A Sandbox

The Boundary Most People Already Have

Reimplementing The Kernel To Avoid Trusting It

The Real Boundary: A Virtual Machine

Letting Someone Else Run The VMs

Where The Wall Goes

Dex Mareno

Continue reading

The Evals Are the Product

Memory Stopped Being a Layer

From Framework to Harness

Dispatches from the machines, in your inbox