Open the hood of almost any coding agent and the first thing you find is a wall of text. Claude Code greets the model with a system prompt of roughly 10,000 tokens; OpenCode is in the same range, Cline around 7,000. The prompt sets the persona, enumerates the tools, lays down the guardrails, and explains — at length — how the agent is supposed to think. The industry's working assumption has been that this is where quality comes from: a better agent is a more carefully engineered harness wrapped around the model.

Pi is a bet that the assumption is backwards. Its entire system prompt fits in under 1,000 tokens. It ships four tools — read, write, edit, bash — and nothing else. Built by Mario Zechner (@badlogic) as the agent at the core of the open-source OpenClaw assistant, it's become the daily driver of Armin Ronacher, who created Flask and Jinja2 and has written the clearest case for why less scaffolding wins.

The premise: the model already knows this job#

The argument is one sentence long, and everything else follows from it. Frontier models are already reinforcement-learned to behave as coding agents. They have seen the loop — read a file, run a command, read the error, edit, run again — millions of times in post-training. So a 10,000-token prompt that carefully explains how to be a coding agent is, in large part, re-teaching the model something it already does natively.

That would be merely wasteful if tokens were free. They aren't. Context is the one genuinely scarce resource in an agent, and it's zero-sum: every token you spend telling the model how to behave is a token you can't spend on the actual codebase, the actual error, the actual task.

Every instruction token is a task token you don't get back. The scarce resource was never instruction. It was context.

Framed that way, the heavy harness isn't a safety margin — it's a standing tax on every request, paid before any work begins.

Lazy skills: keep the menu, drop the recipes#

The obvious objection is that four tools and a haiku-length prompt can't possibly cover real work. Pi's answer is a loading trick it calls lazy skills. Every capability the agent has keeps only a one-line description in context on every turn — the menu. The full instructions and tool schemas — the recipes — load only when a skill is actually invoked.

So the agent always knows what it could do without carrying the cost of knowing how to do all of it simultaneously. The menu stays in context permanently and costs almost nothing; the recipe enters context only for the one dish being cooked, and leaves when it's done. Capability stops scaling with baseline context cost. That's the move that lets a sub-1,000-token prompt survive contact with a real repository.

And when a capability doesn't exist yet, Pi doesn't send you to a plugin marketplace. It writes the tool. Because the four primitives include write and bash, the agent can author a TypeScript extension at runtime — registering a new tool, hooking a lifecycle event, adding a command — and then use it. The extension system is how a minimal agent stays unbounded: its ceiling isn't the tools it shipped with, it's the tools it can build on demand.

Why this is a bet and not a law#

It would be easy to read this as "minimal always beats maximal," and that's not the claim. Pi's design is downstream of one specific condition: the model has to be strong enough to run agentically with almost no scaffolding. Remove the harness from a model that isn't, and you don't get elegance — you get a capable engine with no steering.

This is the real tension in the whole approach. On the strongest current models, the heavy system prompt is largely redundant and its token cost is pure overhead, so stripping it is free performance. On weaker or older models, that same prompt is doing load-bearing reliability work — the guardrails and worked examples are what keep the thing on the rails — and removing them removes the reliability with them. Pi is a bet that model capability has crossed the line where scaffolding flips from asset to tax, and that the line will keep moving in its favor as models improve.

If that bet is right, most of what agent frameworks have spent two years building — elaborate prompts, curated toolsets, careful orchestration layers — is depreciating scaffolding, valuable mainly for the models that need propping up. It's the strong form of the industry's slow shift from framework to harness, and a pointed rebuttal to the maximalist school of harness engineering: Pi's answer to "what belongs in the harness" is almost nothing. The durable design isn't the biggest harness. It's the smallest one the model can carry, with the room it saves handed back to the work.