Run the same browser task two ways and the meter tells two very different stories. Through Playwright MCP, a roughly ten-step job — log in, navigate, read a table, click through — costs on the order of 114,000 tokens. Through the Playwright command line, the identical task lands around 27,000. That four-to-one gap isn't one blogger's outlier; it shows up in independent benchmarks across early 2026, with some early-adopter reports stretching the ratio to 4–10x on longer runs.
The obvious reaction is to treat MCP as the wasteful option and switch. That reaction is half right and, as a rule, wrong. It's worth understanding where the tokens actually go, because the cheap path buys its savings by taking something away.
Where the 114K goes#
An MCP browser server's job is to make the page legible to a language model. So after every action it hands the model a fresh view of the world: the page's accessibility tree, console output, and in some configurations screenshot bytes. Do that on every step of a ten-step task and you re-pay for the entire page ten times. Layer on the tool definitions — reported at around 13,700 tokens, re-sent on every request — and the context window fills before the agent has done anything clever.
This isn't a Playwright quirk. It's the MCP context-bloat problem in its loudest form. Anthropic's own engineering writeup, Code execution with MCP, documents the general case: an agent that pours tool definitions and intermediate results through the model can burn 150,000 tokens where an agent that writes code to call those same tools uses 2,000 — a 98.7% cut. Browsers just produce the biggest intermediate results on the internet, so the effect that's a footnote elsewhere becomes the whole bill here.
The CLI — and the code-execution pattern behind it — sidesteps this by inverting who holds the page. The model writes a script; the script drives the browser; the page state stays in the execution environment. The model sees only what the script chooses to log or return. No per-step snapshot, no re-sent tree, no accumulating context. Hence 27K.
The CLI isn't cheaper because it's smarter. It's cheaper because it stops showing the model the page.
The cheap path is a blind path#
Here's the part the token counter doesn't print. When you strip the per-step snapshot, you don't just remove cost — you remove the agent's eyes. A model driving through a script can't notice the cookie banner that appeared, the A/B-test variant that moved the button, the redirect that dropped it on a different page. It follows the script it wrote against the page it expected, and when reality diverges it fails without knowing why.
MCP's expensive snapshot is exactly what a general model uses to adapt. It sees the modal, so it dismisses it. It sees the layout changed, so it re-plans. The tokens you're spending are the raw material of recovery. This is the same tension underneath the DOM-tree-versus-pixels debate and the broader computer-use-versus-browser-automation question: how much the agent gets to look is inseparable from how much it costs.
So the two options aren't better-and-worse. They're two failure modes:
- MCP: expensive, but self-correcting. It sees the surprise and handles it.
- CLI / code execution: cheap, but brittle. It runs blind and needs you to have anticipated the surprise.
Choose by recoverability, not by the smaller number#
The real axis is how deterministic your task is.
If the flow is mapped — stable selectors, a login you've automated a hundred times, high-volume repeated runs where 4x compounds into a real invoice — use the CLI or have the model write a Playwright script and call it from code. You already know what the page looks like; paying to re-show the model on every step is pure overhead. This is the same instinct as loading tools on demand instead of all at once: don't put in context what the task doesn't need.
If the flow is exploratory — an unfamiliar site, a UI still in flux, anything adversarial or changing — pay for MCP's per-step vision. You're not overspending; you're buying back the reliability that a comparison like browser-use vs Stagehand vs Playwright MCP ultimately turns on. A blind agent on an unknown page is cheap right up until it's wrong.
The 114K-versus-27K number is seductive because it's a single figure you can optimize. But it's downstream of a design decision, not the decision itself. The token bill is really a measurement of how much the agent is allowed to see — and the right amount of sight is a property of your task, not of your budget. Pick the transport that gives the model exactly as much of the page as the job requires. Then the token count takes care of itself.



