The demo that should have ended the debate took about thirty seconds. In August 2025, researchers at Brave hid a snippet of text inside an ordinary-looking Reddit comment. When a user asked Perplexity's Comet browser to summarize the page, the browser read the hidden text, obeyed it, and — using the user's own logged-in session — opened Gmail, located a one-time passcode email, and posted the code back as a reply to the Reddit thread. There was no malware, no exploit chain, no memory-corruption trick. The payload was English, and the browser did what it was told.

Nearly a year later, the pattern has repeated across every agentic browser on the market, and the companies building them have stopped promising a fix.

The attack keeps working because the input surface has no edge#

Comet was not a one-off. Between late August 2025, LayerX Security disclosed a technique they named CometJacking: a single malicious URL, with the right query parameters, could steer the browser's agent into exfiltrating email, calendar entries, and data from connected services. Perplexity initially triaged the report as "Not Applicable." By October, Brave was back with a stranger variant — "unseeable" prompt injections, where the instruction is faint, near-invisible text baked into an image. A human glancing at the screenshot sees nothing; the model, which OCRs the pixels, reads a command and runs it.

That last one matters more than it looks. It means the injection surface is not "the visible text on a page." It is anything the model can perceive — rendered text, alt attributes, image pixels, PDF layers. You cannot enumerate it, so you cannot filter it.

OpenAI's ChatGPT Atlas, which launched on October 21, 2025, was injectable within days: security researchers showed that a few words dropped into a Google Doc could redirect the agent's behavior. To its credit, OpenAI did not wave this away. In December it published a candid post explaining that it hardens Atlas with reinforcement-learning red-teaming, requires user confirmation before the agent sends messages or makes payments, and advises people to give agents narrow, specific instructions rather than handing over an inbox. In the same breath, it said the quiet part: prompt injection, "much like scams and social engineering on the web, is unlikely to ever be fully solved."

The bug is a trust boundary that was never drawn#

Every one of these exploits has the same shape, and it is not a shape you patch.

A traditional browser keeps two things apart that an AI browser fuses together. On one side is your authenticated session — the cookies, tokens, and logins that let you read your Gmail and move money. On the other is untrusted content — the arbitrary, attacker-controllable text of whatever page you happen to be looking at. In a normal browser these never mix: a Reddit comment cannot reach into your Gmail tab, because the same-origin policy is a hard wall.

An agentic browser tears that wall down on purpose. Its entire value proposition is that one system reads the untrusted page and holds your credentials and can take actions across your logged-in sites. The moment it pours page content and user intent into a single context window, it becomes a textbook confused deputy: a privileged actor taking instructions it cannot authenticate, from a party it cannot trust, using authority that belongs to someone else.

The vulnerability is not a rough edge on the feature. It is the feature.

That is why the language model cannot be trained out of it. You can raise the cost of an attack — better classifiers, RL against known payloads, confirmation gates — and every vendor should. But you are hardening a component whose job is to obey natural language against an attacker whose payload is natural language. As Brave and OpenAI both concede, this is the phishing problem, not the buffer-overflow problem. There is no version number where it goes to zero. It is the same three-ingredient recipe behind every agent data leak — the pattern Simon Willison named the lethal trifecta, now wearing a browser chrome.

Read the mitigations as a confession#

Here is the tell. Look at what the industry actually shipped once the demos landed: confirmation prompts before consequential actions, outbound-domain allowlists, "agent mode" toggles you turn on deliberately, and the standing advice not to give the agent access to your inbox. Every single one of these makes the browser less agentic. They work by subtracting exactly the autonomy that was supposed to be the point.

That is the honest end state, and it is worth saying plainly to anyone building in this space. If you are shipping agentic browsing, do not treat page content as data your model can safely "read." Treat it as hostile input, always. Keep the credentialed session and the untrusted content in separate trust boundaries so an instruction from a web page structurally cannot reach a privileged action. Allowlist egress so a leaked secret has nowhere to go. And keep a human on the trigger for anything that sends, buys, or changes an account — not as a courtesy, but because it is the only part of the design that an injected sentence cannot overrule.

The AI browser was sold as the assistant that finally acts on your behalf. The last year taught us the uncomfortable corollary: an assistant that can act on your behalf can be talked into acting against you, and the talking is the easy part.