The security library, read in order — from the threat models (OWASP for LLMs and MCP) through the attacks (prompt injection, its escalation to remote code execution, tool poisoning, SSRF), the isolation that contains them (sandboxes and microVMs), the identity & secrets an agent carries, and the defensive & testing tooling that hardens it.
The list reads like a model-safety checklist. Read it again: most of the ten are not the model misbehaving — they're your architecture trusting the model too much. Agents make exactly those entries worse.
OWASP now has a third Top 10 — one scoped to a single protocol. The surprise isn't a new class of AI attack; it's that connecting an agent to MCP servers re-exposes 2010-era web and supply-chain bugs through a channel that auto-executes them.
You cannot patch prompt injection out of a model. The defenses that actually hold treat it as an architecture problem — and start by taking away what a hijacked agent could do.
A classifier that blocks 98% of injections sounds like a fix. Against an attacker who can retry, a nonzero bypass rate isn't a wall — it's a toll. The defenses with real guarantees don't detect the bad instruction at all; they cap what any instruction is allowed to cause.
They get used as synonyms, and that confusion is why teams 'add a guardrail' and stay wide open. A jailbreak attacks the model's policy; prompt injection attacks your application's trust boundary.
Three critical 2026 CVEs — in ModelScope's MS-Agent, Microsoft's Semantic Kernel, and Cursor — share one root cause. The agent filtered the command it was about to run. It never controlled the ground that command would run on.
An autonomous agent found 21 genuine zero-days in FFmpeg for about $1,000. The same technology just made curl kill its bug bounty. Discovery got cheap; disposition didn't.
The worst MCP attacks aren't bugs in a server's code — they're features of a trust model that drops every tool's description into one undifferentiated context. Here's the threat map, and the defenses that actually hold.
The most common serious flaw in MCP servers isn't prompt injection. It's SSRF — the boring, pre-AI bug that sank Capital One — and we just installed it by the thousand.
Agents that write their own code forced an old infrastructure question back into the open — where, exactly, does the security boundary live, and what does it cost to drop it a layer lower?
Three ways to keep an agent's untrusted code off your host kernel — and why the right choice is a triangle of compatibility, cold-start speed, and operational weight, not a security ranking.
The choice isn't speed versus security. It's whether the model is writing code that orchestrates your tools or code that needs the whole operating system — and that picks the security model for you.
For a normal service the threat is a static key leaked to a repo. For an agent the sharper threat is the agent itself being talked into reading its own environment and handing the key to an attacker.
An agent needs two identities at once — proof it is itself, and proof of whose authority it's borrowing right now — and the dangerous failures all live at the seam between them.
The hard part of remote MCP auth was never the login. It's proving a token was minted for *your* server and no one else's — the audience claim that turns a friendly proxy back into a locked door.
Between two spec revisions in 2025, MCP servers quietly stopped being their own authorization servers. The one parameter that change forces your client to send is the whole security story.
For 25 years the web tried to detect bots by behavior and kept losing. Web Bot Auth gives up on detection and asks the bot to sign its name instead — and the big agent makers have already started doing it.
Three open-source tools promise to catch prompt injection before it reaches your agent. Their GitHub status pages tell you more about whether detection works than any benchmark does.
They get filed together as "LLM guardrails," but they guard three different things — format, flow, and content. Picking by stars gets you a tool that protects the wrong layer.
Three open-source tools dominate LLM red teaming — but they aren't rivals. One scans a model, one is a framework for building attacks, one is a CI gate. Pick by layer.
Three ways to scrub names, card numbers, and patient IDs out of a prompt before it reaches a model provider. The hard part isn't detection — it's whether you can ever put the data back.