Most of the tooling built to secure AI agents in the last two years watches the door. Prompt-injection classifiers read each message. Guardrail libraries sit in the request path and block the unsafe ones. Runtime monitors log what the agent did and page you when it does something weird. All of it assumes the agent itself is roughly what you think it is, and that the danger arrives as input.

Exabeam's new open-source tool, Praxen, released June 23 under Apache-2.0, starts from the opposite assumption: that the danger is baked into the agent before a single request arrives, and that nobody has actually checked. It calls the idea Agent Behavior Verification — evaluating an agent as a complete system rather than probing it one prompt at a time — and the method is less like a firewall than like a code review that ends in a signed statement of scope.

Declared intent versus the evidence#

Praxen runs a comparison. On one side is what it calls a Worker Remit: a markdown document that states the agent's mission, its authorized tools, the channels and counterparties it may talk to, and the actions it must never take. On the other side is the evidence — the source code, the deployment state, the config, the behavioral logs, whatever points at how the agent is actually built. Praxen reads both and reports where the implementation diverges from the declaration.

The interesting move is the Worker Remit itself. You either write it or have Praxen draft one for you, and in doing so you are forced to answer a question most teams have quietly skipped: what is this agent allowed to do? Not what it does — what it is permitted to do. The absence of a remit is not a blank cell in the report; it is the finding. An agent whose authorized scope was never written down cannot be verified against anything, and that gap is exactly where capability drift hides.

The named patterns Praxen looks for read like a catalog of the ways an agent quietly outgrows its brief: policy-implementation divergence, credential exposure, configuration gaps like undetected loops and missing rate limits, capability drift into unauthorized tools or destinations, unpinned dependencies, and — a nice one — "secondary prompt discovery," where an identity or instructions file gets treated as a system prompt nobody reviewed. Findings are tagged against the OWASP Top 10 for LLM Applications 2025, the OWASP Top 10 for Agentic AI Applications 2026, and OWASP's secure-MCP guide when an MCP config is present, then rolled up into a RAISE maturity score: six categories, zero to five.

The absence of a written remit is not a blank in the report. It is the finding.

An agent auditing an agent#

Here is the part that should make you sit up. Praxen does not ship as a Python library you import. It ships as a plugin for a coding agent — Claude Code or OpenAI Codex. You install it (claude plugin install praxen@open-agent-ai-security), point it at a directory, and tell your coding agent in one sentence to run the analysis. The thing doing the verification is itself a frontier-model agent, reading another agent's code and rendering judgment.

That is either elegant or unnerving depending on your mood, and Praxen is honest enough to print the consequence in its own README: model tier affects RAISE scores, and you should compare only within the same tier. Run the audit with Sonnet and you get one grade; run it with a bigger model and the grade moves. The maturity score is not a measurement in the way a unit-test count is. It is a reading — an informed, OWASP-grounded, reproducible-ish reading, but a reading produced by a language model's judgment about another language model's plumbing.

This is the honest tension at the center of the whole approach, and it's why the framing matters. A Praxen score is a within-team trend line, not a cross-vendor benchmark. Treat it as "did our posture improve since last sprint, graded by the same judge," and it is genuinely useful. Treat it as "our agent scored 4.2, theirs scored 3.8," and you are comparing two different graders' handwriting.

Where it fits#

None of this replaces the guardrail in your request path or the monitor on your logs, and it isn't the same thing as treating the agent as an insider threat at runtime. Those catch the live attack; Praxen never sees a live request. What it catches is the structural stuff that runtime tooling is architecturally blind to — the tool your agent was granted and forgot it had, the dependency nobody pinned, the loop with no ceiling — and it catches it before deploy, as an audit, with the findings mapped to a standard your auditors already recognize.

The repo ships three worked examples, including a teardown of the real Salesforce Help Agent Accelerator, so you can read what a finished report looks like before pointing it at your own stack. That transparency is the right instinct. Agent Behavior Verification is a young discipline with an obvious soft spot — the grader is a model — but it is asking the question the guardrail crowd skipped: not is this input safe, but is this agent the agent you said it was? For most production agents, nobody has ever written down the answer.