In June 2026, a small security startup called depthfirst pointed an autonomous AI agent at FFmpeg — the media library quietly decoding video inside your browser, your phone, and half the servers on the internet — and let it read. It scanned roughly 1.5 million lines of C. It came back with 21 zero-day vulnerabilities, each with a working proof-of-concept input. One of them, a stack overflow in a service-description-table parser, had been sitting in the code since 2003. The agent found in an afternoon what twenty-three years of human eyes had missed.
The number that should stop you is not 21. It is the price tag: about $1,000. That is what it cost to find every one of them.
Discovery just became a line item#
For thirty years, finding a serious vulnerability was the expensive, scarce, prestigious part of security. It took a skilled researcher weeks of staring, fuzzing, and reverse-engineering, and the payoff — a single named CVE — was worth a conference talk. That economy is over. depthfirst's run is not a lab curiosity; it is the third act of a trend. Google's Big Sleep has reported twenty-plus real bugs in widely-used software, including a critical SQLite flaw (CVE-2025-6965) that threat actors already held and were about to fire — the first time an AI agent cut off a live exploit before it landed. Anthropic's Mythos pulled a sixteen-year-old H.264 bug out of the same FFmpeg codebase for around $10,000.
When the cost of discovery falls this far, discovery stops being the thing you ration. It becomes something you run continuously, cheaply, against everything. The 2026 CVE forecast has crept toward 66,000; June's Patch Tuesday set a record at 198 fixes, one of them a zero-day reported by a coding model. The pipe is filling faster than it ever has.
The bottleneck moved. It did not disappear.#
Here is the part the headlines miss. A vulnerability is not handled when it is found. It is handled when a human confirms it is real, writes a fix that doesn't break something else, gets that fix reviewed, and ships it to every downstream project that embeds the code. AI compressed the first step by three orders of magnitude and did nothing for the rest. Confirmation, patching, and distribution are exactly as slow, as human, and as underfunded as they were in 2015.
So when depthfirst dropped 21 reports on FFmpeg's volunteer maintainers, it didn't hand them a win. It handed them 21 simultaneous emergencies to reproduce, patch, and coordinate — while every application that ships FFmpeg waits for a build. The scarce resource was never the finding. It was the person on the other end who has to do something about it.
The cost of finding a zero-day fell to a thousand dollars. The cost of disposing of one is still a favor a volunteer does on a weekend.
The same capability, pointed the other way#
Now watch that identical technology run in reverse. In early 2026, curl — one of the most-deployed pieces of software on Earth — ended its bug bounty program. Not because it ran out of bugs, but because it drowned in reports of bugs that weren't there. Maintainer Daniel Stenberg's summary was blunt: the share of accurate submissions had collapsed to roughly one in twenty or thirty, and in years of AI-generated reports, not one had found a genuine vulnerability. He called it AI DDoSing open source.
What makes AI slop insidious is precisely what makes AI discovery powerful: fluency. A slop report uses the right terminology, references real functions and code paths, and describes a plausible attack. It looks exactly like the depthfirst report — until you try to reproduce it and there is nothing underneath. And because it looks real, a maintainer cannot ignore it. Every fake still costs the same hours of human triage as a genuine find. The fakes tax the exact pipeline that the real finds are already overloading.
That is the whole trap. Both the signal and the noise now arrive at machine speed, wearing the same clothes, at the same human desk.
What actually has to change#
The tell in the FFmpeg story is easy to miss: depthfirst shipped a working proof-of-concept for every one of its 21 findings. That is the line between a discovery and a slop report — not the prose, the repro. The near-term fix for the flood is not smarter finders; it is making disposition cheaper and making unverified reports worthless. That means a hard requirement for a machine-runnable reproduction before any report enters a human's queue, triage tooling that confirms crashes automatically, and — the unglamorous part — actually paying the maintainers who sit at the chokepoint, because FFmpeg's own security page now has to warn submitters about AI false positives instead of writing patches.
And there is an asymmetry no defensive framing can wish away: the attacker gets the same $1,000 agent. Cheap discovery is neutral — and a found flaw is only one step from a weaponized one, the same short hop from prompt injection to remote code execution that keeps turning agent conveniences into exploits. It only becomes a defensive advantage if the found bug gets triaged and patched before the identical bug reaches someone who wants to use it. When finding is symmetric, the entire contest moves downstream — to the side humans still own, and have not yet learned to staff at machine speed.
The $1,000 zero-day is a gift and a threat wearing the same face. Which one it turns out to be is decided long after the agent finishes reading.



