Break Free from VirusTotal with ReversingLabs Threat IntelWatch AMA Replay

AI vulnerability reporting fails developers

Google and others are inundating OSS maintainers with AI-driven reporting. Are AI-enabled fixes the answer? 

paul roberts headshot black and white
Paul Roberts, Director of Content and Editorial at RLPaul Roberts
Developer stressed

Last week, the X account for FFmpeg, the leading open-source multimedia framework, posted a brief — but telling — observation: “AI generated bug reports without patches are a new challenge for projects built almost entirely by volunteers like FFmpeg.”

What prompted that post: A massive reordering of a decades-old security landscape that is being driven by artificial intelligence. Google, OpenAI, Meta, and other companies have been aggressively pursuing efforts to automate the discovery of software flaws. In 2023, for example, Google’s Project Zero team introduced Project Naptime, an effort to develop a framework for LLM-assisted software security testing (or “fuzzing”) and vulnerability discovery. 

Here’s what you need to know about how AI vulnerability reporting is failing OSS maintainers — and what can be done about it.

See webinar: How Finding Software Tampering Differs From Vulnerabilities

How AI is transforming vulnerability hunting

Through 2024, Google had announced scores of significant vulnerabilities discovered with AI that it was passing along to open-source project maintainers. They included a vulnerability in the critical OpenSSL library (CVE-2024-9143) and an exploitable stack buffer underflow in the SQLite database engine, another widely used piece of open-source software.

As ZDNet reported, OpenAI unveiled Aardvark last month, describing it as “a new agentic security researcher.” Powered by OpenAI’s GPT-5, Aardvark monitors commits and changes to codebases, identifying vulnerabilities and assessing how they could be exploited. It can also propose code fixes for human developers to implement. 

One clear advantage of such tools is that they do not simply automate traditional program analysis techniques such as code fuzzing or software composition analysis (SCA). Rather, they lean into ML-powered reasoning to spot flaws that human security researchers would be unlikely to detect. 

As Google noted, the OpenSSL flaw (CVE-2024-9143) discovered by its AI “has likely been present for two decades and wouldn’t have been discoverable with existing fuzz targets written by humans.” AI excels at analyzing the many different code paths and states that can yield different flavors of vulnerabilities, Google said.

What that means, experts agree, is that developers and open-source maintainers should get ready for a flood of new code vulnerability reports created by AI-powered vulnerability analysis tools. And that’s on top of the more than 200,000 (mostly) human-discovered CVEs registered annually in the past decade. 

Embarrassment of glitches

That’s good news, right? After all, more transparency into commercial and open source code will shine a light on exploitable flaws. As the FFmpeg X post shows, however, a flood of AI-powered vulnerability reports is already overwhelming the small population of mostly unpaid maintainers of open source code responsible for assessing discovered flaws and implementing code fixes. So the news isn’t entirely good.  

Katie Moussouris, founder and CEO of Luta Security, wrote about the problem in a recent post on LinkedIn. 

AI’s real value to the security community lies in its ability not only to automate vulnerability discovery, but also to speed up the formulation and testing of code fixes, Moussouris wrote. “Humans will still have to verify for a while to ensure reliability,” she said.

Moussourris is not the only security expert calling out the AI-induced bottleneck. In a recent ConversingLabs podcast, BugCrowd’s founder, Casey John Ellis, said the biggest challenge AI poses to the open-source community is not its ability to find bugs and propose fixes to them, but getting maintainers to triage and test those fixes. 

The hard part is on the shoulders of the maintainer trying to figure out, like, ‘How do I test this code?’ If they’re getting jackhammered with this stuff from everyone with this type of tool across the internet, then you’re gonna end up with a triage issue on their end.

Casey John Ellis

The right tools — in the wrong hands

Even as it overwhelms open-source maintainers with new vulnerability reports, AI is empowering the bad guys to delve into code looking for weaknesses, Ellis said. 

“I think what [AI] is doing is reducing the ‘You must be this tall to ride’ requirement for attackers,” Ellis said. “It’s gotten people to a place where they can get to impact — they can get to vuln discovery or even exploitation — without having to have gone through the 10 years of just being immersed in compute or the CS degree. … And this is bug-bounty hunters. It’s vuln researchers. It’s the bad guys.”

Can AI be used to also fix the flaws?

With more weight on an already strained vulnerability ecosystem, developers and development organizations need new tools and resources to scale vulnerability discovery, patch development, testing, and deployment.

Here again, AI can help. In September, for example, academic researchers unveiled CVE-Genie (PDF), an AI-enabled framework for automating the work of validating software vulnerabilities. And in October, Google unveiled CodeMender, an experimental platform that leverages Google’s Gemini Deep Think models to power an autonomous agent capable of debugging and fixing complex vulnerabilities. In just the last six months, CodeMender has pushed 72 security fixes to open-source projects, Google said. 

Tools like these, once deployed, will allow software developers and maintainers to focus on “building good software,” Google claims. But experts warn that doing so demands resources — both financial and human — that are lacking. 

Jasmine Noel, a senior product marketing manager at ReversingLabs, said that regardless of how many or how quickly new vulnerabilities are found, the pushback from overworked and stressed developers will be the same.

Rather than using AI to strap a rocket to the current vulnerability discovery process, tech firms should instead use it to “flip the switch,” Noel said. This will help developers and security teams to focus their energy and attention on flaws specific to their code and environments that are being weaponized with malware and to ferret out malicious or suspicious behavior and code changes before deployment. 

While the exact impact of AI-powered tooling remains to be seen, one thing is for sure: Continuing with the same approach to vulnerability management isn’t going to produce different results.

Jasmine Noel
Back to Top