Spectra Assure Free Trial
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial
Last week, the X account for FFmpeg, the leading open-source multimedia framework, posted a brief — but telling — observation: “AI generated bug reports without patches are a new challenge for projects built almost entirely by volunteers like FFmpeg.”
What prompted that post: A massive reordering of a decades-old security landscape that is being driven by artificial intelligence. Google, OpenAI, Meta, and other companies have been aggressively pursuing efforts to automate the discovery of software flaws. In 2023, for example, Google’s Project Zero team introduced Project Naptime, an effort to develop a framework for LLM-assisted software security testing (or “fuzzing”) and vulnerability discovery.
Here’s what you need to know about how AI vulnerability reporting is failing OSS maintainers — and what can be done about it.
Join webinar: Empowering Maintainers to Thrive, Not Just Survive
Through 2024, Google had announced scores of significant vulnerabilities discovered with AI that it was passing along to open-source project maintainers. They included a vulnerability in the critical OpenSSL library (CVE-2024-9143) and an exploitable stack buffer underflow in the SQLite database engine, another widely used piece of open-source software.
As ZDNet reported, OpenAI unveiled Aardvark last month, describing it as “a new agentic security researcher.” Powered by OpenAI’s GPT-5, Aardvark monitors commits and changes to codebases, identifying vulnerabilities and assessing how they could be exploited. It can also propose code fixes for human developers to implement.
One clear advantage of such tools is that they do not simply automate traditional program analysis techniques such as code fuzzing or software composition analysis (SCA). Rather, they lean into ML-powered reasoning to spot flaws that human security researchers would be unlikely to detect.
As Google noted, the OpenSSL flaw (CVE-2024-9143) discovered by its AI “has likely been present for two decades and wouldn’t have been discoverable with existing fuzz targets written by humans.” AI excels at analyzing the many different code paths and states that can yield different flavors of vulnerabilities, Google said.
What that means, experts agree, is that developers and open-source maintainers should get ready for a flood of new code vulnerability reports created by AI-powered vulnerability analysis tools. And that’s on top of the more than 200,000 (mostly) human-discovered CVEs registered annually in the past decade.
That’s good news, right? After all, more transparency into commercial and open-source code will shine a light on exploitable flaws. As the FFmpeg X post shows, however, a flood of AI-powered vulnerability reports is already overwhelming the small population of mostly unpaid maintainers of open source code responsible for assessing discovered flaws and implementing code fixes.
In a post on X on October 31st, FFmpeg called out a patch that it issued for a heap buffer overflow flaw discovered by Big Sleep in an obscure piece of code used to render the 30-year-old game Rebel Assault."We take security very seriously but at the same time is it really fair that trillion dollar corporations run AI to find security issues on people's hobby code? Then expect volunteers to fix."
FFmpeg posted on November 3:
The core of the debate is Google should send patches. Billions of dollars of AI infrastructure and highly paid security engineers used to pressure volunteers into fixing issues for free.
The FFmpeg posts have spurred a debate among cybersecurity luminaries.
Robert Graham, for one, expressed sympathy for FFmpeg's situation, saying the volunteer organization is "justifiably upset," but argued that attackers have some of the same capabilities as Google and will use it to find and exploit software flaws. So Google should not be discouraged from doing what it can to find and fix those flaws before they fall into the hands of malicious actors.
Robert GrahamThey are just finding them shortly before hackers do, being only six months to a year ahead of hackers figuring out the same AI tricks.
Others disagree. Katie Moussouris, founder and CEO of Luta Security, wrote about the problem in a recent post on LinkedIn, arguing that AI’s real value to the security community lies in its ability not only to automate vulnerability discovery, but also to speed up the formulation and testing of code fixes, Moussouris wrote. “Humans will still have to verify for a while to ensure reliability,” she said.
Moussouris is not the only security expert calling out the AI-induced bottleneck. In a recent ConversingLabs podcast, BugCrowd founder Casey John Ellis said the biggest challenge AI poses to the open-source community is not its ability to find bugs and propose fixes to them, but getting maintainers to triage and test those fixes.
Casey John EllisThe hard part is on the shoulders of the maintainer trying to figure out, like, ‘How do I test this code?’ If they’re getting jackhammered with this stuff from everyone with this type of tool across the internet, then you’re gonna end up with a triage issue on their end.
Even as it overwhelms open-source maintainers with new vulnerability reports, AI is empowering the bad guys to delve into code looking for weaknesses, Ellis said.
“I think what [AI] is doing is reducing the ‘You must be this tall to ride’ requirement for attackers,” Ellis said. “It’s gotten people to a place where they can get to impact — they can get to vuln discovery or even exploitation — without having to have gone through the 10 years of just being immersed in compute or the CS degree. … And this is bug-bounty hunters. It’s vuln researchers. It’s the bad guys.”
With more weight on an already strained vulnerability ecosystem, developers and development organizations need new tools and resources to scale vulnerability discovery, patch development, testing, and deployment.
Here again, AI can help. In September, for example, academic researchers unveiled CVE-Genie (PDF), an AI-enabled framework for automating the work of validating software vulnerabilities. And in October, Google unveiled CodeMender, an experimental platform that leverages Google’s Gemini Deep Think models to power an autonomous agent capable of debugging and fixing complex vulnerabilities. In just the last six months, CodeMender has pushed 72 security fixes to open-source projects, Google said.
Tools like these, once deployed, will allow software developers and maintainers to focus on “building good software,” Google claims. But experts warn that doing so demands resources — both financial and human — that are lacking.
Jasmine Noel, a senior product marketing manager at ReversingLabs, said that regardless of how many or how quickly new vulnerabilities are found, the pushback from overworked and stressed developers will be the same.
Rather than using AI to strap a rocket to the current vulnerability discovery process, tech firms should instead use it to “flip the switch,” Noel said. This will help developers and security teams to focus their energy and attention on flaws specific to their code and environments that are being weaponized with malware and to ferret out malicious or suspicious behavior and code changes before deployment.
Jasmine NoelWhile the exact impact of AI-powered tooling remains to be seen, one thing is for sure: Continuing with the same approach to vulnerability management isn’t going to produce different results.
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial