Self-replicating Shai-hulud worm spreads token-stealing malware on npmRead Analysis

CVE-Genie raises the stakes in the vulnerability management race

While security defenders welcomed the new vulnerability-validation tool, others stress it can be just as useful for would-be attackers.

Vulnerability management race

An AI-enabled framework that automates the process of reproducing software vulnerabilities at scale could transform how security researchers validate flaws and test defenses. On the other hand, it also heightens the risk that attackers could leverage it to speed up exploit development.

The framework, dubbed CVE-Genie (PDF), was developed by researchers from Boston University, UC Santa Barbara, Arizona State University, and the University of New South Wales in Sydney as a way to test and evaluate fuzzers, scanners, patch validation systems, and other security tools.

Here’s what defenders need to know about CVE-Genie.

Download Today: Software Supply Chain Security for Dummies

Multi-agent framework marks ‘leap forward’

The system uses multiple AI agents to gather relevant code, advisories, patches, and other Common Vulnerabilities and Exposures data, rebuild the vulnerable environment, generate proof-of-concept exploits, and verify that vulnerabilities can actually be triggered as described.

In tests on 841 vulnerabilities disclosed between June 2024 and May 2025, CVE-Genie successfully reproduced 428, or 51% of them, across 22 programming languages and 267 projects, at a cost of a mere $2.77 per CVE. 

CVE-GENIE marks a major leap in automated vulnerability reproduction, said Frankie Sclafani, director of cybersecurity enablement at Deepwatch. The framework’s end-to-end automation, verifiable exploit generation, and broad generalization could redefine industry benchmarks, he predicted. 

By transforming vulnerability management from theoretical to empirical, based on provable exploitability, it promises to revolutionize security operations with faster, more data-driven prioritization and remediation.

Frankie Sclafani

New tool is EAGER to impress

The researchers designed CVE-Genie around five principles called EAGER: generating exploits from advisories, assessing exploit effectiveness, generalizing across vulnerability types, enabling end-to-end automation, and rebuilding vulnerable environments.

Structurally, CVE-Genie is composed of a processor, a builder, an exploiter, and a capture-the-flag verifier. The processor gathers all CVE data needed for reproducing a vulnerability, including source code, patches, advisories, and vulnerability description. The builder sets up the vulnerable environment, ensuring correct version and dependencies; the exploiter is the component that produces or reproduces a working exploit in the environment, and the CTF verifier verifies the exploit for correctness.

In tests, the researchers found that CVE-Genie worked best at reproducing emergency engineering binary (eeb) vulnerabilities such as XSS, CSRF, path traversal, and SQL injection in the languages Python, JavaScript, PHP, Ruby, TypeScript, and others. The lead author of the paper, Saad Ullah, a research student at Boston University, told RL Blog that CVE-Genie was especially effective with projects that had clear setup instructions and working proof-of-concept (POC) exploits. 

Conversely, the bugs that CVE-Genie had a harder time reproducing were memory safety bugs, UI-dependent flaws, and concurrency issues. Often, failures also had to do with poor documentation and complex build processes, Ullah said.

A demonstration of feasibility

Security experts say CVE-Genie demonstrates that automated CVE reproduction is feasible at scale using good-quality CVE data, especially for interpreted language and web-style bugs and somewhat less so with complex and compiled projects.

The PoCs it produces can facilitate realistic threat emulation exercises that test an organization’s defensive posture against confirmed real-world threats, he said.

CVE-Genie is an indispensable tool for penetration testers and red team operations due to its capacity for generating verifiable exploits. This exploit simulation offers a more authentic and rigorous evaluation of a security team’s ability to detect, respond to, and remediate live attacks.

Frankie Sclafani

Mayuresh Dani, manager of security research at Qualys, said the ability to reliably reproduce vulnerabilities is crucial to effective security research because it transforms theoretical security flaws into code that can be verified and quantified. The process enables researchers to validate vulnerability claims, develop accurate patches, and create effective defensive measures, Dani said.

These capabilities come at a critical moment; attackers are shrinking the window between when a vulnerability is disclosed and when it is exploited. In 2022, adversaries needed about 32 days on average to launch an attack against a freshly disclosed CVE. By 2023, the lag had dropped to just five days, Google’s Mandiant security team found. Another, more recent study, by SonicWall, showed that when an exploit becomes available for a new CVE, threat actors begin leveraging it in attacks in less than 48 hours.

This has created an urgency for rapid vulnerability assessment and response. If this process is automated or augmented with automated processes, it can reduce complexities, improve response times and in some cases achieve improved results.

Mayuresh Dani

Automation could compound the security challenge

But as useful as CVE-Genie might be for defenders and security researchers, it is an equally handy tool for attackers. And that could be a huge problem.

Open sourcing exploit generation tools will introduce significant security considerations that extend beyond their intended research applications and usage, Dani warned. An automated exploit-generation framework such as CVE-Genie is going to significantly lower the barriers to entry for malicious activities by reducing the technical expertise required to launch sophisticated attacks. “Such tools, [like] a sword, are made for only one purpose. It will only be as good or evil as the one who wields it,” Dani said.

Jeff Williams, CTO and founder of Contrast Security, said CVE-Genie could very likely reduce the time between vulnerability disclosure and weaponization. Currently, attackers only exploit a relatively small percentage of all disclosed vulnerabilities. In fact, less than 1% of all CVEs are ever exploited in the wild, Williams said. 

Imagine the carnage if that number goes up to 50%. No security team anywhere will be able to handle the onslaught.  And no development team will be able to remediate fast enough.

Jeff Williams

It’s also important to note that risk doesn’t come from CVEs alone. Vulnerabilities in custom code often present an even bigger threat, and it’s a certainty that attacks are using AI to explore that space as well, Williams said.

Faster exploits mean defenders need to act quickly

Boston University’s Ullah conceded that tools such as CVE-Genie will likely lead to faster vulnerability exploits. But the takeaway for defenders is that they need to be more agile than attackers, he said. 

Defenders will have to be faster than the bad guys. Defenders need to use this tool to make sure they find those vulnerabilities in their own systems and patch them before the bad guys can do it.

Saad Ullah

Ullah said frameworks such as CVE-Genie underscore the need for organizations to reach the highest maturity levels in their vulnerability management programs. As attackers gain the ability to weaponize flaws ever more quickly, defenders that rely on manual patch cycles or ad hoc processes will find themselves outpaced. Intelligence-driven patching, and continuous validation of defenses are going to become increasingly essential, he said.

Also, while tools such as CVE-Genie highlight how quickly vulnerabilities can be turned into working exploits, they underscore the fact that vulnerability remediation alone is no longer enough to manage software risk. The growing number of vulnerability disclosures, the constantly shrinking exploit windows, and the frequent lack of context and actionable information around CVEs has made it increasingly hard for organizations to remain on top of software threats.

Casey Ellis, founder of Bugcrowd, said that demonstrating CVE exploitability can be useful in the context of reachability analysis or assessment of true, exploitable risk. It can also help defenders separate theoretical vulnerabilities from the ones that are truly exposed and exploitable in their environment. 

But for the better part, a solid asset inventory, configuration management database and/or SBOM is going to be more useful for the task of creating a patch burn-down list. I honestly see [CVE-Genie] as being most powerful as a true offensive capability tool, and potentially for the validation of CVEs in COTS products or software systems where an SBOM isn’t available.

Casey Ellis
Back to Top