OWASP tackles AI risk in bold new push

The Open Worldwide Application Security Project now includes an Agentic Top 10, an AI testing guide, and an AI vulnerability scoring tool.

John P. Mello Jr., Freelance technology writer.

As 2025 comes to a close, the Open Worldwide Application Security Project Foundation has ramped up its efforts to provide security teams with tools to secure AI deployments within their organizations, releasing a top 10 list of security risks for agentic AI, an AI testing guide, and a web-based, dynamic AI vulnerability assessment tool.

The OWASP GenAI Security Project said the Top 10 for Agentic Applications 2026 is a globally peer-reviewed framework that identifies the most critical security risks facing autonomous and agentic AI systems. Scott Clinton, co-chair and co-founder of the AI security project, said, “The top 10 is a culmination of hundreds of people’s contributions to get us to the state where we can realistically and confidently give a set of top 10 risks and mitigation recommendations.”

What’s great about the list is it ties back to all of the other research and practical guidance we’ve given over the last year. So it’s really not just giving people a list and set of mitigations, like a typical top 10 list starts out with. We already have a set of goals and a set of tools that they can start to use immediately to start to build this.
Scott Clinton

Here’s what you need to know about the new AI risk initiatives from OWASP — and why they matter.

See webinar: AI Redefines Software Risk: Develop a New Playbook

What’s inside the new Top 10 for Agentic Applications?

AS01 — Agent Goal Hijack: Attackers manipulate an agent’s natural-language input to alter its goals, exfiltrate data, manipulate outputs, or hijack workflows. “LLMs [large language models] can’t reliably distinguish instructions from the related content or context,” Clinton said. “So as we start to see more agents interacting with each other to drive automation, this opens up some big challenges for organizations as they start to roll these out. That’s why it’s a top risk.”

AS02 — Tool Misuse & Exploitation: Agents misuse legitimate tools via prompt manipulation or privilege control, leading to unsafe operations or data leaks. “Tools can be misused in ways that weren’t intended by your data workflow,” Clinton said. “Threat actors can use them to extract and exfiltrate data. They can use them to hijack processes to get access to identities.”

AS03 — Identity & Privilege Abuse: Weak scoping and dynamic delegation allow privilege escalation and impersonation through cached credentials or indirect commands. “What we’ve done is look at identity and privilege abuse in the context of agentic AI,” Clinton said. “For example, there is no discoverability for agents, no way to say that this is an authorized agent. So we came up with a standard that’s being adopted by companies such as GoDaddy from an agent discovery perspective.”

AS04 — Agentic Supply Chain Vulnerabilities: Poisoned or impersonated tools, prompts, models, or external connections compromise agents via unverified dependencies. “If you look around, many companies are supplying you with agents for you to integrate into your applications,” Clinton said. “So now as you think about supply chain security, you have a whole new category of things that you have to worry about, not just libraries."”

AS05 — Unexpected Code Execution (RCE): Unsafe code generation, deserialization, or shell execution can be triggered by crafted prompts or poisoned inputs. “These agents have the ability to generate automation scripts,” Clinton said. “That’s a big change when you think about threat vectors. Now you’ve got folks coming in as threat actors who can drive agents to generate code that can help them infiltrate and exfiltrate the information they’re looking for.”

AS06 — Memory & Context Injection: Adversaries poison RAG stores or context windows to plant false knowledge, bias logic, or trigger risky behaviors. “With agentic AI, we’re looking at different concepts of memory,” Clinton said. “We’re looking at memory within the agent and memory across multiple agents. The idea of contextual memory is something that is a big risk that we are working to find the best practices to help address.”

AS07 — Insecure Inter-Agent Communication: Lack of encryption or authentication enables impersonation, message tampering, or multi-agent exploits. “Communication between different agents is really critical because it can be altered in a way that benefits a threat actor,” Clinton said. “So a person deploying or developing these applications wants to ensure communication between agents is done securely.”

AS08 — Cascading Failures: A single fault or attack propagates across agents via shared memory, tools, or communication channels. “Because we’re giving agents more agency to do things autonomously, that accelerates cascading failures,” Clinton said. “So where we had cascading failures before that may be more controlled because they’re happening through well-understood implementations and architectures, now we’re talking about agents who may be interacting in a way that we don’t have full visibility into.”

AS09 — Human-Agent Trust Exploitation: Attackers exploit user over-trust through deception, impersonation, or fake explanations to gain unsafe approvals. “If you think about a chain of approvals, typically there’s a person in that approval cycle,” Clinton said. “When you’re putting agents in place of that human, that opens up potentially unsafe situations where you can end up with agents approving fraudulent agreements.”

AS10 — Rogue Agents: Malicious agents evolve self-intelligence, replicate, or hijack ecosystems, causing autonomous failures or attacks. “If a bad actor can tell an agent whatever they want them to do, then that’s a big issue,” Clinton said. “I don’t expect us to see that in the near term, but I think, longer term, that becomes more of a risk.”

OWASP’s trifecta on AI risk

In addition to the agentic AI Top 10, OWASP has released a 250-page AI testing guide. “Artificial intelligence has shifted from an innovative technology to a critical component of modern digital infrastructure,” OWASP said.

AI systems now support high-stakes decisions in health care, finance, mobility, public services, and enterprise automation, OWASP said. “As these systems grow in reach and autonomy, organizations need a standardized and repeatable way to verify that AI behaves safely as intended.”

The OWASP AI Testing Guide fills this gap by establishing a practical standard for trustworthiness testing of AI systems, offering a unified, technology-agnostic methodology that evaluates not only security threats but the broader trustworthiness properties required by responsible and regulatory-aligned AI deployments.
OWASP

Jeff Williams, a former chair of the OWASP board and CTO and co-founder of Contrast Security, said OWASP is leading the charge to help organizations take advantage of AI technologies without accidentally becoming wide-open targets. “In the early days of OWASP, we created several very useful projects to help organizations adopt web technologies. All those projects are still thriving today and have spawned numerous child projects,” he said.

OWASP is following the same pattern for AI technologies. The testing guide is important because it makes it possible to test these systems at scale. By bringing experts together to make this knowledge accessible and public, we can rapidly accelerate the state of the art — and it’s never been more critical.
Jeff Williams

OWASP’s AI Testing Guide: A foundation to build on

Frank Balonis, CISO and senior vice president of operations at Kiteworks, said OWASP’s AI Testing Guide is a step in the right direction. “The 32 standardized test cases give teams something concrete to work with,” he said. “The regulatory alignment with the EU AI Act and NIST AI RMF is genuinely useful for board conversations and customer questionnaires” — as a starting point.

In all, it’s a foundation worth building on. [However], any CISO treating this as a checkbox exercise rather than a starting point for continuous adaptation is setting themselves up for a painful lesson.
Frank Balonis

Rosario Mastrogiacomo, chief strategy officer at Sphere Technology Solutions, praised the guide for its structured method for evaluating systems that do not behave like traditional software.

AI systems make context-dependent decisions, learn from interactions, and can take actions that aren’t explicitly coded. Until now, most teams have lacked a consistent approach for testing these behaviors. The guide brings needed discipline to an environment where AI is being adopted rapidly and often without a clear understanding of how it might fail, be manipulated, or create unintended consequences.
Rosario Mastrogiacomo

However, Larry Maccherone, founder and CTO of Transformation.dev, said agentic AI doesn’t need another laundry list of things to test for. “It needs one commandment: An agent must never have more access than the human it represents. Every action must be attributable to both, and the underlying non-AI access control model must be vulnerability-free,” he said.

The guide is important mostly because it exposes a problem. We’re drowning in exhaustive laundry lists that no real security team can possibly execute. Security teams don’t need another encyclopedia of threats. They need a short list of the few things that actually matter.
Larry Maccherone

How AIVSS targets agentic AI risk

AI agents are the next big thing coming from AI, for a variety of reasons. For security teams looking to get a handle on the risk level of AI agents used by their organizations, the foundation has developed the OWASP AI Vulnerability Scoring System (AIVSS).

Contrast’s Williams said the AIVSS assessment tool is an extension to the widely used CVSS standard and tool. “Like CVSS, the AIVSS allows people to analyze AI risks across a set of base factors,” he said.

The AIVSS adds a new set of 10 factors specifically designed to help understand the risk associated with AI. For example, there’s a factor that covers whether the AI has the capability for self-modification. “When you set all the factors, you get a contextual score from 0 to 10, where 0 is not a risk and 10 is ‘Absolutely drop everything critical.’ This allows teams to prioritize their work in AI security and focus on what really matters,” Williams said.

Kiteworks’ Balonis said OWASP’s AIVSS is a good starting point for solving a real problem because security teams have lacked any consistent methodology for measuring AI risk exposure across their portfolios. “A common scoring framework will help organizations prioritize vulnerabilities across different AI implementations,” he said, “though any standardized approach can strain when it hits the complexity of actual enterprise environments with their unique architectures and use cases.”

The agentic AI focus is particularly relevant given how many organizations are rushing autonomous systems into production without fully grasping how tool execution and multi-agent chains expand their attack surface. Quantifying those risks accurately is harder than any scoring calculator can fully capture, but having a shared language and baseline metrics beats the alternative of every organization inventing their own approach from scratch.
Frank Balonis

However, Transformation.dev’s Maccherone, said he doubts that security teams will be flocking to use the tool because it has inherent problems. “Real teams don’t have the resources to score 200 tiny risks,” he said.

Ironically, the scoring may obscure the fact that almost all the meaningful AI risk comes from one class of failure — misaligned identity and overly broad access. You don’t need a score to tell you the truth. If your agents can do more than the user they represent, you’ve already lost.
Larry Maccherone

Maccherone said one of the problems with vulnerability scoring systems is that they optimize for coverage over impact. “They treat all risks as peers when, in practice, a few dominate everything else. AIVSS risks turning AI security into theater, meticulously scoring dozens of edge cases while the real danger gets 1/100th of the focus it needs.”

Sphere Technology’s Mastrogiacomo said he believes the OWASP guide and AVISS tool will help standardize AI security assessments.

AI security today is fragmented, with each organization — and sometimes each team — using its own definitions and methods for evaluating risk. The guide and tool establish a common baseline for what should be tested, how it should be tested, and how risks should be interpreted. This shared framework makes discussions between engineers, security teams, auditors, and regulators more coherent.
Rosario Mastrogiacomo

Keep learning

Get up to speed on the state of software security with RL's Software Supply Chain Security Report 2026. Plus: See the the webinar to discussing the findings.
Learn why binary analysis is a must-have in the Gartner® CISO Playbook for Commercial Software Supply Chain Security.
Take action on securing AI/ML with our report: AI Is the Supply Chain. Plus: See RL's research on nullifAI and watch how RL discovered the novel threat.
Get the report: Go Beyond the SBOM. Plus: See the CycloneDX xBOM webinar.

Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:AppSec & Supply Chain Security