Spectra Assure Free Trial
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial
An earlier RL Blog post about the OpenClaw debacle looked at the effects that such AI agent skills — and skills repositories such as ClawHub — can have on the software supply chain. The conclusion: The agentic skills marketplace is a source of risks much the same as npm and the Python Package Index (PyPI) are. Also, the fundamentals of zero trust, provenance vetting, and dependency management still apply.
But OpenClaw and other AI agents also bring novel application security risks that can’t be managed with existing security playbooks and new dangers that security teams haven’t encountered before.
And the problems aren’t all from scrappy open-source projects. This month, Microsoft confirmed that a bug caused its Copilot AI assistant to summarize confidential emails even if data loss-prevention policies explicitly designed to restrict access by automated tools had been applied.
With AI agents expanding attack surfaces and triggering brand-new control problems, AppSec must rethink some basic assumptions about how software behaves so that their threat models and governance remain meaningful.
Here’s why it will be difficult to manage AI agent risk with legacy AppSec tooling — and what you can do until standards and frameworks catch up.
[ See webinar: Trust But Verify: Secure the AI You Build, Buy and Deploy ]
OpenClaw has become the poster child of the risks popping up on the agentic AI landscape, but it is just the most recent example. It’s gotten a lot of attention because it neatly fits the narrative that industry watchers have been telling for months — and because AI agents are being deployed rapidly.
A February 2026 report from the Cloud Security Alliance (CSA) on autonomous AI agents found that, although 40% of organizations already have agents in production, only 18% are highly confident their identity and access management systems can handle them. A recent survey from NeuralTrust found that, while 73% of CISOs are very or critically concerned about AI agent risks, only 30% have mature safeguards in place. And earlier research from the CSA showed that 34% of organizations with AI workloads had already experienced an AI-related breach.
OpenClaw demonstrated that, without the right controls, AI agents don’t respect permissions boundaries. And they execute instructions in unpredictable ways that are nothing like traditional software’s actions, said the security researcher Jamieson O’Reilly, who discovered in OpenClaw hundreds of exposed control servers leaking credentials and backdoors in the top downloaded skill and in many other skills available on the platform. As O’Reilly put it, a notes app shouldn’t be able to delete photos, and a calendar app shouldn’t be able to read someone’s bank statements.
“We’ve had those boundaries for years. Agentic AI blows the walls away. It’s got access to everything, everywhere, at all times.”
—Jamieson O’Reilly
This is particularly dangerous considering the autonomous nature of agentic AI. Traditional software doesn’t act without a predefined trigger or an action taken by a user. Not so with an agent, said Dhaval Shah, senior director of product management at ReversingLabs (RL).
“An agent doesn’t wait for a human to click a button; it proactively executes the malicious intent across any APIs it has access to. We are still figuring out how to build behavioral guardrails that don’t stifle the utility of the agent.”
—Dhaval Shah
Traditional software is deterministic: Give it the same input and you’ll get the same output. This makes it fairly simple to trace execution paths and predict behavior. But AI agents interpret natural language, and that interpretation can vary based on context, phrasing, and model state.
And when a nondeterministic, unpredictable system that acts on its own has access that traditional permission models can’t constrain, you have a big problem, said Graham Neray, co-founder and CEO of the security firm Oso.
“When deterministic code calls APIs, we have decent permissions systems. When humans predictably use tools, we have decent permissions systems. But when autonomous and nondeterministic systems that make decisions based on unstructured inputs call APIs — we’re still figuring that out.”
—Graham Neray
The OWASP Top 10 for Agentic Applications identifies excessive agency — agents granted more permissions than necessary or lacking proper human-in-the-loop controls — as a top risk category. The CSA’s research suggests the problem is widespread: Most organizations still rely on static API keys, username/password combinations, and shared service accounts to authenticate their agents — even though those same credentialing patterns caused problems a decade ago with service accounts and automation scripts.
But even when organizations configure restrictions, there’s no guarantee agents will respect them. The Copilot bug makes that point — and the agent was summarizing those confidential emails undetected for nearly a month.
Organizations need to double down on the principles of zero trust, say many experts, including Alessandro Pignati, an AI security researcher at NeuralTrust and a contributor to the OWASP GenAI Top 10. “Give your AI agent the absolute minimum permissions it needs to do its job. If it only needs to read from one database, don’t give it write access to your entire system,” he wrote recently about OpenClaw. He advises running AI agents in isolated, controlled environments.
“If an [isolated] agent is compromised, the damage is contained within the sandbox and can’t spread to your network.”
—Alessandro Pignati
And maintaining strict boundaries requires behavioral monitoring and controls, said RL’s Shah.
“Don’t focus solely on the agent’s brain; focus on its ‘hands.’ What APIs can it reach? What data can it read? Agent permissions must be heavily restricted and continuously monitored for anomalous behavior.”
—Dhaval Shah
One risk identified in recent research is that agentic AI can give attackers a shortcut for maintaining persistence in systems: They don’t need to maintain a foothold because the agent does it for them. All it takes is poisoning the agent’s memory or context once, Oso’s Neray said, and the corruption persists across sessions, influencing every future interaction.
“One bad input today can become an exploit chain next week. It’s like SQL injection, but instead of code [that] you inject into a database query, you inject goals into an AI’s task list.”
—Graham Neray
Zenity researchers demonstrated how this plays out. OpenClaw’s persistent context (stored in a file called SOUL.md) can be modified and reinforced using scheduled tasks, creating a long-lived listener for attacker-controlled instructions. The backdoor persists even after the original entry point is closed. From there, the compromise can be escalated by using the agent itself to deploy a traditional command-and-control implant on the host — transitioning from agent-level manipulation to complete system-level compromise.
O’Reilly described a variant he calls “reverse prompt injection,” which involves planting fake memories in an agent’s context. The idea is that an attacker can compromise an agent’s API key and then post as that agent, and the agent presumably will trust those posts because they appear to come from itself.
“We don’t know what effects it could have on agents. But I think it probably would trust it a little bit more if it thought that it posted it.”
—Jamieson O’Reilly
Security teams should be thinking about context validation the same way they think about input validation. Memory and context aren’t just features; they’re attack vectors.
Another risk arising with agentic AI is that traditional scanning and detection tools are simply unprepared to look for AI agents, let alone the weaknesses within them, wrote Christopher Ijams recently in his Substack ToxSec, which focuses on AI security issues.
“The traditional security stack wasn’t built for this. Firewalls don’t stop prompt injection. EDR doesn’t flag malicious skills. SIEM doesn’t correlate agent-to-agent communication patterns.”
—Christopher Ijams
And simply enumerating where and when agents are operating in an environment is beyond a lot of organizations. The CSA reports that only 21% maintain a real-time registry or inventory of their agents, and another 32% plan to build one within the next year. But the rest, accounting for nearly half, either rely on outdated records or have no registry at all.
RL’s Shah said the first step toward governing agent risks is discovery and inventory.
“Shadow AI is the new shadow IT, but with the potential for much faster, automated damage. You cannot secure what you cannot see.”
—Dhaval Shah
The goal isn’t necessarily to find the agentic behavior to block and ban the agents, he said. But discovery is crucial to bring the agents into the light and start analyzing and controlling the security of their artifacts, permissions, and behavior.
The detection gap is wider for companies that rely on the legacy AppSec tools such as dynamic and static application security testing (DAST and SAST) and software composition analysis (SCA), which are built to look for vulnerable patterns in code and dependencies and can’t examine the natural-language instructions that drive agentic behavior for risks or see how that behavior plays out. That’s why Shah says organizations must find ways to look for the hallmarks of agentic behavior such as continuous, high-volume API calls to the providers of the large language models or unrecognized traffic to registries such as ClawHub.
The recent ToxicSkills study from Snyk shows just how much is slipping through. Researchers found prompt injection vulnerabilities in 36% of ClawHub skills and identified 1,467 malicious payloads across the ecosystem, none of which were caught by traditional scanners. Detecting them required specialized analysis of natural-language instructions, but looking for malicious instructions is just one piece of the puzzle.
That’s why deep artifact inspection, grounded in runtime context, is so important, Shah said.
“The detection gap shrinks when you look at the complete, holistic software package rather than just isolated components. You can’t just look at the plaintext instructions; you have to look at the entire compiled deployment package. What underlying Python scripts is the agent calling? Are there hidden binaries or hardcoded secrets embedded in the deployment artifact?”
—Dhaval Shah
Oso’s Neray added that better detection at runtime should help in enforcing controls. He recommends putting a control layer between agents and the tools they touch that enforces authorization on every action regardless of what the model thinks it should do.
“Log every tool call with full context: user, requested action, resource, permission evaluated, outcome. Detect anomalies like rate spikes, unusual tool sequences, unusually broad data reads. And most important, have a way to stop it immediately. Not ‘Stop it after we investigate.’ Stop it now. Throttle, downgrade to read-only, or quarantine. You can always turn it back on.”
—Graham Neray
The regulatory landscape has to catch up, but that work has begun. The National Institute of Standards and Technology recently announced its AI Agent Standards Initiative, aimed at ensuring that agents “can function securely on behalf of its users and can interoperate smoothly across the digital ecosystem.” The agency’s Center for AI Standards and Innovation is actively soliciting input on securing AI agent systems, with comments due by March 9.
But standards take time to develop and longer to adopt, and the risks are already here. Security leaders have four suggestions on how to deal with agentic risk in the meantime:
1. Use threat modeling frameworks such as RAK: Many advocate that organizations use the Root, Agency, Keys framework to evaluate agent risk. Described in depth by Manveer Chawla, co-founder of Zenith AI, and visualized in a viral slide deck shared by Eduardo Ordax, AI lead at Amazon Web Services, RAK breaks down agent risk into three types. Root represents host-level compromise risk. Agency represents unintended autonomous execution. Keys represents credential exposure. Alone, each type of risk is a problem; all three together could be devastating. Some of the most immediate countermeasures for each factor include containerization for Root, controlling flags and actions for Agency, and brokering authentication for Keys.
2. Demand skills provenance and transparency: Treat the skills ecosystem as you would any unvetted open-source dependency. Shah recommends asking pointed questions of any registry or vendor: Are they performing deep binary and artifact analysis on the components they host? How are they handling secrets? Are they isolating, encrypting, and rotating credentials? And most critically, can they prove who wrote a skill, what it contains, and that it hasn’t been tampered with since publication? “We should be demanding comprehensive SBOMs and AI-BOMs as a prerequisite for procurement,” he said.
3. Inspect natural language artifacts before deployment: OpenClaw showed us that malicious instructions embedded in natural-language skill files are a big risk that’s likely to become prevalent. While traditional scanners can’t see them, a growing set of open-source tools can. Snyk’s mcp-scan and Cisco’s AI Skill Scanner both analyze agent skills for prompt injection vulnerabilities and credential exposure. And ClawShield offers configuration auditing and hardening. None is a silver bullet, but they’re a starting point for examining what legacy AppSec tools can’t see.
4. Assume agent compromise and plan accordingly: The autonomy and nondeterministic nature of agents make it tough to deem them safe and let them operate. An agent that’s running safely today might not be tomorrow. In a recent OpenClaw security guide, Semgrep’s security team distilled this to a core principle: “You cannot secure the reasoning layer; you must sandbox the execution layer.” Design systems so that a compromised agent can’t spread risk across the entire business. This starts by leaning on sandboxing methods, minimizing stored credentials, and segmenting agent environments from production systems.
There’s still a lot we don’t know about securing agentic systems. Why legacy tooling is immature, the threat models are evolving, and new attack surfaces keep popping up every day. But security fundamentals such as least privilege, segmentation, and monitoring all still apply. It’s just a matter of finding out how to do so in ways that account for the idiosyncrasies of AI architecture.
As RL's Dhaval Shah noted in the first report, it's important to remember that while AI agents are introducing new risks, software supply chain security fundamentals apply.
"The foundational concepts of trust, provenance, and dependency risk are identical.”
—Dhaval Shah
Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.
Get your 14-day free trial of Spectra Assure
Get Free TrialMore about Spectra Assure Free Trial