Spectra Assure Free Trial
Get your 14-day free trial of Spectra Assure for Software Supply Chain Security
Get Free TrialMore about Spectra Assure Free Trial
Organizations rushing to deploy AI agents across their workflows need to realize that the underlying agent stack that governs identity, access, persistence, security, and operational control is more consequential than the particular model chosen.
The engineering leader and director for Google Cloud AI, Addy Osmani, wrote in a recent Substack post that in many environments, agents are being granted broad autonomy without the infrastructure needed to properly control, monitor, or audit their behavior. As those systems become more deeply embedded in enterprise operations, weaknesses in the underlying stack can metastasize from being an engineering concern to become a broader security, governance, and operational risk.
"Right now, the industry is living with what I’d call excessive agency: autonomous systems given broad permissions to get things done, then left to discover — at runtime, in production — that a schema drifted, an API changed, or a downstream service started returning PII it wasn’t supposed to. This is not a failure of the people building agents. It is a failure of the stack they’re building on.”
—Addy Osmani
Here's why agentic AI infrastructure security is critical to managing risk — and four essential requirements.
[ See webinar: AI Redefines Software Risk: Develop a New Playbook ]
Osmani’s concern is one of the current moment. A growing number of organizations are feverishly deploying AI agents, automating everything from triaging support tickets to generating code, responding to customer inquiries, updating account records, and handling procurement workflows and invoice processing.
A recent survey by Salesforce found that 82% of IT leaders are either using AI agents at their organizations or planning to deploy them within the next two years. Many are already running pilots across HR, IT, finance, and customer service workflows. Similarly, McKinsey’s 2025 State of AI report found that 62% of organizations are either piloting or using AI agents to automate key aspects of their business and operations.
McKinsey reported that organizations most commonly are using agents to manage service desks and do deep research. It added that organizations in the technology, media, telecommunications, and health care sectors are using agents more than those in other sectors.
Osmani said effective AI agent systems require a fundamental rethink of the stack on which they are deployed. He listed four requirements:
Ensar Seker, CISO at SOCRadar, agrees with Osmani’s assessment. “The biggest risk with AI agents is not only the model; it is the architecture around the model,” he said.
“Many organizations are building agents on top of fragile plumbing: shared service accounts, custom session logic, weak audit trails, and inconsistent access controls.”
—Ensar Seker
They are creating a security problem before the agent even makes its first decision, Seker said.
With AI agent deployments, Seker said, the core failure point is usually identity and permissions design. A common mistake is to treat agents like scripts or automations when they actually behave more like digital workers. If your agents don’t have clear identities, scoped permissions, logging, and revocation paths, he said, you can lose control over what the agent did, why it did it, and which data it touched.
David Brumley, chief AI and science officer at Bugcrowd, said that giving every agent its own identity ensures better auditability, accountability, and access control.
“You would never ask people to share one password and one account, but that is effectively what a lot of organizations are asking agents to do today.”
—David Brumley
Agent identities give organizations a way to scope permissions, trace actions, and revoke access when needed, he said.
In another echo of Osmani’s requirements, Eric Hulse, director of research at Command Zero, said context is more important than most organizations realize. Many agents have access to only a narrow slice of enterprise information at any given moment, but real enterprise work requires reasoning across CRM platforms, ERP systems, ticketing tools, project plans, transcripts, analytics systems, data warehouses, and other systems. “If an agent investigates an alert with access to endpoint telemetry but no visibility into network traffic, cloud logs, or identity context, it’s not going to say, ‘I don’t have enough information.’ It’s going to synthesize a verdict from what it has,” Hulse said.
“A confident wrong answer looks identical to a confident right answer.”
—Eric Hulse
Bugcrowd’s Brumley stressed: “Agents often don’t know what they don’t know.”
Real investigation means following a thread wherever it goes, Hulse said: from an endpoint artifact into the identity layer, into cloud API logs, into network egress. “That requires unified context at the platform level. Not stitched-together API calls at runtime that happen to return data in the right order on a good day.”
But organizations need to be cautious about how they enable context for agents, cautioned Brumley. More context is not always better or safer, he said. The best approach is to give agents limited, purpose-built access to what they need for a specific task — bounded context, not universal context.
Osmani said agents also need to be able to maintain continuity. Most AI agents today can handle short, contained tasks but aren’t up to what genuine enterprise deployment demands. Procurement cycles, compliance audits, incident investigations, and other real business workflows can span days, weeks, or even months, and having agents that simply stop when they hit a time or processing limits forces humans to pick up from where the agent left off.
What enterprises need are agents that can maintain progress across interruptions, hand off work without losing context, and leave a clear, auditable trail, while still being intelligent enough to know when to pause and ask for human guidance.
“Enterprise-grade autonomy requires durable, cloud-native execution with a much higher floor than ‘The session stayed up.’”
—Addy Osmani
When architecting an agent stack, start with a clear idea of what you’re building for, Hulse said. Most agentic security architectures give alerts and resolutions the one-for-one treatment: one alert in, one verdict out. That’s the wrong model. “Security work is investigation-shaped,” Hulse said.
“You start with a signal, follow threads, accumulate evidence across multiple systems and time windows, and arrive at a conclusion that might span several alerts that, individually, looked unrelated. If your architecture forces every alert to be a self-contained task, you’ve inherited every limitation of the triage layer and put an AI label on it.”
—Eric Hulse
Before building in autonomy, focus on identity and context, Hulse said, because without them, governance debt will inevitably compound. “You will hit an incident that exposes the shared credentials, the missing audit trail, the agent that had way too much access for what it was supposed to do.”
And before planning to scale, measure accuracy, not throughput. That means doing things like tracking false-positive and false-negative rates by alert type, data source, and time of day, Hulse said.
“The organizations that do this end up with agents that get measurably better over time because the feedback loop is short and honest. The ones chasing closure rates end up with fast agents that are confidently wrong at scale — worse than the manual process they replaced.”