Why AI coding security features are not enough

Integrated security in AI assistants could help to catch code flaws — but they are only one layer in a comprehensive AppSec strategy.

Jaikumar Vijayan, Freelance technology journalist

AI coding tools have code review limitations

Anthropic’s recent launch of an integrated security review feature in Claude Code takes aim at growing concerns over AI-driven software development risks. What’s unclear is whether the new features — and similar capabilities emerging across the AI coding landscape — will genuinely reduce vulnerabilities, or give developers a potentially dangerous false sense of security.

In Anthropic's words, the new, integrated /security-review command in Claude Code lets developers run ad-hoc security analysis from their terminals before committing code. The command checks for multiple common security vulnerabilities in code that Claude might have generated, including SQL injection risks, cross-site scripting errors, authentication and authorization flaws and insecure data handling. Developers can then ask Claude Code to fix any identified errors.

A separate GitHub Actions for Claude Code automatically runs security analysis on every pull request (PR), scanning code changes for vulnerabilities and posting inline comments with specific concerns and fix recommendations. Organizations can apply customized rules to filter out known issues and false-positive

Security experts perceive the emerging code security checking features in AI assistants as a step in the right direction, but nowhere near enough on their own.

Get Report: How AI Impacts Supply Chain Security

Are built-in AI coding security reviews enough?

The command and GitHub action work together as a sort of complimentary capability. The /security-review command lets developers do quick security checks during development while Action is automatic on every pull request and runs in GitHub CI/CD pipeline. While the security command feature is for individual developer use, Actions is more of a team-wide capability.

The new features have proved their worth in improving product security at Anthropic according to the company. In one instance the GitHub action identified and helped fix a rremote code execution vulnerability in a new feature for an internal tool before the pull request was merged. In another instance, the GitHub action identified a newly built proxy system as being vulnerable to Server-Side Request Forgery (SSRF) attacks, Anthropic said:

As developers increasingly rely on AI to ship faster and build more complex systems, ensuring code security becomes even more critical. These new features let you integrate security reviews into your existing workflows, helping you catch vulnerabilities before they reach production.

Reviews are an emerging response to a rising concern

Anthropic is not alone in trying to help developers protect against common vulnerabilities when using AI-enabled coding tools to develop software — a major and growing concern. Others offer a varying range of roughly similar capabilities. Microsoft's GitHub Copilot for instance offers code scanning and automatic fixes for any identified issues, a command line static code analysis capability and a dependency review features. Amazon's CodeWhisperer similarly helps developers look for common vulnerabilities in code, for policy violations and other issues and Tabnine's emerging review agent offers a set of predefined rules that developers can use for checking the security of their code.

Such capabilities are crucial at a time when a growing number of developers and organizations have begun using AI coding tools to develop software. An astonishing 94% of organizations in a survey that OpsLevel conducted recently reported at least some level of use of AI coding assistants like GitHub Copilot, Claude, ChatGPT and Cursor. While many see such tools as boosting productivity, there's also mounting concern about risks tied to their unchecked use. A study by researchers at Stanford University showed that developers who used an AI assistant "wrote significantly less secure code than those without access to an assistant," while also tending to be "overconfident about security flaws in their code."

ML models are only as good as the training data

One big issue is that the models are only as good as the data they are trained on, said Chad Cragle, CISO at Deepwatch. They cannot fully replicate the context awareness of an experienced engineer, he said.

AI tools are effective at identifying obvious issues, such as missing input validation, insecure function calls, or outdated dependencies. However, they struggle with vulnerabilities that require a deep understanding of architecture, complex business logic, or threat modeling. Challenges include privilege escalation pathways, subtle race conditions, and multi-step exploit chains.
Chad Cragle

Expect to see the security performance of these tools decline as complexity increases, Cragle said. "AI assistants can handle multi-file structures, but truly understanding cross-module interactions, dependency chains, and subtle security regressions remains inconsistent. This is where traditional SAST/DAST tools and human review still have the edge — at least for now."

Because AI assistants are merely acting on their training data, it also means that the vulnerabilities they catch are almost always going to be known issues. It's important to keep in mind AI coding tools don't actually do anything thinking on their own and are trained on things that have already happened, noted Erich Kron, a security awareness advocate at KnowBe4.

That means when it comes to vulnerabilities, they are likely stuck in the past, identifying known vulnerabilities, but not really able to deal with new ones.

They are most likely to miss new or novel vulnerabilities that were not part of their training or were discovered before their most recent training.
Erich Kron

Why reliance on a single approach is too narrow

Also, current security code review tools primarily rely on a single machine learning (ML) approach: LLM-based generative AI. While useful, this method lacks the complex reasoning required to fully understand the decisions it produces. Ideally, security code review would leverage a variety of ML techniques, each designed to analyze code from different security perspectives added, Nicole Carignan, SVP, security and AI strategy and Field CISO at Darktrace.

Some techniques can be augmented with automated checks to ensure production-ready fixes, but these remain limited in scope. Generative AI-assisted coding tools, for instance, struggle to detect novel vulnerabilities or identify complex, interconnected issues across multiple packages, Carignan said.

Another concern is that the fixes that AI coding tools suggest could themselves introduce additional security problems that are not being reviewed, Carignan said. "This could cause an infinite loop of security review, generated fixes, review, fixes, and so on. With automated pull requests, this could also be an area of vulnerability propagation."

One promising way forward is the use of agentic AI systems that actively test code to discover new vulnerabilities. While research and experimentation over the past year show that these systems are not yet highly accurate, they could serve as an additional layer of security testing in the future.

Ultimately, code review and code testing or vulnerability identification are just layers in a more cohesive security plan to achieve secure code. But one should not be used in isolation and solely depended on.
Nicole Carignan

Is AI coding review giving a false sense of security?

Time and cost-effectiveness are other issues. Jeff Williams, co-founder and CTO at Contrast Security, said Claude Code took over 15 minutes to analyze a simple test affecting only about one dozen lines of code. While the tool did find some vulnerabilities it missed quite a few as well and running the test just three times cost some $4.63 which, he described as very expensive.

"While it is theoretically possible to do data flow analysis — required for the most critical vulnerabilities — in an LLM, it’s a slow, non-deterministic, and very expensive way to do it," Williams said. It's unclear how big of a code change Claude Code can handle presently, but there's a high probability that the time required would expand exponentially, he said.

There's also the very real issue about the security features in AI coding assistants lulling developers and development teams into a false sense of security. This is especially the case when AI generated suggestions can seem authoritative even when subtly flawed, Cragle said. "There’s a real risk of “check-the-box” complacency if teams assume AI-reviewed code is inherently secure. Without human spot checks and peer review, that false confidence can be dangerous," he said.

The safest approach is to treat AI as an advanced code-suggester; it can accelerate development and highlight issues, but a human must review security-critical logic.
Chad Cragle

Keep learning

Read the 2025 Gartner® Market Guide to Software Supply Chain Security. Plus: See RL's webinar for expert insights.
Get the report: Go Beyond the SBOM. Plus: See the CycloneDX xBOM webinar.
Go big-picture on the software risk landscape with RL's 2025 Software Supply Chain Security Report. Plus: See our webinar for discussion about the findings.
Get up to speed on securing AI/ML with our report: AI Is the Supply Chain. Plus: See RL's research on nullifAI and watch how RL discovered the novel threat.
Learn how commercial software risk is under-addressed: Download the white paper — and see the related webinar for more insights.

Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:AppSec & Supply Chain Security