RL Blog

Topics

All Blog PostsAppSec & Supply Chain SecurityDev & DevSecOpsProducts & TechnologySecurity OperationsThreat Research

Follow us

XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBluesky

Subscribe

Get the best of RL Blog delivered to your in-box weekly. Stay up to date on key trends, analysis and best practices across threat intelligence and software supply chain security.

Spectra Assure Free Trial

Get your 14-day free trial of Spectra Assure for Software Supply Chain Security

Get Free TrialMore about Spectra Assure Free Trial
Blog
Events
About Us
Webinars
In the News
Careers
Demo Videos
Cybersecurity Glossary
Contact Us
reversinglabsReversingLabs: Home
Privacy PolicyCookiesImpressum
All rights reserved ReversingLabs © 2026
XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBlueskyRSSRSS
Back to Top
ReversingLabs: The More Powerful, Cost-Effective Alternative to VirusTotalSee Why
Skip to main content
Contact UsSupportLoginBlogCommunity
reversinglabs
Dev & DevSecOpsSeptember 17, 2025

How AI coding can learn to do secure software

If you train ML models, they can learn to write more secure code. But the quality of the training data is only as good as your AppSec tooling.

John P. Mello Jr.
John P. Mello Jr., Freelance technology writer.John P. Mello Jr.
FacebookFacebookXX / TwitterLinkedInLinkedInblueskyBlueskyEmail Us
Train your AI coding tools

As if human developers creating insecure code weren’t bad enough, now we have machine learning (ML) models doing it. Using large language models (LLMs) to write code is now a standard procedure in software development — despite the risk of creating flawed applications. 

Using a benchmark it created for evaluating how well LLMs can create secure and correct backend code, the open-source BaxBench project found recently that 62% of the solutions generated by LLMs are either incorrect or contain a security vulnerability and that about half develop insecure code.

Rosario Mastrogiacomo, chief strategy officer for Sphere Technology Solutions, said LLMs can serve as a second set of eyes by reviewing their own output — and that, in turn, trains the model.

When given secure coding checklists or asked to critique code against known vulnerabilities, they can identify weaknesses and suggest fixes. This self-audit loop — generation followed by red-teaming with a new prompt — can meaningfully reduce risks, especially when paired with traditional static analysis tools.

Rosario Mastrogiacomo

Some prominent AI coding tools are now using feedback into the LLMs so that ML models can learn from their mistakes — and researchers say the potential is there for improvement. Here’s what you need to know.

Get Report: How AI Impacts Supply Chain Security

Why do AI coding tools spread security flaws?

Kurt Seifried, chief innovation officer at the not-for-profit Cloud Security Alliance (CSA), said that software teams have been trying to figure out how to write code securely for the past 50 years, but they haven’t figured out a reliable way yet. While tools such as memory safe languages can eliminate entire classes of vulnerabilities, things like logic bugs and poor architecture can still cause problems, he said.

It doesn’t matter how high the quality of steel and concrete is if you build the wrong type of bridge for a given place. LLMs are trained on data from humans, and even if the LLMs wrote perfectly bug-free code with no known security flaws, there’s a good chance that a new class or type of flaw will be discovered in a year or two or three that the code is vulnerable to.

Kurt Seifried

Elango Senthilnathan, a solutions architect at Qwiet AI, said LLMs lack a deep understanding of security principles and often prioritize code that looks correct or common over what’s safe. They can miss context-specific risks, such as script injection or access-control flaws.

Roger Grimes, a defense evangelist at KnowBe4, said LLMs are basing the majority of the code they create on what people have written in the past, and that human-created code is full of security vulnerabilities.

[The] creators of these code LLM models don’t prioritize doing secure coding over insecure coding. LLMs have no concept of that. They just code. So, if confronted with doing something in a secure or nonsecure way, the LLM isn’t weighted to do the secure [method].

Roger Grimes

Dwayne McDaniel, a developer advocate with GitGuardian, said LLMs are subject to the garbage-in/garbage-out problem. “That’s why they often suggest using hardcoded credentials or using vulnerable versions of a dependency,” McDaniel said.

At the same time, security threats are constantly evolving. Code that once might have been considered safe might suddenly have a zero-day discovered that affects the whole of the internet. LLMs are going to always struggle with how fast things change.

Dwayne McDaniel

How ‘explained feedback’ can train AI to improve code security

One way to mitigate the security risks associated with LLMs is to have the models review the code they generate. Anthopic’s Claude Code and Microsoft’s GitHub Copilot are building out their tools to help developers protect code from security flaws. 

A team of researchers at George Mason University found that a structured feedback loop is an effective way of addressing security flaws in AI code. “Our study revealed that while LLMs are inherently prone to generating insecure code, their security performance can be significantly improved through the incorporation of self-generated vulnerability hints and explained, contextualized feedback,” the researchers wrote.

The effectiveness of self-generated “hints” is contingent upon their relevance and preciseness, the researchers said. “Contextualized feedback can result in lower vulnerability rates in code repair tasks. Moreover, our comparative analysis across a diverse set of models highlighted that more advanced models tend to benefit more from the provided hints and feedback.”

The researchers emphasized the importance of “explained feedback,” where the output of a static analysis tool, CodeQL, is supplemented with contextual information and fed into the LLM so it can provide suggestions for code repair.

Abhay Bhargav, CEO and co-founder of Security Review AI, said it’s a good idea to add specific guidance in the prompts to reduce security flaws in the code they produce.

For example, if your AI-assisted IDE is tasked with building a feature for a product, adding specific instructions on how the feature has to be secured from an authentication, authorization, input validation, and cryptographic perspective would be a better idea than just relying on vague statements like ‘build it securely.’

Abhay Bhargav

LLMs are heavily context-driven, Bhargav said. “Adding specific context makes a significant difference.” 

Leveraging application security tool data is key

Mark Sherman, a technical director of Cybersecurity Foundations in the CERT division of Carnegie Mellon University’s Software Engineering Institute (SEI), said security teams should use the full collection of tools available to them, since LLM-created code is like any other code. “LLMs can augment conventional source code analysis [SCA] tools to investigate output of those tools, and conversely, the output of an LLM can be run through an SCA tool,” Sherman said.

In the most extreme case, one can alternate between an LLM and an SCA, having each one inspect and correct the other until one converges on code without any further diagnostics.

Mark Sherman

SEI software security engineer Will Klieber said the output of application security (AppSec) tools can be fed to LLMs, and often the LLM can use a tool’s output to identify and fix a security flaw. 

But he warned that static-analysis tools sometimes report false alarms — and the LLM itself “can be used to adjudicate this.”

Iftach Ian Amit, founder and CEO of Gomboc.AI, said security and engineering teams need to use accurate and repeatable tools to assure security and reliability. That means that deterministic AI — which is rule-based and produces the same output for a given input, without randomness or surprises — must be part of the tool chain to complement and provide the right alignment for generative AI tools.

While generative AI can produce 10 times the code, it also generates 10 times the bugs, and without a scalable deterministic AI to address such bugs, any efficiency gains will be nullified.

Iftach Ian Amit

The CSA’s Seifried recommended using multiple LLMs to secure code. “You can have one model write the code and a second, different model, audit and validate it,” he said. “Hopefully, this reduces the chance of an unknown security flaw slipping through because there is a lesser chance that both models will ignore it or don't know about it. Simply put, most everything we do with humans to try and reduce security fails in code is a good starting point with LLMs.”

Melody (MJ) Kaufmann, an author and instructor at O’Reilly Media, said security teams can harden LLM-created code by embedding checks directly into the CI/CD pipeline from static scans and infrastructure-as-code misconfiguration checks to secrets detection, so that insecure code is flagged before deployment.

 Some organizations are even experimenting with AI reviewers to augment human review.

Melody (MJ) Kaufmann

Feedback is only as good as your AppSec tools

By conducting threat modeling and secure design reviews, security teams can also mitigate the risks associated with AI-generated code — and help train the models. “This will help define secure features that will, in turn, necessitate the AI-assisted integrated development environments [IDEs] to generate more secure code,” Security Review AI’s Bhargav said. 

It also provides clearer security specifications that can be leveraged by AI-assisted IDEs to build more secure features consistent with the existing application architecture, Bhargav said

Security expectations should also be embedded into the developer workflow. “That means providing pre-approved prompt templates, secure coding playbooks, and automated checks that flag risky constructs immediately,” said Sphere Technology’s Mastrogiacomo

Security teams should act as enablers — giving developers clear guardrails and rapid feedback — rather than slowing them down with after-the-fact reviews.

Rosario Mastrogiacomo

Security teams need to understand how their developers are working and give them tools that can help them avoid checking in insecure code, said GitGuardian’s McDaniel. “Fortunately, the tooling market is fairly mature and new tools are emerging constantly,” McDaniel said.

One key area for advancement of the feedback into AI coding is to use advanced security practices such as complex binary analysis and reproducible builds. Saša Zdjelar, chief trust officer at ReversingLabs (RL), said the use of binary analysis and reproducible builds is a significant step forward over traditional static application security testing (SAST) and dynamic application security testing (DAST).

AppSec practices such as SAST and DAST typically only apply to a small subset of internally developed systems and applications at many organizations.

Saša Zdjelar

In addition, CycloneDX’s machine learning bill of materials (ML-BOM) can provide rich feedback for training ML models. Dhaval Shah, senior director of product management at RL, wrote recently that developers and security teams need comprehensive visibility into their entire AI supply chain, and the ML-BOM is key to that.

[An] ML-BOM identifies potentially malicious open-source models before they can be integrated into your products, giving your development team confidence that the ML components they’re using are safe.

Dhaval Shah

Security Review AI’s Bhargav said LLMs are our best possibility of writing secure code over the long term. “[But] it’s essential to understand that LLMs are only as good as the context they are given, and the more specific context in terms of security that you want to provide, the more the LLMs are able to generate more secure code,” he said.

LLMs are the best hope that we have to solve secure coding issues over the long term — but they have to be prompted and provided context in the right way.

Abhay Bhargav

Keep learning

  • Get up to speed on the state of software security with RL's Software Supply Chain Security Report 2026. Plus: See the the webinar to discussing the findings.
  • Learn why binary analysis is a must-have in the Gartner® CISO Playbook for Commercial Software Supply Chain Security.
  • Take action on securing AI/ML with our report: AI Is the Supply Chain. Plus: See RL's research on nullifAI and watch how RL discovered the novel threat.
  • Get the report: Go Beyond the SBOM. Plus: See the CycloneDX xBOM webinar.

Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:Dev & DevSecOps

More Blog Posts

ReversingLabs: Home
Solutions
Secure Software OnboardingSecure Build & ReleaseProtect Virtual MachinesIntegrate Safe Open SourceGo Beyond the SBOM
Increase Email Threat ResilienceDetect Malware in File Shares & StorageAdvanced Malware Analysis SuiteICAP Enabled Solutions
Scalable File AnalysisHigh-Fidelity Threat IntelligenceCurated Ransomware FeedAutomate Malware Analysis Workflows
Products & Technology
Spectra Assure®Software Supply Chain SecuritySpectra DetectHigh-Speed, High-Volume, Large File AnalysisSpectra AnalyzeIn-Depth Malware Analysis & Hunting for the SOCSpectra IntelligenceAuthoritative Reputation Data & Intelligence
Spectra CoreIntegrations
Industry
Energy & UtilitiesFinanceHealthcareHigh TechPublic Sector
Partners
Become a PartnerValue-Added PartnersTechnology PartnersMarketplacesOEM Partners
Alliances
Resources
BlogContent LibraryCybersecurity GlossaryConversingLabs PodcastEvents & WebinarsLearning with ReversingLabsWeekly Insights Newsletter
Customer StoriesDemo VideosDocumentationOpenSource YARA Rules
Company
About UsLeadershipCareersSeries B Investment
EventsRL at RSAC
Press ReleasesIn the News
Pricing
Software Supply Chain SecurityMalware Analysis and Threat Hunting
Request a demo
Menu
MCP security robot

Lab offers 9 ways to improve MCP security

The Vulnerable MCP Servers Lab delivers integration training, demos, and instruction on attack methods.

Learn More about Lab offers 9 ways to improve MCP security
Lab offers 9 ways to improve MCP security
AI coding new life for Rust

How AI coding is breathing new life into Rust 

AI tools are making Rust a favorite language of developers — even those maintaining codebases like Microsoft’s.

Learn More about How AI coding is breathing new life into Rust 
How AI coding is breathing new life into Rust 
Open-source software (OSS)

Anthropic’s PSF investment: Why it matters

Here’s what the $1.5M investment in the Python Software Foundation will mean for AI coding and open-source security.

Learn More about Anthropic’s PSF investment: Why it matters
Anthropic’s PSF investment: Why it matters

Software quality's decline: How AI accelerates it

Development is in freefall toward software entropy and insecurity. Can spec-driven development help?

Learn More about Software quality's decline: How AI accelerates it
Software quality's decline: How AI accelerates it
Software quality crisis