RL Blog

Topics

All Blog PostsAppSec & Supply Chain SecurityDev & DevSecOpsProducts & TechnologySecurity OperationsThreat Research

Follow us

XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBluesky

Subscribe

Get the best of RL Blog delivered to your in-box weekly. Stay up to date on key trends, analysis and best practices across threat intelligence and software supply chain security.

Spectra Assure Free Trial

Get your 14-day free trial of Spectra Assure for Software Supply Chain Security

Get Free TrialMore about Spectra Assure Free Trial
Blog
Events
About Us
Webinars
In the News
Careers
Demo Videos
Cybersecurity Glossary
Contact Us
reversinglabsReversingLabs: Home
Privacy PolicyCookiesImpressum
All rights reserved ReversingLabs © 2026
XX / TwitterLinkedInLinkedInFacebookFacebookInstagramInstagramYouTubeYouTubeblueskyBlueskyRSSRSS
Back to Top
ReversingLabs: The More Powerful, Cost-Effective Alternative to VirusTotalSee Why
Skip to main content
Contact UsSupportLoginBlogCommunity
reversinglabsReversingLabs: Home
Solutions
Secure Software OnboardingSecure Build & ReleaseProtect Virtual MachinesIntegrate Safe Open SourceGo Beyond the SBOM
Increase Email Threat ResilienceDetect Malware in File Shares & StorageAdvanced Malware Analysis SuiteICAP Enabled Solutions
Scalable File AnalysisHigh-Fidelity Threat IntelligenceCurated Ransomware FeedAutomate Malware Analysis Workflows
Products & Technology
Spectra Assure®Software Supply Chain SecuritySpectra DetectHigh-Speed, High-Volume, Large File AnalysisSpectra AnalyzeIn-Depth Malware Analysis & Hunting for the SOCSpectra IntelligenceAuthoritative Reputation Data & Intelligence
Spectra CoreIntegrations
Industry
Energy & UtilitiesFinanceHealthcareHigh TechPublic Sector
Partners
Become a PartnerValue-Added PartnersTechnology PartnersMarketplacesOEM Partners
Alliances
Resources
BlogContent LibraryCybersecurity GlossaryConversingLabs PodcastEvents & WebinarsLearning with ReversingLabsWeekly Insights Newsletter
Customer StoriesDemo VideosDocumentationOpenSource YARA Rules
Company
About UsLeadershipCareersSeries B Investment
EventsRL at RSAC
Press ReleasesIn the News
Pricing
Software Supply Chain SecurityMalware Analysis and Threat Hunting
Menu
Dev & DevSecOpsFebruary 28, 2024

Secure your AI development tools: 4 key questions to ask

When using AI tools including GitHub Copilot, your security team must be aware of — and protect against — certain risks. Here are the top considerations.

man in suit
Jaikumar Vijayan, Freelance technology journalistJaikumar Vijayan
FacebookFacebookXX / TwitterLinkedInLinkedInblueskyBlueskyEmail Us
scrabble letters spelling AI

Microsoft's soon-to-be-released GitHub Copilot Enterprise option will give organizations an enterprise-grade subscription plan for its AI-powered code-completion tool, which helps developers write code faster.

The option will give administrators a single point of visibility and management over Copilot use in the enterprise and will include security and privacy features to protect enterprise code from potential threats and compromise.

The enterprise plan gives organizations that have a low appetite for risk one of the best options yet to harness the productivity benefits of an AI pair programmer such as Copilot while mitigating some of the perceived risks associated with the technology.

When using AI tools such as Copilot, organizations need to be cognizant of a number of security and legal risks — and protect against them. This includes AI-powered code auto-completion, code-generation, and code-optimization tools. Here are four important questions to ask.

Learn more in our report: The Buyer’s Guide to Software Supply Chain SecuritySee Webinar: Why you need to upgrade your AppSec tools for the new era

1. Do components of your code belong to someone else?

One of the biggest concerns associated with automatic code-generation and code-completion tools is the potential for copyright violations and licensing complications. Copilot and other similar AI-based tools often use public and private databases for training data. The black-box nature of these technologies means organizations have no visibility into whether the code snippets and partial code that these tools suggest might include use of copyrighted material and intellectual property.

Microsoft has acknowledged the concerns that some of its customers have expressed about the potential IP and copyright issues associated with the use of Copilot: "Microsoft is bullish on the benefits of AI, but, as with any powerful technology, we’re clear-eyed about the challenges and risks associated with it, including protecting creative works."

The company's versions of Copilot for enterprises explicitly blocks outputs that match public code as one measure to reduce copyright-related risks. To further ease some of these concerns, the company announced last September that it would offer legal protections from third-party copyright infringement claims to Copilot customers. Under Microsoft's Copilot Copyright Commitment provision, Microsoft will defend customers — and even pay for any adverse judgments or settlements — resulting from any lawsuit that a third-party might file against them over copyright infringement claims.

However, to be eligible for the protection, organizations need to ensure that they implement specific guardrails and content filters that Microsoft has built into Copilot when using the technology. The goal in integrating the filters and other mitigation technologies is to reduce the likelihood of Copilot returning infringing content, Microsoft said:

These build on and complement our work to protect digital safety, security, and privacy, based on a broad range of guardrails such as classifiers, meta-prompts, content filtering, and operational monitoring and abuse detection.

Microsoft

FOSSA, the developers of the eponymously named software platform for managing open-source license compliance, also recommends that organizations scan their AI-generated code for potential copyrighted, licensed code and tag all AI-generated code. In addition, FOSSA recommends that organizations enable GitHub Copilot's optional duplicate code-detection feature to further reduce risk associated with license compliance issues.

John Bambenek, president at Bambenek Consulting, said organizations should be very concerned because “the AI made it” will likely not be a defense against a copyright claim.

Organizations are making money with this code, which means there is a strict level of scrutiny on the organization against using copyrighted code even if they had no idea, it was copyrighted in the first place.

John Bambenek

2. Is your AI tool introducing vulnerabilities?

Copilot can inadvertently suggest code snippets that contain security weaknesses that developers then introduce into their software projects without review. Past research has shown that GitHub Copilot and other AI-based code-completion and code-generation assistants such as AWS CodeWhisperer that are trained on open source code and libraries with known vulnerabilities in them can often reproduce output that contains these same vulnerabilities.

Eric Schwake, director of cybersecurity strategy at Salt Security, said that while AI tools including Copilot can help to dramatically decrease the time needed for software development, they can also introduce security concerns that need to be accounted for. He said that it's imperative that organizations that use tools such as Copilot do due diligence when utilizing code or code suggestions.

Because of the black-box nature of these tools, organizations need strategies in place to ensure the AI is providing code that is both secure and compliant with their internal regulations.

Eric Schwake

One new study from Snyk found that if an organization uses Copilot in a project with existing vulnerabilities, the AI-based tool will amplify those vulnerabilities through its suggestions. At the same time, when Copilot is used in a project without existing security issues, the code it generates is also mostly vulnerability-free.

An older, 2021 analysis of the security of GitHub Copilot's code contributions by researchers at New York University and the University of Calgary found that 40% of 1,689 programs that Copilot helped produce for the study contained vulnerabilities. The researchers found that while Copilot could significantly increase the productivity of software developers, it also heightened security risks.

"Put simply, when Copilot suggests code, it may inadvertently replicate existing security vulnerabilities and bad practices present in the neighbor files," Snyk said. "This can lead to insecure coding practices and open the door to a range of security vulnerabilities."

Mitigating these risks means having mechanisms in place to scan the output from AI code-prompting and code-generation tools for vulnerabilities and to have code-review policies that require approval of all auto-generated code before deployment.

While Copilot has evolved considerably since the 2021 study, researchers believe organizations still need to pair the tool with controls for comprehensive vetting. Matt Rose, field CISO at ReversingLabs, said that while traditional AppSec tools (AST) can test for vulnerabilities and tools such as software composition analysis (SCA) can help validate open-source licensing, what organizations need is a tool that can provide visibility into the entire software packages companies develop or use.

Rose wrote recently that complex binary analysis is the right tool for dealing with today's increasingly complex software.

An evolution of application security (AppSec) is under way, and a key to it is complex binary analysis, which is like a final exam for your software package before release. Complex binary analysis allows your team to review the software in final form so that you can trust all of the software your organization produces and consumes.

Matt Rose

3. Are your secrets ending up in the training data?

AI writing assistants, code completion tools and chatbots have a tendency to store large chunks of training data and spew out the data verbatim with the appropriate prompts. A study that researchers at Google's DeepMind AI research lab conducted in collaboration with peers at Cornell University, UC Berkeley, and three other universities showed how an adversary could extract training data from ChatGPT simply by prompting it to incessantly repeat specific words such as "poem," "make," "send," and "company."

This can become a significant issue with AI-based coding assistants because the training data can sometimes contain copyrighted code and hard-coded secrets such as access tokens, OAuth IDs, and API keys. Though such data is supposed to remain private, it can often end up in codebases on public repositories such as GitHub, which tools such as Copilot then use as training data. In 2022, developers inadvertently committed codebases to GitHub that in total contained over 3 million hard-coded secrets.

Just as with ChatGPT, research has shown that attackers can extract these secrets in AI-based code assistants using the appropriate prompts. To illustrate the extent of the problem, researchers at the Chinese University of Hong Kong and Sun Yat-sen University in China ran a tool they developed called Hard-coded Credential Revealer against Copilot and CodeWhisperer and extracted 2,702 hard-coded credentials from Copilot and 129 secrets from CodeWhisperer.

Salt Security's Schwake said awareness was key.

As with most things, when it comes to security, it’s important that there are awareness programs in place for DevOps that explain the risks associated with Copilot and how it could potentially interact with sensitive data. Ensuring secure coding practices across the organization is especially important to limit the risk of data loss.

Eric Schwake

FOSSA also recommends that organizations ensure they have opted out of having Copilot using any prompts or code snippets of theirs for training purposes.

Philip George, executive technical assistant at Merlin Cyber, said sound secrets hygiene should be exercised when curating training data for Copilot. The goal should be to ensure no hardcoded secrets existing within the training content or in the code repositories, he said.

Consider establishing a cryptographic bill of materials to track the proliferation of secrets across a given codebase and incorporate centralized credential management and Just-in-time (JIT) access for CI/CD development.

Philip George

There are mechanisms, usually around pattern matching, to prevent secrets from getting into public repositories adds Bambenek. "But [often] these mechanisms are ineffectively deployed, if they are deployed at all."

4. Can someone manipulate your code completion tool?

One risk that organizations need to consider and protect against is the potential for adversaries to try and "poison" AI systems or to deliberately confuse them into malfunctioning.

The National Institute of Standards and Technology (NIST) earlier this year highlighted four attack classes that threat actors can use to trigger these outcomes: evasion attacks or attempts to alter an input to affect how the system responds to it; poisoning attacks involving the introduction of corrupt data into the training set; privacy attacks for extracting training data for malicious purposes; and abuse attacks involving the use of incorrect and false information into a source web page or document.

NIST noted in its blog post:

Most of these attacks are fairly easy to mount and require minimum knowledge of the AI system and limited adversarial capabilities. Poisoning attacks, for example, can be mounted by controlling a few dozen training samples, which would be a very small percentage of the entire training set.

National Institute of Standards and Technology

Such attacks can happen in the context of AI-powered code assistants such as GitHub Copilot and CodeWhisperer, said Merlin Cyber's George. For starters, organizations should adopt regular static analysis scans of their codebase in conjunction with strict access control requirements for data repositories to mitigate this risk, he said.

DevOps teams that plan on using Copilot or similar tools should also consider adopting access control mechanisms to enforce the compartmentalization of datasets used for training large language models in the environment, George said. He recommended that organizations consider models such as the Bell LaPadula model — often used by the military and government — to protect sensitive information when using AI-based assistants in the development environment.

The focus is to ensure both the confidentiality and integrity of data sources to maintain trust for AI pair/code-generation tools.

Philip George

AI giveth, and AI taketh away

AI tools such as Copilot can help to dramatically increase the time needed for app development. But they can also introduce security concerns that need to be accounted for, Schwake said.

It’s important for organizations to set in place guardrails around the use of such technologies, the kind of data they have access to, and how, or whether, such data can be shared.

There is still uncertainty on whether AI-generated code can be copyrighted, so this needs to be considered if your organization utilizes code that Copilot has built.

Eric Schwake

Keep learning

  • Get up to speed on the state of software security with RL's Software Supply Chain Security Report 2026. Plus: See the the webinar to discussing the findings.
  • Learn why binary analysis is a must-have in the Gartner® CISO Playbook for Commercial Software Supply Chain Security.
  • Take action on securing AI/ML with our report: AI Is the Supply Chain. Plus: See RL's research on nullifAI and watch how RL discovered the novel threat.
  • Get the report: Go Beyond the SBOM. Plus: See the CycloneDX xBOM webinar.

Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:Dev & DevSecOps

More Blog Posts

Request a demo
MCP security robot

Lab offers 9 ways to improve MCP security

The Vulnerable MCP Servers Lab delivers integration training, demos, and instruction on attack methods.

Learn More about Lab offers 9 ways to improve MCP security
Lab offers 9 ways to improve MCP security
AI coding new life for Rust

How AI coding is breathing new life into Rust 

AI tools are making Rust a favorite language of developers — even those maintaining codebases like Microsoft’s.

Learn More about How AI coding is breathing new life into Rust 
How AI coding is breathing new life into Rust 
Open-source software (OSS)

Anthropic’s PSF investment: Why it matters

Here’s what the $1.5M investment in the Python Software Foundation will mean for AI coding and open-source security.

Learn More about Anthropic’s PSF investment: Why it matters
Anthropic’s PSF investment: Why it matters

Software quality's decline: How AI accelerates it

Development is in freefall toward software entropy and insecurity. Can spec-driven development help?

Learn More about Software quality's decline: How AI accelerates it
Software quality's decline: How AI accelerates it
Software quality crisis