Indirect prompt injection attacks target common LLM data sources

Malicious instructions buried in LLM sources such as documents can poison ML models. Here's how it works — and how to protect your AI systems.

John P. Mello Jr., Freelance technology writer.

While the shortest distance between two points is a straight line, a straight-line attack on a large language model isn't always the most efficient — and least noisy — way to get the LLM to do bad things. That's why malicious actors have been turning to indirect prompt injection attacks on LLMs.

Indirect prompt injection attacks involve malicious instructions embedded within external content — documents, web pages, or emails — that an LLM processes. The model may interpret these instructions as valid user commands, leading to unintended actions such as data leaks or misinformation.

A team of researchers recently wrote that indirect prompt injection attacks are successful because LLMs lack the ability to distinguish between informational context and actionable instructions. In addition, LLMs lack awareness when executing instructions within external content. The research team wrote on arXivLabs about their approach to assessing the attack method, as well as techniques for protecting LLMs:

To address this critical yet under-explored issue, we introduce the first benchmark for indirect prompt injection attacks, named BIPIA, to assess the risk of such vulnerabilities. Using BIPIA, we evaluate existing LLMs and find them universally vulnerable.

Here's what you need to know about indirect prompt injection attacks — and what you can do to secure your AI systems against them.

Get White Paper: How the Rise of AI Will Impact Software Supply Chain Security

Indirect LLM attacks are challenging to defend against

Indirect prompt injection attacks are powerful because they exploit the LLM’s trust in external sources, including user-generated data, websites, and comments, bypassing any need for direct access to the system prompt or user interface, said Chris Acevedo, a principal consultant with the security firm Optiv.

Unlike traditional prompt injection, where an attacker tries to manipulate AI by feeding it crafted input directly, this technique hides malicious instructions inside content that the model reads, like a poisoned well disguised as clean water. This makes them stealthy and harder to trace, since the injection is hidden in data the LLM is simply reading, not in user input.
Chris Acevedo

Christopher Cullen, a vulnerability researcher in the CERT division of the Software Engineering Institute at Carnegie Mellon University, said indirect prompt injection attacks can be challenging for blue teams because they give a sufficiently positioned and competent attacker the capability to either control an underlying LLM system or prevent expected function of it.

[In] comparison to direct prompt injection, this attacker can be positioned in a way not immediately obvious to a blue team member. This gives that attacker control over the systems from a position that cannot be directly addressed by blue teams without changing the underlying way that their LLM draws data.
Christopher Cullen

Cullen explained that in an enterprise that uses an LLM trained on emails, for example, an attacker could provide enough emails with malicious content in them that they may alter the LLM system. "Blue team members may believe that their system is blocking the malicious emails, but if the LLM is accessing a malicious email to form a response to a user, the attacker can alter the expected behavior of the LLM," he said.

Stephen Kowski, field CTO at SlashNext, said the attacks can bypass security controls since they’re delivered through trusted content channels that the LLM is asked to analyze.

The attack payload activates only when the content is processed by the LLM, making detection particularly challenging without specialized AI security tools that can identify and block manipulated content before it reaches the model.
Stephen Kowski

Greg Anderson, co-founder and CEO of DefectDojo, said indirect prompt injection attacks are especially dangerous because they exploit the very foundation of how LLMs are built, by training on vast, uncurated datasets. "Unlike direct prompt injections, which target the model through cleverly crafted user inputs, indirect prompt injections poison the model’s knowledge base by inserting malicious content into the public data it learns from.

The challenge is that most LLMs prioritize scale, scraping as much data as possible without verifying the trustworthiness of the source. That creates a wide-open surface for manipulation.
Greg Anderson

Anderson cited one attack involving a group of Reddit users who successfully manipulated various LLMs so that they would not recommend their favorite restaurants and thus prevent crowds. "While relatively benign, and potentially even hilarious, this same technique can have devastating consequences on code generation when used to recommend intentionally malicious code," he said.

Understand the threat to software supply chains

Indirect prompt injection attacks pose a significant threat to the software supply chain because LLMs are increasingly integrated into development tools and workflows and so can inject malicious code or configurations into software projects, said Jason Dion, chief product officer and founder of the health care firm Akylade.

If an attacker can compromise the data sources used by an LLM by affecting the source code repositories or the LLM's documentation and training, then this can lead to compromises that might impact countless downstream users and connected systems.
Jason Dion

Erich Kron, a security awareness advocate at KnowBe4, said that with more and more people using AI coding tools, the risk of including potentially vulnerable or malicious code that was learned from malicious sources increases.

If bad actors create a number of GitHub repositories that all include a purposely created vulnerability in the code, and the LLM is told to learn from or use those as code sources, it is very possible that could include that same vulnerability in the code it produces for the LLM user, which may then include it in their product.
Erich Kron

Optiv's Acevedo noted that as more developers rely on LLMs to vet packages, review pull requests, and write code, the content these tools consume becomes an attack vector. A malicious actor could hide an indirect prompt injection in a package’s README or metadata, tricking the model into recommending or installing something unsafe, he said.

There have been demonstrations of package managers like PyPI or npm hosting packages whose documentation contains prompt injection payloads designed to influence AI-assisted tools.
Chris Acevedo

Steps to addressing the threat

The research team's analysis noted:

Our analysis identifies two key factors contributing to their success: LLMs' inability to distinguish between informational context and actionable instructions, and their lack of awareness in avoiding the execution of instructions within external content.

Based on these findings, the team proposes two novel defense mechanisms: boundary awareness and explicit reminders. "Extensive experiments demonstrate that our black-box defense provides substantial mitigation, while our white-box defense reduces the attack success rate to near-zero levels, all while preserving the output quality of LLMs," they wrote.

Acevedo said that because indirect prompt injection is happening now, "the more we rely on LLMs to interact with external data, the more doors we’re opening."

These attacks don’t require deep technical skill or zero-day exploits. They rely on something simpler: the model’s willingness to follow whatever text it sees, regardless of where it came from. In a world where AI is reading everything, we need to start asking, 'Who’s writing it?
Chris Aceve

While there is no silver bullet for mitigating indirect prompt injection attacks, Acevedo suggested the following steps to reduce risk in your organization immediately:

Sanitize content before it’s fed into an LLM.
Tell the model what is input and what is context and instruct it not to follow commands from external data.
Tag untrusted sources so models can treat them more cautiously.
Restrict what LLMs can do, especially if they’re allowed to take actions such as executing code or writing files.
Monitor outputs for weird behavior, and red-team your systems by simulating these attacks regularly.

Keep learning

Read the 2025 Gartner® Market Guide to Software Supply Chain Security. Plus: See RL's webinar for expert insights.
Get the white paper: Go Beyond the SBOM. Plus: See the webinar: Welcome CycloneDX xBOM.
Go big-picture on the software risk landscape with RL's 2025 Software Supply Chain Security Report. Plus: See our webinar for discussion about the findings.
Get up to speed on securing AI/ML with our white paper: AI Is the Supply Chain. Plus: See RL's research on nullifAI and replay our Webinar to learn how RL discovered the novel threat.
Learn how commercial software risk is under-addressed: Download the white paper — and see our related webinar for more insights.

Explore RL's Spectra suite: Spectra Assure for software supply chain security, Spectra Detect for scalable file analysis, Spectra Analyze for malware analysis and threat hunting, and Spectra Intelligence for reputation data and intelligence.

Tags:AppSec & Supply Chain Security