Published
- 3 min read
Prompt Injection via Third-Party Plugins: A Growing LLM Supply Chain Risk
Security researchers at multiple firms have documented a significant uptick in indirect prompt injection attacks targeting enterprise deployments of AI agents that consume third-party plugins, web browsing capabilities, and external data sources. Unlike direct prompt injection — where an attacker controls user input — indirect injection embeds adversarial instructions in external content that the LLM reads and acts upon autonomously.
What’s Happening
As organisations deploy LLM-based agents with tool-use capabilities — browsing the web, reading emails, querying databases, executing code — attackers have identified a high-value attack surface: the data the model ingests, not the prompt the user writes.
Observed attack patterns include:
- Malicious instructions hidden in web pages: pages designed to be retrieved by AI browsing agents contain hidden text (white-on-white, zero-font-size, or CSS
display:none) instructing the model to exfiltrate session data or send follow-up requests. - Poisoned document repositories: documents uploaded to RAG (retrieval-augmented generation) systems contain embedded directives that activate when retrieved by the model.
- Compromised plugin endpoints: third-party plugins registered in enterprise AI platforms return payloads designed to override system prompts or redirect agent actions.
Why It Matters
The consequences of successful indirect injection in an agentic context are severe. An agent with access to email, calendar, code execution, or internal APIs can be turned against the organisation by instructions in a web page or document it retrieves. Demonstrated impacts include:
- Exfiltration of conversation history and system prompts
- Sending emails on behalf of the user
- Creating OAuth tokens or API keys
- Executing code in sandboxed environments with lateral movement potential
Defensive Guidance
1. Treat model outputs as untrusted. Any action taken by an AI agent should go through the same authorisation and audit trail as a human-initiated action. Agents should not have implicit trust.
2. Implement privilege separation. Agents should operate with least-privilege tool access. Browsing capability should not co-exist with email send or code execution without explicit scoping.
3. Validate plugin provenance. Only allow plugins from vetted, internal registries. Monitor for unexpected additions to approved plugin lists.
4. Apply input sanitisation to retrieved content. Where feasible, pre-process external content before feeding it to the model — strip non-visible text and flag anomalous instruction-like patterns.
5. Log all agent actions. Full audit trails of tool calls, retrieved documents, and model outputs are essential for detection and forensics.
Industry Response
OWASP updated its LLM Top 10 in early 2026 to elevate indirect prompt injection (LLM01 adjacency) as a distinct risk. Several AI platform vendors have implemented content provenance tracking, but enforcement at the plugin layer remains inconsistent. No widely-adopted standard for plugin security vetting currently exists.