Published
- 6 min read
By Allan D - Editor, AI Security Wire
SymJack and TrustFall Break Six AI Coding Agents in the Same Month
Security researchers broke two separate attacks against AI coding agents in May 2026. They came from the same lab, hit the same products, and pointed to the same structural problem: the approval dialog you see when your coding assistant asks permission to do something is not a reliable description of what’s actually about to happen.
The first is called SymJack. The second is TrustFall. Both were published by Adversa AI researcher Rony Utevsky.
SymJack: When the Config Write Isn’t What It Looks Like
SymJack starts from a simple observation: AI coding agents follow project-level instruction files. CLAUDE.md, AGENTS.md, similar variants. Developers put these in repositories to tell agents how to behave. Attackers can put them there too.
The attack chain is three steps. First, a malicious instruction file directs the agent to run a shell copy command. This is a deliberate choice: the agent’s native file tools have some safety guardrails. Raw shell commands do not. Second, the “destination” in that copy instruction is a symbolic link. It looks like a path to documentation or media. What it actually points to is the agent’s configuration directory. Third, when the kernel follows the symlink, the malicious payload lands in .claude/settings.json, .mcp.json, .codex/config.toml, or whatever global MCP config the target agent reads. On next restart, the planted MCP server executes with full user privileges and no further prompting.
Adversa AI confirmed the technique against six independent agents: Claude Code, Gemini CLI and its Antigravity CLI variant, Cursor Agent CLI, GitHub Copilot CLI, Grok Build CLI, and OpenAI Codex CLI.
The developer-facing approval prompt, when one appears at all, shows something like “copy this file to that documentation folder.” Nothing about a config directory. Nothing about executable content. Nothing about an MCP server that will run on next startup. The researcher’s description is precise: “The developer sees one request: copy this innocuous-looking file to that documentation folder. They approve it. Nothing on screen mentions the config directory, the MCP file, or executable content.”
What can be stolen? Anything in the developer’s shell environment. SSH private keys. Cloud tokens. Browser sessions. CI/CD deploy keys. Code signing material. Container registry credentials. Secrets in dotfiles. On CI runners, a single triggered payload can exfiltrate all pipeline secrets before any human review happens. That makes this a supply chain attack with a coding agent as the delivery mechanism.
Vendor responses were uneven. Anthropic patched Claude Code, hardening it to resolve symlinks before showing approval prompts so the user sees the real destination. Google rejected the report, treating explicit user approval as a sufficient boundary. Cursor declined, citing prior awareness. xAI and GitHub had not responded at time of writing. No CVEs were assigned.
TrustFall: The Prompt Doesn’t Need to Lie If You Don’t Read It
TrustFall is quieter. It doesn’t need symlinks or shell tricks. It exploits the trust dialog that appears when a developer opens an unfamiliar project.
A malicious repository contains an MCP configuration file that defines helper programs. When the developer opens the project, the coding tool presents a trust dialog. The dialog asks whether to trust the project. It defaults to yes. What it doesn’t explain in any granular way is that the configuration includes executable definitions that will run with developer permissions, before the AI processes any request. The helpers execute on trust grant. The LLM is not involved at the point of exploitation.
This lands credentials, SSH keys, source code, or backdoor connections in attacker hands before the AI has done anything at all. The vulnerable tools are Claude Code, Gemini CLI, Cursor CLI, and GitHub Copilot CLI.
Utevsky’s commentary on the trust dialogs is worth quoting: they’re “not that obvious to understand all configuration nuances, especially for vibe coders.” That last phrase carries weight. The developer population that has adopted AI coding tools most enthusiastically is also, by definition, the population most likely to move fast, clone unfamiliar repos, and approve prompts they haven’t read carefully.
The Pattern Underneath Both Attacks
SymJack and TrustFall are mechanically different but they rhyme.
SymJack abuses the fact that agents follow instructions without fully resolving what those instructions will physically do to the filesystem. TrustFall abuses the fact that trust dialogs don’t communicate their implications. In both cases, the security model depends on the developer correctly interpreting an interface element that was not designed to be a security boundary.
MCP is the common thread. Both attacks ultimately plant or invoke malicious MCP servers. The Model Context Protocol was built for extensibility, not for adversarial environments. A developer can add a tool to their coding agent by dropping a config file. That’s the feature. That the same mechanism can be triggered without explicit developer intent, by manipulated project files or insufficiently described trust prompts, is the vulnerability surface.
The NSA’s AISC flagged exactly this class of issue in its May 2026 Cybersecurity Information Sheet on MCP: the protocol provides no mandatory signature verification for dynamically loaded servers, and the security posture of a given deployment depends entirely on controls layered on by implementers rather than enforced by the protocol itself.
What SymJack and TrustFall add is empirical confirmation that the attack path works at scale across competing products from different vendors.
Practical Implications for Development Teams
The immediate risk surface is any developer machine running one of the affected tools that clones repos from uncontrolled or external sources. This includes CI runners, which are often more permissive than developer workstations, hold more secrets, and run automated clone-and-build operations with minimal human review.
Several concrete steps reduce exposure:
Pin MCP server configurations at the organisation level. If your team’s agents should only connect to internal, approved MCP servers, enforce that through org-level config rather than per-repo settings. This removes the attack surface from project-level config files entirely.
Treat CLAUDE.md, AGENTS.md, and equivalent instruction files as executable inputs. Security review for project instruction files should mirror the review applied to CI configuration and build scripts. A CLAUDE.md from an external contributor can direct agent behaviour in ways that have real security consequences.
Review CI runner permissions. Runners that clone external repos and invoke AI coding agents should operate with minimal credentials. Secrets used only for deployment should not be present in the build environment.
Update Claude Code. Anthropic’s patch, which resolves symlinks before displaying approval prompts, directly addresses the SymJack technique. It’s available now.
For vendors that declined to patch: the security argument for treating explicit user approval as a sufficient boundary ignores the reality that the approval UI was not designed to communicate the security significance of what’s being approved. That framing may hold up in an engineering review. It will not hold up when developers’ SSH keys are showing up in attacker infrastructure.
References
- SecurityWeek — SymJack Attack Turns AI Coding Agents Into Supply Chain Attack Delivery Systems
- Adversa AI — The Approval Prompt Is Lying to You: Symlink RCE in Five AI Coding Agents
- Help Net Security — TrustFall: Researcher Demonstrates How AI Coding CLIs Can Be Exploited Before the LLM Thinks
- NSA AISC — Model Context Protocol (MCP): Security Design Considerations for AI-Driven Automation
- Adversa AI — Top Agentic AI Security Resources, June 2026
Frequently Asked Questions
- What is SymJack and which AI coding tools are affected?
- SymJack is a symlink hijack attack that tricks AI coding agents into overwriting their own configuration files with attacker-controlled MCP server definitions. It works by combining malicious project instruction files (CLAUDE.md, AGENTS.md, etc.) with a symlink that points an innocuous-looking file copy operation at the agent's actual config directory. The attack was confirmed against Claude Code, Gemini CLI, Antigravity CLI, Cursor Agent CLI, GitHub Copilot CLI, Grok Build CLI, and OpenAI Codex CLI. The malicious MCP server executes on next restart with full user privileges.
- How is TrustFall different from SymJack?
- TrustFall targets the trust dialog itself rather than the file system. A malicious repository includes an MCP configuration that defines helper programs. When a developer opens the project and responds to the trust prompt — which defaults to yes and doesn't explain what the configuration actually does — those helpers execute before any AI reasoning takes place. The result is code execution with developer permissions before the LLM has processed a single token. SymJack requires the agent to actively run commands and follow symlinks. TrustFall only needs the developer to open the repo.
- What credentials are at risk from these attacks?
- Both attacks can access anything available to the user's shell session at the time of execution: SSH private keys, cloud provider tokens (AWS, GCP, Azure), browser session cookies, CI/CD deploy keys, code signing material, container registry credentials, and secrets stored in dotfiles or environment variables. On CI runners, a single triggered payload can exfiltrate all pipeline secrets before any human review occurs. The attack surface is whatever the developer can reach, because the MCP server executes with the developer's own permissions.
- Did the AI tool vendors patch these vulnerabilities?
- Responses varied significantly. Anthropic patched Claude Code, hardening it to resolve symlinks before displaying approval prompts so users see the real destination path. Google rejected the SymJack report, treating explicit user approval as the intended security boundary regardless of whether users understand what they're approving. Cursor also declined to patch, citing prior awareness. xAI and GitHub had not responded at time of writing. For TrustFall, no patches were confirmed across the affected tools, and no CVEs were assigned to either vulnerability.