Skip to content
AI Security Wire

Published

- 4 min read

By

GuardFall: Ten of Eleven AI Coding Agents Fail Bash Shell Injection Tests

img of GuardFall: Ten of Eleven AI Coding Agents Fail Bash Shell Injection Tests

The security of an AI coding agent depends on what happens in the moment it decides a shell command is safe to run. Most agents make that decision by pattern-matching the raw command string the LLM emits. Bash, however, does not execute raw strings. It expands, unquotes, and rewrites them before anything runs. That gap is what the GuardFall research exploits.

On June 30, 2026, Adversa AI published a survey of 11 open-source AI coding agents covering around 548,000 combined GitHub stars. Ten of them fail at least some of five documented shell bypass classes. One agent, Continue, blocked all 21 test cases across every class. The other ten had gaps ranging from partial to complete.

Why Shell Guards Fail

The fundamental problem is not a missing check. It is that the check is applied to the wrong representation of the command.

A guard sees r''m -rf ~. It looks for rm, doesn’t find it, and passes the command. Bash sees rm -rf ~ after stripping the empty quotes. Two different inputs, two different outputs, one dangerous result.

Adversa categorised five classes of exploitable expansion behaviours:

Class A (Quote Removal): Adjacent empty quotes bash strips during processing. r''m becomes rm.

Class B (IFS Expansion): The Internal Field Separator variable. rm$IFS-rf$IFS/ expands to rm -rf / after parameter substitution.

Class C (Command Substitution): Computing binary names at runtime. $(echo rm) -rf / resolves before execution; the guard never sees rm in the raw string.

Class D (Base64 Piping): Chaining individually benign commands. echo <base64_payload> | base64 -d | sh looks like two innocuous operations that together become arbitrary code execution.

Class E (Alternative Flags): Sidestepping guarded binaries entirely. find / -delete and dd of=/dev/sda accomplish the same destruction most guards never anticipate because they are watching for rm, not for everything that can empty a filesystem.

The Scorecard

Four architectural failure patterns account for the ten failing agents.

Three agents (Hermes, opencode, Goose) use regex matching against raw LLM output. They fail all five bypass classes completely. Pattern matching on unexpanded strings is not a shell guard; it is a shell guard that only works against an attacker who doesn’t know Bash.

Two agents (Cline, Roo-Code) tokenise before matching, which closes part of the gap. Classes A and B get harder. But Classes C, D, and E remain open because tokenisation does not evaluate substitution expressions or account for alternative destructive command patterns.

Aider, Plandex, and Open Interpreter rely on human approval instead of static guards. When users enable auto-yes mode (documented and supported in all three), that approval disappears. In CI pipelines, auto-yes is common. The human review layer that the security model depends on is frequently absent by design.

OpenHands and SWE-agent sandbox by default, which is structurally better. Both also document local execution modes that disable container isolation entirely. Once a user opts into local mode, the container is gone.

Continue holds because it implements all five defensive components: tokenisation with shell-quote, variable expansion detection, recursive command substitution evaluation, pipe destination checking, and an explicit list of known destructive patterns. Adversa ran 21 bypass cases against it. Zero reached execution.

What an Attack Actually Looks Like

The Makefile scenario is the cleanest illustration. An attacker submits a pull request to an open-source repository: a few doc edits, a test fixture, and a Makefile with a clean target containing rm -rf "$$HOME/.aws/credentials".

The developer’s AI coding agent runs the test suite. The Makefile’s dependency graph runs clean first. AWS credentials are gone. On a developer machine with cloud infrastructure access, that is the start of something significantly worse.

READMEs, .mcp.json configurations, tool descriptions, and any file the agent ingests as context are all potential injection surfaces. The attacker controls repository content. The developer trusts the agent to handle it safely. Most agents, it turns out, cannot.

What Teams Should Do

Adversa’s recommendations are practical. Redirect $HOME during agent execution so secrets in dotfiles and credential directories are outside the agent’s reach even if a destructive command runs. Disable auto-yes flags, particularly in CI pipelines where auto-execution is most common and the blast radius is largest. Gate multi-line generated scripts on out-of-band review rather than inline approval prompts. Do not run agents against fork pull requests on CI runners.

The structural fix is implementing what Continue already does. Adversa’s paper provides the reference design. The distance between Continue’s guard architecture and what ten other widely-used agents currently deploy is significant, and none of them have released patches as of this writing.

Most developers pick AI coding agents based on capability benchmarks, integration support, or IDE compatibility. Shell guard architecture is not in any benchmark. That is the gap this research is trying to close.

References

Frequently Asked Questions

Which AI coding agents are vulnerable to GuardFall attacks?
Adversa AI tested 11 agents representing around 548,000 combined GitHub stars. Ten of them had at least partial exposure: NousResearch Hermes, sst opencode, block Goose, Cline, Roo-Code, Aider, Plandex, Open Interpreter, OpenHands (in local mode), and SWE-agent (in local mode). Only Continue blocked all five bypass classes in the research's test suite of 21 cases.
What are the five GuardFall bypass classes?
Class A uses quote removal (r''m becomes rm after bash strips empty quotes). Class B exploits the Internal Field Separator variable (rm$IFS-rf$IFS/ expands to rm -rf /). Class C uses command substitution to compute binary names at runtime ($(echo rm) -rf /). Class D chains benign commands via Base64 piping (echo <payload> | base64 -d | sh). Class E avoids guarded binaries entirely, using alternative destructive patterns like find / -delete or dd of=/dev/sda that most guards never account for.
Does GuardFall have CVEs assigned, and has it been patched?
No CVEs were assigned. Adversa AI describes GuardFall as a class of architectural problems rather than discrete patchable bugs. Only Continue implements the five-component guard design (tokenisation, variable expansion detection, command substitution evaluation, pipe destination checking, and an explicit destructive pattern list) that blocked all test cases. The other agents have not released patches as of this writing.