Published
- 4 min read
By Allan D - Editor, AI Security Wire
Gaslight: North Korean Backdoor Poisons AI Malware Analysis Tools
North Korean threat actors have built malware that doesn’t just try to hide from analysts. It actively tries to manipulate the AI tools analysts use to examine it.
SentinelLabs published their analysis on June 23, 2026 covering macOS.Gaslight, a Rust-based backdoor that embeds 38 fabricated system messages inside its binary. The messages are formatted to look like internal scaffolding from an LLM triage harness. When an automated malware analysis pipeline feeds the sample to a language model, the intent is to push the model into aborting, truncating, or generating a misleading report rather than a useful one.
It’s the first confirmed case of malware specifically designed to poison AI-assisted security analysis. Not to evade sandboxes. Not to detect virtual machines. To corrupt the triage report.
How the Evasion Works
The fake messages are stored as a 3.5 KB Markdown-fenced blob inside the binary, delimited with {{DATA}} tokens. They mimic the kind of system-role context a developer might inject into an LLM harness: token refresh failures, out-of-memory events, disk space warnings, Redis connection errors, SQL injection false positives, pipeline faults.
The goal is misdirection. If a security analyst’s AI triage platform processes malware samples by feeding their contents into a language model, Gaslight’s embedded payload tries to look like the infrastructure talking back. A poorly isolated pipeline might truncate analysis, flag the sample as benign based on misleading context, or generate a report full of noise that buries the actual threat.
SentinelLabs framed it clearly: threat actors are experimenting with anti-analysis techniques that specifically target AI-assisted security platforms. Gaslight is the first confirmed example in the wild.
What It Actually Does
The prompt injection is the novel part, but Gaslight is a functional backdoor with a capable stealer payload.
An embedded Python component (6.6 KB, base64-encoded) targets browser credentials across Chrome, Brave, Firefox, and Safari, copies the raw macOS keychain database, collects terminal command histories, running process snapshots, and full system profiler output. The malware stages its own CPython 3.10.18 runtime from astral-sh/python-build-standalone rather than relying on the system Python. That keeps the stealer operational even on hardened macOS environments and lets operators push additional collection modules dynamically.
C2 runs over the Telegram Bot API. Communications are AES-GCM encrypted with fresh nonces per message and certificate-pinned. The operator command set covers six verbs: help, id, shell, kill, upload, and stop. A small but deliberate OPSEC detail: the malware self-redacts its bot tokens in runtime output, substituting placeholders so the credentials don’t appear in captured logs.
Persistence is established via a LaunchAgent using the label com.apple.system.services.activity, squatting in Apple’s own namespace. An IOPMAssertion prevents system sleep, keeping the C2 polling loop running continuously.
North Korean Attribution
SentinelLabs attributes Gaslight to North Korean threat actors with high confidence. Apple XProtect rule MACOS_BONZAI_COBUCH flags the primary sample, and a sibling sample is caught by the AIRPINK rule. Both detection families are associated with DPRK operations in prior SentinelLabs research. The malware family fits an established pattern of DPRK-linked macOS backdoor development targeting security researchers, developers, and organisations with access to sensitive systems.
Why This Matters Beyond the Sample
Security teams building LLM-assisted malware triage pipelines, and there are many doing exactly that right now, need to treat this as a design prompt, not just an IOC to block.
The attack surface Gaslight targets is the growing assumption that raw sample contents can be safely passed to language models for analysis. They can’t. Sample data is adversarial input by definition, and Gaslight demonstrates that threat actors have noticed the gap and are actively exploiting it.
SentinelLabs’ defensive guidance is practical: isolate malware data from model prompts completely. Filter content before it reaches an LLM. Build pipelines that can’t be redirected by embedded text payloads, regardless of how those payloads are formatted.
For detection today: the SHA-256 hashes in the SentinelLabs report cover the main sample, a sibling BONZAI sample, the Python payload, and the bash installer. Apple XProtect already covers the broader DPRK family.
References
Frequently Asked Questions
- How do the fake system messages actually fool AI analysis tools?
- The 38 fabricated messages are formatted to look like legitimate LLM triage scaffolding, complete with system-role framing and plausible error scenarios: token expiration, out-of-memory events, disk exhaustion, pipeline failures, SQL injection alerts. When a security team feeds the malware sample to an AI triage harness, the model encounters these messages as part of the sample content and, depending on how the pipeline is built, may treat them as operational context rather than hostile data. The goal is to push the LLM into truncating, aborting, or misreporting its analysis.
- What does Gaslight steal, and how does it communicate with its operators?
- The embedded Python stealer harvests browser credentials from Chrome, Brave, Firefox, and Safari, plus macOS keychain copies, terminal command histories, running process lists, and full system profiler output. All of this goes back to operators via a Telegram Bot API C2, encrypted with AES-GCM and using certificate pinning. The malware self-redacts its own bot tokens in runtime output, a deliberate OPSEC measure to prevent credential exposure if the sample is caught and analysed.
- What should teams building LLM-assisted analysis pipelines do right now?
- SentinelLabs' core guidance is to treat malware sample contents as adversarial input at all times, never as instructions. That means isolating raw sample data from model prompts, adding content filtering before LLM processing, and building pipelines that can't be redirected by embedded text payloads. For detection, the SHA-256 hashes published in the SentinelLabs report are available now, and Apple XProtect rules MACOS_BONZAI_COBUCH and AIRPINK already flag the broader DPRK malware family this sample belongs to.