Published
- 12 min read
By Allan D - Editor, AI Security Wire
AI Agent Observability: Building Security Monitoring for Agentic Systems
Security teams that deployed their first AI agents six months ago are now discovering the same problem: when something goes wrong, they cannot tell what happened. The SIEM shows an API call spike at 2 AM. Was that the agent doing legitimate work, or an attacker who compromised it and ran data queries through the tool layer? There is no way to know, because the agent’s reasoning, its tool invocations, the documents it retrieved, and the identity it operated under were never logged in a form the SIEM can interpret.
This is not an oversight. It is a structural gap that three separate developments in June 2026 have made impossible to ignore: Google DeepMind’s AI Control Roadmap, the OWASP Top 10 for Agentic Applications, and the approaching EU AI Act Article 11 enforcement deadline.
Why Existing Logging Misses the Attack Surface
A standard enterprise application produces structured security events: authentication attempts, authorisation decisions, API calls with status codes, database queries with affected row counts. SIEM correlation rules are written for these events because the events map to discrete, deterministic actions.
AI agent activity does not have this structure. An agent processing a user request may perform dozens of actions — retrieving documents, calling APIs, generating intermediate reasoning steps, spawning sub-agents — none of which surface cleanly as conventional security events. What appears in your infrastructure logs is the downstream effect: an HTTP request to an internal API, a database query. The cause — why the agent made that call, what it was instructed to do, what context it was operating under — is invisible unless you explicitly capture it.
This creates three specific blind spots.
Prompt injection goes undetected. If an attacker embeds malicious instructions in a document the agent retrieves, those instructions become part of the agent’s context and drive its subsequent actions. Your SIEM sees the tool calls that result. It does not see the injected instruction that caused them. Without logging the full input context — including what the agent retrieved, not just what the user typed — you cannot identify injection after the fact.
Tool abuse looks like legitimate traffic. A compromised or manipulated agent querying your database through an MCP tool generates the same database log as a legitimate query. Distinguishing them requires correlating the query with the agent’s reasoning context. Without reasoning traces, every malicious tool call is indistinguishable from a benign one.
Multi-turn manipulation evades single-event detection. Attackers are increasingly using multi-turn techniques to lead AI agents toward prohibited actions across multiple interactions. Any individual turn may look normal. The pattern across turns reveals the manipulation. SIEM correlation rules operating on individual events will not catch this without a behavioural baseline to compare against.
The Four-Layer Telemetry Model
Comprehensive AI agent security logging requires capturing four layers of telemetry, each of which illuminates a different part of the attack surface.
Layer 1: Input context. This is the full prompt presented to the model at each turn — the system prompt, user message, any documents or data retrieved through RAG, and tool output injected back into context. Logging only the user’s typed message is insufficient; the attack surface is the entire context window, and the most dangerous content arrives through retrieval rather than direct user input.
Fields to capture: session ID, turn number, timestamp, user identity, system prompt hash (to detect tampering), retrieved document identifiers and their source, and a hash of the full context window.
Layer 2: Reasoning trace. Many agentic frameworks — LangChain, AutoGen, LangGraph, Claude’s extended thinking — expose the agent’s intermediate reasoning steps before it takes action. These traces are the earliest signal of manipulation: if an agent has been injected with malicious instructions, its reasoning will show intent before its actions execute.
DeepMind’s AI Control Roadmap explicitly identifies reasoning monitoring as a D3-level detection capability. For high-value or high-risk agent deployments, capturing and analysing reasoning traces before tool calls execute is the only way to catch manipulation in time to intervene.
Fields to capture: the full chain-of-thought or scratchpad content, detected policy violations or anomalous framing, whether the reasoning diverges from the agent’s stated task.
Layer 3: Tool invocation. Every tool call the agent makes should generate a structured security event equivalent to a privileged operation log. This includes the tool name and version, the full arguments passed, the identity the agent was operating under, the result returned, and the latency. Treat tool invocations the way you treat sudo commands: log everything, alert on anomalies.
The OWASP Agentic Top 10 identifies Tool Misuse (ASI02) as a top-three risk precisely because agents are granted tool access that far exceeds what any individual action requires. Logging at the tool layer is where you make that risk visible.
Fields to capture: tool name, arguments (redacted for sensitive values), calling agent identity, authorisation scope at time of call, result type and size, whether the tool call was within the expected scope for the current task.
Layer 4: Output and committed effects. What did the agent return, and what did it cause to happen in external systems? This layer closes the loop: correlating the agent’s output with the downstream state changes it triggered. An agent that sends an email, creates a file, or posts to a webhook has committed an irreversible action. Logging these effects separately from the tool calls creates an audit trail of what changed as a result of agent activity.
Fields to capture: output type, destination system, content hash (not content, for sensitive outputs), reversibility flag, any subsequent human review or override.
Establishing Behavioural Baselines
Static detection rules fail against AI agents because the same agent may legitimately call very different tools on different days depending on what it is asked to do. What matters is whether the agent’s behaviour is consistent with its stated task and its historical patterns.
Baseline metrics to establish per agent or agent type:
Tool call distribution. Over a rolling 30-day window, what tools does this agent typically call, in what proportions, and in what sequences? An agent that suddenly begins calling file-system tools when it only ever called database tools is an anomaly worth investigating.
Data access volume. How many records does this agent typically retrieve per session? Exfiltration via a manipulated agent looks like an elevated retrieval rate. A query that returns 100,000 rows where the baseline is 50-100 is a detection signal.
Token volume per task type. For a given class of task, how much context does the agent typically consume? Prompt injection attacks often increase token consumption because the injected content adds to the context length. Baseline token consumption per task type gives you a detection signal for injection without reading the content.
Inter-agent communication patterns. In multi-agent architectures, which agents communicate with which, and at what frequency? Anomalous cross-agent messaging — particularly a low-privilege agent initiating communication with a high-privilege agent outside established patterns — is a signal of cascading compromise, which OWASP ASI08 identifies as a critical risk.
Detection Rules That Work
Given the four telemetry layers and established baselines, these are the detection patterns most likely to catch real attacks.
Reasoning-action divergence. If the agent’s stated reasoning describes one task but the subsequent tool calls execute a different task, the agent has been manipulated between planning and execution. This requires comparing the reasoning trace with the actual tool invocations — a correlation that only works if you are capturing both layers.
Tool scope escalation. An agent calling a tool outside its assigned permission scope, or calling a privileged tool for the first time after many sessions without it, is a signal of either prompt injection or a misconfiguration being exploited. Alert on any first-time tool usage after a baseline period has been established.
Recursive agent spawning without authorisation. Malicious prompt injection sometimes causes an agent to attempt to spawn sub-agents or invoke orchestration endpoints. If your architecture does not expect an agent to spawn other agents, any attempt to do so is an immediate high-severity alert.
Elevated data retrieval volume. More than two standard deviations above baseline for record retrieval in a single session, particularly if the retrieved data spans multiple unrelated entities, is an exfiltration signal.
System prompt modification attempts. Prompt injection attacks sometimes attempt to overwrite or extend the system prompt through the user turn. Comparing the system prompt hash at the start of a session against the system prompt hash presented to the model mid-session detects this class of attack. The system prompt should be immutable within a session.
Anomalous output destinations. An agent that writes to an external endpoint, email address, or storage bucket outside its normal operating scope is either misconfigured or compromised. Maintain an allowlist of legitimate output destinations per agent and alert on any deviation.
The DeepMind Tiered Model as a Reference Architecture
The AI Control Roadmap published by Google DeepMind on 18 June 2026 provides the clearest framework available for thinking about AI agent monitoring at scale. The framework maps safeguards to two axes: Detection (D1-D4) based on the agent’s capability to evade monitoring, and Response (R1-R3) based on its capability to execute harmful actions.
At the lowest tier (D1/R1), basic monitoring of tool calls and output filtering is sufficient. As agent capability increases, the roadmap introduces a Supervisor Agent that monitors the primary agent’s reasoning in real time, cryptographic signing of agent actions to create a tamper-evident audit trail, and a kill switch with criteria for automatic termination of a session.
For most enterprise deployments today, D2/R2 is the appropriate target: capturing all four telemetry layers, maintaining behavioural baselines, correlating reasoning traces with tool invocations, and having a human-review queue for flagged sessions. The D3/D4 capabilities — autonomous monitoring agents, real-time cryptographic attestation — are emerging from research into production for the highest-risk deployments.
Where the SIEM Fits In
The SIEM is not the right place to run AI agent behavioural analysis. It is the right place to receive the results.
The architecture that works in practice: a purpose-built LLM observability layer — Langfuse, Arize Phoenix, Helicone, or a custom logging pipeline — captures all four telemetry layers and runs semantic analysis and behavioural baselines against them. When this layer produces a high-confidence alert (tool scope escalation, reasoning-action divergence, elevated data retrieval), it writes a structured security event to the SIEM using a standard schema — CEF, LEEF, or your vendor’s proprietary format.
The SIEM then handles what it is good at: correlation across time, enrichment with threat intelligence, alert routing, and case management. An agent that triggered a tool scope escalation alert at the same time a new external IP was granted access to the MCP endpoint is a SIEM-level correlation — combining the semantic AI signal with the network event.
PredictionGuard’s research on the SIEM gap in agentic AI governance describes this as a “semantic bridge” problem: the SIEM operates in deterministic event space, the AI agent operates in probabilistic semantic space, and building the bridge between them requires domain-specific preprocessing that generic SIEM vendors are not yet providing.
The EU AI Act Deadline
For organisations operating high-risk AI systems within the EU, Article 11 of the EU AI Act requires logging capabilities sufficient to enable post-incident investigation by August 2026. The regulation does not specify technical implementation but requires that logs capture enough context to reconstruct the agent’s decision-making for any incident.
In practice this means: session-level audit trails with the full input context, tool calls, and output; tamper-evident storage (hash chaining or equivalent); retention periods appropriate to the risk classification of the system; and access controls on the logs themselves, since prompt logs often contain sensitive user data.
Systems that rely on infrastructure logs — HTTP access logs, database query logs — without capturing the semantic layer will struggle to demonstrate compliance. The regulation’s requirement for “readily available information on the operation of the system” is interpreted broadly to include the agent’s reasoning context, not just its network-level behaviour.
Where to Start
If your organisation has deployed AI agents and has not yet addressed observability at the semantic layer, the priority order is:
Instrument tool calls first. This is the layer where the most consequential actions happen and where logging is most straightforward. Every tool call should generate a structured event with tool name, arguments, and agent identity. Get this into your SIEM immediately.
Add session-level input context logging. Capture what the agent was given to work with at the start of each session — system prompt hash, user request, any retrieved documents. Store these separately from your SIEM, accessible for forensic investigation but not creating noise in normal operations.
Establish baselines before writing detection rules. Run two to four weeks of instrumented production traffic before attempting to write anomaly-based rules. Without a baseline, every detection rule will be tuned to threshold values that may be completely wrong for your specific agent and workload.
Review the OWASP Top 10 for Agentic Applications 2026 as your threat model. The ten risk categories (ASI01-ASI10) map directly to detection requirements. Use it to prioritise which attack classes your monitoring infrastructure addresses first.
The fundamental principle is the same as endpoint detection: you cannot investigate what you did not record. AI agents executing tool calls against production systems are privileged processes. They warrant the same logging investment as privileged human access.
References
- Google DeepMind — AI Control Roadmap (PDF, June 2026)
- OWASP — Top 10 for Agentic Applications 2026
- PredictionGuard — AI Security Event Logging: The SIEM Gap in Agentic AI Governance
- Palo Alto Networks — OWASP Agentic AI Security
- Teleport — OWASP Top 10 for Agentic Applications: Key Takeaways
- eWeek — Google DeepMind Roadmap Sets Security Controls for AI Agents
- LoginRadius — Auditing and Logging AI Agent Activity: A Guide for Engineers
- Confident AI — Top LLM Observability Tools 2026
Frequently Asked Questions
- Why can't existing SIEM tools monitor AI agents without modification?
- SIEM platforms were built to correlate discrete, deterministic security events — failed logins, firewall drops, process executions. AI agent activity is probabilistic, context-dependent, and expressed through tool calls and reasoning steps rather than system calls. A standard SIEM rule that fires on 'unusual process execution' has no equivalent for 'agent reasoning drift toward data exfiltration'. You need a semantic layer — purpose-built LLM logging and behavioural analysis — feeding structured, enriched events into the SIEM for correlation.
- What are the four layers of AI agent telemetry security teams should capture?
- The four layers are: (1) Input context — the full prompt including injected context from RAG or tool outputs, not just the user turn; (2) Reasoning trace — the agent's intermediate planning steps, which reveal manipulation before it reaches action; (3) Tool invocation — every tool call with arguments, the identity the agent used, and the result returned; (4) Output and downstream effects — what the agent returned and what actions were committed in external systems as a consequence. Most logging setups only capture layer 1 and layer 4, leaving the attack surface in layers 2 and 3 invisible.
- What does the EU AI Act require for AI agent logging by August 2026?
- Article 11 of the EU AI Act requires high-risk AI systems to maintain 'readily available information on the operation of the system' including logs sufficient to enable post-incident investigation. For agentic systems interacting with databases, APIs, or users in consequential domains, this means capturing the full decision context — not just inputs and outputs but tool calls, retrieved documents, reasoning steps, and any human-in-the-loop decisions or overrides. Logs must be tamper-evident and retained for a period appropriate to the risk level.