Incident Report: LLM System Used to Exfiltrate Privileged Legal Documents • AI Security Wire

Incident Classification: Confirmed | Severity: Critical | Sector: Legal / Professional Services | Date Confirmed: May 2026

A mid-sized UK law firm specialising in M&A advisory has disclosed a significant data breach in which an attacker leveraged the firm’s internally deployed AI document assistant to systematically exfiltrate privileged client communications and transaction documents. The firm has notified the ICO and affected clients.

Legal privilege notice: The firm has stated it is treating this incident as a potential breach of legal professional privilege. Affected client matters include active M&A transactions, the details of which are subject to ongoing disclosure review.

Incident Summary

Field	Detail
Incident type	Unauthorised access + AI-assisted data exfiltration
Affected system	Internal RAG-based document assistant (self-hosted)
Data at risk	Privileged client communications, due diligence files, M&A transaction documents
Estimated documents accessed	4,000–6,000
Duration	Approximately 6 weeks
Detection method	Anomalous query patterns flagged by a fee-earner
Notified to	ICO, SRA (Solicitors Regulation Authority), affected clients

Timeline

Week −8: The firm deploys an internal AI document assistant built on a self-hosted RAG architecture, allowing fee-earners to query the firm’s document management system using natural language. The system indexes documents across all practice areas into a shared vector database.

Week 0: An attacker obtains access to the system using credentials belonging to a trainee solicitor, believed to have been phished via a targeted email. The trainee’s account has standard firm-wide access to the document assistant.

Weeks 1–6: The attacker uses the document assistant to systematically query for high-value documents. Rather than using the trainee’s own matter access (which would have been limited), the attacker exploits a misconfiguration in the RAG system’s access controls — the vector database has no per-document access controls, meaning the AI system can retrieve documents from any matter regardless of the querying user’s permissions in the underlying DMS.

Representative queries used by the attacker (reconstructed from query logs):

“Show me all documents relating to the acquisition of [target company]"
"What is the agreed valuation for the Meridian transaction?"
"List all parties to the [client] financing round with their share percentages"
"What are the key risks identified in the due diligence for [deal code]?”

Week 5: A senior associate notices an unusual number of queries about matters she is working on originating from the trainee’s account at times when the trainee was not in the office. She reports this to the IT security team.

Week 6: IT security reviews query logs and identifies the systematic, high-volume querying pattern. The attacker’s session is terminated, the trainee’s credentials are reset, and the document assistant is taken offline for investigation.

Root Cause Analysis

Misconfiguration 1: Flat Vector Database Access Controls

The central failure was the absence of document-level access controls in the vector database. The system was architected as follows:

All firm documents were chunked and embedded into a single shared Pinecone vector database
The AI retrieval layer authenticated users against Active Directory for login
However, no per-document or per-matter metadata filters were applied to vector search results
Any authenticated user could retrieve chunks from any document in the database

This is a well-known risk in enterprise RAG deployments. The underlying DMS (iManage) had correct per-matter access controls, but these were not replicated into the vector search layer.

Misconfiguration 2: Excessively Broad Document Indexing

The initial deployment indexed all documents accessible to the administrative service account used to build the index — including documents that individual fee-earners would not normally have access to. This compounded the flat-access issue.

Misconfiguration 3: No Query Rate Limiting or Anomaly Detection

The system had no query rate limiting and no monitoring for anomalous query patterns. The attacker made over 3,000 queries across 6 weeks without triggering any automated alert. Only a human reviewer noticed the anomaly.

Contributing Factor: Single-Factor Authentication

The trainee’s account was protected only by a password. The firm had MFA deployed for external access but not for internal system logins, including the document assistant.

Technical Impact

The attacker’s queries returned document chunks that were re-assembled into coherent summaries by the LLM layer. For an attacker querying a well-indexed document collection, the AI assistant essentially served as an automated document review and summarisation tool — significantly reducing the manual effort required to extract intelligence from thousands of documents.

Key categories of information confirmed as accessed:

Transaction values and structures for 12 active M&A matters
Identities of buyers, sellers, and advisors on confidential transactions
Due diligence findings and identified risks
Client personal financial information in connection with private client matters

Remediation Actions

The firm has taken the following immediate remediation steps:

Document assistant offline — the system will not return to service until remediated
Per-document ACL metadata — all documents will be re-indexed with ownership and access metadata tags; vector search queries will be filtered to only return chunks the querying user has permission to access
Index scope reduction — the index will be rebuilt using per-user access patterns, not a blanket admin service account
MFA — mandatory MFA rolled out to all internal system access within 30 days
Query anomaly monitoring — rate limiting and pattern-based alerting implemented before redeployment
Red team exercise — a third-party red team will test the remediated system before it returns to service

Recommendations for Law Firms Deploying AI Document Assistants

RAG access controls must mirror DMS permissions — the vector database layer must enforce the same per-matter access controls as the underlying DMS. This requires tagging every embedded chunk with matter/document access metadata and filtering at query time.
Scope the index to the querying user’s permissions — consider per-user or per-role indices rather than a single shared index, accepting the infrastructure cost.
Enable comprehensive query logging — every query and its results should be logged with the authenticated user’s identity, enabling forensic investigation and anomaly detection.
Apply MFA to all AI system access — there is no justification for weaker authentication on systems that can access privileged client data.
Conduct access control testing before deployment — test whether a user can retrieve documents they should not have access to before going live.