Published
- 5 min read
Incident Report: LLM System Used to Exfiltrate Privileged Legal Documents
Incident Classification: Confirmed | Severity: Critical | Sector: Legal / Professional Services | Date Confirmed: May 2026
A mid-sized UK law firm specialising in M&A advisory has disclosed a significant data breach in which an attacker leveraged the firm’s internally deployed AI document assistant to systematically exfiltrate privileged client communications and transaction documents. The firm has notified the ICO and affected clients.
Legal privilege notice: The firm has stated it is treating this incident as a potential breach of legal professional privilege. Affected client matters include active M&A transactions, the details of which are subject to ongoing disclosure review.
Incident Summary
| Field | Detail |
|---|---|
| Incident type | Unauthorised access + AI-assisted data exfiltration |
| Affected system | Internal RAG-based document assistant (self-hosted) |
| Data at risk | Privileged client communications, due diligence files, M&A transaction documents |
| Estimated documents accessed | 4,000–6,000 |
| Duration | Approximately 6 weeks |
| Detection method | Anomalous query patterns flagged by a fee-earner |
| Notified to | ICO, SRA (Solicitors Regulation Authority), affected clients |
Timeline
Week −8: The firm deploys an internal AI document assistant built on a self-hosted RAG architecture, allowing fee-earners to query the firm’s document management system using natural language. The system indexes documents across all practice areas into a shared vector database.
Week 0: An attacker obtains access to the system using credentials belonging to a trainee solicitor, believed to have been phished via a targeted email. The trainee’s account has standard firm-wide access to the document assistant.
Weeks 1–6: The attacker uses the document assistant to systematically query for high-value documents. Rather than using the trainee’s own matter access (which would have been limited), the attacker exploits a misconfiguration in the RAG system’s access controls — the vector database has no per-document access controls, meaning the AI system can retrieve documents from any matter regardless of the querying user’s permissions in the underlying DMS.
Representative queries used by the attacker (reconstructed from query logs):
- “Show me all documents relating to the acquisition of [target company]"
- "What is the agreed valuation for the Meridian transaction?"
- "List all parties to the [client] financing round with their share percentages"
- "What are the key risks identified in the due diligence for [deal code]?”
Week 5: A senior associate notices an unusual number of queries about matters she is working on originating from the trainee’s account at times when the trainee was not in the office. She reports this to the IT security team.
Week 6: IT security reviews query logs and identifies the systematic, high-volume querying pattern. The attacker’s session is terminated, the trainee’s credentials are reset, and the document assistant is taken offline for investigation.
Root Cause Analysis
Misconfiguration 1: Flat Vector Database Access Controls
The central failure was the absence of document-level access controls in the vector database. The system was architected as follows:
- All firm documents were chunked and embedded into a single shared Pinecone vector database
- The AI retrieval layer authenticated users against Active Directory for login
- However, no per-document or per-matter metadata filters were applied to vector search results
- Any authenticated user could retrieve chunks from any document in the database
This is a well-known risk in enterprise RAG deployments. The underlying DMS (iManage) had correct per-matter access controls, but these were not replicated into the vector search layer.
Misconfiguration 2: Excessively Broad Document Indexing
The initial deployment indexed all documents accessible to the administrative service account used to build the index — including documents that individual fee-earners would not normally have access to. This compounded the flat-access issue.
Misconfiguration 3: No Query Rate Limiting or Anomaly Detection
The system had no query rate limiting and no monitoring for anomalous query patterns. The attacker made over 3,000 queries across 6 weeks without triggering any automated alert. Only a human reviewer noticed the anomaly.
Contributing Factor: Single-Factor Authentication
The trainee’s account was protected only by a password. The firm had MFA deployed for external access but not for internal system logins, including the document assistant.
Technical Impact
The attacker’s queries returned document chunks that were re-assembled into coherent summaries by the LLM layer. For an attacker querying a well-indexed document collection, the AI assistant essentially served as an automated document review and summarisation tool — significantly reducing the manual effort required to extract intelligence from thousands of documents.
Key categories of information confirmed as accessed:
- Transaction values and structures for 12 active M&A matters
- Identities of buyers, sellers, and advisors on confidential transactions
- Due diligence findings and identified risks
- Client personal financial information in connection with private client matters
Remediation Actions
The firm has taken the following immediate remediation steps:
- Document assistant offline — the system will not return to service until remediated
- Per-document ACL metadata — all documents will be re-indexed with ownership and access metadata tags; vector search queries will be filtered to only return chunks the querying user has permission to access
- Index scope reduction — the index will be rebuilt using per-user access patterns, not a blanket admin service account
- MFA — mandatory MFA rolled out to all internal system access within 30 days
- Query anomaly monitoring — rate limiting and pattern-based alerting implemented before redeployment
- Red team exercise — a third-party red team will test the remediated system before it returns to service
Recommendations for Law Firms Deploying AI Document Assistants
- RAG access controls must mirror DMS permissions — the vector database layer must enforce the same per-matter access controls as the underlying DMS. This requires tagging every embedded chunk with matter/document access metadata and filtering at query time.
- Scope the index to the querying user’s permissions — consider per-user or per-role indices rather than a single shared index, accepting the infrastructure cost.
- Enable comprehensive query logging — every query and its results should be logged with the authenticated user’s identity, enabling forensic investigation and anomaly detection.
- Apply MFA to all AI system access — there is no justification for weaker authentication on systems that can access privileged client data.
- Conduct access control testing before deployment — test whether a user can retrieve documents they should not have access to before going live.