How did the attacker exfiltrate documents without triggering access controls?

The attacker exploited a misconfiguration in the RAG system's vector database, which had no per-document access controls. Any authenticated user could retrieve chunks from any document, regardless of their permissions in the underlying document management system.

What is the most critical remediation step for RAG deployments handling privileged data?

The most critical step is ensuring the vector database enforces the same per-document or per-matter access controls as the underlying DMS. Every embedded chunk should be tagged with access metadata, and vector search queries must be filtered at query time to only return results the requesting user is authorised to see.

Why was the breach not detected by automated systems for six weeks?

The system had no query rate limiting and no anomaly detection for unusual query patterns. The attacker made over 3,000 queries across six weeks without triggering any automated alert. Detection only occurred when a human fee-earner noticed unusual query activity on her matters.

LLM Document Exfiltration at Law Firm

Incident Classification: Confirmed | Incident Type: Illustrative | Severity: Critical | Sector: Legal / Professional Services | Date Confirmed: May 2026

A mid-sized UK law firm specialising in M&A advisory has disclosed a data breach in which an attacker used the firm’s internally deployed AI document assistant to systematically extract privileged client communications and transaction documents. Six weeks passed before detection. The firm has notified the ICO, the SRA, and affected clients.

Legal privilege notice: The firm has stated it is treating this incident as a potential breach of legal professional privilege. Affected client matters include active M&A transactions, the details of which are subject to ongoing disclosure review.

Incident Summary

Field	Detail
Incident type	Unauthorised access + AI-assisted data exfiltration
Affected system	Internal RAG-based document assistant (self-hosted)
Data at risk	Privileged client communications, due diligence files, M&A transaction documents
Estimated documents accessed	4,000–6,000
Duration	Approximately 6 weeks
Detection method	Anomalous query patterns flagged by a fee-earner
Notified to	ICO, SRA (Solicitors Regulation Authority), affected clients

What Happened

Week −8: The firm deploys an internal AI document assistant on a self-hosted RAG architecture. Fee-earners can query the document management system using natural language. All documents across all practice areas are indexed into a single shared vector database.

Week 0: The attacker compromises credentials belonging to a trainee solicitor, believed to be via phishing. Standard firm-wide access to the document assistant.

Weeks 1–6: The attacker queries the assistant for high-value documents. The trainee’s own matter access would have been limited, but the RAG system’s vector database has no per-document access controls. Any authenticated user can retrieve chunks from any document, regardless of their permissions in the underlying DMS.

Reconstructed queries from the attacker’s session logs:

“Show me all documents relating to the acquisition of [target company]"
"What is the agreed valuation for the Meridian transaction?"
"List all parties to the [client] financing round with their share percentages"
"What are the key risks identified in the due diligence for [deal code]?”

Week 5: A senior associate notices unusual queries about matters she’s actively working on, originating from the trainee’s account at times when the trainee wasn’t in the office. She reports it to IT security.

Week 6: Query logs reviewed, systematic high-volume querying pattern identified. Attacker’s session terminated, trainee’s credentials reset, document assistant taken offline.

The system authenticated users against Active Directory. Login worked correctly. The problem was what happened after login: vector search queries returned results from any document in the index, with no filtering based on the querying user’s matter access.

iManage, the underlying DMS, had correct per-matter access controls. Those controls were never replicated into the vector search layer. Anybody who could log into the document assistant could retrieve content from any document in the database.

This is a well-documented risk in enterprise RAG deployments and it was apparently not on anyone’s checklist when this system went live.

Compounding this: the initial index was built using an administrative service account, which had access to documents that most individual fee-earners would never normally see. So the index itself was broader than any single user’s permission scope, and then access to that over-broad index was granted to everyone.

No Query Rate Limiting, No Anomaly Detection

3,000-plus queries over six weeks from a single account, accessing matters across the entire firm. No automated alert. The only detection signal was a human fee-earner noticing that queries about her matters were coming from someone else’s account at odd hours.

That’s a human catch, and it almost didn’t happen. If the attacker had been slightly less systematic, or if the associate hadn’t been paying attention, this runs indefinitely.

Single-Factor Authentication on a System Holding Privileged Data

The firm had MFA for external access. Internal system logins, including the document assistant, were password-only. A phished trainee credential was all it took.

The AI Made This Dramatically More Efficient for the Attacker

When the document assistant returned query results, the LLM layer synthesised them into coherent summaries. The attacker wasn’t getting raw document chunks; they were getting structured answers. For an attacker querying a well-indexed collection of M&A documents, the AI assistant functioned as an automated document review tool, compressing what would otherwise be hours of manual reading into seconds of natural language queries.

Confirmed categories of information accessed:

Transaction values and structures for 12 active M&A matters
Identities of buyers, sellers, and advisors on confidential transactions
Due diligence findings and identified risks
Client personal financial information from private client matters

Remediation

Document assistant taken offline; will not return to service until fully remediated
All documents re-indexed with ownership and access metadata; vector search queries now filtered at query time to the requesting user’s authorised matters only
Index rebuilt using per-user access patterns, not the admin service account
MFA mandated for all internal system access within 30 days
Rate limiting and query anomaly alerting implemented before redeployment
Third-party red team exercise before the system returns to service

The Takeaway for Law Firms Deploying RAG

The vector database is not a secondary concern. It’s the access control boundary for everything your AI system can retrieve. If it doesn’t enforce the same per-matter permissions as your DMS, you’ve built a privilege bypass into your own infrastructure, one that any authenticated user can exploit.

Scope the index correctly from the start. Per-matter or per-user indexing costs more to build and maintain. It’s still cheaper than this.

Log every query with the authenticated user’s identity. Rate-limit aggressively. Test access controls before deployment: specifically, verify that a user cannot retrieve documents they shouldn’t have access to, using test accounts at different permission levels. This is not a difficult test to run. It just has to be on the checklist.

And apply MFA to anything that can access privileged client data. There is no justification for treating internal system access as lower-risk than external access when the data behind it is this sensitive.

References

OWASP LLM Top 10: LLM vulnerabilities including insecure output handling and data exfiltration risks: https://owasp.org/www-project-top-10-for-large-language-model-applications/
MITRE ATLAS: Adversarial threat landscape for AI systems, including LLM-specific attack techniques: https://atlas.mitre.org/
NCSC: Guidance on AI security for organisations deploying AI in sensitive contexts: https://www.ncsc.gov.uk/collection/ai-security
NIST AI RMF: Risk management framework covering data governance and AI system integrity: https://airc.nist.gov/