Disrupting The First Reported AI-orchestrated Cyber Espionage Campaign Anthropic

We recently argued that an inflection point had been reached in cybersecurity: a point where AI models became truly useful to cybersecurity operations, for both good and bad. This was based on a systematic evaluation that showed that cyber capabilities doubled in six months; We were also tracking real-world cyberattacks, looking at how malicious actors were using AI capabilities. Although we predicted that these capabilities would continue to evolve, what stood out to us is how rapidly they have done so at scale.

In mid-September 2025, we became aware of suspicious activity, which later investigation revealed was a highly sophisticated espionage campaign. Attackers exploited the “agent” capabilities of AI to an unprecedented level – using AI not only as an advisor, but also to carry out cyberattacks.

The threat actor – which we assess with high confidence was a Chinese state-sponsored group – manipulated our cloud code tools to attempt to infiltrate approximately thirty global targets and in some cases was successful. The operation targeted large technology companies, financial institutions, chemical manufacturing companies and government agencies. We believe this is the first documented case of a large-scale cyber attack carried out without substantial human intervention.

Upon learning of this activity, we immediately launched an investigation to understand its scope and nature. Over the next ten days, as we mapped out the severity and full extent of the operation, we banned accounts as they were identified, appropriately notified affected entities, and coordinated with authorities as we gathered actionable intelligence.

This campaign has substantial implications for cybersecurity in the age of AI “agents” – systems that can be run autonomously for long periods of time and that complete complex tasks largely independently of human intervention. Agents are valuable for everyday work and productivity – but in the wrong hands, they can substantially increase the feasibility of large-scale cyber attacks.

The effectiveness of these attacks is only likely to increase. To keep pace with this rapidly growing threat, we have expanded our detection capabilities and developed better classifiers to flag malicious activity. We are constantly working on new ways to investigate and detect such large-scale distributed attacks.

In the meantime, we are sharing this issue publicly to help industry, government, and the broader research community strengthen their cybersecurity defenses. We will continue to issue reports like this regularly, and be transparent about the threats we find.

How does a cyber attack work?

The attack relied on several features of AI models that did not exist, or were in a much nascent form, just a year earlier:

intelligence. The general level of capability of models has increased to the extent that they can follow complex instructions and understand context in ways that make very sophisticated tasks possible. Not only that, but many of their well-developed specific skills—in particular, software coding—are well suited to be used in cyberattacks.
agencyModels can act as agents – that is, they can run in loops where they take autonomous actions, link tasks together, and make decisions with only minimal, occasional human input,
toolModels have access to a wide range of software tools (often through open standard model reference protocols), They can now search the web, retrieve data, and perform many other tasks that were previously the sole domain of human operators, In case of cyber attacks, tools may include password crackers, network scanners, and other security-related software,

The diagram below shows the different stages of an attack, each of which requires all three of the above developments:

image?url=https%3A%2F%2Fwww cdn.anthropic.com%2Fimages%2F4zrzovbb%2Fwebsite%2Fb0d38712e4f7b8002bb3a2734ceeb33f34817a43 — The cyberattack lifecycle shows a move from human-led targeting to large-scale AI-driven attacks using a variety of tools (often through the Model Reference Protocol; MCP). At various points during the attack, the AI returns to its human operator for review and further direction.

In Step 1, human operators chose relevant targets (for example, the infiltrated company or government agency). They then developed an attack framework – a system designed to autonomously compromise a chosen target with minimal human involvement. This framework used cloud code as an automated tool to carry out cyber operations.

At this point they had to convince Cloud – who has been extensively trained to avoid harmful behaviours – to join the attack. They did this by jailbreaking it, effectively tricking it into bypassing its guardrails. They broke down their attacks into small, seemingly innocent tasks that the cloud would execute without providing the full context of their malicious intent. They also told Cloud that it was an employee of a legitimate cybersecurity firm, and that it was being used in defensive testing.

The attackers then launched the second phase of the attack, which involved inspecting the target organization’s systems and infrastructure through cloud code and locating the highest-value databases. The cloud was able to complete this reconnaissance in less time than it would have taken a team of human hackers. It then reported back to human operators with a summary of its findings.

In the next phase of the attack, Cloud identified and tested security vulnerabilities in the target organizations’ systems by researching and writing his own exploit code. Having done so, the framework was able to use the cloud to collect credentials (username and password) that allowed it further access and then extracted large amounts of private data, which it classified according to its intelligence value. The highest-privilege accounts were identified, backdoors were created, and data was exfiltrated with minimal human supervision.

In the final phase, the attackers generated extensive documentation of the attack from the cloud, extracted stolen credentials and useful files of the analyzed systems, which would assist the threat actor in planning the next phase of their cyber operation.

Overall, the threat actor was able to use AI to perform 80-90% of the campaign, with human intervention required only sporadically (perhaps 4-6 critical decision points per hacking campaign). The sheer amount of work done by AI would take too much time for a human team. The AI made thousands of requests per second – an attack speed that would be absolutely impossible for human hackers to match.

The cloud didn’t always work perfectly. It sometimes hallucinated credentials or claimed to extract secret information that was actually publicly available. This remains an obstacle to fully autonomous cyber attacks.

cyber security implications

The barriers to conducting sophisticated cyberattacks have fallen significantly—and we predict they will continue to do so. With the right setup, threat actors can now use agentic AI systems for extended periods of time to do the work of entire teams of experienced hackers: analyzing target systems, producing exploit code, and scanning vast datasets of stolen information more efficiently than any human operator. Less experienced and resourceful groups can now potentially carry out large-scale attacks of this nature.

This attack is also an outgrowth of the “Vibe hacking” findings we reported on this summer: In those operations, humans were still in the loop and directing the operation. Here, despite the massive scale of the attack, there was little human involvement. And although we only have visibility into cloud use, this case study likely reflects consistent patterns of behavior across frontier AI models and demonstrates how threat actors are adapting their operations to exploit today’s most advanced AI capabilities.

This raises an important question: If AI models can be misused for cyberattacks on this scale, why continue to develop and release them? The answer is that the capabilities that the cloud allows for these attacks are also what make it important for cyber defense. When sophisticated cyberattacks inevitably occur, our goal is for the cloud – into which we have built robust security measures – to help cybersecurity professionals detect, disrupt, and prepare for future versions of attacks. In fact, our threat intelligence team used the cloud extensively to analyze the massive amounts of data generated during this very investigation.

There has been a fundamental change in cyber security. We recommend security teams experiment with applying AI to defense in areas such as security operations center automation, threat detection, vulnerability assessment, and incident response. We also recommend developers to continue investing in security measures on their AI platforms to prevent adverse misuse. The techniques described above will undoubtedly be used by many more attackers – which makes industry threat sharing, better detection methods, and stronger security controls all the more important.

Read the full report.

Disrupting the first reported AI-orchestrated cyber espionage campaign \ Anthropic

How does a cyber attack work?

cyber security implications

Like this:

Related

Leave a Comment Cancel reply

How does a cyber attack work?

cyber security implications

Share this:

Like this:

Related

Leave a Comment Cancel reply