How Anthropic's AI Was Jailbroken To Become A Weapon

Chinese hackers automated 90% of the espionage campaign using Anthropix Cloud, and breached four organizations out of the 30 they chose as targets.

"They broke down their attacks into small, seemingly innocent tasks that the cloud would execute without providing the full context of their malicious intent," Jacob Klein, head of threat intelligence at Anthropic, told VentureBeat.

AI models have reached an inflection point earlier than most experienced threat researchers anticipated, as evidenced by hackers being able to jailbreak a model and launch an attack without being detected. The cloaking prompt being part of a legitimate pen testing effort aimed at exfiltrating confidential data from 30 targeted organizations demonstrates how powerful the models have become. Jailbreaking and then weaponizing a model against targets is no longer rocket science. It is now a democratic threat that can be exploited by any aggressor or nation-state at will.

Klein told The Wall Street Journal, which broke the story. "Hackers carried out their attacks with literally the click of a button." In a breach, "The hackers instructed Anthropic’s cloud AI tools to query the internal database and extract the data independently." Human operators intervened at only four to six decision points per campaign.

The architecture that made it possible

30 The sophistication of attacks on organizations is not found in tools; It’s in the orchestration. The attackers used commodity pentesting software that anyone can download. Attackers carefully break down complex operations into innocent-looking tasks. Cloud thought it was doing a security audit.

The social engineering was precise: Klein told the WSJ that the attackers posed as employees of cybersecurity firms performing authorized penetration testing.

Source: anthropic

The architecture, detailed in Anthropic’s report, reveals MCP (Model Context Protocol) servers simultaneously directing multiple cloud sub-agents against the target infrastructure. The report explains how "The framework used the cloud as an orchestration system that decomposed complex multi-stage attacks into separate technical tasks for cloud sub-agents, such as vulnerability scanning, credential verification, data extraction, and lateral movement, each of which appeared valid when evaluated in isolation."

This disruption was important. By presenting tasks without broader context, the attackers inspired the cloud "executing individual components of the attack chain without access to the broader malicious context," According to the report.

The attack velocity reached several operations per second, continuing for hours without any fatigue. Human participation in the effort dropped to 10 to 20%. The traditional three to six month campaign has been shortened to 24 to 48 hours. report document "Peak activity included thousands of requests, representing sustained request rates of several operations per second."

Source: Anthropological

The six-phase attack progression documented in Anthropic’s report shows how AI autonomy increased at each stage. Step 1: Human selects target. Step 2: Autonomously maps the entire network while searching the cloud "Internal services within the target network through systematic calculations." Step 3: The cloud identifies and validates vulnerabilities, including SSRF flaws. Step 4: Credential harvesting across networks. Step 5: Data Extraction and Intelligence Classification. Step 6: Complete Documentation for Handoff.

"Claude was doing almost the entire red team’s work," Klein told VentureBeat. Reconnaissance, exploitation, lateral movement, data extraction, all taking place with minimal human direction between stages. Anthropix’s report notes that "The campaign demonstrated unprecedented integration and autonomy of artificial intelligence across the entire attack lifecycle, with cloud code supporting reconnaissance, vulnerability discovery, exploitation, lateral movement, credential harvesting, data analysis and intrusion operations autonomously at scale."

How weaponizers model flattening the cost curve for APT attacks

Traditional APT campaigns require report documents "10-15 skilled operators," "custom malware development," And "Months of preparation." GTG-1002 requires only cloud API access, open-source Model Context Protocol Server, and commodity mapping tools.

"What surprised us was the efficiency," Klein told VentureBeat. "We are achieving nation-state capability with resources accessible to any medium-sized criminal group."

The report states: "Minimal reliance on proprietary tools or advanced exploit development demonstrates that cyber capabilities are increasingly driven by the orchestration of commodity resources rather than technological innovation."

Klein emphasized autonomous execution capabilities in his discussion with VentureBeat. Report independently verifies cloud "Scaned target infrastructure, enumerated services and endpoints, mapped attack surfaces," Then "SSRF vulnerabilities identified, exploitation techniques researched," and generate "Developing custom payloads, exploit chains, validating exploitability through callback responses."

Against a technology company, the report documents, cloud "Independently querying databases and systems, extracting data, parsing results to identify proprietary information, and classifying findings based on intelligence value."

"The compression factor is what enterprises need to understand," Klein told VentureBeat. "The work that used to take months now takes days. Now the particular skill required requires basic motivational knowledge."

Lessons learned on key identity indicators

"The patterns were so different from human behavior, it was like watching a machine pretend to be a human," Klein told VentureBeat. report document "physically impossible request rates" with "Sustained request rates of multiple operations per second."

The report identifies three indicator categories:

traffic patterns: "Request rates of multiple operations per second" with "Substantial disparity between data input and text output."

Query decomposition: What Klein said divided the tasks into "small, seemingly innocent acts" – Technical queries of five to 10 words that lack human browsing patterns. "Each question seemed valid in isolation," Klein explained to VentureBeat. "Overall, the pattern of attacks emerged."

Authentication Behavior: report Description "Systematic credential collection on targeted networks" with cloud "Independently determining which credentials granted access to which services, mapping privilege levels and access boundaries without human direction."

"We’ve expanded detection capabilities to detect new threat patterns, including improvements to our cyber-focused classifiers." Klein told VentureBeat. is anthropological "Prototype Proactive Early Detection System for Autonomous Cyber Attacks."

How Anthropic's AI was jailbroken to become a weapon

The architecture that made it possible

How weaponizers model flattening the cost curve for APT attacks

Lessons learned on key identity indicators

Like this:

Related

Leave a Comment Cancel reply

The architecture that made it possible

How weaponizers model flattening the cost curve for APT attacks

Lessons learned on key identity indicators

Share this:

Like this:

Related

Leave a Comment Cancel reply