Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

HERO
Of all the leading laboratories, Anthropic has the most rapid injection data published this spring. Point Red-Timer at its latest model in the browser, and the attacker hijacked it 31.5% of the time before introducing security measures. OpenAI, Google and Meta have never produced numbers to match security leaders. This figure seems like a liability. In this comparison, it is the opposite. It is a solid piece of land.

Each of the four frontier laboratories sent rapid injection disclosures, and no two matched. Anthropic put 244 pages and four agentic surfaces on the table on May 28. OpenAI reported a surface,connectors. Google moved the topic out of Model Cards and into a separate security framework. Meta did not ship any discontinued-model cards. The cross-vendor prompt injection disclosure grid below shows what each lab tested, what each measured, and a side-by-side comparison of all four locations.

A quick injection hides a malicious instruction in something read by an agent, a web page, a document, or a tool result. An imposed line can intrude into records or thwart actions that no one has approved, and these are the only first-party evidence to the card buyer.

There is no industry standard for measuring any of this, and this is the crux of the problem. Carter Rees, vice president of AI at Reputation, told VentureBeat that the accelerated injection breaks the assumption that every legacy device was built. "An innocuous phrase like ‘ignore previous instructions’ can carry as destructive a payload as a buffer overflow, yet it shares no resemblance with known malware signatures." With no shared signature to scan, each lab created its own scale, and the results were not ranked.

Adam Meyers, senior vice president of counter adversarial operations at CrowdStrike, said the exposure is now up to the buyer to manage. "As you implement AI, it increases your attack surface, so now you have to be able to protect those AI models from adversarial misuse or data poisoning or injectables." CrowdStrike’s own frontline data shows that the threat side is not standing still. In its 2026 Financial Services Threat Landscape report released in May, the company noted that adversaries used AI to reduce the time from initial access to impact that legacy security responses can take.

Anthropic measured four surfaces. The numbers vary by orders of magnitude depending on what you read.

The Opus 4.8 card does what others don’t: it breaks instant injection out of the surface, and dispersion is the story.

Put the model in a coding environment, and an adaptive attacker with Gray Swan’s shader tool succeeded in a calculated 7.03% of solo attempts. Security measures pulled it down to 2.09%.

Move the same square of attack to the browser, the surface behind the cloud in Chrome and cloud cowork, and the floor gives way. Anthropic deployed professional red-teamers on 129 web environments prevented from training and printed each result in Table 5.2.2.4.a on page 81 of the system card. Per-attempt is the share of all injection attempts received in 129 environments in 10 attempts. The counter-scenario is the hard cut, the part of the environment where at least one attempt landed.

Without safeguards, read the per-attempt column without thinking, and the raw rate drops with each generation, from Sonnet 4.6 to 50.7%, Opus 4.8 to 31.5%. The lowest in the table, at 5.9%, belongs to Mythos Preview, which no one can buy yet. Turn on security measures, and Opus 4.8 drops to 0.5%. Stop thinking and it goes to zero in all 129 environments.

OpenAI measured a surface, with attacks it already knew about.

The GPT-5.5 card, published on April 23 and updated on April 24, handles prompt injection in one place, with a single section on robustness to known attacks against connectors. OpenAI reports this as a robustness score where higher is better, the inverse of the attack success rate. GPT-5.5 came in at 0.963, down from 0.998 for GPT-5.4-thinking. That one figure alone is a complete disclosure.

Anthropic tested four surfaces against an adaptive attacker that rewrites its approach based on what the model does, then ran a week-long bug bounty where red-teamers tried to break the model live. When coding results came back worse than Opus 4.7, the card said so.

Put 0.963 next to 31.5%, and they look like they belong on a scoreboard. they do not. A surface has a robustness score against known attacks. Second, the per-attempt attack success rate in 129 browser environments against an attacker is optimized in real time.

Google and Meta never put numbers on cards

Google’s Gemini 3 files put the prompt injection under wraps, and the launch materials describe strong resistance with no numbers attached. The Frontier Safety Framework report drives red teaming, but only in its capability domains, and prompt injection is not one of them. No model cards, no framework pages, no per-surface numbers that the buyer can pick up in a risk review.

Meta ships open weights with no closed-model cards. Prompt Injection Defense sits in a separate stack, Purple Llama’s LlamaFirewall. A PromptGuard 2 classifier and an AlignmentCheck auditor, run against the public AgentDojo benchmark and its 97 functions, reduce the attack success from 17.6% to 1.75% without any defenses. Real numbers. They grade guardrails based on a public benchmark, not a model on the deployment surface that the safety team recognizes.

Cross-vendor prompt injection disclosure grid

The grid below works for any frontier model that security teams are weighing. Each line marks the place where the four laboratories are divided. Each division is where a quick comparison breaks down. The anthropic figures come from the Opus 4.8 system card. Everything for the other three comes from each vendor’s published security documentation.

Dimensions

Anthropic, Opus 4.8

OpenAI, GPT-5.5

Google, Gemini 3.x

meta, llama stack

security document

System Card, 28 May 2026, 244 pages

System Card, April 23, 2026, updated April 24

Model card and a separate Frontier Safety Framework report

No closed-model cards. Open Weights and Purple Llama Stack

injection benchmark or dataset

ART, shade tools from Gray Swan and UK AISI, as well as an internal browser eval, 129 environments

Internal Connectors Assessment, Known Attacks

none for injection

AgentDojo, 97 jobs

Surfaces with an injected eval

Four. Tool Use, Coding, Computer Use, Browser

One. connectors

None published for injection

One. AgentDojo Agent Tasks

Multi-attempt increase shown

Yes. ART benchmarks at 1, 10, 100. Coding and Computer Use at 1 and 200

no, one point

No

No

title metric and unit

Attack success rate. Browser, with thinking, 31.5% raw, 0.5% secure

Strength score, the higher the better. 0.963, down from 0.998 for GPT-5.4-think

None published. Qualitatively increased resistance claimed

Attack-success rate on AgentDojo. 17.6% baseline to 1.75% combined

live external rewards

Yes. One week live injection bounty with external raid-teamers

No injection reward. bio reward only

none found

none found

Regression revealed

yes, clear, with numbers

The number dropped from 0.998 to 0.963, this was not considered as a regression

Claim of increased resistance, no numbers

Not applicable

Security teams now need to consider five factors

Anthropic tested four surfaces and printed each number. OpenAI tested one. Google did not publish a per-surface rate. Meta classified its handrails, not the models. The four revelations cannot be compared. These five constitute stage one.

Drag each agent you deploy or scope and tag each based on the surface it touches, browser, code, connectors, or desktop. Anthropic’s rate for Opus 4.8 runs 2.09% on coding and 0.5% on the browser. A mixed number doesn’t cover anyone. Pull up the vendor’s published rate for your specific surface. If the vendor has never published it, consider it untested.

Send cross-vendor grids to each vendor under evaluation. The 0.963 connector score and 31.5% browser rate were never on a scale. Demand per-surface attack success rates, raw and secure, with a named attack method. Empty cells are surfaces that have no first-party evidence.

Confirm in writing what number your integration gets. Anthropic’s 0.5% comes from the cloud in Chrome and Cowork with a full security stack. On the API, the model ships without them. Do not accept product numbers for API deployment.

Add two sections to the RFP. The vendor tested with an adaptive attacker that rewrites the payload against the model, and someone outside the company tried to break it. Anthropic ran Gray Swan’s Adaptive Shader tool and a week’s paid bounty. OpenAI tested known attacks on a single surface. Adversaries do not present known payloads.

Run your own injection test before sending it to any agent. Vendor numbers come from the vendor environment along with vendor system signals. Your stack has its own signals, permissions, and data access. Set pass limit. Anything above this is not live.

Bottom line. There is no standard for this yet. A vendor’s number tells you what he or she has chosen to measure. Your own Red Team tells you what you’re exposed to.



<a href

Leave a Comment