Anthropic’s Browser Agent Got Hijacked 31.5% Of The Time Before Safeguards Engaged

Of all the leading laboratories, Anthropic has the most rapid injection data published this spring. Point Red-Timer at its latest model in the browser, and the attacker hijacked it 31.5% of the time before introducing security measures. OpenAI, Google and Meta have never produced numbers to match security leaders. This figure seems like a liability. In this comparison, it is the opposite. It is a solid piece of land.

Each of the four frontier laboratories sent rapid injection disclosures, and no two matched. Anthropic put 244 pages and four agentic surfaces on the table on May 28. OpenAI reported a surface,connectors. Google moved the topic out of Model Cards and into a separate security framework. Meta did not ship any discontinued-model cards. The cross-vendor prompt injection disclosure grid below shows what each lab tested, what each measured, and a side-by-side comparison of all four locations.

A quick injection hides a malicious instruction in something read by an agent, a web page, a document, or a tool result. An imposed line can intrude into records or thwart actions that no one has approved, and these are the only first-party evidence to the card buyer.

There is no industry standard for measuring any of this, and this is the crux of the problem. Carter Rees, vice president of AI at Reputation, told VentureBeat that the accelerated injection breaks the assumption that every legacy device was built. "An innocuous phrase like ‘ignore previous instructions’ can carry as destructive a payload as a buffer overflow, yet it shares no resemblance with known malware signatures." With no shared signature to scan, each lab created its own scale, and the results were not ranked.

Adam Meyers, senior vice president of counter adversarial operations at CrowdStrike, said the exposure is now up to the buyer to manage. "As you implement AI, it increases your attack surface, so now you have to be able to protect those AI models from adversarial misuse or data poisoning or injectables." CrowdStrike’s own frontline data shows that the threat side is not standing still. In its 2026 Financial Services Threat Landscape report released in May, the company noted that adversaries used AI to reduce the time from initial access to impact that legacy security responses can take.

Anthropic measured four surfaces. The numbers vary by orders of magnitude depending on what you read.

The Opus 4.8 card does what others don’t: it breaks instant injection out of the surface, and dispersion is the story.

Put the model in a coding environment, and an adaptive attacker with Gray Swan’s shader tool succeeded in a calculated 7.03% of solo attempts. Security measures pulled it down to 2.09%.

Move the same square of attack to the browser, the surface behind the cloud in Chrome and cloud cowork, and the floor gives way. Anthropic deployed professional red-teamers on 129 web environments prevented from training and printed each result in Table 5.2.2.4.a on page 81 of the system card. Per-attempt is the share of all injection attempts received in 129 environments in 10 attempts. The counter-scenario is the hard cut, the part of the environment where at least one attempt landed.

Without safeguards, read the per-attempt column without thinking, and the raw rate drops with each generation, from Sonnet 4.6 to 50.7%, Opus 4.8 to 31.5%. The lowest in the table, at 5.9%, belongs to Mythos Preview, which no one can buy yet. Turn on security measures, and Opus 4.8 drops to 0.5%. Stop thinking and it goes to zero in all 129 environments.

OpenAI measured a surface, with attacks it already knew about.

The GPT-5.5 card, published on April 23 and updated on April 24, handles prompt injection in one place, with a single section on robustness to known attacks against connectors. OpenAI reports this as a robustness score where higher is better, the inverse of the attack success rate. GPT-5.5 came in at 0.963, down from 0.998 for GPT-5.4-thinking. That one figure alone is a complete disclosure.

Anthropic tested four surfaces against an adaptive attacker that rewrites its approach based on what the model does, then ran a week-long bug bounty where red-teamers tried to break the model live. When coding results came back worse than Opus 4.7, the card said so.

Put 0.963 next to 31.5%, and they look like they belong on a scoreboard. they do not. A surface has a robustness score against known attacks. Second, the per-attempt attack success rate in 129 browser environments against an attacker is optimized in real time.

Google and Meta never put numbers on cards

Google’s Gemini 3 files put the prompt injection under wraps, and the launch materials describe strong resistance with no numbers attached. The Frontier Safety Framework report drives red teaming, but only in its capability domains, and prompt injection is not one of them. No model cards, no framework pages, no per-surface numbers that the buyer can pick up in a risk review.

Meta ships open weights with no closed-model cards. Prompt Injection Defense sits in a separate stack, Purple Llama’s LlamaFirewall. A PromptGuard 2 classifier and an AlignmentCheck auditor, run against the public AgentDojo benchmark and its 97 functions, reduce the attack success from 17.6% to 1.75% without any defenses. Real numbers. They grade guardrails based on a public benchmark, not a model on the deployment surface that the safety team recognizes.

Cross-vendor prompt injection disclosure grid

The grid below works for any frontier model that security teams are weighing. Each line marks the place where the four laboratories are divided. Each division is where a quick comparison breaks down. The anthropic figures come from the Opus 4.8 system card. Everything for the other three comes from each vendor’s published security documentation.

Dimensions	Anthropic, Opus 4.8	OpenAI, GPT-5.5	Google, Gemini 3.x	meta, llama stack
security document	System Card, 28 May 2026, 244 pages	System Card, April 23, 2026, updated April 24	Model card and a separate Frontier Safety Framework report	No closed-model cards. Open Weights and Purple Llama Stack
injection benchmark or dataset	ART, shade tools from Gray Swan and UK AISI, as well as an internal browser eval, 129 environments	Internal Connectors Assessment, Known Attacks	none for injection	AgentDojo, 97 jobs
Surfaces with an injected eval	Four. Tool Use, Coding, Computer Use, Browser	One. connectors	None published for injection	One. AgentDojo Agent Tasks
Multi-attempt increase shown	Yes. ART benchmarks at 1, 10, 100. Coding and Computer Use at 1 and 200	no, one point	No	No
title metric and unit	Attack success rate. Browser, with thinking, 31.5% raw, 0.5% secure	Strength score, the higher the better. 0.963, down from 0.998 for GPT-5.4-think	None published. Qualitatively increased resistance claimed	Attack-success rate on AgentDojo. 17.6% baseline to 1.75% combined
live external rewards	Yes. One week live injection bounty with external raid-teamers	No injection reward. bio reward only	none found	none found
Regression revealed	yes, clear, with numbers	The number dropped from 0.998 to 0.963, this was not considered as a regression	Claim of increased resistance, no numbers	Not applicable

Security teams now need to consider five factors

Anthropic tested four surfaces and printed each number. OpenAI tested one. Google did not publish a per-surface rate. Meta classified its handrails, not the models. The four revelations cannot be compared. These five constitute stage one.

Drag each agent you deploy or scope and tag each based on the surface it touches, browser, code, connectors, or desktop. Anthropic’s rate for Opus 4.8 runs 2.09% on coding and 0.5% on the browser. A mixed number doesn’t cover anyone. Pull up the vendor’s published rate for your specific surface. If the vendor has never published it, consider it untested.

Send cross-vendor grids to each vendor under evaluation. The 0.963 connector score and 31.5% browser rate were never on a scale. Demand per-surface attack success rates, raw and secure, with a named attack method. Empty cells are surfaces that have no first-party evidence.

Confirm in writing what number your integration gets. Anthropic’s 0.5% comes from the cloud in Chrome and Cowork with a full security stack. On the API, the model ships without them. Do not accept product numbers for API deployment.

Add two sections to the RFP. The vendor tested with an adaptive attacker that rewrites the payload against the model, and someone outside the company tried to break it. Anthropic ran Gray Swan’s Adaptive Shader tool and a week’s paid bounty. OpenAI tested known attacks on a single surface. Adversaries do not present known payloads.

Run your own injection test before sending it to any agent. Vendor numbers come from the vendor environment along with vendor system signals. Your stack has its own signals, permissions, and data access. Set pass limit. Anything above this is not live.

Bottom line. There is no standard for this yet. A vendor’s number tells you what he or she has chosen to measure. Your own Red Team tells you what you’re exposed to.

<a href

Anthropic’s browser agent got hijacked 31.5% of the time before safeguards engaged

Anthropic measured four surfaces. The numbers vary by orders of magnitude depending on what you read.

OpenAI measured a surface, with attacks it already knew about.

Google and Meta never put numbers on cards

Cross-vendor prompt injection disclosure grid

Security teams now need to consider five factors

Like this:

Related

Leave a Comment Cancel reply

Anthropic measured four surfaces. The numbers vary by orders of magnitude depending on what you read.

OpenAI measured a surface, with attacks it already knew about.

Google and Meta never put numbers on cards

Cross-vendor prompt injection disclosure grid

Security teams now need to consider five factors

Share this:

Like this:

Related

Leave a Comment Cancel reply