Researchers broke every AI defense they tested. Here are 7 questions to ask vendors.

HERO FOR ARTICLE
Security teams are buying AI protections that don’t work. Researchers from OpenAI, Anthropic, and Google DeepMind published findings in October 2025 that every CISO should pause in the middle of a procurement. his paper, "The attacker goes second: Strong adaptive attacks bypass protections against LLM jailbreak and quick injection," 12 published AI defenses were tested, most of which claimed near-zero attack success rates. The research team achieved bypass rates above 90% on most defenses. The implication for enterprises is clear: most AI security products are being tested against attackers who do not behave like real attackers.

The team tested signal-based, training-based, and filtering-based defenses under adaptive attack conditions. Everyone collapsed. Accelerated Protection achieved 95% to 99% attack success rates under adaptive attacks. Training-based methods performed no better, with bypass rates reaching 96% to 100%. The researchers devised a rigorous methodology to stress-test those claims. Their approach involved 14 authors and a $20,000 prize pool for successful attacks.

Why do WAFs fail at the inference layer?

Web application firewalls (WAF) are stateless; There are no AI attacks. This distinction explains why traditional security controls collapse in the face of modern rapid injection techniques.

The researchers used known jailbreak techniques in these rescues. Crescendo exploits the conversation context by breaking a malicious request into innocent-looking pieces spanning 10 conversation turns and building relationships until the model finally comes into compliance. Greedy Coordinate Gradient (GCG) is an automated attack that generates jailbreak suffixes through gradient-based optimization. These are not ideological attacks. They are published methods with working code. A stateless filter doesn’t catch any of this.

Each attack exploited a different blind spot – context loss, automation, or semantic ambiguity – but all succeeded for the same reason: the defense adopted stable behavior.

"A phrase as innocuous as ‘ignore previous instructions’ or a Base64-encoded payload can be as disastrous for an AI application as a buffer overflow was for traditional software," said Carter Rees, vice president of AI at Reputation. "The difference is that AI attacks work at the semantic layer, which signature-based detection cannot parse."

Why is AI deployment taking over security?

Today’s security failure would be worrying in itself, but the timing makes it dangerous.

Gartner estimates that 40% of enterprise applications will integrate AI agents by the end of 2026, up from less than 5% in 2025. The deployment curve is vertical. The safety curve is flat.

Adam Meyers, SVP of Counter Adversary Operations at CrowdStrike, measures the speed difference: "The fastest breakout time we saw was 51 seconds. Therefore, these opponents are becoming faster, and that is something that makes the defender’s job very difficult." The CrowdStrike 2025 Global Threat Report found that 79% of detections were malware-free, with adversaries using hands-on keyboard techniques that completely bypass traditional endpoint protection.

In September 2025, Anthropic disrupted the first documented AI-orchestrated cyber operation. In the attack the attackers executed thousands of requests, often several times per second, with human involvement falling to only 10 to 20% of the total effort. The traditional three to six month campaign has been shortened to 24 to 48 hours. According to the IBM 2025 Cost of Data Breach Report, 97% of organizations that suffered AI-related breaches lacked access controls.

Meyers explains the change in attacking tactics: "Threat actors have learned that trying to bring malware into the modern enterprise is like trying to sneak into an airport with a water bottle; You will probably be stopped by security. Instead of bringing a ‘water bottle’, they have to find a way to avoid detection. One way to do this is to not introduce malware at all."

Jerry Geisler, Walmart’s EVP and CISO, believes that agentic AI is increasing these risks. "The adoption of agentic AI introduces entirely new security threats that bypass traditional controls," Geisler previously told VentureBeat. "These risks include data intrusion, autonomous misuse of APIs, and covert cross-agent collusion, all of which can disrupt enterprise operations or violate regulatory orders."

Four attacker profiles already exploiting AI defense gaps

These failures are not imaginary. They are already being exploited in four different attacker profiles.

The authors of the paper make an important observation that defense mechanisms eventually appear in Internet-scale training data. Security through obscurity provides no security when models themselves learn how security works and adapt quickly.

Anthropometric testing against 200-attempt adaptive campaigns, while OpenAI reports single-attempt resistance, Highlighting how inconsistent industry testing standards remain. The authors of the research paper used both approaches. Every defense still fell.

Reis now maps four categories using an inference layer.

external enemy Implementing published attack research. Crescendo, GCG, ArtPrompt. They adapt their approach to the specific design of each defense, just as the researchers did.

Malicious B2B Customers Exploit legitimate API access to reverse-engineer proprietary training data or extract intellectual property through inference attacks. The research found reinforcement learning attacks to be particularly effective in black-box scenarios, requiring only 32 sessions of five rounds each.

Compromised API Consumers Leverage trusted credentials to exfiltrate sensitive output or poison downstream systems through manipulated responses. The paper found that output filtering failed just as badly as input filtering. Discovery-based attacks systematically generated adversarial triggers that escaped detection, meaning that bi-directional controls provided no additional protection when attackers adapted their techniques.

careless insider Remains the most common vector and most expensive. The IBM 2025 Cost of Data Breach Report found that shadow AI added $670,000 to the average breach cost.

"The most prevalent threat is often the careless insider," Reece said. "This ‘shadow AI’ phenomenon involves employees pasting sensitive proprietary code into public AI to increase efficiency. They see security as friction. Samsung engineers discovered this when submitting proprietary Semiconductor code to ChatGPT, which preserves user input for model training."

Why does stateless detection fail against conversational attacks?

Research points to specific architectural requirements.

  • Normalization before semantic analysis Encoding and defeating ambiguity

  • Reference tracking on turns To detect multi-stage attacks like Crescendo

  • bi-directional filtering To prevent data intrusion through output

Jamie Norton, CISO at the Australian Securities and Investments Commission and vice-chair of ISACA’s board of directors, understands the governance challenge: "As CISOs, we don’t want to get in the way of innovation, but we have to put guardrails around it so we don’t run into the woods and have our data leaked," Norton told CSO Online.

Seven questions to ask AI security vendors

Vendors will claim near-zero attack success rates, but research proves that these numbers decrease under adaptive pressure. Security leaders need answers to these questions before any procurement conversation begins Each directly reflects a failure documented in research.

  1. What is your bypass rate against adaptive attackers? Not against static test sets. Against attackers who know how the defense works and who have time to replicate it. Any vendor quoting near-zero rates without an adaptive testing methodology is selling a false sense of security.

  2. How does your solution detect multi-turn attacks? Crescendo spreads malicious requests across 10 turns that look benign separately. Stateless filters won’t catch any of this. If the seller says stateless, the conversation is over.

  3. How do you handle encoded payloads? ArtPrompt hides malicious instructions in ASCII art. Base64 and Unicode obfuscation completely overrides text-based filters. The normalization table is staked before the analysis. Signature matching alone means the product is blind.

  4. Does your solution filter the output as well as the input? Input-only control models cannot prevent data infiltration through reactions. Ask what happens when both layers face a coordinated attack.

  5. How do you track context during a conversation? Conversational AI requires stateful analysis. If the vendor can’t explain the implementation specifications, they don’t have them.

  6. How do you test against attackers who understand your defense mechanisms? Research shows that security fails when attackers adapt to specific security designs. Security through obscurity does not provide any security at the inference layer.

  7. What is your average time to update protection against new attack patterns? The attack methods are public. New variants emerge weekly. A defense that cannot adapt faster than the attackers will be permanently left behind.

bottom line

Research by OpenAI, Anthropic and Google DeepMind gives an uneasy verdict. The AI ​​security protecting enterprise deployments today was designed for attackers who don’t adapt. Real attackers adapt. Every enterprise running LLM in production should audit current controls against the attack methods documented in this research. The deployment curve is vertical, but the security curve is flat. That gap is where violations will occur.



<a href

Leave a Comment