Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Anthropic retreat on a policy that would secretly limit competitors from using its new AI model, Cloud Fable 5, to develop other AI models. The company changed its stance after the move received significant backlash from the AI research community.

“We are changing Fable 5’s security measures for Frontier LLM development to make them visible,” Anthropic said in a statement to WIRED. “We made a bad compromise and we apologize for not getting the balance right.”

Anthropic earlier this week released Cloud Fable 5, a version of its latest AI model with additional safety guardrails designed to prevent abuse. Some of the security measures Anthropic decided were not surprising: The company said it would turn users asking questions about cybersecurity, biology or chemistry into less capable AI models to reduce the chances that someone using advanced AI would conduct a cyberattack or create a bioweapon.

But for researchers trying to use CloudFable 5 for frontier AI development, Anthropic outlined a different approach. The company would deliberately degrade the performance of models in ways that were invisible to the user. The move would effectively harm researchers trying to use the cloud to train competing AI models, which Anthropic explicitly prohibits in its terms of service.

Anthropic now says that’s changing course, and CloudFable 5’s safeguards for AI development will be visible to users. If the company suspects that a user is attempting to use the cloud to create a highly capable AI, it will alert them that it will, either denying the request or redirecting the user to a less capable model.

Anthropic reversed the policy after receiving sharp backlash from the AI research community. Anthropic has already taken steps to limit competitors using the cloud to build closed and open-source AI models, but critics say quietly reducing model performance for some users has gone too far. The cloud’s coding agent has become a favorite tool among developers, including those working on open-source AI research projects, and researchers told WIRED that the company’s latest policy could lead to a troubling future in which only a few major AI labs can conduct advanced AI research.

Dean Ball, a senior fellow at the Foundation for American Innovation and former White House adviser on AI, wrote in a post on He continued in another post that the “secret subversion” policy undermines Anthropic’s overall stance, as it limits AI researchers from collaborating on AI safety.

“It felt like Anthropic was saying to the public, ‘We don’t trust anyone else to do AI research. We’re the only ones who have to do AI research,'” says Will Brown, head of research at open-source AI startup Prime Intellect. “It feels a bit like they’re starting to drag the ladder behind them.”

Brown said the policy would also have left developers in the dark as to whether they were violating Anthropic’s rules, because the company would not alert them if its security measures were implemented. He said the sanctions could have wide-ranging consequences. For example, he pointed to the growing ecosystem of third-party evaluation firms that test Frontier models for safety, performance, and reliability — work that could be hampered if Anthropic secretly bugged its models.

<a href

Anthropic Walks Back Policy That Could Have ‘Sabotaged’ AI Researchers Using Claude

Like this:

Related

Leave a Comment Cancel reply

Share this:

Like this:

Related

Leave a Comment Cancel reply