Class of AI Models Hyped as Scarily Powerful Apparently Scared the Government Too Much and Now They’re Disabled

Dario Amodei Pentagon Anthropic V2

Anthropic was forced to “suddenly disable” two of its most prized Frontier AI models in response to a highly restrictive US government order, according to a statement posted on its website on Friday. “We recognize this is a misunderstanding and we are working to restore access as quickly as possible,” the statement said.

The government action in question is an “export control directive” stating that foreign nationals cannot use the model inside or outside the US, and it was motivated by an unspecified national security concern, according to Anthropic.

But national security concerns, and other safety and security fears, have been at the center of the rollout of these models, which arguably made such an incident foreseeable.

Instead of releasing its Cloud Mythos preview model to the public, in early April, Anthropic turned the creation of the model into a campaign raising awareness about the direct dangers of frontier AI models.

It released a system card explaining why the model would not be made publicly available, and detailing scary capabilities such as cheating and the ability to break out of control from a limited system. It was also reportedly capable of aiding the development of advanced weapons. For example, System Card has described it as “capable of significant cross-domain synthesis relevant to the development of destructive biological weapons”.

At the same time, the company launched Project Glasswing, a program that allowed a limited group of partners and organizations to sample the model to see what new horrors it could bring to the world of cybersecurity. Anthropic blog post about Project GlassWing states, “We formed Project GlassWing because of the capabilities we saw in a new frontier model trained by Anthropic, which we believe can reshape cybersecurity.”

Soon, despite the inherent silliness of the subject matter, the Mythos preview became a tabloid story. An article in the New York Post cites computer scientist Roman Yampolsky as predicting that AI could soon develop “hacking tools, biological weapons, chemical weapons,” as Mithos said. [and] Such new weapons that we cannot even imagine.” The phrase “weapon we can’t even imagine” also made it into the headlines.

British government officials and leaders of Britain’s finance sector struggled to formulate an action plan in the face of the perceived threat. According to The New York Times, the Trump administration’s “non-interventionist policy” toward AI changed after the Mythos announcement, and its mere existence helped lead to the development of a security-focused AI executive order. Trump had signed a similar order about a week ago.

Anyway, last week Anthropic released Cloud Fable 5 and Mythos 5. The company describes the Fable 5 as “a Mythos-class model that we’ve designed to be safe for general use,” but that its capabilities “exceed any model we typically make available.” Meanwhile, Mythos 5 received a very limited release as part of Project Glasswing.

Brian Merchant in Blood in the Machine described it this way:

After setting off a huge news cycle in the tech media with its April announcement that it had created an AI model, Mythos, that was so powerful, so dangerous that it threatened to take down the entire civilizational order — and it was diligently withholding the product from the public to protect us from it — the country’s now-#1 AI startup decided to put Mythos Finally for sale.

Hours after Merchant wrote those words, an export control directive was delivered to Anthropic, and the Fable 5 and Mythos 5 were made unavailable due to obvious national security concerns. It appears that Anthropic was ordered to revoke access only for users who are not US citizens, but it is understandable that Anthropic would be impractical to allow access to anyone anywhere in the world for fear of disobeying the order. Among many issues, non-US citizens work at Anthropic. It is obviously easier to pull the models altogether until the situation is resolved.

Interestingly, Anthropic’s statement regarding the Export Control Directive said that Anthropic had “worked with the US Government” along with the UK Government and “several private third-party organizations” in an effort to create a satisfactory set of safeguards for the models. Upon release, security measures were, in many ways, the most prominent feature of the media narrative surrounding the Fable 5. One of the stricter guardrails designed to silently punish users who misused the model was ill-conceived, prompting Anthropic to apologize.

But according to Anthropic, the government was alarmed after learning of a jailbreak for the Fable 5 that bypassed those all-important security measures:

“Our understanding is that the government believes it has become aware of a method of bypassing or ‘jailbreaking’. We reviewed the performance of this specific technique being used to identify a small number of previously known vulnerabilities. All of these vulnerabilities appear relatively simple, and we have found that other publicly available models are also able to find them without the need for bypass.

Anthropic absolutely correctly points out that when they released the Fable 5, the section of their blog post about the security of the model made it clear that some jailbreaks were still possible. This is “probably impossible.” completely Prevent universal jailbreak, but our goal is to make any remaining jailbreaks so slow and expensive that we can detect and stop them before they can be used on a large scale,” Anthropic wrote. Essentially, since it is not yet possible to make a model completely jailbreak-proof, Anthropic tried to make the jailbreak either expensive to produce, or too “narrow” to be a threat. Anthropic was also public about this fact. is that it retains more data than usual on users of Mythos-class models.

Still, it’s strange to see Anthropic now downplaying the perceived threats to its models, writing that these vulnerabilities are “minor,” “already known,” and “relatively simple,” while also pointing out that “other publicly available models are capable of finding them without even requiring bypass.”

Then, when Anthropic first publicized this class of models, it told the world that it had created a creation of unprecedented power with the potential to do real damage to the world. Two months later, the “Mythos-Class” model was a product for public consumption, available as a premium product “at no additional cost to users of Pro, Max, Team, and seat-based enterprise plans”, but only for a limited time. On June 23, Anthropic intended to “remove Fable 5 from those plans” and instead require a pay-as-you-go plan.

Anthropic claims that such government actions, if they become standard, “could halt all new model deployments for all marginal model providers.” And maybe this is true. When the release of a product is put on hold when the rollout of that product involves a precedent-setting piece of technology that is reportedly worthy of a global reassessment of cybersecurity, an overreaction to holes in that product’s security measures should probably come as no surprise, even if that overreaction is bad for business.



<a href

Leave a Comment