Mystery Solved: Anthropic Reveals Changes To Claude's Harnesses And Operating Instructions Likely Caused Degradation

For several weeks, a growing crowd of developers and AI power users claimed that Anthropic’s flagship models were losing their edge. Users on GitHub, "ai shrinkage inflation"- A perceived decline where Cloud seemed less capable of sustained reasoning, more prone to hallucinations, and increasingly useless with tokens.

Critics point to measurable changes in behavior and charge that the model has moved beyond a "research-first" go to a lazy person, "first edit" A style that can no longer be trusted for complex engineering.

While the company initially rejected the claims "Nervousness" Models for managing demand, growing evidence from high-profile users, and third-party benchmarks created a significant trust gap.

Today, Anthropic directly addressed these concerns by publishing a technical post-mortem, which identified three separate product-layer changes responsible for the reported quality issues.

"We take reports about degradation very seriously," reads Anthropic’s blog post on the matter. "We never intentionally degrade our models, and we were able to immediately confirm that our API and inference layer were unaffected."

Anthropic claims to have resolved the issues by making the argument attempt change and bringing back the verbosity prompt while also fixing a caching bug in version v2.1.116.

growing evidence of decline

The controversy gained momentum in early April 2026, fueled by detailed technical analyzes from the developer community. Stella Lorenzo, a senior director in AMD’s AI group, published a detailed audit of 6,852 Cloud Code session files and more than 234,000 tool calls on Github, finding performance degradation from their first use.

Their findings revealed that Claude’s depth of reasoning declined rapidly, leading to logical fallacies and an increased tendency to choose. "the simplest solution" Instead of right.

This real disappointment was validated by third-party benchmarks. Bridgemind reported that the accuracy of Claude Opus 4.6 in their tests dropped from 83.3% to 68.3%, causing its ranking to drop from No. 2 to No. 10.

Although some researchers argued that these specific benchmark comparisons were flawed due to inconsistent test areas, the statement that the cloud became "stupidity" Became a viral discussion topic. Users also reported that usage limits were being exhausted faster than expected, leading to suspicions that Anthropic was deliberately reducing performance to manage increasing demand.

cause of

In his post-Morem Bog post, Anthropic explained that although the underlying model weights have not returned, there have been three specific changes. "harness" Having the models around inadvertently hindered their performance:

Default argument attempt: On March 4, Anthropic changed the default logic attempt high To medium For cloud code to address UI latency issues. The purpose of this change was to prevent the interface from displaying "frozen" While the model did think, it resulted in a significant decline in intelligence for complex tasks.
A caching logic bug: Shipped on March 26, a caching optimization aimed at sorting out old "Thinking" Idle sessions contained a serious bug. Instead of clearing the history once after an hour of inactivity, it cleared it on every subsequent turn, causing the model to lose its identity. "short term memory" And become repetitive or forgetful.
System prompt verbosity limits: On April 16, Anthropic added instructions to the system prompt to keep text between tool calls to less than 25 words and final responses to 100 words. This effort to reduce verbosity in Opus 4.7 backfired, resulting in a 3% drop in coding quality assessment.

Impact and future safety measures

Quality issues extend beyond the Cloud Code CLI, affecting Cloud Agent SDK And cloud peerAlthough cloud api not impressed.

Anthropic acknowledged that these changes made the model "low intelligence," He acknowledged that this was not the experience users should expect.

To regain user trust and prevent future declines, Anthropic is implementing several operational changes:

Internal Dogfooding: A large portion of internal staff will need to use the exact public build of the cloud code to ensure they experience the product like users do.
Advanced Assessment Suite: The company will now run a comprehensive suite of per-model assessments "resection" Quick changes to isolate the effect of instructions specific to each system.
Strict Control: New tooling has been built to make it easier to audit quick changes, and model-specific changes will be strictly tailored to their intended goals.
Subscriber Compensation: In response to the token waste and performance friction caused by these bugs, Anthropic has reset usage limits for all clients effective April 23rd.

The company intends to use its new @ClaudeDevs account on X and GitHub threads to provide deeper reasoning behind future product decisions and maintain a more transparent dialogue with its developer base.

<a href

Mystery solved: Anthropic reveals changes to Claude's harnesses and operating instructions likely caused degradation

growing evidence of decline

cause of

Impact and future safety measures

Like this:

Related

Leave a Comment Cancel reply

growing evidence of decline

cause of

Impact and future safety measures

Share this:

Like this:

Related

Leave a Comment Cancel reply