
Anthropic released Cloud Sonnet 4.6 on Tuesday, a model that is a seismic reevaluation event for the AI industry. It provides near-major intelligence at a mid-tier cost, and it lands in the midst of an unprecedented corporate rush to deploy AI agents and automated coding tools.
It is a complete upgrade in model coding, computer use, long-context reasoning, agent planning, knowledge work, and design. It has a 1M token reference window in beta. This is now the default model in claude.ai and Cloud Cowork, and pricing is stable at $3/$15 per million tokens – the same as its predecessor, Sonnet 4.5.
That pricing detail is the headline that matters most. Anthropic’s flagship Opus model is priced at $15/$75 per million tokens – five times the Sonnet price. Yet the performance that would previously have required reaching an Opus-class model – including real-world, economically valuable office tasks – is now available with the Sonnet 4.6. For the thousands of enterprises that are now deploying AI agents that make millions of API calls every day, that math changes everything.
Why did the cost of running large-scale AI agents drop so dramatically?
To understand the significance of this release, you need to understand the moment of its arrival. The past year has been dominated by double events. "vibe coding" And agentic AI. Cloud Code – Anthropic’s developer-facing terminal tool – has become a cultural force in Silicon Valley, with engineers building entire applications through natural language conversations. The New York Times chronicled its meteoric rise in January. The Verge recently announced that Cloud Code is real "Moment." OpenAI, meanwhile, is continuing its offensive with codecs desktop applications and faster inference chips.
The result is an industry where AI models are no longer evaluated in isolation. They are evaluated as the engines inside autonomous agents – systems that run for hours, call thousands of tools, write and execute code, navigate browsers, and interact with enterprise software. Every dollar spent per million tokens multiplies into thousands of calls. At scale, the difference between $15 and $3 per million input tokens is not incremental. This is transformative.
The benchmark table released by Anthropic paints a fascinating picture. On SWE-Bench Verified, the industry-standard test for real-world software coding, Sonnet 4.6 scored 79.6% – which roughly matches Opus 4.6’s 80.8%. On Agentic Computer Use (OSWorld-verified), Sonnet 4.6 scored 72.5%, which is essentially the same as Opus 4.6’s 72.7%. On office tasks (GDPval-AA Elo), the Sonnet 4.6 actually scored 1633 points, which is higher than the Opus 4.6’s 1606. On agentive financial analysis, the Sonnet 4.6 achieved a score of 63.3%, beating every model in the comparison, including the Opus 4.6 with 60.1%.
These are not marginal differences. In many of the categories that enterprises care about most, the Sonnet 4.6 matches or outperforms models that cost five times more to run. An enterprise running an AI agent that processes 10 million tokens per day was previously forced to choose between inferior results at low costs or better results at rapidly increasing expenses. The Sonnet 4.6 largely eliminates that trade-off.
In Cloud Code, initial testing found that about 70% of the time users preferred Sonnet 4.6 over Sonnet 4.5. In 59% of cases users preferred the Sonnet 4.6 over November’s Frontier model Opus 4.5. He considered the Sonnet 4.6 to be significantly less susceptible to over-engineering "laziness," And significantly better at following instructions. They reported fewer false claims of success, fewer hallucinations, and more consistent follow-up on multi-step tasks.
How Claude’s computer use abilities went from ‘experimental’ to near-human in 16 months
One of the most dramatic stories in the release is Anthropic’s advances in computer use – the ability of an AI to operate a computer the same way a human does, clicking on a mouse, typing on a keyboard, and navigating software that lacks modern APIs.
When Anthropic first introduced this capability in October 2024, the company acknowledged that it was "Still experimental – cumbersome and error-prone at times." The figures since then tell a remarkable story: On OSWorld, Cloud Sonnet 3.5 scored 14.9% in October 2024. Sonnet 3.7 reached 28.0% in February 2025. Sonnet reached 42.2% by June 4. Sonnet rose 4.5 to 61.4% in October. Now Sonnet 4.6 has reached 72.5% – almost a fivefold improvement in 16 months.
This matters because computer access is the capability that unlocks a broad set of enterprise applications for AI agents. Almost every organization has legacy software – insurance portals, government databases, ERP systems, hospital scheduling tools – that were built before APIs existed. A model that can simply look at a screen and interact with it opens all this up for automation without having to create custom connectors.
Pace CEO Jamie Cuff said Sonnet 4.6 achieved a score of 94% on their complex insurance computer usage benchmark, the highest of any cloud model tested. "It reasons through failures and improves itself in ways we haven’t seen before," Cuffe said in a statement sent to VentureBeat. Conway co-founder Will Harvey called it "A clear improvement over anything else we tested in our evaluation."
The security dimension of computer use was also addressed. Anthropic noted that the use of computers creates accelerated injection risks – malicious actors hide instructions on websites to hijack models – and said that its evaluation shows that Sonnet 4.6 is a major improvement over Sonnet 4.5 in resisting such attacks. For enterprises deploying agents that browse the web and interact with external systems, this hardening is not optional.
Enterprise customers say the model narrows the gap between Sonnet and Opus pricing tiers
Customer feedback regarding cost-performance dynamics has been unusually specific. Many early testers clearly described the Sonnet 4.6 as eliminating the need to step up to the more expensive Opus level.
Caitlin Colegrove, CTO of Hex Technologies, said the company is moving most of its traffic to Sonnet 4.6, noting that with adaptive thinking and higher effort, "We see Opus-level performance on all but our toughest analytical tasks with the more efficient and flexible profile. At Sonet pricing, it’s an easy call for our workload."
Ben Kuss, Box’s CTO, said the model outperformed Sonnet 4.5 by 15 percentage points on heavy logic quizzes in real enterprise documents. Michele Catasta, president of Replite, said the performance-to-cost ratio "Extraordinary." Ryan Wiggins of Mercury Banking put it more clearly: "Cloud Sonnet 4.6 is faster, cheaper and delivers better results on the first try. That combination was a surprising combination of improvements, and we didn’t expect to see it at this price point."
The coding improvements are particularly resonant given the dominance of cloud code in the developer tools market. David Locker, vice president of AI at CodeRabbit, said the model "It punches well above its weight class for most real-world PRs." Factory AI’s Leo Tchorakov said the team is "Converting our Sonnet traffic to this model." Joe Binder, GitHub’s vice president of product, confirmed the model. "Already excels at complex code fixes, especially when required to search across large codebases."
Hercules founder and CEO Brendan Falk went on: "The Claude Sonet 4.6 is the best model we’ve seen so far. It has Opus 4.6 level accuracy, follow instructions and UI, all at a significantly lower cost."
A simulated business competition shows that AI agents plan in minutes, not months
Hidden in the technical details is a potential that hints at where autonomous AI agents are headed. Sonnet 4.6’s 1M token reference window can house entire codebases, long contracts, or dozens of research papers in a single request. Anthropic says the model argues effectively in all contexts – a claim the company demonstrated through an unusual assessment.
The vending-bench arena tests how well a model can run a simulated business over time, with different AI models competing against each other for the biggest profits. Without human inspiration, Sonnet 4.6 developed a new strategy: it invested heavily in capacity for the first ten simulated months, spending significantly more than its competitors, and then sharply scaled back to focus on profitability in the final phase. The model ended its 365-day simulation at about $5,700, compared to about $2,100 for the Sonnet 4.5.
This type of multi-month strategic planning, executed autonomously, represents a qualitatively different capability than answering questions or generating code snippets. This is the type of long-horizon reasoning that makes AI agents viable for real business operations — and helps explain why Anthropic is positioning Sonnet 4.6 not just as a chatbot upgrade, but as the engine for a new generation of autonomous systems.
Anthropic’s Sonnet 4.6 comes as the company expands into enterprise markets and defense
This release does not come in a vacuum. Anthropic is in the midst of the most consequential stretch in its history, and the competitive landscape is intensifying on every front.
On the same day of this launch, TechCrunch reported that Indian IT giant Infosys announced a partnership with Anthropic to build an enterprise-grade AI agent that will integrate cloud models into Infosys’ Topaz AI platform for banking, telecom, and manufacturing. Anthropic CEO Dario Amodei tells TechCrunch "A big difference between AI models working in demos and AI models working in a regulated industry," And Infosys helps bridge this. TechCrunch also reported that Anthropic has opened its first India office in Bengaluru, and India now accounts for about 6% of global cloud usage, second only to the US. The company, which CNBC reports is valued at $183 billion, is rapidly expanding its enterprise footprint.
Meanwhile, Anthropic president Daniela Amodei told ABC News last week that AI will make the humanities a major "More important than ever," Arguing that critical thinking skills will become more valuable as larger language models master technical tasks. This is the kind of statement a company makes when it believes its technology is going to reshape entire categories of white-collar employment.
The competitive picture of Sonet 4.6 is also noteworthy. The model outperforms Google’s Gemini 3 Pro and OpenAI’s GPT-5.2 on several benchmarks. GPT-5.2 trails on agentic computer use (38.2% vs. 72.5%), agentic search (77.9% vs. 74.7% for a non-pro score of SONNET 4.6), and agentic financial analysis (59.0% vs. 63.3%). Gemini 3 Pro shows competitive performance on visual reasoning and multilingual benchmarks, but falls behind in agentic categories where enterprise investment is increasing.
Broad conclusions cannot be drawn about any one model. It’s about what happens when Opus-class intelligence becomes available for a few dollars per million tokens instead of a few tens of dollars. Companies that were carefully operating AI agents with small deployments are now facing a fundamentally different cost calculation. Agents that were too expensive to run consistently in January have suddenly become cheaper in February.
Cloud Sonnet 4.6 is now available on all cloud plans, cloud cowork, cloud code, APIs, and all major cloud platforms. Anthropic has upgraded its free tier to Sonnet 4.6 by default. Developers can quickly access it using cloud-sonnet-4-6 through the cloud API.
<a href