Anthropic’s Claude Opus 4.5 is here: Cheaper AI, infinite chats, and coding skills that beat humans

nuneybits Vector art of a computer image in burnt orange 1655df10 3e1f 4619 bb97 ffc10b141b64
Anthropic on Monday released its most capable artificial intelligence model to date, boasting cutting-edge performance on software engineering tasks while cutting prices by nearly two-thirds — a strategic move that sharpens the AI ​​startup’s competition with deep-pocketed rivals OpenAI and Google.

According to materials reviewed by VentureBeat, the new model, Cloud Opus 4.5, scored higher on Anthropic’s most challenging internal engineering assessment than any human job candidate in company history. The results underscore growing questions about the rapidly expanding capabilities of AI systems and how the technology will reshape white-collar businesses.

The Amazon-backed company is pricing Cloud Opus 4.5 at $5 per million input tokens and $25 per million output tokens — a dramatic reduction from the rates of $15 and $75 for its predecessor, Cloud Opus 4.1, released earlier this year. The move makes Frontier AI capabilities accessible to a broader group of developers and enterprises while taking the pressure off competitors to match both performance and pricing.

"We want to make sure this really works for people who want to work with these models," Alex Albert, Anthropic’s Head of Developer Relations, said in an exclusive interview with VentureBeat. "That’s really what we’re focused on: How can we make Claude better able to help you do the things you don’t want to do in your job?"

The announcement comes as mankind races to maintain its position in an increasingly crowded field. OpenAI recently released GPT-5.1 and a special coding model called Codex Max that can operate autonomously for extended periods. According to a recent report by The Information, Google had just unveiled Gemini 3 last week, due to which OpenAI had also raised concerns about the search giant’s progress.

Developers say Opus 4.5 demonstrates better judgment on real-world tasks

Anthropic’s internal testing revealed what the company describes as a qualitative leap in the reasoning capabilities of Cloud Opus 4.5. According to company data, the model achieved 80.9% accuracy on SWE-Bench Verified, a benchmark measuring real-world software engineering tasks, outperforming OpenAI’s GPT-5.1-codecs-max (77.9%), Anthropic’s own Sonnet 4.5 (77.2%), and Google’s Gemini 3 Pro (76.2%). This result represents significant progress compared to OpenAI’s current state-of-the-art model, which was released just five days ago.

But technical standards tell only part of the story. Albert said employee testers have consistently reported that the model displays better judgment and intuition across a variety of tasks – a change he described as the model developing an understanding of what matters in real-world contexts.

"The model just gets it," Albert said. "It has developed the kind of intuition and judgment on so many real-world things that qualitatively seems like a huge leap forward from previous models."

He pointed to his own workflow as an example. Previously, Albert said, he would ask AI models to gather information but would be hesitant to trust their synthesis or prioritization. With Opus 4.5, he’s delegating more holistic work, connecting it to Slack and internal documents to create a consistent summary that matches his priorities.

Opus 4.5 beats all human candidates in company’s toughest engineering test

The model’s performance on Anthropic’s internal engineering assessment is a notable milestone. The take-home exam, designed for potential performance engineering candidates, aims to evaluate technical ability and judgment under time pressure within a prescribed two-hour limit.

According to the company, using a technique called parallel test-time computation — which aggregates multiple attempts from the model and selects the best result — Opus 4.5 scored higher than any human candidate who took the test. The model matches the performance of the best human candidate to date when used in Anthropic’s coding environment, cloud code, without time limits.

The company acknowledged that the test does not measure other important professional skills such as collaboration, communication, or tendencies that develop over years of experience. Nevertheless, Anthropic said the results "This raises questions about how AI will change engineering as a profession."

Albert stressed the importance of the discovery. "I think this is kind of a sign, perhaps, of how useful these models can actually be in the work context and for our jobs in the future." He said. "Of course, this was an engineering task, and I would say that the models are relatively advanced in engineering compared to other areas, but I think it’s a really important sign to pay attention to this."

Dramatic efficiency improvements cut token usage by up to 76% on key benchmarks

Beyond raw performance, Anthropic is betting that efficiency improvements will differentiate Cloud Opus 4.5 in the market. The company says the model uses dramatically fewer tokens – units of text that AI systems process – to achieve the same or better results than predecessors.

According to Anthropic, at the medium effort level, the Opus 4.5 matches the previous Sonnet 4.5 model’s best score on SWE-Bench Verified, while using 76% fewer output tokens. At the highest effort level, Opus 4.5 exceeds Sonnet 4.5’s performance by 4.3 percentage points, while still using 48% fewer tokens.

To give developers more control, Anthropic introduced a "effort parameter" This allows users to adjust how much computational work the model applies to each task – balancing performance against latency and cost.

Provided rapid verification of efficiency claims to enterprise customers. "Opus 4.5 outperforms Sonnet 4.5 and the competition on our internal benchmarks, using fewer tokens to solve the same problems," said Michele Catasta, president of Replit, a cloud-based coding platform, in a statement to VentureBeat. "At scale, that efficiency increases."

Mario Rodriguez, GitHub’s chief product officer, said early testing shows Opus 4.5 "Exceeds internal coding benchmarks while halving token usage, and is particularly suitable for tasks like code migration and code refactoring."

Early customers report AI agents that learn from experience and hone their skills

One of the most important capabilities demonstrated by early customers includes anthropic calls. "self-improvement agent" – AI systems that can refine their own performance through iterative learning.

Japanese e-commerce and Internet company Rakuten tested Cloud Opus 4.5 on automation of office tasks. "Our agents were able to autonomously refine their capabilities – achieving peak performance in 4 iterations while other models could not match that quality after 10." said Yusuke Kaji, general manager of Rakuten’s AI for business.

Albert explained that the model is not updating its own weights – the fundamental parameters that define the behavior of an AI system – but rather iteratively improving the tools and approaches used to solve problems. "It was iteratively refining the skill for a task and seeing that it was trying to adapt the skill to get better performance so that it could complete this task," He said.

The capabilities extend beyond coding. Albert said Anthropic has seen significant improvements in creating professional documents, spreadsheets and presentations. "They’re saying this model is the biggest leap they’ve seen between generations," Albert said. "So going from the Sonet 4.5 to the Opus 4.5 is a bigger leap than having two models back to back in the past."

Financial modeling firm Fundamental Research Labs gave this information "Accuracy at our internal level improved by 20%, efficiency increased by 15%, and complex tasks that once seemed out of reach became achievable," According to co-founder Nico Christie.

New features target Excel users, Chrome workflows and eliminate chat length limits

Along with the model releases, Anthropic also launched a suite of product updates for enterprise users. Cloud Pivot for Excel Max became generally available to Teams and enterprise users, with new support for pivot tables, charts, and file uploads. The Chrome browser extension is now available to all Max users.

Perhaps most importantly, Anthropic introduced "infinite chat" – A feature that eliminates context window limitations by automatically summarizing earlier parts of conversations as they become longer. "Within cloud AI, within the product, you effectively get this kind of infinite context window because of compaction, as well as some of the memory things that we’re doing," Albert explained.

For developers, Anthropic released "programmatic tool calling," Which allows the cloud to write and execute code that directly invokes functions. Cloud Code received an update "planning mode" And Research became available on desktop in preview, enabling developers to run multiple AI agent sessions in parallel.

Market heats up as OpenAI races to match Google’s performance and pricing

Anthropic’s annual revenue is set to reach $2 billion during the first quarter of 2025, more than doubling from $1 billion in the previous period. The number of customers spending more than $100,000 annually increased eightfold year over year.

The rapid release of Opus 4.5 – just weeks after Haiku 4.5 in October and Sonnet 4.5 in September – reflects the dynamics of the broader industry. OpenAI released several GPT-5 variants throughout 2025, including a special Codex Max model in November that can operate autonomously for up to 24 hours. Google shipped Gemini 3 in mid-November after several months of development.

Albert partially attributed Anthropic’s accelerated pace of using the cloud to accelerate its own growth. "We’re seeing a lot of support and momentum by the cloud itself, whether it’s on the actual product building side or the model research side," He said.

Pricing cuts for Opus 4.5 could put pressure on margins while potentially expanding the addressable market. "I’m hoping that a lot of startups will start incorporating it into their products and feature it prominently," Albert said.

Yet profitability remains elusive for leading AI laboratories as they invest heavily in computing infrastructure and research talent. The AI ​​market is estimated to top $1 trillion in revenue within a decade, but no single provider has established a dominant market position – even as models reach the threshold where they can meaningfully automate complex knowledge work.

Michael Truel, CEO of Cursor, an AI-powered code editor called Opus 4.5 "A marked improvement over the previous cloud model inside Cursor, with better pricing and intelligence on difficult coding tasks." Scott Wu, CEO of Cognition, an AI coding startup, said the model delivers "Strong results on our toughest evaluations and consistent performance through 30-minute autonomous coding sessions."

For enterprises and developers, competition translates into rapidly improving capabilities at falling prices. But as AI performance on technical tasks approaches and sometimes even exceeds human expert levels, the impact of technology on professional tasks becomes less theoretical.

When asked about the engineering test results and what they indicate about the trajectory of AI, Albert said simply: "I think that’s a really important sign to pay attention to."



<a href

Leave a Comment