
Anthropic has today released Opus 4.5, its flagship Frontier model, and it brings improvements in coding performance, as well as some user experience improvements that make it more competitive with OpenAI’s latest Frontier model.
Perhaps the most prominent change for most users is that in consumer app experiences (web, mobile, and desktop), the cloud will be less likely to suddenly stop difficult conversations because they have gone on for too long. The memory improvements within a single conversation apply not only to Opus 4.5, but also to any existing cloud models in apps.
Users who experienced sudden termination (despite having space left in their session and weekly usage budget) were facing a hard reference window (200,000 tokens). While some larger language model implementations simply start trimming earlier messages out of context when a conversation goes beyond the maximum limit in the window, Cloud ended the conversation instead, allowing the user to experience increasingly inconsistent conversations, where the model will start forgetting things depending on how old they are.
Now, Cloud will instead go through a behind-the-scenes process of summarizing the key points from earlier parts of the conversation, attempting to discard what seems unnecessary while keeping what’s important.
Developers calling Anthropic’s API can take advantage of the same principles through context management and context contraction.
opus 4.5 performance
The Opus 4.5 is the first model to surpass an accuracy score of 80 percent — specifically, 80.9 percent in the SWE-Bench Verified benchmark, beating out OpenAI’s recently released GPT-5.1-codecs-max (77.9 percent) and Google’s Gemini 3 Pro (76.2 percent). The model performs particularly well in the Agentic Coding and Agentic Tool Use benchmarks, but still lags behind GPT-5.1 in Visual Reasoning (MMMU).
<a href