Cursor’s New Coding Model Composer 2 Is Here: It Beats Claude Opus 4.6 But Still Trails GPT-5.4

Cursor, the San Francisco AI coding platform of $29.3 billion startup Anisphere, has launched Composer 2, a new in-house coding model that is now available inside its Agentic AI coding environment, and which provides significantly better benchmarks than its previous in-house model.

It is being launched and is also being built composer 2 fastA pricier but faster version, the default experience for users.

Here are the cost details:

Composer 2 Standard: $0.50/$2.50 per 1 million input/output tokens
Composer 2 Fast: $1.50/$7.50 per 1 million input/output tokens

This is a big drop from Cursor’s previous in-house model, Composer 1.5, from February, which cost $3.50 per million input tokens and $17.50 per million output tokens; Composer 2 is about 86% cheaper In both cases.

composer 2 fast is also roughly 57% cheaper than Composer 1.5.

There is also a discount for "cash-read pricing," That is, sending some of the same tokens again in the model, $0.20 per million tokens for Composer 2 and $0.35 per million for Composer 2 Fast, while $0.35 per million for Composer 1.5.

It also matters that this appears to be a Cursor-native release, not a widely distributed standalone model. In the company’s announcement and model documentation, Composer 2 is described as available in Cursor, tuned for Cursor’s agent workflow, and integrated with the product’s tool stack.

The materials provided do not imply separate availability through external model platforms or as general purpose APIs outside the Cursor environment.

Cursor is pitching not just better completions, but also longer-horizon coding

The deeper technical claim in this release is not just that Composer 2 scores higher than Composer 1.5. It is this that Cursor says makes the model better suited for long-horizon agentic coding.

In its blog, Cursor says that the gain in quality comes from its first continuous pretraining run, which gave it a strong foundation for scaled reinforcement learning. From there, the company says it trained Composer 2 on long-horizon coding tasks and that the model can solve problems requiring hundreds of actions.

That framing is important because it addresses one of the biggest unresolved issues in AI coding. Many models are good at discrete code generation. Very few people remain reliable in long workflows that involve reading the repository, deciding what to change, editing multiple files, running commands, interpreting failures, and continuing toward the goal.

The cursor’s documentation reinforces that this is the use case it cares about. It describes Composer 2 as an agentic model with a 200,000-token context window, tuned for tool use, file editing, and terminal operations inside the cursor.

It also notes training techniques like self-summation for long-running tasks. For developers already using Cursor as their main environment, that tight tuning may mean more than a simple leaderboard claim.

Benchmark gains are substantial, even though GPT-5.4 is still ahead on a major chart

Published results of Cursor show clear improvements compared to prior Composer models. The company lists Composer 2 at 61.3 on CursorBench, 61.7 on Terminal-Bench 2.0 and 73.7 on SWE-Bench Multilingual.

This compares to the Composer 1.5 at 44.2, 47.9 and 65.9, and the Composer 1 at 38.0, 40.0 and 56.9.

The release has been more measured than some model launches as Cursor is not claiming universal leadership.

On Terminal-Bench 2.0, which measures how well an AI agent functions in a command-line terminal-style interface, GPT-5.4 is still ahead at 75.1, while Composer 2 has a score of 61.7, Opus is ahead by 4.6 at 58.0, Opus 4.5 is ahead at 52.1 and Composer 1.5 is ahead at 47.9.

This makes cursor pitch more practical and arguably more useful to buyers. The company isn’t saying that the Composer 2 is the best model at everything. That being said, the model has moved to a more competitive quality level while offering more attractive economics and stronger integration with the product developers are already using.

Cursor has also included a performance-versus-cost chart on its CursorBench benchmarking suite that appears to be designed to make a Pareto-style argument for Composer 2.

In that graphic, Composer 2 sits at a higher cost-to-performance point than Composer 1.5 and compares favorably with the higher-cost GPT-5.4 and Opus 4.6 settings shown by the cursor. The company’s message is not only that Composer 2 scores higher than its predecessor, but that it may offer a more efficient cost-to-intelligence tradeoff for everyday coding work inside cursors.

Why the “lock on cursor” point matters to buyers

For readers deciding whether to use Composer 2, the most important question may not simply be benchmark performance. They may want a customized model for Cursor’s own product experience.

That can be a strength. According to the documentation, Composer 2 can access Cursor’s agent tool stack, which includes semantic code search, file and folder search, file read, file editing, shell commands, browser controls, and web access.

If the goal is to accomplish real software tasks rather than provide impressive one-shot answers then that kind of integration may be more valuable than the quality of the raw model.

But it also limits the addressable audience. Teams looking for models that they can broadly deploy across multiple external devices and platforms should recognize that Cursor is presenting Composer 2 as a model for Cursor users, not as a generally available standalone foundation model.

The big picture: the cursor giving an operational argument

The significance of Composer 2 isn’t that cursors have suddenly taken over the top spot on every coding benchmark. it has not. The more important point is that Cursor is making an operational argument: its model is getting better, its price is low enough to encourage widespread use, and its fast level is so responsive that the company is comfortable making it the default despite the higher cost.

This combination may resonate with engineering teams that care less about abstract model reputation and more about whether an assistant can remain useful over long coding sessions without becoming excessively expensive.

Cursor’s comprehensive pricing structure helps ease the competitive pressure surrounding this launch. On its current pricing page, Cursor offers a free hobby tier Pro plan at $20 per month, Pro+ at $60 per monthAnd Ultra at $200 per month For individual users, models from OpenAI, Anthropic and Google offer more usage with higher tiers.

On the business side, Teams costs $40 per user per monthWhile Enterprise is custom-priced and adds pooled usage, centralized billing, usage analytics, privacy controls, SSO, audit logs, and granular admin controls. In other words, Cursor isn’t just charging for access to the coding model. It’s charging for a managed application layer that sits on top of multiple model providers while adding team features, administration, and workflow tooling.

That model is increasingly under pressure as first-party AI companies move deeper into coding. OpenAI and Anthropic are no longer just selling models through third-party products; They are also shipping their own coding interfaces, agents, and evaluation frameworks – like codecs and cloud codes – raising the question of how much room is left for an intermediary platform.

Commenters on

Some of those posts describe frustration with the cursor’s pricing, context loss, or editor-centric experience, while praising cloud code as a more direct and fully agentic way of working. Even if treated cautiously, that kind of social interaction points to the strategic problem Cursor faces: It must prove that its integrated platform, team control, and now its own in-house models add enough value to justify sitting among developers and model makers’ increasingly capable coding products.

This makes Composer 2 strategically important for cursors.

By offering a much cheaper in-house model than Composer 1.5, tuning it tightly to Cursor’s own tool stack, and making the faster version the default, the company is trying to show that it offers more than a wrapper around an external system.

The challenge is that as first-party coding products improve, developers and enterprise buyers may increasingly ask whether they want a separate AI coding platform, or whether model makers’ own tools are sufficient on their own.

<a href

Cursor’s new coding model Composer 2 is here: It beats Claude Opus 4.6 but still trails GPT-5.4

Cursor is pitching not just better completions, but also longer-horizon coding

Benchmark gains are substantial, even though GPT-5.4 is still ahead on a major chart

Why the “lock on cursor” point matters to buyers

The big picture: the cursor giving an operational argument

Like this:

Related

Leave a Comment Cancel reply

Cursor is pitching not just better completions, but also longer-horizon coding

Benchmark gains are substantial, even though GPT-5.4 is still ahead on a major chart

Why the “lock on cursor” point matters to buyers

The big picture: the cursor giving an operational argument

Share this:

Like this:

Related

Leave a Comment Cancel reply