Alibaba's proprietary Qwen3.7-Max can run for 35 hours autonomously and supports external harnesses like Anthropic's Claude Code

ChatGPT Image May 21 2026 07 29 11 PM
AI industry has fully entered "agent era," A paradigm where AI models do more than generate text – they now actively plan, execute, and course-correct complex tasks in days instead of seconds.

Thus, it’s perhaps not surprising to see Chinese e-commerce giant Alibaba’s renowned Kwen team of AI researchers release a model capable of performing autonomous agentic AI tasks over multiple days: That model comes in the form of Kwen3.7-Max, the company reported in a blog post. "~35 hours of continuous autonomous performance" – albeit, in a proprietary format, not an open source format as in prior Quen Team releases.

This is to be expected – this is what many analysts and industry experts feared after the departure of several key leaders from the Quon team earlier this year. But it makes economic sense for Alibaba, at least in the short term: training AI models, especially powerful ones like QW3.7-Max, is expensive, and giving them away essentially for free, as open source models are, doesn’t immediately help recoup any costs.

In that sense, Alibaba is aligning its efforts with US AI giants like OpenAI and Google by only offering the latest and greatest models through paid APIs and subscriptions or paid web plan bundles, and slightly less performing models through open source.

Still, the arrival of Qwen3.7-Max provides more optionality for enterprises and individual users, and more competition for US AI labs – hardly a bad thing for consumers at all budget levels. Still, the fact that the model is only accessible from Chinese-based endpoints means that its appeal may be limited to US and European enterprises seeking to maximize compliance and security when completing government contracts, or even attempting to comply with all relevant state, local and national data sovereignty regulations.

marathon ai era

To understand why Qwen3.7-Max is different from previous models, one needs to look at how it was trained and how it works in practice.

Language models generally degrade when they are forced to maintain the same chain of thought over thousands of conversational turns; They forget instructions, hallucinate, or simply get stuck in logical dilemmas. Qwen3.7-Max was specially designed "Versatile Agent Foundation" able to "long-horizon logic" To overcome this exact obstacle.

The clearest demonstration of this capability is an autonomous engineering work detailed by the Quon team. The model was given access to a separate server equipped with a T-Head ZW-M890 PPU – a hardware architecture that the model had never encountered during its training. Its function was to optimize the focus kernel.

During 35 continuous hours, the Qwen3.7-Max operated completely autonomously. It executed 1,158 separate tool calls, performed 432 kernel evaluations, diagnosed compilation failures, and iteratively improved the code to achieve a 10.0x geometric mean speedup.

By comparison, Chinese competing models like z.ai’s GLM-5.1 and Moonshot’s Kimi K2.6 finished at 7.3x and 5.0x speedups, respectively, often voluntarily ending their sessions when they failed to make progress. However, both are available open source.

This endurance is achieved through what Alibaba calls "environmental scaling". Just as early LLMs got smarter by incorporating more diverse text, QWEN3.7-Max was trained on a vast, scaled range of dynamic agentic environments.

It is able to simulate the one-year lifecycle of a startup "yc-bench" Navigating hundreds of decision-making rounds related to evaluation, personnel management, and contract screening. In this simulation, the model managed to generate $2.08 million in virtual revenue, which almost doubles the performance of the previous generation, Quen3.6-Plus.

Additionally, the model has built-in reward-hacking self-monitoring, autonomously detecting when it attempts to cheat the training environment and adding heuristic rules to correct its own behavior.

A brain for any loft

From a product perspective, Qwen3.7-Max is designed as a cognitive engine for modern software development and enterprise automation.

The model offers a massive 1-million-token context window and 64K maximum output limit, which provides excessive overhead for processing huge codebases or long technical documents.

One of its most compelling features is "Cross-harness normalization". Rather than being hardcoded to work best within a specific proprietary interface, Qwen3.7-Max is built to act as a drop-in intelligence layer for diverse agent frameworks. it Supports Anthropic API protocols natively, Allowing developers Plug it directly into existing tools like Cloud Code or OpenCloud.

Benchmark data provided by Alibaba indicates that this generalized approach has paid massive dividends.

On Apex Math Reasoning BenchmarkQueue3.7-max scored 44.5, which beats Cloud Opus-4.6 max’s score of 34.5. And DeepSeek V4-Pro Max’s 38.3. Also posted Leading scores on Humanities Last Exam (41.4) and the realistic coding agent benchmark MCP-Atlas (76.4).

This translates into tangible utility for end users. Through open source Model Context Protocol (MCP) integration, the model can serve as an autonomous office assistant, capable of reading university formatting specifications and automatically reformatting a messy Word document via command-line tools without human intervention.

There is a distinct cost to running this level of intelligence. Developers accessing the API through Alibaba Cloud Model Studio will pay $2.50 per 1 million input tokens and $7.50 per 1 million output tokens. The platform also includes explicit cache build and read pricing, as well as a $10 fee per 1,000 calls for integrated web searches, although the code interpreter tools remain free for a limited time.

Qwen3.7-Max occupies a strategic middle path in the current API economy. While it demands a notable premium over its aggressively priced domestic rivals – costing almost twice as much as the DeepSeek V4 Pro ($5.22) and Z.ai’s GLM-5.1 ($5.80) – it undercuts the Western range stalwarts enough that it regularly matches the benchmarks.

For context, running heavy agentive workflows through OpenAI’s GPT-5.4 or Anthropic’s Cloud Opus 4.7 will run developers $17.50 and $30.00 per million tokens, respectively. See VentureBeat’s pricing chart below:

VentureBeat Frontier AI Model API Pricing Snapshot

Sample

input

Production

total cost

Source

MIMO-V2.5 Flash

$0.10

$0.30

$0.40

xiaomi mimo

Minimax M2.7

$0.30

$1.20

$1.50

minimal maximum

Gemini 3.1 Flash-Lite

$0.25

$1.50

$1.75

Google

MIMO-V2.5

$0.40

$2.00

$2.40

xiaomi mimo

KM-K2.6

$0.95

$4.00

$4.95

moonshot/km

GLM-5

$1.00

$3.20

$4.20

Z.ai

Grok 4.3 (following reference)

$1.25

$2.50

$3.75

xai

DeepSeek V4 Pro

$1.74

$3.48

$5.22

deepseek

GLM-5.1

$1.40

$4.40

$5.80

Z.ai

cloud haiku 4.5

$1.00

$5.00

$6.00

anthropic

Grok 4.3 (High Reference)

$2.50

$5.00

$7.50

xai

Quen3.7-Max

$2.50

$7.50

$10.00

alibaba cloud

gemini 3.5 flash

$1.50

$9.00

$10.50

Google

Gemini 3.1 Pro Preview (≤200K)

$2.00

$12.00

$14.00

Google

GPT-5.4

$2.50

$15.00

$17.50

OpenAI

Gemini 3.1 Pro Preview (>200K)

$4.00

$18.00

$22.00

Google

cloud opus 4.7

$5.00

$25.00

$30.00

anthropic

GPT-5.5

$5.00

$30.00

$35.00

OpenAI

By placing the Qwen3.7-Max just below Google’s Gemini 3.5 flash ($10.50) but well above the budget-level model, Alibaba is signaling that this is not a commodity release; It’s a leading reasoning engine designed to power enterprise workloads away from Silicon Valley’s most expensive offerings.

Licensing remains proprietary

Despite all its technical brilliance, the most controversial aspect of Qwen3.7-Max is how it is distributed. This is how Quen is billing the release "proprietary model". It’s completely API only.

Historically, Alibaba’s Quan has been a hero to the open-source and local LLM communities. Previous iterations such as Qween 2.5 and Qween 3.6 had their weights publicly released. Open Weight allows developers, researchers, and enterprises to download the model, run it on their own hardware, and fine-tune it for highly specific or data-sensitive use cases without sending proprietary information to a third-party server.

By locking Qwen3.7-Max behind an API, Alibaba is moving toward the standard commercial playbook used by OpenAI (with GPT-4) and Anthropic (with the cloud). For enterprise users, this means using Qwen3.7-Max requires trusting Alibaba Cloud with their data streams and relying solely on Internet connectivity to run their agentic workflows. For the open-source community, this means losing access to what is currently one of the most capable models on the planet.

Community reactions divided between astonishment and disappointment

The reaction from the developer community has been intense, with a mixture of deep respect for the engineering achievement and disappointment at the licensing model.

Prominent AI commentator Sudo Su (@sudoingX) captured the prevailing sentiment on "Quen is unrealistic," He has written. "They just dropped 3.7 max and it’s beating opus 4.6 max on most of the benchmarks they run".

The technical metrics, especially the endurance of the model, have stunned many in the field. "The highest math number, 44.5 versus Opus 34.5, is not a small difference," Sudo su noted. "The 35 hours straight on kernel optimization work with 1000+ tool calls is the part I keep re-reading. This is actually an Agent Age event, no slide".

The speed of Alibaba’s iteration is also attracting attention. With Queue 3.6 released just last month, the jump to 3.7-max highlights the continued development momentum. As Sudo Su observed, "Nobody’s moving like this".

Yet, this appreciation is largely overshadowed by changes in the closed ecosystem. The reduction in model weight is seen as a blow to the localized AI movement, which relies on cutting-edge open models to push the boundaries of what can be done on consumer hardware or private enterprise clusters.

"One thing though, please open source this too," Sudo Su pleaded in his post. "3.6 Intensification improved the entire local LLM ecosystem. Max Tier going API will only close the door we are keeping open. finally give us weight".

Quen3.7-Max proves that the autonomous agent era is no longer a theoretical projection; It is a current reality that humans are able to carry out complex engineering tasks while they sleep. The only question now is whether this new frontier of AI will be a democratized resource you can download to your laptop, or an intelligence utility rented from the cloud. For now, with Qwen3.7-Max, it’s unquestionably the latter.



<a href

Leave a Comment