
Alibaba this week released Qwen3.7-Plus, the latest AI Large Language Model (LLM) in the globally beloved and rapidly expanding Qwen family, with greater multimodal capabilities and 60% lower cost than the text-only Qwen3.7-Max model released just a few weeks ago.
However, like its predecessor Qwen3.7-Plus is only available under a "Closed" Commercial licenses through proprietary application programming interfaces (APIs) and QuenChat.
This marks a major departure from Quoin’s strategy so far, which focused primarily on issuing powerful near cutting-edge open source models. Those enterprises and users who relied on the open source Qween model – including US giants like Airbnb – will undoubtedly be disappointed to see that Alibaba is closing in on its new releases.
Still, the model is worth a look because of its low cost and high performance on multimodal tasks like creating enterprise-grade visuals or analyzing video, imagery, and screenshots, which Qwen3.7-Max can’t do (it’s text-only). It’s one of the cheaper powerful AI models available now, coming in just above the limited-time discount price of Chinese rival’s new MiniMax-M3.
VentureBeat Frontier AI Model API Pricing Snapshot
| Sample |
input |
Production |
total cost |
Source |
|
MIMO-V2.5 Flash |
$0.10 |
$0.30 |
$0.40 |
xiaomi mimo |
|
deepseek-v4-flash |
$0.14 |
$0.28 |
$0.42 |
deepseek |
|
deepseek-v4-pro |
$0.435 |
$0.87 |
$1.305 |
deepseek |
|
minimax-m3 |
$0.30 |
$1.20 |
$1.50 |
minimal maximum |
|
Qwen3.7-plus |
$0.40 |
$1.60 |
$2.00 |
alibaba cloud |
|
Gemini 3.1 Flash-Lite |
$0.25 |
$1.50 |
$1.75 |
|
|
MIMO-V2.5 |
$0.40 |
$2.00 |
$2.40 |
xiaomi mimo |
|
Grok 4.3 following reference |
$1.25 |
$2.50 |
$3.75 |
xai |
|
GLM-5 |
$1.00 |
$3.20 |
$4.20 |
Z.ai |
|
KM-K2.6 |
$0.95 |
$4.00 |
$4.95 |
moonshot/km |
|
GLM-5.1 |
$1.40 |
$4.40 |
$5.80 |
Z.ai |
|
grok 4.3 high reference |
$2.50 |
$5.00 |
$7.50 |
xai |
|
Quen3.7-Max |
$2.50 |
$7.50 |
$10.00 |
alibaba cloud |
|
gemini 3.5 flash |
$1.50 |
$9.00 |
$10.50 |
|
|
Gemini 3.1 Pro Preview ≤200K |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.4 |
$2.50 |
$15.00 |
$17.50 |
OpenAI |
|
Gemini 3.1 Pro Preview >200K |
$4.00 |
$18.00 |
$22.00 |
|
|
cloud opus 4.8 |
$5.00 |
$25.00 |
$30.00 |
anthropic |
|
GPT-5.5 |
$5.00 |
$30.00 |
$35.00 |
OpenAI |
Maintaining continuity during complex device execution loops
For technical decision makers deploying autonomous agents, the primary hurdle has rarely been initial model intelligence. Instead, it is phase decay– The tendency of an agent framework to lose its analytical trajectory on multi-step, long-horizon tasks.
Qwen3.7-Plus addresses this architectural vulnerability through a combined approach of context management and logic state protection.
The model comes with a 1-million token reference window And allocates 256K tokens specifically for internal chain-idea processing. To contextualize this capability, imagine an automated cloud migration agent: it can ingest entire codebases, map dependencies, and spend thousands of tokens to silently evaluate edge cases before executing a single line of Bash script.
Importantly, the API exposes a parameter called ‘preserve_thinking.’ In Alibaba’s ecosystem, the capability serves as a standardized architectural bridge rather than a tiered perk. Alibaba introduced this feature during the previous Qwen 3.6 generation, integrating it into both the open-weight Qwen3.6-27B and the proprietary Max model.
At its core, parameters operate at the API and template level to maintain internal <think> Constantly blocks conversational turns.
This structural continuity solves a significant hurdle for developers engineering long-horizon functions. By keeping these internal logic loops intact, the feature prevents the model from dropping its context or unnecessarily recomputing its cached history in the middle of an operation.
When a model performs complex, multi-step agentic coding assignments, this retention allows the system to hold onto its original train of thought without losing the plot or forgetting the underlying logic of its previous actions.
Alibaba is not alone in recognizing this technical need, as the underlying concept now determines the architecture of almost all major artificial intelligence laboratories.
Anthropic deploys this exact capability under the alias "extended thinking" For its advanced models, including its latest Cloud Opus 4.8. This framework requires developers to feed unmodified thinking blocks directly into the API at subsequent turns to maintain an unbroken chain of logic.
OpenAI tackles the same challenge through an encrypted reasoning pass-back mechanism for models like GPT-5.5. Within the OpenAI ecosystem, developers must return specific logic items generated with previous function calls, ensuring that the model clearly remembers the logic behind its tool execution.
At the end, preserve_thinking simply represents Alibaba’s terminology for what has increasingly become unquestioned table stakes for modern multi-turn reasoning.
Benchmarks show a competitive, yet cutting-edge model
On raw efficiency metrics, this deep-thinking architecture translates to structural gains in multimodal and agentic benchmarks. However, it still lags behind many leading and previous generations of US-owned models, such as Anthropic’s Cloud Opus 4.6 and OpenAI’s GPT-5.4.
But Terminal Bench 2.0-TerminusWhich measures the ability of a model to safely and iteratively run real terminal-level code, Qwen3.7-Plus scored 70.3DeepSeek-v4-Pro outperforms Max (67.9) and Gemini-3.1 Pro (63.5).
On computer vision benchmarks that demand localized interface understanding, e.g. ScreenSpot Promodel hit 79.0Far outpacing old industry standouts like GPT-5.4 (xhigh) at 67.4 and cloud-opus-4.6 at 49.5. Agent evaluation metrics (selected benchmarks)
What should enterprises consider Qwen3.7-Plus?
For an enterprise architect, the main question when analyzing Qwen3.7-Plus is obvious: What does it replace in our current technology stack?
The model is designed as a direct replacement for premier frontier models (such as GPT-5-tier or cloud-max-tier models) within high-frequency developer workflow, robotic process automation (RPA), and data engineering pipelines.
Instead of deploying an expensive, general-purpose flagship model to handle repetitive system operations, technical teams can route these tasks to the Qwen3.7-Plus. It handles visual interface interpretation, command execution, and code generation simultaneously.
Alibaba has structured its API delivery to align with existing open-source and proprietary enterprise frameworks. Endpoints are fully OpenAI-compatible, meaning changing existing dependencies requires minimal infrastructure adjustments. For groups leveraging the Autonomous Terminal framework, integration is natively supported across multiple environments.
Engineers can run Qwen3.7-Plus directly through their local terminal setup by changing the base environment targets.
From a pure cost perspective, running an agent framework that constantly references large-scale code repositories or visual layout history can quickly become cost-prohibitive.
Alibaba solves this by exposing granular caching price points.
Standard input processing sits at $0.40 per million tokens, but if the agent is reading from an explicitly created cache (for example, a huge base repository or standard enterprise UI kit that remains static over hundreds of automated loops), the cost drops sharply to $0.04 per 1M tokens for subsequent reads.
This level makes high-frequency, multi-turn agent iterations economically practical at enterprise scale.
No one raises open source licenses or compliance questions for open source enterprises
When evaluating any model in the Kween ecosystem, the primary concern for legal and security teams is the licensing framework and operational limits of the data pipeline.
While previous iterations of the Qween family gained significant enterprise traction through fully open-source weighted availability under Apache 2.0 or customized open-use licenses, Qween 3.7-Plus is distributed strictly as a managed, commercial cloud API through Alibaba Cloud Model Studio. For enterprise risk management, this distinction has specific implications:
- no local load deployment:Organizations cannot download, sandbox, or locally host Qwen3.7-Plus weights within their fully air-gapped internal data centers. All data validation, visual processing, and execution calls must go through Alibaba Cloud’s international endpoints (for example, the Singapore example highlighted in the developer documentation).
-
Compliance and sovereignty: Because the model requires cloud-based inference, companies operating under strict sovereign data limitations (such as healthcare entities subject to local HIPAA/GDPR constraints or defense contractors) must explicitly evaluate whether external API routing complies with their specific data-residency obligations.
-
managed risk mitigation: In contrast, a managed API architecture removes the internal infrastructure burden of provisioning, optimizing, and maintaining a multi-GPU cluster (such as dedicated Nvidia H100 arrays) just to host an internal agent network.
Nevertheless, Qwen3.7-Plus provides high intelligence in all modalities at low cost
The initial reception from developer communities and tech venture capital highlights the changing economics of agent deployment.
Leading industry voice and Web3 venture capitalist @boxmining highlights the strategic cost benefits, saying:
"Being 40% cheaper than the Queue 3.7 Plus Max changes the conversation. If the output is close enough for most coding and strong enough for visual workflows, do you really need Max every day or only for heavy terminal-only jobs?"
This perspective aligns with the current trend to optimize enterprise operating budgets: moving away from raw, uncontrolled calculations toward targeted task automation. Also, specialized researchers within the ecosystem point out that this is not just incremental optimization of text generation.
Dunjie Lu, research intern at Alibaba Quan, commented:
"It shows clear advantages over Qwen3.6-Plus in computer-use capabilities, with strong generalization beyond typical desktop tasks into professional workflows such as data engineering and scientific research."
Ultimately, for enterprise buyers deciding on their next infrastructure roadmap, Qwen3.7-Plus presents a practical option. If your organization’s primary objective is to create flexible, visualization-enabled autonomous software loops that interact directly with developer environments and cloud consoles – without expending your estimation budget – model execution provides a compelling reason to shift away from more expensive Frontier options.
<a href