
Today, Chinese AI startup Z.ai (formerly Zhipu AI) announced the immediate release of GLM-5.2, a 753 billion parameter open-weight Large Language Model (LLM) specifically engineered to dominate. "long-horizon" Autonomous coding and engineering work.
Available immediately on Hugging Face, the Z.ai API, and over 20 third-party coding environments, this model boasts a highly stable 1-million-token reference window, along with enterprise subscription tiers starting at just $12.60 per month.
In excellent news for cost- and security-conscious businesses, z.ai has released the core weights of GLM-5.2 under an unrestricted MIT open-source license, allowing enterprises to freely download the model from Hugging Face, customize or fine-tune it to their liking, and run it locally or via virtual machines for only their computation and power costs.
It’s an increasingly attractive option for enterprises, as cutting-edge US proprietary models face an uncertain and potentially disrupted regulatory future, after the Trump administration’s export control directive last week barred foreign nationals from using Anthropic’s new Cloud Fable 5 models (in response to which the company took the models offline entirely). All user).
For enterprise technology decision makers, z.ai’s GLM-5.2 provides a highly capable path to hosting frontier-level AI locally, completely bypassing geographic fences and commercial limitations.
IndexShare reuses one indexer for each of the four sparse attention layers, reducing the computation required.
Under the hood, GLM-5.2 operates with 753 billion parameters and introduces a major architectural optimization called "indexshare".
In standard large language models, recomputing attention mechanisms in long documents is computationally prohibitive. IndexShare solves this by reusing the same indexer in each of the four sparse attention layers.
At a maximum 1-million-token reference length, this single innovation reduces per-token computation FLOPs by a massive 2.9 times.
The model also includes an advanced multi-token prediction (MTP) layer for speculative decoding, which increases the accepted token length during inference by 20%.
Additionally, Z.ai has implemented flexible, selectable "ways of thinking". Users can toggle the model’s reasoning effort between "max," Designed to push the limits of logical problem-solving, or "High," Which strikes a careful balance between high-level performance and latency-sensitive token efficiency.
State-of-the-art standards for an open model, and matching, even surpassing, proprietary leaders in some categories
On industry-standard third-party benchmark tests, GLM-5.2 performs above most open source flagship models, even DeepSeek v4, and scores close to or above its close-weighted rivals, OpenAI’s GPT-5.5 and Anthropic’s Cloud Opus 4.8.
The model particularly shines in agentic tool use and long-horizon software engineering tasks:
- SWE-Bench Pro: GLM-5.2 scored 62.1, decisively outperforming GPT-5.5 (58.6) and its predecessor GLM-5.1 (58.4).
-
FrontierSWE (Dominance): Designed to test long-horizon task completion, GLM-5.2 hit 74.4%, beating GPT-5.5 (72.6%) and finishing in a near tie with Cloud Opus 4.8 (75.1%).
-
MCP-Atlas: On this device-usage evaluation, GLM-5.2 achieved 77.0, beating GPT-5.5 (75.3) and slightly underperforming Cloud Opus 4.8 (77.8).
-
The Final Test of Humanity (Including Equipment): When equipped with external tools, GLM-5.2 reached a score of 54.7, ahead of GPT-5.5 (52.2) and well behind Cloud Opus 4.8 (57.9).
-
PostTrainBench and SWE-Marathon: On extended, multi-hour engineering workloads, GLM-5.2 consistently topped GPT-5.5, scoring 34.3% versus GPT-5.5’s 25.0% on PostTrainBench and 13.0% versus GPT-5.5’s 12.0% on SWE-Marathon.
While GLM-5.2 lags slightly behind Cloud Opus 4.8 and GPT-5.5 on raw Terminal-Bench 2.1 scores (81.0 vs. 85.0 and 84.0, respectively), it is well ahead of Google’s Gemini 3.1 Pro (74.0).
Beyond traditional coding metrics, GLM-5.2 achieved an impressive first place on the crowdsourced design task benchmark Design Arena, even surpassing the aforementioned state-of-the-art Cloud Fable 5 with an ELO score of 1360.
In addition, the impact of Z.ai’s new selection "ways of thinking" Clearly visible in the data: under "maximum" At the effort level, GLM-5.2 promotes peak intelligence, but uses approximately 85k output tokens per task. being switched on "High" The effort setting sacrifices only a few points in performance while effectively halving the required token output, providing an important optimization lever for latency-sensitive applications.
Available via coding plans and API
To operationalize the model, Z.ai launched the GLM coding scheme, which is aimed at developer workflows rather than simple chat interfaces.
This plan provides out-of-the-box support for third-party US and global agentive coding harnesses and tools including Cloud Code, OpenClave, Kline, Kilo Code, Crush, and Factory. Coding plan pricing tiers (when billed annually) are highly competitive:
- light: $12.60 per month ($151.20 per year starting in year 2), ready for light iteration on small repositories.
-
Pro: $50.40 per month for daily development on a medium-sized repository, offers 5x the usage allowance of the Lite plan.
-
Maximum: $112.00 per month for heavy workloads, offering 20x light usage and dedicated resources during peak hours.
For enterprise developers integrating the raw models into their own applications, Z.ai’s API pricing significantly undercuts its Western competitors while matching the exact rates of the previous GLM-5.1 generation.
GLM-5.2 API access The price is $1.40 per million input tokens and $4.40 per million output tokensWhich makes it a mid-priced model globally, but about
VentureBeat Frontier AI Model API Pricing Snapshot
Sorted by total cost (input + output) from least to most expensive. The price shown is standard pay-as-you-go pricing per 1 million tokens.
| Sample |
input |
Production |
total cost |
Source |
|
MIMO-V2.5 Flash |
$0.10 |
$0.30 |
$0.40 |
xiaomi mimo |
|
deepseek-v4-flash |
$0.14 |
$0.28 |
$0.42 |
deepseek |
|
deepseek-v4-pro |
$0.435 |
$0.87 |
$1.305 |
deepseek |
|
minimax-m3 |
$0.30 |
$1.20 |
$1.50 |
minimal maximum |
|
Gemini 3.1 Flash-Lite |
$0.25 |
$1.50 |
$1.75 |
|
|
Qwen3.7-plus |
$0.40 |
$1.60 |
$2.00 |
alibaba cloud |
|
MIMO-V2.5 |
$0.40 |
$2.00 |
$2.40 |
xiaomi mimo |
|
Grok 4.3 (following reference) |
$1.25 |
$2.50 |
$3.75 |
xai |
|
MiMo-V2.5 Pro(≤256K) |
$1.00 |
$3.00 |
$4.00 |
xiaomi mimo |
|
KM-K2.6 |
$0.95 |
$4.00 |
$4.95 |
moonshot/km |
|
GLM-5.2 |
$1.40 |
$4.40 |
$5.80 |
Z.ai |
|
Grok 4.3 (High Reference) |
$2.50 |
$5.00 |
$7.50 |
xai |
|
MiMo-V2.5 Pro (>256K) |
$2.00 |
$6.00 |
$8.00 |
xiaomi mimo |
|
Quen3.7-Max |
$2.50 |
$7.50 |
$10.00 |
alibaba cloud |
|
gemini 3.5 flash |
$1.50 |
$9.00 |
$10.50 |
|
|
Gemini 3.1 Pro Preview (≤200K) |
$2.00 |
$12.00 |
$14.00 |
|
|
GPT-5.4 |
$2.50 |
$15.00 |
$17.50 |
OpenAI |
|
Gemini 3.1 Pro Preview (>200K) |
$4.00 |
$18.00 |
$22.00 |
|
|
cloud opus 4.8 |
$5.00 |
$25.00 |
$30.00 |
anthropic |
|
GPT-5.5 |
$5.00 |
$30.00 |
$35.00 |
OpenAI |
|
Cloud Mythos 5 |
$10.00 |
$50.00 |
$60.00 |
anthropic |
To further optimize costs for long-context workloads, Z.ai offers a cached input rate of only $0.26 per million tokens, along with a limited-time offer for free cached input storage.
The clear distinction between open-source innovators and proprietary Western laboratories has not gone unnoticed by the developer community.
On X, prolific AI observer Lisan Al Gaib (@scaling01) argued this "Frontier Labs is flat out scamming you on API pricing".
The post notes that while large open models like the 744-billion-parameter GLM-5.2 charge $4.40 per million output tokens and DeepSeq-v4-Pro (1.6 trillion parameters) charges only $0.87, proprietary models demand hefty premiums: Anthropic’s Sonnet 4.6 and Opus 4.8 charge $15.00 and $25.00, respectively, while OpenAI’s GPT-5.5 charges. The cost for output is $30.00.
Highlighting how open-model developers are working profitably without relying on the latest "fancy blackwell chips," The commenter suggested that major proprietary laboratories are "Probably at 90%+ margin at this point".
The beauty of the unmodified MIT license for enterprise use
The most disruptive aspect of the GLM-5.2 release is its licensing. Z.ai released the weight model under the MIT open-source license, establishing it as a "pure open" System.
It is clearly written in the company’s technical documentation that this license guarantees "no regional limits" and allows "technological access without borders".
For enterprise technology leaders, the MIT license means that software can be used, modified, and commercialized without paying royalties or following restrictive regulations. "acceptable use" General governance policies for dual-use licenses.
This allows engineering teams to host frontier-level AI on their own sovereign infrastructure, completely eliminating vendor lock-in.
Warm welcome among AI developers and tool makers
Developer response to the release has been immediate and overwhelmingly positive.
The team behind Kilo Code confirmed the integration on day one, posting on X: "Runs in GLM-5.2 kg code on day one. Both the 1M reference window and maximum effort mode are live. Point your configuration at it and go!".
The open-source coding environment Kline IDE echoed this sentiment on X with economic benefits in mind: "GLM-5.2 is the first open-weight model to exceed 80% on the terminal bench, and outperforms all other open models available. It even outperforms the Gemini, making it a flagship level model for a fraction of the cost. Open Weights is back. This model is a game changer. Available online now!".
Similarly, rival open source coding desktop agent Agent AI also tested the model’s new capabilities on complex agentic workflows, with a focus on X: "Presented a real long-term task: Research 30 companies in 6 areas of the AI infrastructure stack, structure it in JSON, then create an interactive HTML report… Where 5.2 leads: -> Plans…".
<a href