z.ai's open source GLM-5 achieves record low hallucination rate and leverages new RL 'slime' technique

SZWjerB5QqmiUg LswgV3
Chinese AI startup Zupai aka z.ai is back this week with a fascinating new frontier large language model: GLM-5.

The latest in Z.ai’s ongoing and increasingly impressive GLM series, it retains an open source MIT license – perfect for enterprise deployment – ​​and, in one of many notable achievements, achieves a record-low hallucination rate on the independent Artificial Intelligence Intelligence Index v4.0.

With a score of -1 on the AA-Omniscience Index – representing a massive improvement of 35 points over its predecessor – GLM-5 now leads the entire AI industry, including US rivals like Google, OpenAI and Anthropic, in knowledge reliability by knowing when to avoid rather than fabricate information.

Beyond its reasoning capabilities, GLM-5 is designed for high-utility knowledge tasks. It has native features "agent mode" Capabilities that allow it to convert raw signals or source material directly into professional office documents, including ready-to-use .docx, .pdfAnd .xlsx files.

Whether preparing detailed financial reports, high school sponsorship proposals, or complex spreadsheets, GLM-5 delivers results in real-world formats that integrate directly into enterprise workflows.

It is also priced at approximately $0.80 per million input tokens and $2.56 per million output tokens, approximately 6x cheaper than proprietary competitors like Cloud Opus 4.6, making cutting-edge agent engineering more cost-effective than ever. Here’s what else enterprise decision makers should know about the model and its training.

Technology: Scaling for Agentic Efficiency

At the core of GLM-5 is a huge jump in the raw parameters. The model scales from GLM-4.5’s 355B parameters to a staggering 744B parameters, with 40B active per token in its Mixture-of-Experts (MoE) architecture. This increase is supported by an increase of 28.5T tokens in the pre-training data.

To address training inefficiencies of this magnitude, ZAI developed "mud," A novel asynchronous reinforcement learning (RL) infrastructure.

Traditional RL often suffers "long tail" bottlenecks; Slime breaks this lockstep by allowing trajectories to be generated independently, enabling the subtle iterations necessary for complex agentic behavior.

By integrating system-level optimizations such as active partial rollout (APRIL), Slime addresses the generation bottlenecks that typically consume more than 90% of RL training time, significantly speeding up iteration cycles for complex agentic tasks.

The framework’s design focuses on a tripartite modular system: a high-performance training module powered by Megatron-LM, a rollout module using SGLang and custom routers for high-throughput data generation, and a centralized data buffer that manages rapid initialization and rollout storage.

By enabling adaptive verifiable environments and multi-turn compilation feedback loops, Slime provides the strong, high-throughput foundation needed to transition AI from simple chat interactions toward rigorous, long-horizon systems engineering.

To keep deployment manageable, GLM-5 integrates DeepSeek Sparse Attention (DSA), preserving 200K reference capacity while significantly reducing costs.

end-to-end knowledge work

ZAI is preparing GLM-5 as a "Office" Tools for the AGI era. Whereas previous models focused on snippets, GLM-5 is designed to deliver ready-to-use documents.

It can automatically convert documents ranging from financial reports to sponsorship proposals into formatted .docx, .pdf, and .xlsx files.

In practice, this means that the model can decompose high-level goals into actionable subtasks and perform "agent engineering," Where humans define the quality gateways while AI handles the execution.

high performance

According to Artificial Intelligence, GLM-5’s benchmarks make it the world’s new most powerful open source model, surpassing Chinese rival Moonshot’s new KM K2.5 released two weeks ago, suggesting that Chinese AI companies are nearly caught up with better-resourced proprietary Western rivals.

According to z.ai’s own materials shared today, GLM-5 is close to the state-of-the-art on several key benchmarks:

SWE-Bench Verified: The GLM-5 achieved a score of 77.8, outperforming the Gemini 3 Pro (76.2) and coming close to the Cloud Opus 4.6 (80.9).

Vending Bench 2: In the simulation of running a business, GLM-5 ranked #1 among open-source models with a final balance of $4,432.12.

Beyond performance, the GLM-5 is aggressively cutting into the market. Live on OpenRouter as of February 11, 2026, it is priced at approximately $0.80-$1.00 per million input tokens and $2.56-$3.20 per million output tokens. It falls in the mid-range compared to other leading LLMs, but based on its top-tier benchmarking performance, one can call it "steal."

Sample

Input (per 1M token)

Output(per 1M token)

Total Cost (1M in + 1M out)

Source

quen 3 turbo

$0.05

$0.20

$0.25

alibaba cloud

grok 4.1 fast (logic)

$0.20

$0.50

$0.70

xai

grok 4.1 fast (non-argument)

$0.20

$0.50

$0.70

xai

DeepSeek-Chat (V3.2-Exp)

$0.28

$0.42

$0.70

deepseek

DeepSeek-Reasoner (V3.2-Exp)

$0.28

$0.42

$0.70

deepseek

gemini 3 flash preview

$0.50

$3.00

$3.50

Google

km-k2.5

$0.60

$3.00

$3.60

moon

GLM-5

$1.00

$3.20

$4.20

Z.ai

Ernie 5.0

$0.85

$3.40

$4.25

qianfan

cloud haiku 4.5

$1.00

$5.00

$6.00

anthropic

quen3-max (2026-01-23)

$1.20

$6.00

$7.20

alibaba cloud

Gemini 3 Pro (≤200K)

$2.00

$12.00

$14.00

Google

GPT-5.2

$1.75

$14.00

$15.75

OpenAI

cloud sonnet 4.5

$3.00

$15.00

$18.00

anthropic

Gemini 3 Pro (>200K)

$4.00

$18.00

$22.00

Google

cloud opus 4.6

$5.00

$25.00

$30.00

anthropic

GPT-5.2 Pro

$21.00

$168.00

$189.00

OpenAI

This is about 6 times cheaper on input and about 10 times cheaper on output than Cloud Opus 4.6 ($5/$25). This release confirms rumors that Zipu AI was behind it "pony alpha," A stealth model that previously crushed coding benchmarks on OpenRouter.

However, despite high benchmarks and low cost, not all early users are enthusiastic about the model, its high performance does not tell the whole story.

Lucas Peterson, co-founder of Andon Labs, a security-focused autonomous AI protocol startup, commented on X: "Hours of reading reveal the GLM-5: an incredibly effective model, but much less situationally aware. Achieves goals through aggressive tactics but does not reason about one’s position or experience leverage. This is scary. This is how you get Paperclip Maximizer."

"Paperclip Maximizer" Refers to a hypothetical situation described by Oxford philosopher Nick Bostrom in 2003, in which an AI or other autonomous creation accidentally leads to an apocalyptic scenario or human extinction by following a benign instruction – such as maximizing the number of paperclips produced – to extremes, redirecting all resources necessary for human (or other life) survival or making life impossible through its commitment to serving an otherwise seemingly benign purpose.

Should your enterprise adopt GLM-5?

Enterprises seeking to avoid vendor lock-in will find GLM-5’s MIT license and open-vault availability a significant strategic advantage. Unlike closed-source competitors, which keep intelligence behind proprietary walls, GLM-5 allows organizations to host their own border-level intelligence.

Adoption is not without friction. The sheer scale of the GLM-5—744B parameters requires a massive hardware floor that may be out of reach for smaller companies without significant cloud or on-premises GPU clusters.

Security leaders must weigh the geopolitical implications of the China-based laboratory’s dominant model, especially in regulated industries where data residence and provenance are rigorously audited.

Furthermore, the shift toward more autonomous AI agents introduces new governance risks. As models move forward "to talk" To "Work," They start working autonomously on all apps and files. Without strong agent-specific permissions and human-in-the-loop quality gates established by enterprise data leaders, the risk of autonomous error increases exponentially.

Ultimately, the GLM-5 is a "Purchase" For organizations that have moved beyond simple co-pilots and are ready to create truly autonomous offices.

This is for engineers who need or want to refactor a legacy backend. "self healing" The pipeline that doesn’t sleep.

While Western laboratories continue to adapt to "Thinking" And the depth of the logic, Zai is optimizing for performance and scale.

Enterprises that adopt GLM-5 today are not just purchasing the cheaper model; They are betting on a future where the most valuable AI is the one that can complete a project without being asked twice.



<a href

Leave a Comment