Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more

YPf93J54wCLeSJI7yvPQK

Just hours after OpenAI updated its flagship foundation model GPT-5 to GPT-5.1, promising lower token usage overall and a more pleasant personality with more preset options, Chinese search giant Baidu unveiled its next-generation foundation model, ERNIE 5.0, with a suite of AI product upgrades and strategic international expansion.

The goal: to position itself as a global contender in the increasingly competitive enterprise AI market.

Announced at the company’s Baidu World 2025 event, ERNIE 5.0 is a proprietary, fundamentally omni-modal model designed to jointly process and generate content across text, images, audio and video.

Unlike Baidu’s recently released ERNIE-4.5-VL-28B-A3B-Thinking, which is enterprise-friendly and open source under the permissive Apache 2.0 license, ERNIE 5.0 is a proprietary model and only available through Baidu’s ERNIE bot website (I needed to manually select it from the model picker dropdown) and the Qianfan Cloud Platform application programming interface (API) for enterprise customers. Is.

Along with the model launch, Baidu introduced major updates to its digital human platform, no-code tools, and general-purpose AI agents – all aimed at expanding its AI footprint beyond China.

The company also introduced ERNIE 5.0 Preview 1022, a version optimized for text-intensive tasks along with the general preview model, striking a balance across all modalities.

Baidu emphasized that ERNIE 5.0 represents a change in the way it deploys intelligence at scale, with CEO Robin Li saying: “When you internalize AI, it becomes a core capability and transforms intelligence from a cost to a source of productivity.”

Where ERNIE 5.0 beats GPT-5 and Gemini 2.5 Pro

ERNIE 5.0 benchmark results show that Baidu has achieved parity or near-parity with the top Western foundation models across a broad spectrum of tasks.

In public benchmark slides shared during the Baidu World 2025 event, the ERNIE 5.0 preview outperformed or matched OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro. Multimodal Reasoning, Document Understanding, and Image-Based QAwhile also Demonstrated strong language modeling and code execution capabilities.

The company emphasized its ability to handle joint inputs and outputs across modalities, rather than relying on post-hoc modality fusion, which it framed as a technology differentiator.

On visual tasks, ERNIE 5.0 achieved leading scores on OCRBench, DocVQA and ChartQA, three benchmarks that test document recognition, comprehension and structured data reasoning.

Baidu claims the model outperformed both GPT-5-High and Gemini 2.5 Pro on these document and chart-based benchmarks, areas it says are important for enterprise applications such as automated document processing and financial analysis.

In image generation, according to Baidu’s internal GenEval-based evaluation, ERNIE 5.0 matches or surpasses Google’s Veo3 in all categories, including semantic alignment and image quality. Baidu claimed that the model’s multimodal integration allows it to generate and interpret visual content with greater contextual awareness than models that rely on model-specific encoders.

For audio and speech tasks, ERNIE 5.0 demonstrated competitive results on the MM-AU and TUT2017 audio understanding benchmarks, as well as answering questions from spoken language input. Its audio performance, although not as heavily emphasized as vision or text, suggests a broader capacity footprint intended to support full-spectrum multimodal applications.

In language tasks, the model showed strong results on following instructions, factual question answering, and mathematical reasoning—key areas that define the enterprise utility of large language models.

The Preview 1022 version of ERNIE 5.0, designed for text display, showed even stronger language-specific results in early developer access. While Baidu does not claim blanket superiority in general language reasoning, its internal evaluation shows that ERNIE 5.0 Preview narrows the gap with 1022 top-tier English-language models and outperforms them in Chinese-language performance.

Although Baidu did not publicly release full benchmark details or raw scores, its performance status suggests a deliberate effort to frame ERNIE 5.0 not as a typical multimodel system but as a leading model competitive with the largest closed models in general purpose logic.

Where Baidu claims to have a clear edge is in structured document understanding, visual chart logic, and integration of multiple modalities into a single, native modeling architecture.Independent verification of these results is pending, but the breadth of claimed capabilities establishes ERNIE 5,0 as a serious option in the multimodal foundation model scenario,

enterprise pricing strategy

ERNIE is located at 5.0 premium end Baidu’s model pricing structure. The company has released specific pricing for API use on its Qianfan platform, aligning costs with other top-tier offerings from Chinese competitors like Alibaba.

Sample

Input Cost (per 1K token)

Output cost (per 1K token)

Source

Ernie 5.0

$0.00085 (¥0.006)

$0.0034 (¥0.024)

qianfan

Ernie 4.5 Turbo (Ex)

$0.00011 (¥0.0008)

$0.00045 (¥0.0032)

qianfan

Qwen3 (coder ex.)

$0.00085 (¥0.006)

$0.0034 (¥0.024)

qianfan

The difference in cost between older models, such as ERNIE 5.0 and ERNIE 4.5 Turbo, underlines Baidu’s strategy to differentiate between higher-volume, lower-cost models and higher-capacity models designed for complex tasks and multimodal logic.

Compared to other US options, it remains mid-range in pricing:

Sample

Input (/1M token)

Output(/1M token)

Source

GPT-5.1

$1.25

$10.00

OpenAI

Ernie 5.0

$0.85

$3.40

qianfan

Ernie 4.5 Turbo (Ex)

$0.11

$0.45

qianfan

cloud opus 4.1

$15.00

$75.00

anthropic

gemini 2.5 pro

$1.25 (≤200k) / $2.50 (>200k)

$10.00 (≤200k) / $15.00 (>200k)

Google Vertex AI Pricing

Grok 4 (Grok-4-0709)

$3.00

$15.00

XAI API

Global expansion: products and platforms

With the model release, Baidu is expanding internationally:

  • Genflow 3.0Now with more than 20 million users, it is the company’s largest general-purpose AI agent and features advanced memory and multimodal task management.

  • FamousA self-developed agent capable of solving complex problems dynamically, is now commercially available through invitation.

  • MadiThe international version of Miaoda, Baidu’s no-code builder, is live globally via medo.dev.

  • OrietA productivity workspace with document, slide, image, video, and podcast support, reaching over 1.2M users worldwide.

Baidu’s Digital Human Platform, which has already launched in Brazil, is also part of the global push. According to company data, 83% of livestreamers used Baidu’s digital human technology during this year’s “Double 11” shopping event in China, driving a 91% increase in GMV.

Meanwhile, Baidu’s autonomous ride-hailing service Apollo Go has surpassed 17 million rides, operates a driverless fleet in 22 cities and claims the title of the world’s largest robotaxi network.

Open-source vision-language model attracts industry attention

Two days before the flagship ERNIE 5.0 event, Baidu also released an open-source multimodal model under the Apache 2.0 license: ERNIE-4.5-VL-28B-A3B-Thinking.

As my colleague Michael Nunez at VentureBeat reports, using a mixture-of-experts (MOE) architecture for efficient inference, the model activates only 3 billion parameters while retaining a total of 28 billion.

Major technological innovations include:

  • “Thinking with Images”, which enables dynamic zoom-based visual analysis

  • Support for chart interpretation, document comprehension, visual grounding, and temporal awareness in video

  • Runtime on a single 80GB GPU, making it accessible to medium-sized organizations

  • Full compatibility with Transformer, VLLM and Baidu’s FastDeploy Toolkit

This release increases pressure on closed-source competitors. With Apache 2.0 licensing, ERNIE-4.5-VL-28B-A3B-Thinking has become a viable base model for commercial applications without licensing restrictions – offering some of the highest performance models in this class.

Community feedback and Baidu’s response

Following the launch of ERNIE 5.0, developer and AI evaluator Lisan Al Gaib (@scaling01) posted a mixed review on X. While initially impressed with the model’s benchmark performance, they reported a persistent problem where ERNIE 5.0 would repeatedly invoke the tool during SVG generation tasks – even when explicitly instructed not to do so.

“The ERNIE 5.0 benchmarks looked weird until I tested it… unfortunately this RL is braindamaged or has a serious issue with their chat platform/system prompts,” Lisson wrote.

Within hours, Baidu’s developer-focused support account, @ErnieforDevs, responded:

“Thanks for the feedback! This is a known bug – certain syntax can consistently trigger this. We’re working on a fix. You can try rewriting or changing the prompt to avoid this now.”

The quick turnaround reflects Baidu’s growing emphasis on developer communications, especially as it attracts international users through both proprietary and open-source offerings.

Outlook for Baidu and its ERNIE Founder LLM family

Baidu’s ERNIE 5.0 marks a strategic step up in the global foundation model race. With performance claims that put it on par with OpenAI and Google’s most advanced systems and a mix of premium pricing and open-access options, Baidu is signaling its ambition to become not just a domestic AI leader, but a trusted global infrastructure provider.

At a time when enterprise AI users are increasingly demanding multimodal performance, flexible licensing, and deployment efficiency, Baidu’s two-track approach—premium hosted APIs and open-source releases—could broaden its appeal across both the corporate and developer communities.

It remains to be seen how well the company’s performance claims hold up under third-party testing. But in a landscape shaped by rising costs, model complexity, and computational bottlenecks, ERNIE 5.0 and its supporting ecosystem provide Baidu with a competitive position in the next wave of AI deployment.



Leave a Comment