Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget

ChatGPT Image Nov 12 2025 02 29 18 PM

Another day at the end of 2025, another impressive result from a Chinese company in open source artificial intelligence.

The AI ​​division of Chinese social networking company Weibo recently released its open source VibeThinker-1.5b-1.5 billion parameter large language model (LLM), a fine-tuned version of rival Chinese tech firm Alibaba’s Qwen2.5-Math-1.5b.

It is now available for free download and use by researchers and enterprise developers – even for commercial purposes – under a permissive MIT license on Hugging Face, GitHub, and ModelScope, along with a technical report on the open access science publishing site arxiv.org.

And yet, despite its compact size, VibeThinker-1.5B achieves benchmark-topping reasoning performance on math and code tasks, rivaling or surpassing models hundreds of times its size, even beating Chinese rival DeepSeek’s famous R1 that went viral earlier this year – a 671-billion parameter model – on formal reasoning benchmarks.

It outperforms Mistral AI’s Magistral Medium and holds its own against Anthropic’s Cloud Opus 4 and OpenAI’s GPT-OSS-20B Medium, while requiring a fraction of the infrastructure and investment.

It does this even after being post-trained on a budget of only $7800 USD for compute resources (3900 GPU hours on an Nvidia H800) – tens, or even hundreds, of thousands of dollars less than typically required to fine-tune similar or larger-scale models.

Remember that this is not the total cost of model development, however: the LLM is trained in stages. First comes pre-training, when the model learns basic language structure and common sense by predicting the next word in massive amounts of text from the Internet, books, and articles. This gives him fluency but not much understanding of how to follow instructions or interact

Training is then done using very small, high-quality datasets – typically a collection of example questions, prompts and expert-written answers – to teach the model how to respond helpfully, reason through problems and align with human expectations. Nevertheless, the cost-effectiveness of Weibo after training on VibeThinker-1.5B is remarkable and should be appreciated.

The open-source release overturns assumptions about parameter scale, compute intensity, and minimum viable size for high-performance LLMs.

A different training approach: spectrum-to-signal

The performance of VibeThinker-1.5B is not due to scale, but due to the training framework behind it: spectrum-to-signal principle (SSP).

Instead of optimizing a model solely for single-passage accuracy (pass@1), the SSP framework splits supervised fine-tuning (SFT) and reinforcement learning (RL) into two separate stages with different goals:

  • SFT (“spectrum phase”): The model is trained to maximize the diversity in possible correct answers, improving its PASS@K score. This creates a wide range of plausible solution paths.

  • RL (“signal phase”): A second-stage reinforcement learning method (called Maxent-Guided Policy Optimization or MGPO) is used to identify and enhance the most correct paths from this diverse solution pool. MGPO prioritizes problems where the model is most uncertain, using entropy-based weighting to focus learning.

The authors argue that this separation allows smaller models to more effectively explore the logic space – achieving signal amplification without relying on massive parameter calculations.

VibeThinker-1.5B makes a compelling case that the industry’s reliance on parameter scaling as the only path to improved logic performance may be outdated.

By adopting a diversity-first training pipeline, WeiboAI has shown that smaller, more accessible models can match and even outperform billion-dollar systems in logic-heavy tasks.

The low resource footprint is one of the most important aspects of the VibeThinker-1.5B. At less than $8,000, post-training costs are 30-60 times lower than models like DeepSeek R1 and MiniMax-M1, which cost between $294K and $535K to train.

Performance in all domains

Despite its small size, VibeThinker-1.5B provides cross-domain logic that surpasses many larger open-source and commercial models:

Sample

AIME25

livecodebench v6

GPQA-Diamond

Vibe Thinker-1.5B

74.4

51.1

46.7

gpt-oss-20b-medium

72.1

54.9

66.0

cloud opus 4

69.2

56.6

79.6

Minimax M1 (456B)

74.6

62.3

69.2

DeepSeek R1 (671B)

70.0

65.9

71.5

KM K2 (1.09T)

49.5

53.7

75.1

VibeThinker was benchmarked against both reasoning-centric models (Magistral, Cloud, OpenAI O3-Mini) and non-reasoning LLMs (GPT-4.1, KMK2, DeepSeq v3). In structured logic benchmarks, the model consistently outperformed non-logic models regardless of size:

  • On AIME24 (mathematics), it beats Kimi K2 (1.09T) by more than 10 points (80.3 vs 69.6).

  • On LiveCodeBench v6, it outperformed Cloud Opus 4 (51.1 vs 47.4).

  • On GPQA, it scored below GPT-4.1 and Cloud, but still doubled its base model (from 16.4 to 46.7).

This supports the authors’ claim that size is not the only path to reasoning ability – with proper training design, smaller models can reach or even surpass the performance of much larger systems in targeted tasks.

In particular, it achieves parity with models hundreds of times larger on math and code, although it lags in general knowledge reasoning (GPQA), where larger models maintain the lead.

This suggests a potential specialization trade-off: while VibeThinker excels at structured logical tasks, it has a poor ability to memorize extensive encyclopedias, a known limitation of smaller architectures.

Guidance for Enterprise Adoption

The release includes recommended inference settings (temp = 0.6, top_p = 0.95, max_tokens = 40960).

This model is small enough to be deployed on edge devices, including mobile phones and vehicle-embedded systems, while the cost is estimated to be 20–70 times cheaper than larger models.

This establishes VibeThinker-1.5B not only as a research achievement, but as a potential basis for cost-efficient, locally deployable logic systems.

Weibo’s strategy and market position

Weibo, launched by Sina Corporation in 2009, remains a cornerstone of China’s social media ecosystem. Often described as China’s version of X (formerly Twitter), the platform blends microblogging, multimedia content and trending-topic features with a regulatory environment shaped by stringent government oversight.

Despite counting 600 million monthly active users (more than double that of

In response, Weibo has leaned toward creator-economy monetization, live-streaming, and vertical video — adding tools for influencer engagement, e-commerce integration, and rich analytics for brands.

The platform’s role as a digital public square also makes it the focus of regulatory scrutiny. Chinese authorities continue to exert pressure on issues ranging from content governance to data security. In September 2025, Weibo was one of the platforms cited in official warnings, highlighting its continued exposure to policy risks.

Weibo’s push into AI R&D—exemplified by the release of Weibothinker-1.5b—signals a change in ambition. In addition to being a media platform, Weibo is positioning itself as a player in the next phase of Chinese AI development by using its capital reserves, user behavior data, and in-house research capacity to pursue adjacent technological domains.

What this means for enterprise technology decision makers

For engineering leaders and enterprise AI teams, the release of Vibethinker has practical implications for everything from orchestration pipelines to cost modeling.

A 1.5B-parameter model that outperforms a 100x larger model on math and programming tasks not only saves computation – it changes the architectural balance. This enables LLM inference on constrained infrastructure, reduces latency at the edge, and lowers the barrier to entry for applications that would otherwise require API access to closed, frontier-scale models.

This matters for enterprise ML leaders trying to deploy reasoning-enabled agents within existing systems or for platform owners looking to integrate ML into automated workflows.

It also speaks to those managing reinforcement learning from human feedback (RLHF) pipelines or inference optimization in hybrid cloud environments.

The model’s post-training methodology—specifically its entropy-targeted reinforcement learning approach—provides a roadmap for teams refining smaller checkpoints rather than relying on massive pre-training.

VibeThinker’s benchmark transparency and data refinement steps also address another emerging priority in enterprise AI: auditability. While its performance on general-knowledge tests still lags behind the larger Frontier models, its task-specific reliability makes it an attractive candidate for controlled environments where accuracy matters more than coverage.

In short, VibeThinker-1.5B is not just a research milestone – it is a strong candidate for practical enterprise use, deployment, and learning. This suggests that a new class of compact, logic-optimized models is viable for enterprise use cases that were previously the domain of much larger systems. For organizations trying to balance cost, latency, interpretability, and control, this is a great new option to the long, growing list of Chinese open source offerings.



1 thought on “Weibo's new open source AI model VibeThinker-1.5B outperforms DeepSeek-R1 on $7,800 post-training budget”

  1. Хай всем! Решил рассказать простым лайфхаком по определению печенья в браузере. Часто бывает, что требуется незамедлительно найти определённые файлы куки, чтобы разобраться с зафиксированной информацией или удалить специфические печенья. Для этого следует задействовать профессиональными средствами и утилитами, которые существенно улучшают процесс.

    Когда желаете протестировать что-то удобное, советую посмотреть вот этот ресурс [url=https://ya22.net]Поиск печенек[/url] , где полно раскрыты варианты нахождения и администрирования куки в всевозможных браузерах. Это действительно сохраняет время и помогает более полно осмысливать, что фактически сохраняется на личном устройстве. Предполагаю, отдельные пользователи признает это нужным!

    Reply

Leave a Comment