
Joining the ranks of the growing number of small, powerful logic models is MiroMind’s MiroThinker 1.5, which has only 30 billion parameters, compared to the hundreds of billions or even trillions of parameters used by leading foundation large language models (LLMs).
But MiroThinker 1.5 stands out among these smaller reasoners for one key reason: It offers agentive research capabilities that rival trillion-parameter competitors like KM K2 and DeepSeek at a fraction of the estimated cost.
This release marks a milestone toward intelligent, deployable AI agents. Enterprises have long been forced to choose between expensive API calls to the frontier model or compromised local performance. MiroThinker 1.5 offers a third way: open-weight models specifically designed for extended tool use and multi-step reasoning.
One of the biggest trends emerging in the industry is the move from highly specialized agents to more generalized agents. Until recently, that capability was largely limited to proprietary models. MiroThinker 1.5 represents a serious open-weight contender in this field. Watch my YouTube video on this below.
Risk of hallucinations reduced through verifiable logic
For IT teams evaluating AI deployments, confusion remains the primary barrier to using open models in production. MiroThinker 1.5 addresses this through what MiroMind calls “scientific mode” – a fundamental architectural change in how the model handles uncertainty.
Rather than generating statistically plausible answers from memorized patterns (the root cause of most hallucinations), Mirothinkers are trained to execute a verifiable research loop: proposing hypotheses, interrogating external sources for evidence, identifying mismatches, revising and re-verifying findings. During training, the model is explicitly penalized for high-confidence outputs that lack source support.
The practical implication for enterprise deployment is auditability. When MiroThinker formulates an answer, it can reveal both the reasoning chain and the external sources it consulted. For regulated industries like financial services, healthcare, and legal, this creates a documentation path that memoir-based models cannot provide. Compliance teams can review not only what the model concluded, but how it got there.
This approach also reduces the “confident hallucination” problem common in production AI systems. The model is trained to seek validation rather than expanding when uncertain – a behavior that directly translates into less costly errors.
Benchmark Performance: Punching Above Its Weight
Under this framework, MiroThinker-v1.5-30B provides performance equivalent to models with 30× more parameters, including the trillion-parameter Km-K2-Thinking model.
On BrowseComp-ZH, a key benchmark for web research capabilities, the 30B model actually outperformed its trillion-parameter competitor with a score of 69.8.
The cost difference is equally notable. MiroMind reports that inference costs for the 30B variant are as low as $0.07 per call – about one-twentieth of the cost of km-K2-thinking – as well as faster inference speeds.
A larger 235B version (with 22B active parameters in the expert mix) is in the global top tier in several search-agent benchmarks. On general agentive search evaluation, these models hold their own against the DeepSeek v3.2, Minimax, GLM, and KM-K2 systems.
In testing, the larger model comes closer to the Gemini 3 Pro on several benchmarks and closer to a GPT-5-class system than its parameter calculations. While benchmark hill climbs are becoming increasingly common, what matters more is overall competitiveness – and Mirothinker stands out well.
Extended tool usage: up to 400 tool calls per session
The defining capability of MiroThinker 1.5 is continuous device usage.
The models support up to 256,000 tokens of context and claim to support up to 400 tool calls per session – a critical requirement for complex research workflows involving extensive information gathering, synthesis and cross-checking.
This places Mirothinker firmly in the emerging category of agentic models designed to accomplish autonomous tasks rather than single-turn Q&A. Practical applications include in-depth research workflows, content pipelines, report generation, and podcast-style output, similar to NotebookLM.
Training Innovation: Time-Sensitive Sandbox
Another major innovation in MiroThinker 1.5 is its time-sensitive training sandbox.
Traditional model training operates from what Miromind describes as a “God’s eye”, where the model has access to the final results within a static dataset – which creates hindsight bias. MiroThinker’s training removes that advantage.
During training, the model can only interact with information published before a given timestamp, preventing future leakage and forcing it to reason in realistic situations of incomplete information.
The pipeline combines supervised fine-tuning with reinforcement learning using verifiable rewards through Group Relative Policy Optimization (GRPO), an advanced reinforcement learning algorithm popularized by DeepSeek, which encourages the model to select the right tool at the right time.
This approach is particularly relevant to enterprise use cases where models must reason about emerging situations rather than memorizing static facts.
practical deployment considerations
For IT teams considering deployment, hardware requirements still matter. Even 30B models require a substantial amount of GPU memory, and smaller setups may be difficult.
One advantage is compatibility. MiroThinker runs on vLLM servers with OpenAI-compliant API endpoints, making it easy to integrate into existing toolchains and function-calling workflows as a drop-in replacement.
Both model sizes are available under the face-hugging, permissive, enterprise-friendly MIT license, and an online demo is available for evaluation. Permissive licensing removes major barriers to internal deployment and fine-tuning.
The big picture: interactive scaling vs parameter scaling
MiroThinker 1.5 comes as the industry faces the limitations of traditional scaling laws. Larger models no longer guarantee better performance in the real world. As Artificial Analysis notes, many benchmarks are saturated, pushing the industry toward valuation based solely on economic utility rather than abstract reasoning.
Miromind’s bet is on interactive scaling – improving efficiency through deep tool interactions rather than massive parameter calculations. If true, this could enable sophisticated agents on infrastructure that do not rely on expensive Frontier APIs.
The company, founded by Tianqiao Chen and AI scientist Jifeng Dai, describes its mission as building “native intelligence” – AI that learns through interaction, not memorization.
Whether this approach becomes effective or remains a niche is still an open question. But for enterprises struggling with cost-efficiency tradeoffs, MiroThinker 1.5 provides a compelling data point: Sometimes, teaching a model to do research makes more sense than teaching it to remember everything.
<a href