Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

crimedy7 illustration of a robot ranking things ar 169 v bf96b272 7cb7 4c4e ab01 0c7f29b4da84 0
The Allen Institute for AI (AI2) recently released what it calls its most powerful So far the family of models, Olmo 3But the company continued iterating on the models, expanding its reinforcement learning (RL) runs, creating Olmo 3,1,

The new OLMO 3.1 models focus on efficiency, transparency and control for enterprises.

AI2 updated two of the three versions of OLMO 2: OLMO 3.1 Think 32b, the flagship model optimized for advanced research, and OLMO 3.1 Instruct 32b, designed for instruction-following, multi-turn dialogue, and tool use.

Olmo 3 has a third version, Olmo 3-Base for Programming, Comprehension and Mathematics. This also works well for continuing fine-tuning.

AI2 said that to upgrade Olmo 3 Think32b to Olmo 3.1, its researchers enhanced its best RL run with a longer training program.

“Following the original Olmo 3 launch, we restarted our RL training runs for Olmo 3 32B Think, training for an additional 21 days on 224 GPUs with additional epochs on our dolci-think-RL dataset,” AI2 said in a statement. blog post“This resulted in OLMO 3,1 32B Think, which brings substantial gains in math, logic and instruction-following benchmarks: 5+ points on AIME, 4+ points on ZebraLogic, 4+ points on IFeval and an improvement of 20+ points on IFBench, with stronger performance on coding and complex multi-step tasks,”

To reach OLMO 3.1 instructables, AI2 said its researchers applied the recipe behind the smaller instructable size, 7B, to the larger model.

Olmo 3.1 instruction is 32b "Optimized for chat, tool use and multi-turn dialogue – making it the more performant brother of Olmo 3 Instruct 7B and ready for real-world applications,” AI2 said in a post on x,

For now, the new checkpoints are available on AI2 Playground or Hugging Face, with API access coming soon.

Better performance on benchmarks

The Olmo 3.1 model performed well on benchmark tests and predictably outperformed the Olmo 3 model.

The Olmo 3.1 ThinQ outperformed the Q8 32B model in the AIME 2025 benchmark and came close to the Gemma 27B.

OLMO 3.1 Instruct performed strongly against its open-source peers, even beating models like Gemma 3 on math benchmarks.

“For the Olmo 3.1 32B Instruct, it is a large-scale instruct-tuned model built for chat, tool use, and multi-turn dialogue. The Olmo 3.1 32B Instruct is our most capable full Open Chat model to date and – in our evaluation – the most robust full Open 32B-scale Instruct model,” the company said.

Ai2 has also upgraded its RL-Zero 7B model for math and coding. The company said on the X that both models benefited from longer and more stable training runs.

Commitment to transparency and open source

Ai2 previously told VentureBeat that it designed the Olmo 3 family of models to give enterprises and research labs greater control and understanding over the data and training involved in models.

Organizations can add to the model’s data mix and retrain it to learn from what has been added.

This has long been a commitment for Ai2, which also offers a tool called olmotrace It tracks how the LLM output matches its training data.

“Together, Olmo 3.1 Think32b and Olmo 3.1 Instruct32b show that openness and performance can go together. By extending the same model flow, we continue to improve capabilities while maintaining end-to-end transparency over data, code, and training decisions,” AI2 said.



<a href

Leave a Comment