Another day, another Google AI model. Google has really been introducing new AI tools lately, most recently releasing Gemini 3 in November. Today, it is bringing the flagship model to version 3.1. The new Gemini 3.1 Pro is launching (in preview) today for developers and consumers with the promise of improved problem-solving and reasoning capabilities.
Google announced improvements to its Deep Think tools last week, and apparently, the “key intelligence” behind that update was Gemini 3.1 Pro. As always, Google’s latest model announcement comes with a ton of benchmarks that show mostly minor improvements. On the popular Humanities Last Exam, which tests advanced domain-specific knowledge, the Gemini 3.1 Pro scored a record 44.4 percent. Gemini 3 Pro got a score of 37.5 percent, while OpenAI’s GPT 5.2 got a score of 34.5 percent.

Google also calls for improvements to the models in ARC-AGI-2, including novel reasoning problems that cannot be trained directly into AI. The Gemini 3 was a bit behind in this assessment, reaching a mere 31.1 percent, compared to scores in the 50s and 60s for competing models. The Gemini 3.1 Pro doubles Google’s score, reaching 77.1 percent.
When Google releases new models it often celebrates that they have already reached the top of the Arena leaderboard (formerly LM Arena), but that is not the case this time. For text, Claude Opus 4.6 is four points ahead of the new Gemini at 1504. As for code, Opus 4.6, Opus 4.5, and GPT 5.2 High are all slightly ahead of Gemini 3.1 Pro. However, it’s worth noting that Arena runs on leaderboard vibes. Users vote on the outputs they like best, which can reward outputs that look right whether they are or not.
<a href