OpenAI releases GPT-5.2 after “code red” Google threat alert

In an effort to keep up (or stay ahead) of the competition, model releases proceed at a steady clip: GPT-5.2 represents OpenAI’s third major model release since August. GPT-5 was launched that month with a new routing system that toggles between quick-response and simulated logic modes, although users complained about responses that were cold and clinical. November’s GPT-5.1 update added eight preset “personality” options and focused on making the system more interactive.

number increases

Strangely, even though the GPT-5.2 model release is clearly a response to Gemini 3’s performance, OpenAI decided not to list any benchmarks comparing the two models on its promotional website. Instead, the official blog post focuses on GPT-5.2’s improvements over its predecessors and its performance on OpenAI’s new GDPval benchmark, which attempts to measure professional knowledge work tasks across 44 occupations.

During the press briefing, OpenAI shared some competing comparison benchmarks including Gemini 3 Pro and Cloud Opus 4.5, but emphasized the statement that GPT-5.2 was brought to market in response to Google. “It’s important to note that this has been in the works for several months, although choosing when to release it is, we note, a strategic decision,” Simo told reporters.

According to the shared data, GPT-5.2 Thinking scored 55.6 percent on software engineering benchmark SWE-Bench Pro, compared to 43.3 percent for Gemini 3 Pro and 52.0 percent for Cloud Opus 4.5. On the graduate science benchmark GPQA Diamond, the GPT-5.2 scored 92.4 percent versus the Gemini 3 Pro’s 91.9 percent.

GPT-5.2 benchmarks that OpenAI shared with the press.

GPT-5.2 benchmarks that OpenAI shared with the press.


Credit: OpenAI/VentureBeat

OpenAI says that GPT-5.2 Thinking outperforms or ties “human professionals” on 70.9 percent of tasks in the GDPval benchmark (compared to 53.3 percent for Gemini 3 Pro). The company also claims that the model completes these tasks at more than 11 times the speed and at less than 1 percent cost than human experts.

GPT-5.2 Thinking reportedly generates responses with 38 percent fewer confabulations than GPT-5.1, according to OpenAI post-training lead Max Schwarzer, who told VentureBeat that the model makes “significantly fewer hallucinations” than its predecessor.

However, we always take benchmarks seriously because they are easier to present in a way that is positive for a company, especially when the science of measuring AI performance objectively has not caught up with corporate sales pitches for humanoid AI capabilities.

It will take time for independent benchmark results from researchers outside OpenAI to emerge. Meanwhile, if you use ChatGPT for work tasks, expect a capable model with incremental improvements and some improved coding performance thrown in for good measure.



<a href

Leave a Comment