Google says its latest reasoning model is its "most intelligent" — but Microsoft's CEO claims Google already fumbled its AI opportunity

By Contributions from published

Google's latest model beats the best from OpenAI and DeepSeek in several key benchmarks.

Artificial intelligence mobile apps for DeepSeek, ChatGPT and Google Gemini arranged.
Google's latest model, Gemini 2.5, earned high benchmarks and uses "reasoning" for better results. (Image credit: Getty Images | Bloomberg)

Google just introduced Gemini 2.5, which the company calls its "most intelligent AI model." The first version of the model is Gemini 2.5 Pro, which earned impressive benchmarks across a wide range of tests.

Google claims that its Gemini 2.5 model outperforms the best from OpenAI, DeepSeek, and models from other AI tech giants.

Gemini 2.5 Pro is available now through Google AI Studio and within the Gemini app if you are a Gemini Advanced user. Gemini 2.5 Pro will also be available through Vertex AI in the near future.

At this time, Google has not shared pricing for Gemini 2.5 Pro or other Gemini 2.5 models.

All models that use Gemini 2.5 are "thinking models," meaning they can work through a thought process before generating a response. These "reasoning" models are the next big thing in the AI space since they result in more complex responses and are generally more accurate.

"Now, with Gemini 2.5, we've achieved a new level of performance by combining a significantly enhanced base model with improved post-training," said Google.

"Going forward, we’re building these thinking capabilities directly into all of our models, so they can handle more complex problems and support even more capable, context-aware agents."

Gemini 2.5 vs. OpenAI models

Google Gemini 2.5 benchmarks

Google's Gemini 2.5 Pro models outperformed previous leading models from OpenAI and DeepSeek. (Image credit: Google)

The benchmarks for Gemini 2.5 shared by Google are quite impressive. Gemini 2.5 Pro Experimental scored an 18.5% on Humanity's Last Exam.

That score means that, at least for now, Gemini 2.5 Pro Experimental is the best model by that metric. Its score beats that of OpenAI 03-mini (14%) and DeepSeek R1 (8.6%).

Related: Microsoft CEO says Google missed its opportunity with AI

That specific test is considered difficult, though it's far from the only way to measure the effectiveness of an AI model.

Google also highlighted Gemini 2.5 Pro's ability to code and the model's benchmarks for math and science. Gemini 2.5 Pro currently leads in math and science benchmarks when measured through GPQA and AIME 2025.

Can you code with Gemini 2.5?

Gemini 2.5: Create your own dinosaur game from a single line prompt - YouTube Gemini 2.5: Create your own dinosaur game from a single line prompt - YouTube
Watch On

Coding is a major focus of Gemini 2.5. Google claims a "big leap over 2.0" and teased more improvements that are on the way.

The new model from Google can create web apps and agentic code applications. A demo from Google shows Gemini 2.5 Pro experimental being used to create a game from a single line prompt.

Kevin Okemwa
Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. You'll also catch him occasionally contributing at iMore about Apple and AI. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.

With contributions from

