On Tuesday, Google unveiled Gemini 2.5, a new family of AI reasoning models that pauses to “think” before answering a question.
To kick off the new family of models, Google is launching Gemini 2.5 Pro Experimental, a multimodal, reasoning AI model that the company claims is its most intelligent model yet. This model will be available on Tuesday in the company’s developer platform, Google AI Studio, as well as in the Gemini app for subscribers to the company’s $20-a-month AI plan, Gemini Advanced.
Moving forward, Google says all of its new AI models will have reasoning capabilities baked in.
Since OpenAI launched the first AI reasoning model in September 2024, o1, the tech industry has raced to match or exceed that model’s capabilities with their own. Today, Anthropic, DeepSeek, Google, and xAI all have AI reasoning models, which use extra computing power and time to fact-check and reason through problems before delivering an answer.
Reasoning techniques have helped AI models achieve new heights in math and coding tasks. Many in the tech world believe reasoning models will be a key component of AI agents, autonomous systems that can perform tasks largely san human intervention. However, these models are also more expensive.
Google has experimented with AI reasoning models before, previously releasing a “thinking” version of Gemini in December. But Gemini 2.5 represents the company’s most serious attempt yet at besting OpenAI’s o series of models.
Google claims that Gemini 2.5 Pro outperforms its previous frontier AI models, and some of the leading competing AI models, on several benchmarks. Specifically, Google says it designed Gemini 2.5 to excel at creating visually compelling web apps and agentic coding applications.
On an evaluation measuring code editing, called Aider Polyglot, Google says Gemini 2.5 Pro scores 68.6%, outperforming top AI models from OpenAI, Anthropic, and Chinese AI lab DeepSeek.
However, on another test measuring software dev abilities, SWE-bench Verified, Gemini 2.5 Pro scores 63.8%, outperforming OpenAI’s o3-mini and DeepSeek’s R1, but underperforming Anthropic’s Claude 3.7 Sonnet, which scored 70.3%.
On Humanity’s Last Exam, a multimodal test consisting of thousands of crowdsourced questions relating to mathematics, humanities, and the natural sciences, Google says Gemini 2.5 Pro scores 18.8%, performing better than most rival flagship models.
To start, Google says Gemini 2.5 Pro is shipping with a 1 million token context window, which means the AI model can take in roughly 750,000 words in a single go. That’s longer than the entire “Lord of The Rings” book series. And soon, Gemini 2.5 Pro will support double the input length (2 million tokens).
Google didn’t publish API pricing for Gemini 2.5 Pro. The company says it’ll share more in the coming weeks.
You Might Also Like
Iranian hackers are targeting American critical infrastructure, US agencies warn
The U.S. government is warning that Iran-backed hackers are escalating their tactics by targeting American critical infrastructure systems with the...
Anthropic says Claude Code subscribers will need to pay extra for OpenClaw usage
It’s about to become more expensive for Claude Code subscribers to use Anthropic’s coding assistant with OpenClaw and other third-party...
A new dating app, Sonder, has a deliberately annoying sign-up process (and it’s working)
Sonder’s founders couldn’t bear to encounter one more dating app profile in which someone boldly declared that their most controversial...
‘Project Hail Mary’ becomes Amazon MGM’s biggest box office hit
Amazon’s bet on “Project Hail Mary” has paid off handsomely, with the film recently surpassing “Creed III” to become the...








