AI Math Hype Doesn’t Add Up

Good Morning

A new study reveals AI still struggles with real math Olympiad problems. Google just launched Gemini 2.5 Pro, now topping the LLM leaderboard with advanced reasoning and massive context. And in image land, Midjourney faces heat as new rivals flood the scene. Here’s what’s up.

🎓 AI Math Hype Doesn’t Add Up

$AI Math Hype Doesn’t Add Up$

A few months ago, Google DeepMind claimed that “AI achieves silver-medal standard solving International Mathematical Olympiad problems.” Reasoning models like OpenAI’s O1-Pro and DeepSeek R1 are said to handle such tasks. A new study put that to the test by feeding these models unseen 2025 Olympiad questions.

The results? Not great. On average, model scores ranged from 2.08% (o3-mini) to 4.76% (DeepSeek R1). Human experts graded the responses and found common issues, such as unproven assumptions. That said, DeepSeek R1 came close to fully solving one of the problems in a single case.

This study shows that LLMs are still far from truly solving complex math problems. It also highlights how hard it is to measure model performance accurately. Earlier high scores may have come from training data contamination or leaderboard overfitting. With billions on the line, it’s not hard to imagine some labs gaming the benchmarks.

👑 Google Crowns Itself King Again

Google DeepMind has launched Gemini-2.5-Pro-Exp-03-25, now ranked as the top model on the LLM leaderboard. According to Google, it brings stronger reasoning, improved accuracy, and powerful coding capabilities. It supports multimodal input and features a massive 1 million-token context window, with plans to expand to 2 million.

The model is available in Gemini AI Studio. Gemini 2.5 Pro excels in benchmarks like GPQA and AIME 2025 and scored 18.8% on “Humanity’s Last Exam.” It can understand entire codebases, build applications, and outperforms Gemini 2.0 in reasoning and code editing. It also introduces higher usage limits and billing support.

This launch reinforces Google’s position as a leader in the AI space, now facing growing competition from open-source rivals like DeepSeek. Gemini’s performance reflects Google’s continued push in reasoning and agentic AI, giving users smarter, more context-aware tools for coding, science, and beyond.

📸 Midjourney’s Throne Faces New Rivals

In just the past few days, multiple companies have released or updated their image generation models. Yesterday, OpenAI launched GPT-4o with advanced image generation and editing capabilities. Google’s Gemini 2.5 has also gained praise for its ability to write text inside images and remove watermarks.

Reve Image 1.0 is a new contender in the space, making a bold entrance with claims of superior performance. In just 24 hours, users generated over 1.07 million images with the model, highlighting the booming interest in image generation tools.

For a long time, Midjourney was the dominant player in this space. While open-source models like Stable Diffusion and Flux.1 have challenged its position, Midjourney’s image quality remained unmatched. Now, the landscape is shifting with more serious competitors entering the field.