OpenAI's new "Deep Research" blows ChatGPT o3-mini and DeepSeek out of the water with 26.6% accuracy in the world's hardest "AI exam" — but it skipped the line

The logos of OpenAI and DeepSeek artificial intelligence apps on mobile phones.
Deep Research holds a significant lead ahead of ChatGPT o3-mini and DeepSeek's R1 V3-powered model. (Image credit: Getty Images | Bloomberg)

On Sunday, OpenAI unveiled Deep Research an agentic AI tool that can conduct multi-step research on the internet for complex tasks. The ChatGPT maker says the tool can simulate a human research analyst and claims what the agent accomplishes in ten minutes would take several hours for a human equivalent.

And as it now seems, the tool is living up to the hype. According to shared benchmarks on debatably the hardest AI exam, Humanity's Last Exam, which was released less than two weeks ago, Deep Research holds a significant lead ahead of ChatGPT03-mini and DeepSeek's R1 V3-powered model (via TechRadar).

Kevin Okemwa
Contributor

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.