OpenAI's new "Deep Research" blows ChatGPT o3-mini and DeepSeek out of the water with 26.6% accuracy in the world's hardest "AI exam" — but it skipped the line

The logos of OpenAI and DeepSeek artificial intelligence apps on mobile phones.

Deep Research holds a significant lead ahead of ChatGPT o3-mini and DeepSeek's R1 V3-powered model. (Image credit: Getty Images | Bloomberg)

On Sunday, OpenAI unveiled Deep Research — an agentic AI tool that can conduct multi-step research on the internet for complex tasks. The ChatGPT maker says the tool can simulate a human research analyst and claims what the agent accomplishes in ten minutes would take several hours for a human equivalent.

And as it now seems, the tool is living up to the hype. According to shared benchmarks on debatably the hardest AI exam, Humanity's Last Exam, which was released less than two weeks ago, Deep Research holds a significant lead ahead of ChatGPT03-mini and DeepSeek's R1 V3-powered model (via TechRadar).

For context, the AI exam was created by some of the smartest experts across the world and features some of the most complex questions. DeepSeek previously held a significant lead against other proprietary models with a 9.4% accuracy score.

However, the Chinese AI model was dethroned from the top spot following the launch of OpenAI's o3-mini model with a 10.5% accuracy score. Things got a tad interesting when the setting was adjusted to o3-mini-high, pushing the accuracy score to 13%. The difference between both settings is attributed to the fact that the latter takes longer to analyze and reason when presented with a complex query.

On the other hand, OpenAI's new Deep Research agentic AI tool scored 26.6% in Humanity's Last Exam, translating to a 183% increase in result accuracy.

Granted, the tool ships with resourceful search capabilities, which allows it to scour the web for answers to some of the general knowledge questions featured in the complex test. Ultimately giving it a competitive advantage over other models in the running.

An OpenAI employee referred to his user experience with Deep Research as "a personal AGI moment," indicating:

"Using Deep Research has been a personal AGI moment for me. It takes 10 mins to generate accurate and thorough competitive and market research (with sources) that previously used to take me 3 hours."

See more Artificial Intelligence News

TOPICS

Kevin Okemwa is a seasoned tech journalist based in Nairobi, Kenya with lots of experience covering the latest trends and developments in the industry at Windows Central. With a passion for innovation and a keen eye for detail, he has written for leading publications such as OnMSFT, MakeUseOf, and Windows Report, providing insightful analysis and breaking news on everything revolving around the Microsoft ecosystem. While AFK and not busy following the ever-emerging trends in tech, you can find him exploring the world or listening to music.