Microsoft Copilot scores low on AI IQ tests — but that's not the full story

Microsoft Copilot
Microsoft Copilot has the dumb, unfortunately. (Image credit: Cheng Xin | Getty Images)

Not all brains are created equally, and that is equally true of artificial intelligence.

As big tech companies from Apple to Meta desperately try to figure out how to profit from the big AI boom, one player we're all too familiar with continues to scheme somewhat outside of the limelight.

While Meta reportedly drops tens of millions in bonuses to poach OpenAI researchers, and Apple spooks shareholders by not having a clear plan for its own AI efforts, Microsoft seems relatively content with powering the back-end for a lot of these services.

Indeed, Microsoft's mass layoffs throughout 2025 to the tune of over 15,000 employees are reportedly designed to fund a splurge on new AI-focused data centers for Azure, as Microsoft bets big on powering AI for other companies.

Microsoft has its own home-grown AI efforts, of course. Microsoft Copilot, for example, is the firm's answer to ChatGPT and other similar AI assistant apps. Microsoft has also baked "AI" features into the Photos app, Microsoft Paint, and even Notepad. So far, few people seem to care, though. And this might be at least partially why.

I noticed this website recently called TrackingAI, which stacks up different models in a variety of IQ-oriented challenges. The website runs AI LLMs through both Mensa Norway's notoriously difficult reasoning tests, as well as fully offline tests designed to prevent AI from surfing the web for the answers. How did Microsoft Copilot do? Well ... not particularly well (at least on paper).

Copilot seems to be struggling to catch up, or is it? (Image credit: TrackingAI)

In the offline tests, Microsoft Copilot languished at the very bottom of the barrel, with a score of 67. By comparison, OpenAI o3 Pro is the current frontrunner, hitting 117. Copilot fared a little better in the Mensa Norway test, hitting 84. Elon Musk's Grok-4 won out on the Mensa Norway test, hitting 136. OpenAI o3 Pro was close behind with 135.

But, does it matter? It should be noted that Microsoft Copilot is based on GPT-4o which prioritizes versatility, speed, and cost-effectiveness over reasoning. OpenAI's o3 models aren't generally available, because they're several times more expensive to run than GPT-4o. Most of the models that beat out Microsoft Copilot are generally "pro" models that are more expensive to run. Copilot is free, and its performance generally reflects that fact.

One of Microsoft's core research areas revolves around figuring out to make more powerful models more cost-effective. Microsoft itself reported a huge surge in carbon emissions, revolving almost entirely around AI power use. Microsoft's own Phi models, which also aren't as widely available, prioritize performance as so-called Small Language Models, potentially designed to run on-device and with minimal costs. Phi models aren't listed on TrackingAI yet, sadly, from what I could tell.

The reality is with these tests is that each language model, at least as of today, are custom-purposed for specific tasks. GPT-4o and Copilot are designed to be more consumer-friendly, dare I say, "fun" to use, even if they suffer when it comes to raw academia. Copilot has deep research modes that boost its accuracy if you're willing to subscribe.

There's a ton of hype in the AI arena right now, with players like Google Gemini, X's Grok, and OpenAI's ChatGPT frequently leapfrogging each other in certain parameters. It's perhaps a bit sad that Microsoft itself, for all its investments, rarely seems to be in the conversation — unless it's negatively so, with the flop of Copilot+ PCs and the privacy backlash of Windows Recall.

Perhaps Microsoft is simply content being in the background, powering the future with Azure instead of being in the limelight.

Jez Corden
Executive Editor

Jez Corden is the Executive Editor at Windows Central, focusing primarily on all things Xbox and gaming. Jez is known for breaking exclusive news and analysis as relates to the Microsoft ecosystem while being powered by tea. Follow on Twitter (X) and tune in to the XB2 Podcast, all about, you guessed it, Xbox!

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.