See how Microsoft Copilot stacked up head-to-head with other AI LLMs in a direct IQ test 🧠

Microsoft Copilot
Microsoft Copilot has the dumb, unfortunately. (Image credit: Cheng Xin | Getty Images)

Not all brains are created equally, and that is equally true of artificial intelligence.

As big tech companies from Apple to Meta desperately try to figure out how to profit from the big AI boom, one player we're all too familiar with continues to scheme somewhat outside of the limelight.

Copilot seems to be struggling to catch up, or is it? (Image credit: TrackingAI)

In the offline tests, Microsoft Copilot languished at the very bottom of the barrel, with a score of 67. By comparison, OpenAI o3 Pro is the current frontrunner, hitting 117. Copilot fared a little better in the Mensa Norway test, hitting 84. Elon Musk's Grok-4 won out on the Mensa Norway test, hitting 136. OpenAI o3 Pro was close behind with 135.

But, does it matter? It should be noted that Microsoft Copilot is based on GPT-4o which prioritizes versatility, speed, and cost-effectiveness over reasoning. OpenAI's o3 models aren't generally available, because they're several times more expensive to run than GPT-4o. Most of the models that beat out Microsoft Copilot are generally "pro" models that are more expensive to run. Copilot is free, and its performance generally reflects that fact.

One of Microsoft's core research areas revolves around figuring out to make more powerful models more cost-effective. Microsoft itself reported a huge surge in carbon emissions, revolving almost entirely around AI power use. Microsoft's own Phi models, which also aren't as widely available, prioritize performance as so-called Small Language Models, potentially designed to run on-device and with minimal costs. Phi models aren't listed on TrackingAI yet, sadly, from what I could tell.

The reality is with these tests is that each language model, at least as of today, are custom-purposed for specific tasks. GPT-4o and Copilot are designed to be more consumer-friendly, dare I say, "fun" to use, even if they suffer when it comes to raw academia. Copilot has deep research modes that boost its accuracy if you're willing to subscribe.

There's a ton of hype in the AI arena right now, with players like Google Gemini, X's Grok, and OpenAI's ChatGPT frequently leapfrogging each other in certain parameters. It's perhaps a bit sad that Microsoft itself, for all its investments, rarely seems to be in the conversation — unless it's negatively so, with the flop of Copilot+ PCs and the privacy backlash of Windows Recall.

Perhaps Microsoft is simply content being in the background, powering the future with Azure instead of being in the limelight.

Jez Corden
Executive Editor

Jez Corden is the Executive Editor at Windows Central, focusing primarily on all things Xbox and gaming. Jez is known for breaking exclusive news and analysis as relates to the Microsoft ecosystem while being powered by tea. Follow on Twitter (X) and tune in to the XB2 Podcast, all about, you guessed it, Xbox!

You must confirm your public display name before commenting

Please logout and then login again, you will then be prompted to enter your display name.