Microsoft's neural language AI model surpasses human performance in SuperGLUE test

Microsoft Logo at Ignite (Image credit: Windows Central)

What you need to know

Microsoft's DeBERTa AI model outperformed humans in a test of natural language understanding.
The AI earned higher marks than the human baseline in the SuperGLUE test.
Google also has an AI that beats the human baseline, though Microsoft's AI model scores higher on the same test.

Microsoft invests heavily in artificial intelligence in a wide range of sectors. One of those sectors is natural language understanding, which aims to have AI models understand everyday speech. This is a particularly tricky challenge for machines, but Microsoft's DeBERTa AI model recently scored higher than the human baseline in the SuperGLUE test.

As explained by Microsoft, SuperGLUE is one of the most challenging benchmarks for natural language understanding. Microsoft shares an example in its recent blog post:

Given the premise "the child became immune to the disease" and the question "what's the cause for this?," the model is asked to choose an answer from two plausible candidates: 1) "he avoided exposure to the disease" and 2) "he received the vaccine for the disease."

This is a simple question for humans. We have background information and are used to placing things within context, but it's a challenging question for AI. To make an AI model answer this question correctly, it needs to understand cause and effect, and both options presented to it. The SuperGLUE test includes natural language inference, co-reference resolution, and word sense disambiguation, as explained by Microsoft.

Latest Videos FromWindows Central

The DeBERTa model was recently updated to include 48 Transformer layers and 1.5 billion parameters. As a result, the DeBERTa model earned a macro-average score of 90.3 in the SuperGLUE test. The human baseline for the same test is 89.8.

Microsoft states that it will release the DeBERTa model and its source code to the public.

Microsoft explains that the DeBERTA AI model beating out humans in the SuperGLUE test doesn't mean that it's as intelligent as humans.

Despite its promising results on SuperGLUE, the model is by no means reaching the human-level intelligence of NLU. Humans are extremely good at leveraging the knowledge learned from different tasks to solve a new task with no or little task-specific demonstration. This is referred to as compositional generalization, the ability to generalize to novel compositions (new tasks) of familiar constituents (subtasks or basic problem-solving skills). Moving forward, it is worth exploring how to make DeBERTa incorporate compositional structures in a more explicit manner, which could allow combining neural and symbolic computation of natural language similar to what humans do.

Microsoft's DeBERTa model isn't the first to beat the human baseline on the SuperGLUE test. Google's T5 + Meena" model hit a score of 90.2 on January 5, 2021. Microsoft's DeBERTa model beat Google's with a score of 90.3 just a day later.

Sean Endicott is a News Writer at Windows Central, where he covers Windows 11, Surface hardware, Microsoft 365, AI, apps, and the broader PC ecosystem. Since joining the site in 2017, he has written well over a thousand articles across the Microsoft landscape, covering breaking news, analysis, and feature reporting.

He writes Windows Wrap, a weekly column covering the biggest stories in Windows and the PC industry, and what they mean for the platform going forward.

Before joining Windows Central full-time, Sean worked in journalism and media production after earning a First Class degree in Broadcast Journalism from Nottingham Trent University. Outside of tech, he is an award-winning American football coach based in Nottingham, England, and was named BAFCA Youth Coach of the Year in 2024.