Skip to main content

Microsoft says it's reached a big speech recognition milestone

Microsoft has announced (opens in new tab) (via Geekwire) that its speech recognition technology has reached an error rate of 5.1 percent. The research team describes this as a new industry milestone the substantially surpasses the 5.9 percent error rate that the researchers achieved last year. More importantly, a word error rate of 5.1 percent matches the human error rate achieved in a commonly cited study.

Microsoft utilized a number of new tools to ultimately reduce its speech recognition error rate by around 12 percent compared to last year's result. That's important because this research ultimately powers the accuracy of services used by millions of people every day, like Cortana, Presentation Translator, and Microsoft Cognitive Services.

Reaching human parity in general speech recognition is certainly an achievement, but Microsoft says it still has some work to do in perfecting speech recognition in more complicated environments. From Microsoft:

While achieving a 5.1 percent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available. Moreover, we have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent. Moving from recognizing to understanding speech is the next major frontier for speech technology.

This development follows an increased emphasis from Microsoft, as well as the tech industry as a whole, on improving artificial intelligence (AI). In addition to speech recognition, Microsoft has been pursuing image recognition, natural language processing, and much more. Microsoft researchers even taught AI to master Ms. Pac-Man, of all things.

Dan Thorp-Lancaster is the Editor in Chief for Windows Central. He began working with Windows Central as a news writer in 2014 and is obsessed with tech of all sorts. You can follow Dan on Twitter @DthorpL and Instagram @heyitsdtl. Got a hot tip? Send it to daniel.thorp-lancaster@futurenet.com.

18 Comments
  • Cortana can now work outside of the US/UK on Android & Xbox ? 😂 #SorryNotSorry More seriously : i struggle to understand why it is so hard for them to make Cortana work in French on Xbox & Android (because it works "pretty well" on Windows...)
    So sad, i don't use Kinect anymore because of that. EDIT : well done MS for lowering the error rate anyways.
  • Between their translation software, and now their apparent great speech recognition I don't get it either. They must have dictionaries in place somewhere that they can apply to Cortana along with a generic voice (Unless they wanna get Jen Taylor to do syllables in every language which would be cool)
  • That isn't even the most baffling thing. What kind of irritates me is that they already have perfectly working speech recognition as well as output for these languages on Windows 10, but they just dont move it over to other platforms. It took quite a while until Cortana was available in German, for example. Although Windows 10 pretty much shipped with support for the German language, also regarding Cortana ... And this is also not only about Cortana ... there are so many other technologies, where Microsoft already has everything developed, but just seems to sit on ass for half a year until they move it over to other services / platforms / languages It seems like this is just a "Microsoft thing" to do. Oh well.
  • One thing I've always been happy with Cortana about is the recognition of what I'm saying. I have a speech impediment, and it does a very good job getting every word right anyway, most of the time. The only time I've had issues with it is with the Cortrigga app which requires all your commands start with the word "Trigger". I can't hit the r's hard enough for Cortana to understand me, making the app pretty much useless. And to be fair in the context of this achievement, most humans probably wouldn't understand me saying "Trigger" either, at least not without contextual clues. Next up, give it enough functionality that I actually want to use it regularly. I mostly use it to ask for public transit directions home. I'm sure there is other value in there, but I would love to see more things like home automation or telling Cortana on one device to do something on another device (e.g., start my Xbox). 
  • in my opinion, that should be the next step for cortana, self awareness, to sort of know where you are, and only activate itself in the closest device, so if you are in the living you and give cortana a comand, your cell phone, xbox, pc, tablet, and laptop don't all go crasy at the same time
  • I've seen what happens when AI becomes self aware....you don't want that....trust me. With hope, John Conner
  • 🤓
  • Was texting something about Godzilla vs Mothra, I was shocked when Cortana nailed it, capitals and all. 
  • Still she refuses to say she is better than Siri. Shy girl...
  • Speech ecognition is only the tip of the AI iceberg. Translating what I said into words is the first step. Then you have to understand what I mean by those words. Cortana does that pretty well if confined to the stuff she knows about. She isn't all that good at adding that context to stuff that is said subsequently. Google Assistant is better at that. Who is the current president, followed by, How old is he, confuses her. One thing they all need to implement, is to continue listening for a period afer providing an answer. Whill Google Assistant will carry on a contextual conversation (to a point) you still need to initiate each request/question.  The computer on the oiginal Enterprise was only half a century ago, so we are getting there ;)
  • To be cancelled September 2019 and restart July 2020 with something similar but less features.
  • And yet Amazon rules the living room in voice activated devices. That I don't get.
  • That's because Amazon has products that people want to buy and they market and integrate Alexa. Our home thermostat died 2 days ago and I've been wanting a smart thermostat t save money and manage the temps when i travel. I picked up the top rated, Ecobee 4 and it is an actual Alexa device just like an Echo. I would rather have Cortana on my Thermostat but Alexa it is. 2 hours later, I've revived my near dead Amazon account and will be buying other home devices such as lights that work with Alexa. Oh well, Microsoft's loss. Microsoft makes me sick at times,with their ridiculous procrastination and lack of consumer vision.
  • Should work on fixing WordFlow on IOS. 
  • Wordflow on iOS is dead. Long live Swiftkey. https://www.microsoft.com/en-us/garage/profiles/word-flow-keyboard/?grou...
  • she errors all the time on my xbox one. 
  • Next then will be understanding foreign languages, also based on religion, history, country, mood...
  • Speech recognition will probably also get even better when it can understand what one is trying to tell. Some words simply sound the exact same and you can only distinguish them by getting the context.