Microsoft has announced (via Geekwire) that its speech recognition technology has reached an error rate of 5.1 percent. The research team describes this as a new industry milestone the substantially surpasses the 5.9 percent error rate that the researchers achieved last year. More importantly, a word error rate of 5.1 percent matches the human error rate achieved in a commonly cited study.
Microsoft utilized a number of new tools to ultimately reduce its speech recognition error rate by around 12 percent compared to last year's result. That's important because this research ultimately powers the accuracy of services used by millions of people every day, like Cortana, Presentation Translator, and Microsoft Cognitive Services.
Reaching human parity in general speech recognition is certainly an achievement, but Microsoft says it still has some work to do in perfecting speech recognition in more complicated environments. From Microsoft:
While achieving a 5.1 percent word error rate on the Switchboard speech recognition task is a significant achievement, the speech research community still has many challenges to address, such as achieving human levels of recognition in noisy environments with distant microphones, in recognizing accented speech, or speaking styles and languages for which only limited training data is available. Moreover, we have much work to do in teaching computers not just to transcribe the words spoken, but also to understand their meaning and intent. Moving from recognizing to understanding speech is the next major frontier for speech technology.
This development follows an increased emphasis from Microsoft, as well as the tech industry as a whole, on improving artificial intelligence (AI). In addition to speech recognition, Microsoft has been pursuing image recognition, natural language processing, and much more. Microsoft researchers even taught AI to master Ms. Pac-Man, of all things.
We may earn a commission for purchases using our links. Learn more.