Microsoft bringing breakthrough AI image captioning to Word, PowerPoint, Outlook

Microsoft Enhanced Image Captioning Example (Image credit: Microsoft)

What you need to know

Microsoft today announced a major breakthrough in automatic image captioning powered by AI.
Microsoft's new model can describe images as well as humans can through computer vision.
The company is using the model to improve accessibility across Office apps, its Seeing AI app, and for customers through Azure Cognitive Services.

Microsoft revealed that it has achieved a breakthrough in using artificial intelligence (AI) to automatically caption images. In its tests and a novel object captioning benchmark, Microsoft says its AI model managed to describe images as well as humans can. The model could help improve accessibility across the web and in apps by providing automatic image captions for people with visual impairments.

"Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. But, alas, people don't," said Microsoft AI platform group software engineering manager Saqib Shaikh in Microsoft's blog post announcing the breakthrough. "So, there are several apps that use image captioning as way to fill in alt text when it's missing."

Microsoft says this new model is coming to Outlook, Word, and PowerPoint, and it's already being used with its Seeing AI app. Further, Microsoft is bringing this capability to its Azure customers as part of Azure Cognitive Services.

"We're taking this AI breakthrough to Azure as a platform to serve a broader set of customers," said Lijuan Wang, a principal research manager in Microsoft's Redmond research lab, in the blog post announcing the model breakthrough. "It is not just a breakthrough on the research; the time it took to turn that breakthrough into production on Azure is also a breakthrough."

As Microsoft notes in the blog post announcing the breakthrough, this is the latest in a line of AI achievements the company has managed in recent years. The company has also matched human parity in speech recognition, machine translation, conversational question answering, and machine reading comprehension.

Dan Thorp-Lancaster is the former Editor-in-Chief of Windows Central. He began working with Windows Central, Android Central, and iMore as a news writer in 2014 and is obsessed with tech of all sorts. You can follow Dan on Twitter @DthorpL and Instagram @heyitsdtl.