What you need to know
- Microsoft today announced a major breakthrough in automatic image captioning powered by AI.
- Microsoft's new model can describe images as well as humans can through computer vision.
- The company is using the model to improve accessibility across Office apps, its Seeing AI app, and for customers through Azure Cognitive Services.
Microsoft revealed that it has achieved a breakthrough in using artificial intelligence (AI) to automatically caption images. In its tests and a novel object captioning benchmark, Microsoft says its AI model managed to describe images as well as humans can. The model could help improve accessibility across the web and in apps by providing automatic image captions for people with visual impairments.
"Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. But, alas, people don't," said Microsoft AI platform group software engineering manager Saqib Shaikh in Microsoft's blog post announcing the breakthrough. "So, there are several apps that use image captioning as way to fill in alt text when it's missing."
Microsoft says this new model is coming to Outlook, Word, and PowerPoint, and it's already being used with its Seeing AI app. Further, Microsoft is bringing this capability to its Azure customers as part of Azure Cognitive Services.
"We're taking this AI breakthrough to Azure as a platform to serve a broader set of customers," said Lijuan Wang, a principal research manager in Microsoft's Redmond research lab, in the blog post announcing the model breakthrough. "It is not just a breakthrough on the research; the time it took to turn that breakthrough into production on Azure is also a breakthrough."
As Microsoft notes in the blog post announcing the breakthrough, this is the latest in a line of AI achievements the company has managed in recent years. The company has also matched human parity in speech recognition, machine translation, conversational question answering, and machine reading comprehension.
We may earn a commission for purchases using our links. Learn more.