What you need to know
- Microsoft today announced a major breakthrough in automatic image captioning powered by AI.
- Microsoft's new model can describe images as well as humans can through computer vision.
- The company is using the model to improve accessibility across Office apps, its Seeing AI app, and for customers through Azure Cognitive Services.
Microsoft revealed that it has achieved a breakthrough (opens in new tab) in using artificial intelligence (AI) to automatically caption images. In its tests and a novel object captioning benchmark, Microsoft says its AI model managed to describe images as well as humans can. The model could help improve accessibility across the web and in apps by providing automatic image captions for people with visual impairments.
"Ideally, everyone would include alt text for all images in documents, on the web, in social media – as this enables people who are blind to access the content and participate in the conversation. But, alas, people don't," said Microsoft AI platform group software engineering manager Saqib Shaikh in Microsoft's blog post announcing the breakthrough. "So, there are several apps that use image captioning as way to fill in alt text when it's missing."
Microsoft says this new model is coming to Outlook, Word, and PowerPoint, and it's already being used with its Seeing AI app. Further, Microsoft is bringing this capability to its Azure customers as part of Azure Cognitive Services.
"We're taking this AI breakthrough to Azure as a platform to serve a broader set of customers," said Lijuan Wang, a principal research manager in Microsoft's Redmond research lab, in the blog post announcing the model breakthrough. "It is not just a breakthrough on the research; the time it took to turn that breakthrough into production on Azure is also a breakthrough."
As Microsoft notes in the blog post announcing the breakthrough, this is the latest in a line of AI achievements the company has managed in recent years. The company has also matched human parity in speech recognition, machine translation, conversational question answering, and machine reading comprehension.
Dan Thorp-Lancaster is the Editor in Chief for Windows Central. He began working with Windows Central as a news writer in 2014 and is obsessed with tech of all sorts. You can follow Dan on Twitter @DthorpL and Instagram @heyitsdtl. Got a hot tip? Send it to email@example.com.
This computer vision stuff is really cool. I also like how MS says it's motivated by helping people with low-vision. Accessibility is a big deal for them. One thing that it is useful for is something as simple as recognizing a filled-out form as a form, and recognizing each cell in the form as a certain piece of information. It is not easy to get that kind of thing to work - it requires a lot of training. One day hopefully it'll be easier. But let this also be a reminder that robots aren't taking all our jobs anytime soon. Think about just how basic this is: recognizing coffee beans!
Bruh, it's cool. The future looks interesting
Get the best of Windows Central in in your inbox, every day!
Thank you for signing up to Windows Central. You will receive a verification email shortly.
There was a problem. Please refresh the page and try again.