Azure Speech AI: How future Xbox, PC games could say your custom character's name

Xbox Series X, Xbox Series S (Image credit: Matt Brown | Windows Central)

Microsoft's recent Game Stack Live event showcased a ton of new stuff for developers, including DX12 Agility SDK, updates for Xbox Velocity Architecture, and much more. Many of the sessions are public on the official Game Stack YouTube channel, but we recently got our hands on some of the secret behind-the-scenes sessions intended for developer's eyes only.

There's a ton of incredible work the teams across Microsoft's Game Stack platform are doing to not only make developer's lives easier but also foster the next generation of gaming. One such area that is quite compelling revolves around Microsoft's Azure Speech services, which takes cloud processing and AI algorithms to produce next-gen accessibility and gameplay features.

Microsoft's Azure team is working on "neural voice" features, which is effectively the next generation of text-to-speech processing. Your Amazon Echo speaker uses machine voice to relay information, but it sounds incredibly robotic. Microsoft is working towards AI-generated speech which sounds far more "human." A lot of these applications orient towards enterprise-use cases, but there are applications for video games too.

Benefits of neural voice

Microsoft Flight Simulator 2020 Daher Socata TBM 930 — Source: Microsoft (Image credit: Source: Microsoft)

An obvious benefit for gaming is text-to-speech and vice-versa for accessibility reasons. Sea of Thieves already has this to some degree for U.S. English players, but future versions will incorporate more languages, and eventually enable full speech transcription over comms in scenarios where another player may have hearing or visual impairments. Another benefit could be adding voices to large amounts of NPCs in RPGs, in smaller-budgeted games. Microsoft used Azure Speech Services to create some of the radio transmissions for Microsoft Flight Simulator too. Given the huge range of airports, flight numbers, and other dynamic scenarios, opting for AI makes Flight Simulator scalable in a more cost-effective way.

Star Wars The Old Republic holds the world record for most dialogue recorded for a video game, with over 200,000 lines fully voice acted for the game's hundreds of characters. While that's an easy cost to soak for a large publisher, neural voice features could unlock voiceovers for larger indie projects from smaller teams, who don't necessarily have the cash to spend on thousands of lines of speech.

Next-gen immersion

Fallout 4 — Source: Bethesda Softworks (Image credit: Source: Bethesda Softworks)

Perhaps the coolest scenario Microsoft demonstrated was the ability for games to inject your chosen character names into the game's dialogue, without impacting the natural flow of the script. The robot Codsworth in Fallout 4 lets players select from a large list of the most popular names for the game, but if you have a weird name like Jez for example, it's not likely to be included in that sort of hard-coded system.

Microsoft used examples from one of their existing games to replace generic calls to the character with the player's chosen name instead, which used AI to adapt the speech of the voice actor. Microsoft also showed a demo featuring none other than CEO Satya Nadella, which incorporated his user-inputted name into the demo's dialogue. Hearing Azure Speech AI inject custom names into the dialogue script without sounding overly robotic felt truly next-gen. It reminded me of playing WWE: Smackdown on the PlayStation 1, wondering if there would ever be a system that would let the commentators speak my custom wrestler's names.

Microsoft recently purchased the AI speech firm Nuance responsible for Apple's Siri assistant recently. Clearly, the firm sees an opportunity in the AI speech space for consumers, despite the fact its own assistant, Cortana, is basically dead.

Either way, Azure Speech Services are available right now to all platforms as part of Azure's Cognitive Services platform. Coupled with Nuance, it's exciting to imagine where Microsoft could push this tech in the future.

See more Gaming News

Jez Corden is the Executive Editor at Windows Central, focusing primarily on all things Xbox and gaming. Jez is known for breaking exclusive news and analysis as relates to the Microsoft ecosystem while being powered by tea. Follow on Twitter (X) and tune in to the XB2 Podcast, all about, you guessed it, Xbox!