Microsoft envisions Cortana doing much more than reminding us to pick up toilet paper on the way home from work.
Back in 2014, we Windows Phone fans could barely contain ourselves as we eagerly awaited Cortana's arrival on Windows Phone 8.1. At the time, like many writers, I had a vision of what Cortana would mean for Microsoft and mobile computing. So, well, I wrote about it.
Alas, time has moved on, and that initial fervor that fueled the "Cortana conversations" of many Windows phone fans has transitioned through other topics. The Lumia 950 and 950 XL had their time in the limelight. HoloLens enters and re-enters the conversation. Windows 10 updates are a consistent topic, further fueled by Gabe Aul's passing of the torch to Dona Sarkar as the new face for the Insider Program. And the beat goes on.
We are now more than two years down the road, and though she is not the center of the fans conversations she once was, Cortana has remained an evolving and growing part of the Windows ecosystem dialogue. As such she does pop up in headlines now and again as she makes strides toward Redmond's vision for her. As she has grown, my vision for Cortana as originally presented is largely unchanged. But pieces have been added to the puzzle that I did not anticipate.
Help Wanted: Evolving platform in need of AI
In my "Smartphones are dead series," we focused on the shift of the smartphone to the form and functionality of mini-tablet computers. Microsoft's anticipated 2017 Surface "phone" (for lack of another name) will presumably lead the next hardware shift further toward an all-in-one ultra-mobile PC.
Not only has smartphone hardware shed the archaic remnants of a physical keyboard-centric past, but the ephemeral cloud-centric digital platforms that accommodate our digital lives are also in flux.
It is evident that the big players in personal computing — Apple, Google and Microsoft — are evolving their platforms to better accommodate increasingly complex mobile personal computing and the mobility of a user's digital experiences across devices. Apple's Continuity, Google's making of Android apps available on Chromebooks and Microsoft's Universal Windows Platform are these company's distinct approaches to the same challenge.
As we've argued previously, Apple's and Google's approaches perpetuate the current, and arguably dead-end, iteration-focused smartphone model. Microsoft's unique Universal Windows Platform combined with Continuum-enabled context-sensitive devices conversely positions Redmond on a divergent path toward an evolutionary device: an ultra-mobile PC.
We've talked a lot about the evolution of the platform and the devices that the platform currently or will support. However, with the even more dynamic growth of cloud computing, personal computing's transition to the cloud and the inefficiency of the current "warehouse of apps" app model, there exists a need for an equally dynamic cloud-based intelligent user interface. Enter artificial intelligence, Cortana and an ecosystem of bots.
A healthy perception
Before we get there, allow me to share a little background on how I initially saw this whole Cortana and AI thing. During the cold winter of 2014, February 25th to be precise, I published an article focused on a hot topic for Windows Phone fans, titled "Why Windows Phone 8.1, particularly Cortana must be revolutionary." Yes, revolutionary.
I posited in that, and other pieces at the time, that Cortana would be the next big thing. I suppose, to some, my analysis of Cortana's importance, Microsoft's "Johnny come lately" digital assistant (to many), in the wake of the more mature Siri and Google Now may have seemed, on my part, to be a lack of awareness of what was important that year. Health and wearables were the big focus in 2014 after all.
The blogosphere, fueled by the rumor mill, leaks, and various sources was ablaze with dialog of Apple's and Samsung's next wave of phones (a larger iPhone or two), wearables (Apple Watch and Samsung Gear) and each of these devices integration with health monitoring technologies.
Yes, health platforms, smartphone integrated biometric monitoring and such pumped excitement and speculation through the heart of the industry that year. Thus, Microsoft's heavy focus on artificial intelligence, Cortana, and my analysis' that it was "The Next Big Thing" and not just a "digital-assistant-box" checked for Microsoft, may have seemed sorely out of step with the direction of the industry. That's ok; I dare to be different. And apparently, Redmond does too. I did get some reassuring feedback from our favorite firm confirming my analysis.
That is not to say that the industry's move toward health was a miss. Far from it. Microsoft's own Health platform and the Microsoft Band are the company's cross-platform commitment to that sector. But health monitoring via our dizzying array of smart devices and wearables is more of a "thing" all of our stuff can do, rather than being the "Next Big Thing."
An all-encompassing and intelligent agent is arguably a necessary next step in the UIs evolution.
The "Next Big Thing", in my estimation, will bind our digital experiences and devices together with a single UI and help us more easily "do" stuff with them. Indeed, like the graphical user interface, the mouse and touch, artificial intelligence and voice as the next evolution of the personal computer user interface, I deemed, would be the "Next Big Thing." An all-encompassing and intelligent agent is arguably a necessary next step in the UIs evolution as we move more firmly into the age of "transient computing"; or as Satya Nadella likes to say, the mobility of experiences. Artificial intelligence would be essential in managing experiences as cloud computing carried a user's data across contexts and devices.
The Next Big Thing
This junction is precisely where Cortana, and my 2014 analysis, comes in. With the mobility of computing having hit its stride and our digital experiences living more comfortably in the cloud, following us through life and across an expanding portfolio of devices, how we as humans interact with these devices and digital experiences in such a dynamic environment requires an equally dynamic interface.
In the various pieces I wrote about Cortana in 2014, I pressed the importance of voice and digital assistants as that new user interface beyond the static input methods such as mouse and keyboard and the more dynamic but still limiting interfaces such as touch and pen. The assistant would be the common thread, an intelligent foundation, that binds all of our devices and experiences together while voice (as was my view then but new information has evolved this assessment) would be the primary, and natural, means by which we engaged the assistant.
I envisioned a world where apps would essentially "vanish" (which is an understandable challenge for developers) as Cortana would act as the mediator between us and the things we wanted to do (through apps). I conceived something more complex than the mere voice assistant we saw in Siri and something more personal, interactive and seamless even than Google Now. Cortana in her initial and current iteration isn't there yet. But I posited that she would be.
Admittedly, I did not foresee bots nor what Satya Nadella introduced at Build 2016 as Conversation Canvases
After all, Microsoft has a sturdy foundation of over two decades of voice research. They also have a deep knowledge base in what was previously called the firms Satori engine. Redmond also has a broad base of interconnected duo user (enterprise/consumer) services, profound investments and advancements in machine learning and a growing Bing search- or as they previously marketed, decision engine, that collects essential data across a connected array of services. That's a pretty comprehensive set of resources for a comprehensive AI.
Admittedly, I did not foresee bots, the Microsoft Bot Framework, nor what Satya Nadella introduced at Build 2016 as Conversation Canvases. But those are indeed the tools that will help build the personal computing world of which I envisioned Cortana being a central part.
A more detailed vision
Bots and Conversational Canvases are the components missing from my original analysis that would serve as the medium by which Cortana and eventually the Bing Concierge Bot would connect us to our digital experiences and facilitate our requests. I believe that these tools were even alluded to in part by former Microsoft CEO Steve Ballmer in a July 11th, 2013 memo. In this memo he described the interconnectivity of the digital resources from which the AI would draw, though he could not fully define the coming experience:
The experience we will deliver across all our devices centers on the idea of better connecting people with the things they care about most. This includes their files, documents, photos, videos, notes, websites, snippets, digital history, schedules, tasks, and mail and other messages, combined with real-time information from our devices and services. It is more than what we think of as the shell today, and no current label really fits where we are headed. Neither the desktop nor the social graph describes this new experience, and neither does the search box, the pin board or the file system. The shell will support the experiences layer and broker information among our services to bring them together on our devices in ways that will enable richer and deeper app experiences.
Ballmer painted a picture of the interconnected and transient nature of our cloud-based experiences which is an essential foundation to our highly mobile personal computing landscape. This statement also clearly communicates the ease of access we as users would have to all of this information and the synergy the interconnectivity would imbue to the varied content. Of course, this notion has been repeatedly echoed by Microsoft's current CEO Satya Nadella. In his July 10, 2014 "Bold Ambitions and Our Core" memo Nadella stated:
Microsoft has a unique ability to harmonize the world's devices, apps, docs, data and social networks in digital work and life experiences so that people are at the center and are empowered to do more and achieve more with what is becoming an increasingly scarce commodity – time! Productivity for us goes well beyond documents, spreadsheets and slides. We will reinvent productivity for people who are swimming in a growing sea of devices, apps, data and social networks.
We will build the solutions that address the productivity needs of groups and entire organizations as well as individuals by putting them at the center of their computing experiences. We will shift the meaning of productivity beyond solely producing something to include empowering people with new insights. We will build tools to be more predictive, personal and helpful.
We will enable organizations to move from automated business processes to intelligent business processes. Every experience Microsoft builds will understand the rich context of an individual at work and in life to help them organize and accomplish things with ease.
" …We will create more natural human-computing interfaces that empower all individuals."
Clearly, a CEO's corporate strategies outline goals that are five to ten years down the road. That said, though Nadella's and Ballmer's leadership styles are vastly different, and Microsoft has undergone profound shifts and reorganizations under Nadella's tenure, the current CEO has carried the baton of some of the company's long-range strategies initiated under the previous administration.
We will build tools to be more predictive, personal and helpful.
A pervasive, intelligent cloud and a cloud-based intelligent assistant have been on Redmond's roadmap for years. The technology and other industry factors, however, has only in recent years reached a point where the plan could become a reality. An easily accessible Internet, ubiquitous and always connected powerful smartphones(computers), speech recognition and machine learning, (all areas where Microsoft has invested), have finally coalesced to a point that can support Microsoft's vision for "The Next Big Thing." We are finally taking our first careful steps into the long-awaited age of the AI sidekick.
As a kid, growing up in the 80s one of my favorite television programs was Knight Rider. The booming popularity of Michael Knight and his AI sidekick, KITT, suggests that I was not the only one enamored with the series. What enticed both young and old fans of that show was not the also-ran "crime fighter saves the day" theme. It was, rather, the absolute coolness of a car that could think, act and operate autonomously. And let's not forget KITT's ability to talk with razor sharp wit.
Of course movies, television, books, pop culture, in general, promoted no shortage of artificial intelligent inanimate "beings" over the years. Lost in Space's Robbie the Robot, 2001's Hal, Quantum Leap's Ziggy and even the more recent Samantha, Joaquin Phoenix's AI companion from the 2013 movie "Her", are wildly popular AI icons. The popularity of these characters in modern culture is a reflection, in my estimation, of our growing desire for that "AI sidekick" that knows us intimately and serves us selflessly throughout the day.
With the advent of Siri on one of the world's most popular platforms iOS (and now macOS Sierra), Google Now on the world's most pervasive mobile platform, and Cortana on both as well as her evolving integration within the Windows ecosystem, AI digital assistants are commonplace. This fact coupled with the aforementioned affinity we have for them, I think it safe to say that the desire for an artificial intelligent sidekick has grown beyond the realm of the "nerds dream" and has now piqued the collective interest of the masses.
Despite pragmatic concerns about privacy and the fancied dystopian visions of machine overlords, the consensus seems to be that AI sidekicks are welcomed by the masses in the name of convenience. Nadella summed up our developing relationship with our evolving digital companions this way:
The future isn't going to be about man versus machines, it's going to be about man with machine."
The future is (almost) here
During the 2016 Build Keynote, Nadella acknowledged that we are at the beginning of this journey of artificial intelligence, bots and Conversation Canvases. Microsoft has, however, laid a strong foundation upon which to build this ambitious vision as Eric Horvitz of Microsoft Research expounded on in a previous interview.
First, Redmond has established itself as a leader in machine learning, deep neural networks, natural language processing and artificial intelligence. Each of these areas is an essential component to Redmond's plan to remain a leader in the future of personal computing. A future where the artificial intelligent agent, such as Cortana, is a meta-app and bots operate as "expert" extensions to the AI to accomplish a user's tasks.
Microsoft is a leader in machine learning, neural networks, natural language processing and AI.
In addition to being a leader in the foundational components necessary to make AI and bots a success, Microsoft is one of the world's most powerful software companies, a leading cloud service provider, and a critical IT infrastructure partner for firms around the globe. These strengths enable Microsoft to position itself as a platform company for services from cloud computing to app development, to a range of work and personal computing services, and now to the new frontier of AI and bot development and Conversation Canvases.
Cortana's integration in Microsoft's ecosystem and beyond foreshadow Microsoft's pervasive plan for bots.
These strengths combined with the firm's industry leading work in the previously mentioned areas of machine learning, deep neural networks, natural language processing and more position Redmond to reap the benefits of rich data for its AI platform. They also provide a broad base to extend the firm's AI and bot initiative and a pervasive ecosystem into which the company will integrate AI and bot interaction.
Cortana's progression from Windows phone, to PC, to the Edge browser, iOS, Android and her imminent arrival in Outlook with this Summer's Windows 10 Anniversary Update is a mere foreshadow of Microsoft's pervasive plan for the intelligent agent and bots. This is only the beginning.
Speaking the same language
Whereas my initial 2014 analysis positioned Cortana and voice as the next user interface, Nadella deems human language as the next UI, and as previously mentioned bots as the next generation of apps, and Cortana as a meta-app. Nadella also stressed Ballmer's earlier assertion that intelligence would be infused into all of our interactions.
By positioning or targeting human language as the next user interface, Microsoft (and others) create a broad "canvas", as it were, for digital and human interaction. Human beings, after all, are using a variety of tools to communicate across the "digital divide." The single consistent common denominator, despite the medium used, however, is human language. Whether it is via voice, text, writing or even images, human language is the common "voice" across canvases such as Skype, Cortana, WeChat, Kik, Line, Email, Slack, GroupMe and even SMS.
For a point of context as to the power of these canvases, Skype sees over three-hundred million connected users per month and over three billion minutes of voice calls per day. With this kind of user engagement as a foundation, Microsoft has made the strategic decision to create a Bot Framework, which will provide developers with the tools to connect their apps (as bots) to these common canvases.
Nadella referred to these, and other as-yet-unspecified tools we use to communicate, as Conversation Canvases. He further shared that "developers have a new opportunity to take expertise and intelligence [they] have in [their] apps and services and then register them as extensions, insights and actions."
This broad scale approach brings developer's apps (bots), to common and central "platforms" where infused intelligence can promote the app in useful ways to many users where they are. This strategy would ideally increase app engagement and a user's personal and/or professional productivity, not to mention a potentially higher rating from users regarding the usefulness of the app/bot.
And just as the fictional AIs which have whet our collective appetites for digital companions who can perceive and respond to the physical world, Microsoft has a similar vision for the bots that will serve us.
Microsoft's Cognitive Services, with the accompanying tagline, "Give your apps a human side", aim to give developers the APIs to imbue or expand their app's perceptive abilities beyond "seeing" and "hearing." With emotion and speaker APIs apps will be able to detect emotion and recognize who's speaking. It is Microsoft's intent that, like .NET, these Cognitive Services become core to all applications going forward.
Preparing for the long voyage
Nadella's long-term vision of this bot-populated digital world sees bots moving well beyond text to include animation and even holograms. For my fellow nerds reading this, images of the artificial intelligent hologram, the Doctor, from Star Trek Voyager may spring to mind. We're a long, long, long way from the Doc, but we are, with Microsoft's Universal Windows Platform, HoloLens and "bot-ready" Conversation Canvases such as Skype on HoloLens, at the beginning of the voyage.
Though at this stage in the journey the rules are still being defined, the efficacy of the strategy is being challenged, the AI/bot infrastructure is still being built, the vision is still being communicated, and quite frankly many people are still struggling to wrap their heads around what this all means.
Tech giants Facebook and Google and Siri creators, who are now heading the start-up, Viv, are all approaching AI and or bots a little differently. Facebook is using its wildly popular Messenger app, via the recently introduced Messenger Platform, as a "conversation canvas" akin to Microsoft's approach albeit on a much narrower scope (one canvas rather than many).
Google has Google Assistant and its new intelligent messaging app Allo that benefits from machine learning. Apple, which has found success in its walled garden approach, has finally opened Siri to third-party developers as Microsoft did with Cortana two years ago (though with limited developer response).
Microsoft is not alone in pushing the industry toward an AI and bot-ruled paradigm.
Though Cupertino is not passionately pursuing bots, the company's increased integration of Siri within its ecosystem, and the anticipated support developers may bring to the assistant (because, well - Apple) may bring Apple its own brand of success on the evolving intelligent personal computing landscape. Apple's evolving penchant for openness doesn't end with Siri. Cupertino has also opened iMessage up to third party developers. That said, the efforts of rivals make it clear that Microsoft is not alone in pushing the industry toward an AI and bot ruled paradigm.
In our attempt to find a "mental vantage point" from which to watch this AI and bot story unfold and grasp what Microsoft and others are doing we can focus on the following points.
One of the fundamental points to grasp is that there are many "actors" on the stage. Humans, AI digital assistants and what will presumably be thousands of bots. As a result, the "conversations", which have traditionally been two-way as in person to person, or person to AI, will become increasingly multi-directional (human, AI and bot) and at times may even exclude us altogether.
In the future AI and bot conversations may frequently exclude us altogether.
Now don't feel slighted. Just as the autopilot handles the standard flight path of a passenger jet, or "Computer" autonomously maintains the life support and other tasks of the fictitious Starship Enterprise from Star Trek, bots and AI will in essence support our lives by acting as our autopilots for many tasks.
As digital assistants which get to know us begin acting vicariously for us more frequently, the conversations may increasingly be, more often than we may currently be comfortable conceiving, between AI's and bots. Yes, an (A)I and (B)ot conversation of which we can, for the sake of convenience, "C" our way out.
Of course, Microsoft is intent on ensuring that developers have the tools to build the bots that will facilitate our digital whims. Thus at this very early stage of what may be a shift toward a post-app age of personal computing, Microsoft's Bot Builder SDK is the developers doorway into Microsoft's Bot Framework. If adopted on the level Redmond hopes the firm may very well become the platform for a coming bot revolution.
Approximately a year after Apple introduced the iPhone, Cupertino introduced the App Store. This advent made app discovery and access on the "new and simple" platform easy, and it worked seamlessly with the consumer-friendly form of mobile personal computing. (Though the growth of this user-initiated "warehouse of apps" model has birthed its own challenges).
Nearly ten years later, in a cloud-centric, multi-device mobile personal computing world, Microsoft has introduced a new "storefront" for the AI and bot age. Just like the bots demonstrated during Build 2016, all bots made with the Bot Framework will be discoverable by users and able to be added to conversation platforms directly from Microsoft's Bot Directory.
As the title of this piece suggests, my vision of Microsoft's AI vision has evolved. Not in that my overall perspective has changed, but that my 2014 view of Cortana as the UI between us and the things we want to do has altered with the introduction of more information. The introduction of bots as a major part of Microsoft's strategy and as a part of a profound industry shift were not part my original vision. The purposed integration of a variety of Conversation Canvases was also absent from my view. I did, however, see Cortana as over-seeing and interjecting into our conversations across mediums such as messaging.
The AI as a transient user interface for our mobile experiences was also a part of my vision, though I saw voice and the AI rather than "human language" as the next UI. Admittedly the way I initially perceived Microsoft's AI vision was more akin to what Apple presented in relation to Siri during WWDC 2016.
Apple's opening of Siri to third-party developers, thereby enabling the assistant to interact with a variety of apps via natural language was the end-game I saw for Cortana, which has the same API openness to developers.
Of course, there are short-coming to this model. The user would still have to be cognizant of the app he wants to use, have that app installed and then initiate it via a command to the digital assistant. As we shared in "The Untold App Gap Story" many users are simply not searching for and installing many apps. The advantage of the intelligent bot model is that a developer's app as a bot, "introduces" itself (where applicable) when needed. Through communication with the digital assistant or interaction with a Conversation Canvas the bot is not user-initiated. That said, we are still at ground level with this technology, and the players are still jockeying for position and for the minds of users.
So what's on your mind? Who has the right approach? Is there a correct approach? Or, with our affinity for the mobile web, do Google's Instant Apps, throw a sneaky wrench into the whole "bots-making-apps easier" mix?
And tell us, has your vision of Cortana and AI changed over time? Sound off in comments and let's talk on Twitter. We're sure you have a lot to say!