Next generation automation with digital assistants
Digital assistants are a kind of software agents that reside on a device and help the user to perform tasks in more intuitive and efficient way. In most of the scenarios, the digital assistant is voice enabled and in some case, it can be text or gesture controlled. This article aims at covering the current landscape of voice-enabled digital assistants available today and their application in the field of automation. The most popular digital assistants that are available today are Apple’s Siri, Google Assistant, Microsoft’s Cortana and Amazon’s Alexa. Here are few predictions for digital assistants:
“By 2019, at least 25 percent of households in developed economies, digital assistants on smartphones and other devices will serve as the primary interface to connected home services” – Gartner
“The virtual digital assistant market will reach $15.8 billion worldwide by 2021” – Tractica
Digital assistant landscape
Let’s have a look at these digital assistants from different companies. The vision for each of the digital assistant is different and depends on the overall strategy of the company.
Cortana is intelligent personal assistant developed by Microsoft and primarily available on Windows 10, Windows Mobile and Windows IoT Core. It is also recently released on other platforms like Android and iOS. It aims at automating tasks or workflows that are performed on Windows 10 laptops and PC. A typical example will be integrating with other Microsoft productivity applications like Skype, Outlook etc. for voice enabled automation of routine jobs like sending emails, messages or attending conference calls.
Apple’s Siri is only available on its own platforms like iPhone, iPad, Apple Watch and Apple TV. So far Siri has been confined to the boundaries on Apple ecosystem and mainly used by users for automating few tasks on iPhone or iPad like setting alarms, creating reminders, calling people etc. The recent announcement of Apple version of smart speaker called as HomePod hints at Apple intention of entering into the league of home automation as that of Amazon and Google.
Google Assistant is available on Android enabled devices as well as on smart speaker called Google Home. Google Home is clearly banking on its strategy of organizing world’s information and enabling one more way to consume the same.
Amazon’s Alexa is exclusively available on Amazon Echo that is a voice-enabled wireless speaker. Alexa seems to be more inclined towards the Amazon’s core business of online retail and allows the user an easy to way to order items with it.
Even though the business goals of all the digital assistants are different, there is one common thing, and that is automating the day to day activities for consumers and enterprises.
Digital assistant in real life automation
All of the digital assistant technology is still in early stages of development and was mainly limited to the platform it supported. Recently the vendors are opening up the digital assistant for third party application integration. And this is an indication of start of paradigm shift that will forever change the way we interact with applications and devices.
Consider an example of cab booking. The user currently has to go through number of steps to book a cab. Now with the help of digital assistant like Apple Siri, a user can book the cab with the help of natural language like English in very efficient and intuitive way. This is possible because Apple allows third party application like Uber to integrate their capabilities with Siri. This is a great example of automating a consumer workflow for a day to day activity.
Technology behind the digital assistants
Let’s have a deep look at the digital assistant technology that is enabling consumer and enterprise automation.
Apple provides SiriKit to handle the requests or intents that originate from Siri. Typically a developer implements intents app extension that gets the request or intent from Siri and turns into application specific actions. With SiriKit, developers can automate specific workflows in their applications. The other interesting toolkit available from Apple is HomeKit that is used to communicate and control smart home appliances. The actions for HomeKit can be triggered from Siri hence enabling voice controlled home automation use cases.
Microsoft provides Cortana Skills Kit to help developers build extensions or skills for Cortana that connects with third party applications. The skill basically detects the user intent from the spoken words with the help of machine learning and sends the request to associated application for a response. Microsoft also provides cognitive services like LUIS (Language Understanding Intelligent Service) for natural language understanding.
Google provides Google Assistant SDK to enable interacting with a device with voice. Developers can create their own voice request and response workflow with the help of SDK to control a device. This SDK can turn any small device like Raspberry Pi into a full voice-enabled machine that can automate day to day activities.
Amazon provides Alexa Skills Kit to integrate voice capabilities to an application. The skill typically handles a request and the associated intent. A developer can create a custom skill or a standard skill like home automation. For standard skills like home automation, Amazon also provides Smart Home Skill API.
Note that most of the above development kits are in preview stage and will get matured over a period of time. But the important point is that it is available to developers to build the next innovative thing in consumer and enterprise automation.
Challenges for digital assistants
Let’s have a look at few challenges faced by digital assistants and the underlying technology.
Recognizing precise words and different user accents has been a big challenge. In last few years, with the help of machine learning, this is significantly improved but yet not perfect. Consumer frustration for not recognizing the spoken word or the intent is very common in today’s digital assistants. This coupled with other environmental factors like noise; audio hardware quality etc. makes the speech recognition a challenging task.
Understanding context of the verbal conversation is another challenge. The digital assistant should understand the context in which the words are spoken. A mismatch in context can result in incorrect intent detection and possibly wrong action being performed. In the case of automation, this can be annoying or in some cases can be catastrophic.
The most advanced level of challenge in this area is user identification based on voice along with the context for that specific user.
Example automation use cases
Here are few example scenarios where automation can be performed with digital assistants.
Voice enabled bar where instead of a human bartender, a machine or device can take orders from the customer and prepare the perfect drink to enjoy.
An interactive vending machine, where a user asks for the items they want to purchase. In turn, the machine helps the customer with voice response to pay for the item and dispenses item after a valid payment.
Human-like interaction with home appliances to perform the related tasks like asking a coffee machine to prepare espresso or cappuccino, ask air conditioner to set a temperature and also set a timer for auto turn off, instruct the garage door to open or close with voice commands.
In a typical office environment, a user can ask the digital assistant to book conference rooms or instruct a leave management application to apply for leave and set out of office status.
The digital assistant or voice enabled agents will be the game changers for the future to automate the day to day tasks in a more humane way. It will remove the intricacies of the automation process that exists today and make it well pervasive in consumer and enterprise world.