What Amazon’s Alexa Voice Profiles Means for Customer Service and Voice Biometrics

Matt SmallmanHome Page, Industry News

This week Amazon announced ‘Alexa Voice Profiles’ as a new feature for their best-selling voice assistant. This follows Google’s addition of a similar feature on it’s ‘Home’ devices earlier in the year and helps validate our view on the direction that the big four (Amazon, Google, Apple and Microsoft), are likely to take with Speaker Recognition and Voice Biometrics. In this article we will delve into how Amazon’s feature works, what happens behind the scenes and the implications for customer service and security.

How does it work

We’re big fans, here at SymNex, of Amazon’s Alexa and have been using a number of devices for more than a year. This new feature is not yet fully available and probably won’t be in the UK for a few months, so we haven’t actually managed to get our hands on the service yet. However, there is a lot of information on Amazon’s site (LINK) that shows how it works.

Setting it up

Alexa Voice Profiles Sign Up Flow

Alexa Voice Profiles Sign Up Flow

An Alexa user can set up this new feature on any device that they are associated with, using the Alexa app on their phone or tablet. The set-up process requires them to create a “Voice Profile” under Your Voice in Settings. After giving Amazon permission to create the profile and store it in the cloud, the user is then asked to speak ten different phrases to a designated device from a short distance and in a quiet environment. These phrases are displayed on the screen and include the Alexa wake word as well as the sort of routine commands you would give the devices e.g. playing music and getting a weather forecast. Once this is complete, Amazon says it may take up to 20 minutes to be fully working on your device.

The enrolment journey is pretty well done, if not a little obscure to find, although we are confident it will soon become part of the standard new device set up journey. Amazon handles the increasingly regulated issue of obtaining user consent well with clear and simple language, so it’s transparent how their information will be used. Whilst we haven’t seen the flow for EU countries yet, it’s likely to only need a few tweaks to comply with the impending General Data Protection Regulations, so provides a good example for others to follow. The demo video conveniently skips seven of the ten training phrases, so it is likely to take several minutes, which may put some users off. Finally, there is a nice touch in allowing you to ask the question “Alexa, who am I?”, in order to confirm that the profile is set up.

In order to add another user, you can either set them up as part of your Amazon household (via Amazon’s Prime or Kindle Services), or they can create their own profile by signing in to the Alexa app on your’s or their own device and creating a “someone else” profile.

What can you do with it?

After you are set up, Alexa can tell who you are and play relevant messages from the new Amazon Calls and Messaging feature, or from your Prime Unlimited Music service. Where Amazon does move forward from Google’s approach, is that you no longer need your voice ordering PIN to complete orders on Prime. Particularly useful if you don’t want a Doll’s House being delivered as a result of Alexa comments by oblivious TV news readers (LINK), or if you just have a Lego obsessed tech-savy child! Clearly this just removes a little more friction from the impulse buying process on Amazon and given their scale, probably covers the development cost of the feature alone.

What’s going on behind the scenes

Amazon Web Services Application Architecture

Image – Copyright Amazon Web Services Inc

First, we must remember that not all Alexa commands need to know who you are and those that do, like messaging and purchasing, are likely to be longer than “Alexa, what’s the time?”. Second, the construction of these command statements is reasonably consistent, so using their already excellent speech recognition capability, it won’t be hard for them to use a text dependent algorithm against the most common words and a text independent algorithm against the whole statement. The training phrases are probably well-chosen to represent the range and frequency of these common words to make this even easier. It’s also possible that they will further enhance this with other audio acquired outside the enrolment process and have publicly said it will get better with time, although it’s not clear if this is in terms of accuracy or available functions.

In most cases this is an identification rather than authentication service, that merely needs to identify the user from amongst a small group of potential speakers on any one device (it’s not yet clear if there is a limit to how many “someone else’s” there can be on an account). Amazon consciously avoids any mention of authentication or security in its help or marketing material. However, given the lengthy enrolment process, which is likely to create a reasonably high-quality voiceprint, and the replacement of the voice ordering PIN, it’s likely that Amazon is considering wider applications.

It’s also clear from the disclosure that all the processing is taking place in the cloud (something Google shied away from, claiming that this element of processing is done on the device). Whilst this has all sorts of privacy and data security implications, particularly as this type of data gets more scrutiny in the future, they are already accustomed to these challenges from their other lines of business. The advantage is a significant increase in convenience, for example any new Alexa device added to a household already knows who you are and you only have to train it once. Profiles are also portable between households, which will particularly appeal to student and other co-habiting groups.

What does this mean for the rest of us?

The topic of Voice Assistants often crops up when we are discussing new Speaker Recognition applications with clients. There is often some excitement about how the effort we put into enrolling customers in other channels might be transferable. However, the implications of this story really depend on your applications.

Within the Alexa Ecosystem

For organisations looking to provide service through the growing ecosystem of Alexa devices it’s unclear how Amazon will allow other applications (or ‘Skills’ as they refer to them) to integrate with the voice profile service as to date, there is no publicly available documentation but given their track record this is surely just a matter of time. It’s likely that this will be similar to the existing process of associating calendars and music services through an Open Authentication (OAuth) standard login process.

When this is released it will enable these ‘skills’ to go beyond the purely informational service they provide today and could include meter readings, account balance and transaction reporting, as well as some low risk transactions. The big overhead and barrier to take-up will be in getting users to associate these accounts but fortunately due to the design of the feature you’ll probably only have to do it once.

Unfortunately the bad news for organisations with greater security requirements, often driven by regulation, who already have their own voice templates for customers, is that Amazon continues to show no signs of allowing access to the raw audio stream. This would have enabled them to use their own judgment as to whether the speaker was who they claim to be and it’s unlikely for these activities that Amazon asserting the users’ identity is likely to be sufficient. We think that this is a shame but understand the privacy implications of sending the wrong audio to the wrong organisation but are sure there are pragmatic ways to mitigate this.

Beyond Alexa

For applications beyond the Alexa ecosystem this could be the beginning of a more widely consumable Voice Recognition API (Application Programming Interface). Amazon is of course famous for building everything as something they can later sell and make available to others as an API, from contact centres (Amazon Connect) to warehouses (Fulfilled by Amazon). The theoretical API would work by submitting a claim of identity and audio file to Amazon’s servers and, subject to the claimed identity having authorised the submitting application, Amazon would return a level of confidence that would enable the application to continue. Microsoft already has a similar service, Speaker Recognition API, available in preview under their Azure Cognitive Services banner but the critical difference here is that Amazon could, if they choose to, re-use all the profiles already registered by Alexa devices as well as add new registrations from these applications to it’s database.

Amazon could obviously choose to keep this to themselves and just use it in their own contact centres but as they have already made the software which runs this publicly available (with Alexa integration), we are hopeful that we might see some movement here, particularly as it provides Amazon with an advantage over the likes of Twilio. This approach would overcome some of the significant burdens that more convenience focused organisations have in reducing their dependence on knowledge based authentication processes. It would avoid the need to procure the technology and the cost/expense of training customers but is not without a significant number of hurdles. The key challenge is likely to be the low penetration of voice profiles amongst the population in general but it could still be a powerful additional feature of an Alexa powered IVR (Interactive Voice Response), which we will be writing about shortly.

More broadly, Amazon’s confidence to allow purchase to be made using a “Voice Profile” as well as privacy sensitive messaging, provides an exemplar for the emerging “Third Way” to use Voice Biometrics for Speaker Recognition. With clever design, pragmatic risk management, speech recognition and hybrid text and text-independent algorithms, it’s possible to use the short pieces of audio typically acquired in self-service applications to get enough confidence in a callers identity to complete these transactions.

Looking Forward

Apple's HomePod

Apple’s HomePod – Image courtesy of Apple

It’s going to be really interesting to watch this story develop over the next few months as the public get their hands on it (and inevitably break some of it), and Amazon makes it more widely available in terms of customer base and functionality for third parties to access. Whilst it’s not quite what some of our enterprise clients might have been hoping for, it does have a wide range of positive implications for the future of Voice Biometrics in customer service. We’re also interested to see how Apple will handle the similar challenge they face with their imminent HomePod. We think that it is unlikely, given their track record and privacy leaning, that they can better Amazon’s Alexa Voice Profiles but are eager to test them both out as soon as they become available. Watch this space…