Twilio's new Streams API and Amazon's Kinesis will increase adoption of Voice Biometrics

The Announcements

At last week's Signal conference in San Francisco Twilio announced the public beta of their Streams API (https://www.twilio.com/blog/media-streams-public-beta). This capability finally allows users to manipulate and process the audio from calls on their platform in real-time. It's now possible to send this audio to any endpoint on the internet and do all sorts of exciting things, including, most relevant for us, passive text-independent Voice BiometricsText Independent Voice Biometrics is a form of Voice Biometrics that is not dependent on the utterance being used during enrolment being repeated during authentication or identification. for speaker recognition.

Amazon released a similar capability for its Connect contact centre platform last December (https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-connect-adds-real-time-customer-voice-stream/) using their Kinesis Video Stream product, although this is currently limited to customer side audio only it opens up the same possibilities.

Twilio, but interestingly not Amazon, has always made the snippets of audio acquired in the self-service telephone applications that are typical for their platform available for use by other applications. It has therefore been possible to develop Active Text-Dependent Voice BiometricsA form of Voice Biometrics that is dependent on the same utterance used during enrolment being repeated during authentication or identification. solutions using their platform for some time. VoiceIT (https://voiceit.io) has even published the code for its demonstration application (https://github.com/voiceittech/VoiceIt2-Twilio-Demo), but this approach requires the enrolmentEnrolment is a step in the registration process where specific utterances are requested from the user or previously acquired audio is used to create a Voice Biometric template (Voiceprint) for subsequent use in Authentication/Identification. of a static passphrase which typically has lower customer acceptance rates and is vulnerable to presentation and replay attacks. It was not possible, until this point, to get access to the caller's full audio until after the call finished which prevented real-time processing and feedback that is essential to deliver the seamless and transparent authenticationAuthentication is the call centre security process step in which a user's identity is confirmed. We check they are who they claim to be. It requires the use of one or more authentication factors. experiences promised by text-independent Voice BiometricsVoice Biometrics uses the unique properties of a speakers voice to confirm their identity (authentication) or identify them from a group of known speakers (identification).. This passive approach used by some of the most successful implementations of the technology with organisations such as Fidelity (https://www.fidelity.com/security/fidelity-myvoice/overview) and First Direct (https://www1.firstdirect.com/banking/ways-to-bank/telephone-banking/#voice-id-security).

AWS Connect, Lambda and Kinesis with a Voice Biometrics engine — AWS's infinitely extendable feature set makes integrating Voice Biometric with Amazon Connect very simple

I've been designing and implementing Voice BiometricsVoice Biometrics uses the unique properties of a speakers voice to confirm their identity (authentication) or identify them from a group of known speakers (identification). services for more than ten years now and while the underlying technology is proven, best practice established and customer acceptance high there are significant barriers to broader adoption which we recently wrote about (https://www.symnexconsulting.com/blog/voice-biometric-adoption-challenges/). The most obvious of these is that the costs still significantly outweigh the benefits and a large proportion of this cost comes from the requirement to build, test and implement custom integrations with the contact centre telephony platform to acquire the customer audio.

Why is this important?

Easier

These new APIs eliminate the vast majority of this complexity and more importantly make it accessible to any reasonably competent developer. I'm not going to suggest that there is no work to do, but Amazon recently published a solution template demonstrating how to use their technologies for real-time text transcription (https://aws.amazon.com/solutions/ai-powered-speech-analytics-for-amazon-connect/), and the same approach works for Voice Biometrics.

Twilio Studio for Voice Biometrics integration — Twilio's Studio make's Voice Biometric integration as easy as point and click

Standardised

For systems integrators and biometric engine vendors, both of these platforms provide a far more stable and predictable target to build against. So, as the user base of both increases exponentially, it now makes sense to invest in development upfront and spread this across multiple customers rather than wait for specific end-user requirements. Auraya, an Australia based Voice Biometric Vendor, has already released a solution to connect their Armorvox product to Amazon Connect (https://aurayasystems.com/eva/), and we expect more to follow their lead shortly. We also hope that some systems integrators will develop agnostic applications that allow different vendors Voice Biometric engine's to be substituted depending on the use case.

Speed

The speed at which these platforms can be stood up also encourages end-user organisations and their partners to experiment and test new solutions before committing to full delivery. If end-user technologists and contact centre business owners can get their hands on this technology and play with it, without lengthy procurement processes, they will quickly understand its full potential. These demonstration systems will also make far more compelling cases to those controlling investment.

What will happen next?

Of course, Twilio and Amazon are not the only players in the space, but they are the most well known and will set the bar for others to meet. As well as it's own Flex product, Twilio also provides the underlying infrastructure for a range of cloud contact centres. Several of these provide integrated solutions focused on specific industries where there may be even more significant benefits from the close linkage with CRM systems. Amazon is also rapidly winning over the world's largest enterprises to its Contact Centre approach with many pilots and deployments underway. We expect its increasingly large group of delivery and technology partners to use the lower cost of implementing solutions such as Voice BiometricsVoice Biometrics uses the unique properties of a speakers voice to confirm their identity (authentication) or identify them from a group of known speakers (identification). as part of the case for migration from legacy platforms.

By the partners announced on stage, it's pretty clear that Twilio doesn't, in the short term, intend to develop their Voice Biometric solutions. While Amazon has some of the required capabilities in it's Alexa product (https://www.symnexconsulting.com/blog/alexa-voice-profiles-implications-customer-service-and-voice-biometrics/) it's a reasonably big leap from this to authenticating calls for contact centres. We, therefore, expect Amazon to follow the partner route, but only Auraya and Pindrop are currently listed, and Twilio has none.

Unfortunately, only a handful of Voice Biometric Engine vendors currently offerOffer is a step in the registration process where a user is introduced to and offered enrolment in a Voice Biometric service. the kind of hand over your credit card details and get started customer onboarding approach familiar and expected by users of these platforms and those that do only provide this for active text-dependent solutions. Just as a customer-focused consentConsent is a step in the registration process where a user provides permission to process their biometric data before enrolment in a Voice Biometric system in a way which complies with applicable data protection and privacy legislation. and enrolmentEnrolment is a step in the registration process where specific utterances are requested from the user or previously acquired audio is used to create a Voice Biometric template (Voiceprint) for subsequent use in Authentication/Identification. process is essential for successful implementation of Voice Biometrics so vendors will need to remove friction from their onboarding processes if they want to win in a rapidly scaling market.

Conclusion

This announcement underpins one of the three key trends we've observed that lead us to expect Voice Biometrics to be on the verge of far broader adoption. I'll be discussing how the other trends, developments in Voice Biometrics as a Service offering and the use of hybrid authenticationAuthentication is the call centre security process step in which a user's identity is confirmed. We check they are who they claim to be. It requires the use of one or more authentication factors. approaches in conversational IVRs, complete this hypothesis in separate articles shortly. Make sure you sign up for our email alerts if you want to be the first to receive them.