Twilio’s new Streams API and Amazon’s Kinesis will increase adoption of Voice Biometrics

Matt Smallman
Matt Smallman
6 mins read
Last updated
18 Oct 2024

At last week’s Signal conference in San Francisco Twilio announced the public beta of their Streams API (https://www.twilio.com/blog/media-streams-public-beta). This capability finally allows users to manipulate and process the audio from calls on their platform in real-time. It’s now possible to send this audio to any endpoint on the internet and do all sorts of exciting things, including, most relevant for us, passive text-independent Voice Biometrics for speaker recognition. 

Amazon released a similar capability for its Connect contact centre platform last December (https://aws.amazon.com/about-aws/whats-new/2018/12/amazon-connect-adds-real-time-customer-voice-stream/) using their Kinesis Video Stream product, although this is currently limited to customer side audio only it opens up the same possibilities.

Twilio, but interestingly not Amazon, has always made the snippets of audio acquired in the self-service telephone applications that are typical for their platform available for use by other applications. It has therefore been possible to develop Active Text-Dependent Voice Biometrics solutions using their platform for some time. VoiceIT (https://voiceit.io) has even published the code for its demonstration application (https://github.com/voiceittech/VoiceIt2-Twilio-Demo), but this approach requires the enrolment of a static passphrase which typically has lower customer acceptance rates and is vulnerable to presentation and replay attacks. It was not possible, until this point, to get access to the caller’s full audio until after the call finished which prevented real-time processing and feedback that is essential to deliver the seamless and transparent authentication experiences promised by text-independent Voice Biometrics. This passive approach used by some of the most successful implementations of the technology with organisations such as Fidelity (https://www.fidelity.com/security/fidelity-myvoice/overview) and First Direct (https://www1.firstdirect.com/banking/ways-to-bank/telephone-banking/#voice-id-security).

AWS’s infinitely extendable feature set makes integrating Voice Biometric with Amazon Connect very simple

We’ve been designing and implementing Voice Biometrics services for more than ten years now and while the underlying technology is proven, best practice established and customer acceptance high there are significant barriers to broader adoption which we recently wrote about (https://www.symnexconsulting.com/blog/voice-biometric-adoption-challenges/). The most obvious of these is that the costs still significantly outweigh the benefits and a large proportion of this cost comes from the requirement to build, test and implement custom integrations with the contact centre telephony platform to acquire the customer audio.

Why is this important?

  • Easier – These new APIs eliminate the vast majority of this complexity and more importantly make it accessible to any reasonably competent developer. I’m not going to suggest that there is no work to do, but Amazon recently published a solution template demonstrating how to use their technologies for real-time text transcription (https://aws.amazon.com/solutions/ai-powered-speech-analytics-for-amazon-connect/), and the same approach works for Voice Biometrics.
Twilio’s Studio make’s Voice Biometric integration as easy as point and click
  • Standardised – For systems integrators and biometric engine vendors, both of these platforms provide a far more stable and predictable target to build against. So, as the user base of both increases exponentially, it now makes sense to invest in development upfront and spread this across multiple customers rather than wait for specific end-user requirements. Auraya, an Australia based Voice Biometric Vendor, has already released a solution to connect their Armorvox product to Amazon Connect (https://aurayasystems.com/eva/), and we expect more to follow their lead shortly. We also hope that some systems integrators will develop agnostic applications that allow different vendors Voice Biometric engine’s to be substituted depending on the use case.
  • Speed – The speed at which these platforms can be stood up also encourages end-user organisations and their partners to experiment and test new solutions before committing to full delivery. If end-user technologists and contact centre business owners can get their hands on this technology and play with it, without lengthy procurement processes, they will quickly understand it’s full potential. These demonstration systems will also make far more compelling cases to those controlling investment.

Of course, Twilio and Amazon are not the only players in the space, but they are the most well known and will set the bar for others to meet. As well as it’s own Flex product, Twilio also provides the underlying infrastructure for a range of cloud contact centres. Several of these provide integrated solutions focused on specific industries where there may be even more significant benefits from the close linkage with CRM systems. Amazon is also rapidly winning over the world’s largest enterprises to its Contact Centre approach with many pilots and deployments underway. We expect its increasingly large group of delivery and technology partners to use the lower cost of implementing solutions such as Voice Biometrics as part of the case for migration from legacy platforms. 

By the partners announced on stage, it’s pretty clear that Twilio doesn’t, in the short term, intend to develop their Voice Biometric solutions. While Amazon has some of the required capabilities in it’s Alexa product (https://www.symnexconsulting.com/blog/alexa-voice-profiles-implications-customer-service-and-voice-biometrics/) it’s a reasonably big leap from this to authenticating calls for contact centres. We, therefore, expect Amazon to follow the partner route, but only Auraya and Pindrop are currently listed, and Twilio has none.

Unfortunately, only a handful of Voice Biometric Engine vendors currently offer the kind of hand over your credit card details and get started customer onboarding approach familiar and expected by users of these platforms and those that do only provide this for active text-dependent solutions. Just as a customer-focused consent and enrolment process is essential for successful implementation of Voice Biometrics so vendors will need to remove friction from their onboarding processes if they want to win in a rapidly scaling market.

Conclusion

This announcement underpins one of the three key trends we’ve observed that lead us to expect Voice Biometrics to be on the verge of far broader adoption. We’ll be discussing how the other trends, developments in Voice Biometrics as a Service offering and the use of hybrid authentication approaches in conversational IVRs, complete this hypothesis in separate articles shortly. Make sure you sign up for our email alerts if you want to be the first to receive them.

[cp_modal display=”inline” id=”cp_id_cf4fb”][/cp_modal]

Popular Posts