Battling Deepfakes and Synthetic Voices: Safeguarding Voice Biometrics Systems
In this insightful webinar, we explored the growing challenge of deepfake and synthetic voice attacks on Voice Biometrics systems. With the rapid advancements in artificial intelligence and machine learning, malicious actors are finding increasingly sophisticated ways to bypass security measures, making it crucial for organisations to stay ahead of these emerging threats.
This session covered the following key topics:
Watch the full session to equip you with the knowledge and tools to safeguard your Voice Biometrics systems from these emerging threats.
Matt is the author of “Unlock Your Call Centre: A proven way to upgrade security, efficiency and caller experience”, a book based on his more than a decade’s experience transforming the security processes of the world’s most customer-centric organisations.
Matt’s mission is to remove “Security Farce” from the call centre and all our lives. All organisations need to secure their call centre interactions, but very few do this effectively today. The processes and methods they use should deliver real security appropriate to the risk, with as little impact on the caller and agent experience as possible. Matt is an independent consultant engaged by end-users of the latest authentication and fraud prevention technologies. As a direct result of his guidance, his clients are some of the most innovative users of modern security technology and have the highest levels of customer adoption. He is currently leading the business design and implementation of modern security for multiple clients in the US and UK.Only available to signed-in members
[00:00:00] Matt: Hi. Good afternoon, everyone and thank you very much for joining us, uh, this afternoon, uh, for our Modern Community, Modern Security Community session. My name is Matt Smallman. I’m the author of Unlock Your Call Centre, tactfully positioned here just behind me. Uh, and, uh, just introduce myself briefly. My job really is to help organizations, uh, remove those time-consuming, frustrating, and, and pointless security processes that we’re all too familiar with.
[00:00:24] Uh, through the Modern Security Community we’ve been doing our best to improve understanding, uh, and build the case for better security processes, particularly, uh, in consumer-facing applications. So I’m delighted that you’ve been able to join us this afternoon for our special, uh, session, where we’ll be looking at battling deepfakes and synthetic voices, uh, subtitled Safeguarding Voice Biometric Systems.
[00:00:47] Matt: The objective of this afternoon’s session really is to increase understanding of this deepfake threat to voice biometrics, uh, discuss their real-world implications, and discu- and potential mitigation approaches. This is, uh, an incredibly popular topic, um, but we have specifically, um, scheduled this session after a number of previous ones because I think it’s really important to get it in context of both how voice biometrics works as a technology, but then also the entire range of vulnerabilities that voice biometrics and other call center security technologies are prone to.
[00:01:21] So if you haven’t already, and you, or you don’t have, uh, a great understanding, I’d encourage you to check out, uh, the previous sessions. We had, uh, myself and Ian McGuire running a Beginner’s Guide to Voice Biometrics. Uh, uh, which is a really interesting session and really, uh, some, uh, interesting and amusing anecdotes from Ian as well. Uh, and then a couple of weeks ago we ran, uh, Understanding and Mitigating Voice Biometrics Vulnerabilities.
[00:01:44] Both of those are available, uh, on the Modern Security Community website and I particularly encourage you to check them out because we’ve improved our video player, um, which should enable you to jump exactly to the right sections in the things that you want to know about. So on the, on the right-hand side you’ll see all of the key sections in the video, um, highlighted and just click on those and it will take you to that section.
[00:02:03] So y- you don’t have to sit through, uh, 60 minutes of, uh, Scottish jokes in order to find the nugget of insight that you, that you want. So that’s the kind of prelude.
[00:02:15] Matt: I also want to introduce, uh, our special guest in this afternoon’s session, uh, h- Haydar Talib, who’s joining us, uh, from Nuance Communications. Um, do you want to give a quick, uh, intro, Haydar?
[00:02:25] Haydar: Sure thing, Matt. Um, hi, everybody. Thanks, thanks for having me on today. Um, really interesting topic. So my name is Haydar Talib and I manage the, um, security and biometrics research team at Nuance, um, which as some of you may know is now Microsoft company. Um, there’s actually, we have a long history in the space of voice biometrics. We were, we’ve been developing voice biometrics technology for over two decades.
[00:02:51] And for, um, reasons that, you know, will become more obvious as Matt and I have the discussion on the topic with deepfakes, um, we’ve also expanded the technologies that we developed in this space of secure authentication and fraud detection, uh, namely in the space of, um, anti-spoofing technologies as well as, um, technological factors that would be complementary to voice biometrics as well. So kind of my team are the, the group that develops the, the underlying technologies that are used in our solutions.
[00:03:25] Matt: Thanks very much. So, um, this afternoon’s session, we’re going to go through three, three main areas really. We’re going to talk about the evolution of text-to-speech and where the, the origins of this deepfake and synthetic voice threat. We’re going to talk about how that manifests itself then as a threat to voice biometrics, and then we’re going to discuss, uh, the range of mitigations. And the ways to think about how to, um, counter this threat when implementing these tech- these and other technologies.
[00:03:50] At the end, we’ll have a opportunity to look at potentially the broader implications of, uh, synthetic voices and what we as, uh, an industry and as a set of experts in that field might be able to do to help society as a whole. Ha- tackle some of the challenges that are inevitably going to come down the road. This is a really popular and interesting topic that lots of people have thoughts, concerns, and interests about, so we would love to hear your questions. Uh, and I will kind of moderate those and drop them in at the right sections in the session.
[00:04:21] There are two ways you can do that. You can use the chat feature in Zoom. If you use the chat feature though, bear in mind that everyone else will be able to see your, your name and, and details. Um, or you can use the Q and A feature where only I can see, um, who’s asked a question. And I won’t attribute it to any particular organization when, when I’m doing that. So I’d encourage you to use both of those. The more interactivity we get the better. Otherwise, Haydar and I are gonna just kind of drift off into, uh, into a conversation about different, different aspects of this that we find interesting.
[00:04:50] But what we really want to hear from is the things that are worrying you, keeping you up at night, and how we might help, um, address those, uh, issues. Okay.
[00:04:57] Matt: So making a start then. Um, just to find the right buttons here. So this is the, uh, Kempelen speaking machine. This was from, uh, 1783 to 1784, um, a guy called Kempelen, who’s a really interesting, um, scientific inventor, um, came up with this. And effectively it’s, it’s mimicking the human voice. It’s mimicking the way in which the human vocal cords creates voices.
[00:05:24] Uh, and, um, it’s quite [laughs] amazing to think, um, that it was kind of almost two, 250 years ago now that, um, this was created. Um, but this guy’s also famous for coming up with a thing called the, um, chess Turk, which was a complete fau- fraud, so that was where, uh, a machine was built in order to play chess and actually it was a man hidden underneath. So humans have been interested, um, naturally in replicating human speech since, since time immemorial.
[00:05:51] There, there are, there are, uh, in- incidents of people trying to, theorizing about how this could be done, um, go- going back to, uh, before the current era. So this is, this is some, nothing new and you can see why there’s that interest. Um, but really it’s, it’s accelerated in, in the last few years. Um, this is, uh, the 1939, uh, World Fair, uh, where Bell Labs demonstrated the VODER, um, system, which was a sort of pseudo-mechanical electrical system for producing those phonetics, um, that are key to the, to the human voice.
[00:06:22] Not, not entirely practical, but for anyone who’s kind of following the final series of, uh, Miss Maisel on Amazon Prime, you might recognize it from, uh, from a previous episode. Um, but even going back to the 1980s, the, the technology in systems required to recreate even the most simplistic human voice were really, really challenging. This is the amount of kit that Stephen Hawking had to carry around on his wheelchair, um, in order to create that really, really terrible, um, artificial voice that you would have, um, you’ll recognize from, from his, his work, um, over the, over the last decade.
[00:06:57] Obviously it got smaller, uh, and reduced over time, but this, this is just the 1980s version. Now, clearly over time, um, and in recent years, and, in fact, months, the technology has improved, um, dramatically. Um, and just kind of hat tipping to, to some of the organizations involved in there and, and potentially some of the both good a- and potentially bad actors in here. And also want to kind of touch on, on why this is so important.
[00:07:24] Matt: Well, the first thing is the emergence of deep neural networks. So as opposed to effectively hand-crafting the vocal, um, cords necessary in order to create different words and creating in vast dictionaries of the way in which every single word is said. By exposing AI machine learning and, and deep neural networks to large repositories of human speech, we’ve been able to get the computer to effectively c- figure out the way in which to replicate human speech, without us having to do all the hard work, uh, in between.
[00:07:53] Uh, and that has, created kind of a dramatic increase in both the quality and the speed in which these voices can be created. Why, why is that of interest commercially? Well, if, if you think about the cost it might take to get, uh, a Hollywood actor back on set to rerecord a certain piece of dialogue and the amount of delay and sh- delay to the schedule that might cause to, uh, to next year’s Hollywood blockbuster or Marvel series film, then it’s absolutely worth the time and money to do that.
[00:08:22] If you thought about how you might want to create more and more au- more and more high-quality audiobooks, um, how you might want to localize your services in many different languages, how you might just want to, uh, improve the quality of your interactive voice response, uh, service. There are a whole host of completely legitimate commercial applications for better, and higher quality, and even cloned synthetic voices. Uh, I think the one that sticks in my mind most interestingly is, is just last week Apple announced, uh, that they’ll be introducing a service in the next generation of, uh, IOS specifically designed to help people who are at risk of losing their voice, capture elements of their voice such that they can sound like themselves in the future.
[00:09:06] So there, there is a huge genuine commercial accessibility, uh, and even kind of, um, public desire and need for these kind of services. So we’re, they’re, they are not going away, um, but there are some key characteristics that I think we, we’ll come back to, um, in, in our sessions later on that are, that are interesting that we can be, that can be picked up on again.
[00:09:28] Haydar: The- I think, I think it’s worth highlighting here and I think you said, you touched upon this a little bit, Matt, but, um, you know, the primary use cases of these technologies actually has little or nothing to do with voice biometrics at all. So I know we’re, we’re going to be looking at these in the, through the prism of the voice biometrics world. Um, but this point is going to come back in a subtle way, uh, shortly. Uh, you know, the goal is to produce natural, human-sounding, uh, speech synthesized by, by these advanced neural network-based technologies. But the ultimate goal, the, the sort of objective function they’re optimizing for is not, um, one that is necessarily targeted at voice biometrics, so.
[00:10:08] Matt: Uh, yeah. So that’s a really, really important factor. I think, I think the other things to, to, to bear in mind as you think about the origins of this technology are just kind of where that learning’s taking place. So there are only a finite number of extremely large, um, audio, spoken-word repositories under which these systems are trained. Uh, and the training process for these systems, uh, a lot of these systems today rely on what we call pretrained models and they’re effectively, they spend a huge amount of computational power, effort, and money, um, to produce these pretrained models, on which a small overlay is effectively created to make the voice sound a little bit different.
[00:10:48] So the fact that there is a, uh, a corpus that’s used to create it in the first place, uh, and the fact that there are a finite number of these pretrained models both give some clue to, um, and, and though those pretrained models are very expensive and time-consuming to create, give some clue as to how we might be able to s- think about mitigating this threat as, as it evolves over the next few years. Cool.
[00:11:10] Matt: So I just want to talk a few, for a few minutes, really, about what I see as the … ‘Cause I mean, this, this is, this stuff is not new. Yeah, so I, I’ve, I’ve been implementing voice biometric systems for more than a decade. Synthetic voices have existed for exactly that same period of time. This has always been what we would call a horizon risk. It is something that it was reasonably inevitable at some point synthetic voices would be good enough to start fooling the human ear and they might become good enough to start fooling biometric systems.
[00:11:39] So as we’ve been thinking about that threat over the last decade, I’ve identified what I call three key boundary conditions. Uh, and, uh, oh, I just put my slides back up. So, uh, they are quality. Um, the quality of the audio that’s being produced, the amount of audio required in order to produce that, uh, both the content of it and the duration of it, and the speed of that production. Uh, and we’re going to have a little look at each of those, uh, in, in, in the next, uh, few minutes ’cause I think Haydar’s got some really, really, really interesting examples.
[00:12:11] Matt: So actually, let’s go to tho- let’s go to those now. Let’s, let’s talk first off about the quality aspects, okay? Uh, so where do we go for this? So there, here we go. We’re gonna, we’re gonna try and run this, uh, this test, uh, online. So Haydar, uh, and in fact, I have as well, has created a number of synthetic versions of his voice. So we’re gonna play a little game.
[00:12:31] Uh, and if I j- can just launch the quiz? So you should have had launched online now. You should have had launched a quiz and I’m going to play seven different audio samples, uh, and, uh, if you could use the quiz function to tell us whether you think they’re real or fake? Uh, and then we’ll come back to those at the end. So, so here’s sample number one.
[00:12:50] Haydar: Hi. This is Haydar.
[00:12:51] Matt: What do people think? Let’s have your, have your answers. Oh … Well, this is an interesting experiment, this. Okay.
[00:12:59] Indeed.
[00:13:00] And the answer is … It’s a fake, which, uh, 92% of you got correct. So, uh, so well done [laughs] everyone. Uh, next sample.
[00:13:10] Haydar: How can we help you today?
[00:13:11] Matt: Oh, there’s some hesitation going on. Not everyone’s voting quite as quickly. Okay, so, uh, 75% of people think that’s a fake, but in practice it is. It’s real. Uh, let’s try another one.
[00:13:25] Haydar: Yes. Of course, I could.
[00:13:26] Matt: Oh, okay. Apparently some people are having difficulty accessing the quiz ’cause if you hit submit, then it, um, then it disappears. Uh … So I’m sorry about that. So, uh, yeah. So if you, [laughs] you can put your comments in the, in the chat if you like? So, uh, that was sample, uh, three. Uh, which the, one person left in the quiz, um, because I didn’t explain it very well. Apologies, uh, said was fake, but in practice is real. Okay. We’ll just do one or two more. Let’s do, uh, let’s do this one because it, it tells you a bit about Haydar’s personality.
[00:13:58] Haydar: Marvel superhero films are not as good as they used to be.
[00:14:00] Matt: Just throw, throw some answers in the, in the chat. It’s a fake. There’s some reals. Oh, th- there’s, there’s reals, fakes. S- so- oh, fakes, fakes. Uh, well, oh, th- there’s a consensus emerging for real. Uh, but it is, in fact fake.
[00:14:19] Yeah.
[00:14:19] Okay. So, um, given the quiz has failed, we’ll, um, we’ll, we’ll, we’ll move on and not, not necessarily cover all of the rest of those samples, but I, I think that makes some really, that makes some really interesting points, Haydar. Do, do you want to, do you want to jump on those before we, we carry on?
[00:14:31] Haydar: I, I mean, I won’t, [laughs], I won’t make anyone, uh, feel, feel ashamed for their, their performance here. I think the, the key point is, you know, it’s i- indeed, these technologies have advanced to the level where they are extremely difficult to detect by the human ear and in this case, you know, you all have just heard me speak for, for a few moments.
[00:14:53] Um, but, you know, a lot of these fake, uh, synthetic voices, uh, clones of myself using some of the more advanced technologies available today, um, can indeed produce, you know, what seem to be true-to-life, um, spoofs, in so far as the human ear is concerned. Um, and I think that’s, that’s a key point, um, to bring in at this point. And, um, I know Matt, you wanted to segue into maybe the technological side of things. How do, how do they look?
[00:15:22]
[00:15:22] Matt: Yeah. I, I th- I think or, or so, there’s, there’s a couple of things that flow out of that. So, so one is like you’re listening to this on a reasonably high-definition Zoom telephone call, uh, Zoom call, um, which isn’t a great comparison for the some- sometimes the quality of audio we get on a, uh, on in the phone channel as people call, speak to the contact center. But more importantly, I wanted to kind of come back to a slide that we used in a previous session in the introduction to voice biometrics and this talks about this decision-making.
[00:15:49] Like voice biometrics is, uh, is a probabilistic technology, it’s not deterministic, so every single, uh, every single test produces a score and, and we use this nought to 100. And everyone else uses different ways of scoring it. But nought to 100 works kind of for, for this base and you’ll see the vast majority of genuine speakers sound like themselves, but they never really sound perfectly like the enrolled samples. So we see, up here, we see the genuine speakers.
[00:16:12] A high, a high proportion of those tend to score really well. Uh, most people, most other imposters sound nothing like the real person they are. So the majority of imposters score o- far over to the left, but they never score quite zero because they’re, they’re still humans. They’re probably speaking English. There’s some elements of commonality in there. Um, and what synthetic voices are really doing is trying to create something in this middle ground.
[00:16:35] Uh, and we’ve drawn the curve here for illustrative purposes and, and Haydar, I’m sure can, can tell us about some, some results. What we tend to see with synthetic voices is that they are, as Haydar says, they’re, they’re designed to replicate the elements of the voice that humans think are most distinctive. Those that we would recognize in a movie, or a podcast, or when we’re listening to a localized version of a service. So whether we, when we’re trying to listen to our, our, our, our great aunt’s voice that’s been, um, protected over time because of her, her loss of voice.
[00:17:06] So synthetic voices are moving the curve, but they’re not moving it completely overlapping with genuine and, and this is the result of some, some real-world experiments. What we actually see is, yes, there are a significant proportion that get over that threshold line, but you could move the threshold. But there are also a huge chunk that, that still aren’t good enough in, in that case to make the threshold. Uh, Haydar, I think you’ve done some, some experiments on this. Do you want to give us some, some-
[00:17:30] Yeah.
[00:17:30] … real-world insight?
[00:17:31] Haydar:
[00:17:31] Um, I mean, I can, I can get into a little bit of storytelling on, you know, sort of how, how we would interpret a graph like this. And I know you, you put this up for illustrative purposes, but it’s, um, I would say this is, uh, you know, a good illustration of what the state-of-the-art might look like today. Um, but there’s, there’s a deeper story here, um, that I think is interesting to, to sort of highlight. And it’s a story of, you know, fraudsters trying to impersonate, uh, legitimate, uh, individuals, right?
[00:18:01] So the introduction of voice biometrics, um, was crucial in making sure that, you know, the, that your customers with whom you interact are really who, who they claim to be, the voice on the other line, as it were. Um, and, you know, the goal of any fraudster, uh, or non-legitimate, uh, person, um, is to try to be confused for, for the legitimate, uh, account holder, let’s say. Um, historically, right. Like, you know, Matt, you were saying we, we, we’ve all been in the voice biometrics business for like about a decade here.
[00:18:32] Um, and, you know, 10 years ago, if you look at say technology back then, the two bumps that you showed, the imposter and the genuine, uh, they were a lot closer to one another, uh, 10 years ago. Um, so, you know, the, the, the fact that we can show it now and they look, they look far apart, even though, as you said, it’s not zero. There’s that little, little overlapping region somewhere in the middle. You’ve drawn your, your threshold line vertically.
[00:18:58] You know, these are like the pragmatic realities and we’re not even talking anything about synthetic voices or deepfakes yet. We’re only talking about, you know, human, human malicious actors trying to, um, you know, commit crime, uh, against, uh, you know, i- innocent customers. So the bumps used to be a lot closer to one another, meaning the trade-off between security and convenience was a lot more of a difficult decision to make 10 years ago. But, uh, the same AI technologies that have benefited, uh, you know, advances in, in different areas in the, you know, human-like natural TTS engines have also benefited voice biometrics technology itself.
[00:19:40] And that’s what’s actually enabled those two bumps to separate further and further apart. Those trade-offs between, you know, uh, allowing somebody who is not who they claim to be to, to sort of be authenticated wrongly versus the convenience and security that you provide to legitimate customers. Those trade-offs are a lot easier now. Like the overlapping region is still there probabilistically, but it’s very, very small and that is actually thanks to AI advances, um, in deep neural networks and deep learning of recent years.
[00:20:09] I would say the major revolution as far as voice biometrics was concerned started approximately in 2018, and we’ve just been sort of reaping the rewards, um, you know, developing, uh, you know, advancements in those technologies. Every year you can see that in the domain of voice biometrics, but where does the, the spoofing come in all of a sudden? Well, what it does is you can actually look at it as moving the bump, the red, which are, are bad guys, the, the fraudsters, if you will, um, are now again, closer to the green bump, right?
[00:20:41] So, yes. That, that for them seems like, uh, they’re gaining ground, um, if they’re actually even able to use these kind of technologies and synthetic TTS voices, and use them effectively. Um, but the, the way of looking at it, the other way of looking at it is that it’s certainly not 100% overlap. So even before we talk about some of the other technologies, um, that we’re going to talk about shortly, I think what’s, what’s interesting here is, yes, the, the, the bump has moved back closer to perhaps what the difference might have been a few years ago. Again, not easy to achieve that even with tools today, um, but it’s not, it’s certainly not 100%. It’s not a foolproof way to try to fool a modern state-of-the-art voice biometrics, uh, system.
[00:21:25] Matt: And, and, and I think that, that’s true when we look at those media cases. Uh, inevitably the media showed us the videos of the, the one-and-done successes and, and, and we know in practice that, that’s, that’s not what happens. There were dozens, if not more attempts to, to successfully, um, attack those, those, those applications. So-
[00:21:44] Yes.
[00:21:45] … um, that, this, this is what really happening a- and I think this talks to, uh, if we, if we sort of are breaking down the, the, the threat into these different categories, this means that it’s, it’s starting to approach technical feasibility. Yes, voices can, constraints aside, be created that are starting to be good enough in, in a proportion of cases to beat a voice biometric authentication system at its cur- the way they are currently configured in some, um, applications.
[00:22:11] Matt: There are two, however, um, other boundary conditions, I think it’s worth coming back to talk about and the, the second of those is about the amount of audio required to produce it. So I, I think I talked in the opening segment about the kind of … I think when we were trying to create synthetic voices even maybe five, six years ago, we’d have to put a recording artist in the studio for like a week, [laughs] to create a generalized voice that we could use in, uh, in, uh, in a typical financial services application.
[00:22:35] Uh, in practice in many cases we just use prerecorded samples because it just wasn’t worth the effort. Uh, and th- that has come down significantly, so I think all of the targets that we’ve seen, um, in, in the media reports, uh, have talked about in the range of 90 to 15 minutes of audio being used to create these samples. So that, that starts to become practical. That starts to become practical if somebody is a high-profile individual, has a lot of YouTube videos, uh, available on, online of them, then there starts to be a high-quality of audio available to them.
[00:23:08] Uh, and, and the or high volume of audio. A- and the other thing that’s improved recently is the ability to transcribe that audio, um, effectively and automatically, because it’s not enough just to hear, just to have samples. We need to know what they’re saying I mean, because that’s what the, the input’s required to, to train, to train these models. So we have an out, thinking outside the voice biometrics, uh, arena directly, we have seen in the U.S. a couple of instances of parents being, uh, extorted money, um, by synthetic voices of their children being used to, to s- suggest that they’d been, um, kidnapped, which is I think, a topic we’ll come back to, to later.
[00:23:46] May- maybe the bigger threat too of the, of this technology is, is beyond our scope and is, is to the, to the consumer rather than to the, to the enterprise. So the quality of audio, uh, so the amount of audio required has dramatically reduced a- and there are and, and Microsoft’s strength Nuance is, is one of the players in this. There are models in existence now that requ- requiring five to 10 seconds of audio to create a human-like, um, sample.
[00:24:12] But I th- I think from, from some of our own research, we can say that they, whilst they’re not available in the wild, the samples that we do have access to suggest that they, whilst they might sound okay, they’re probably not gonna, um, defeat a biometric system today. I don’t know if you got more to add on those?
[00:24:26] Haydar: Yeah. No. That, that’s exactly right and, and, you know, that, that example of the, uh, you know, the grandparent being, uh, taken advantage by a very clever fraudster who somehow has impersonated their, their grandchild possibly using something like this. Um, and it goes back to that quiz you just ran for everybody, right? Um, you know, what sounds real or natural to the human ear is not necessarily good enough to fool the machine, right?
[00:24:50] So the, the voice biometrics technology, for sure, the Venn diagram has a big overlapping region between the two natural-sounding speech, what voice biometrics is doing, but it’s not a full overlap, meaning the voice biometrics, again, a state-of-the-art voice biometrics, uh, technology, um, is looking for a plural- plurality of characteristics in the frequency domain that are related to voice, associated with voice. But not necessarily in the particular recipe or combination that these TTS technologies are producing the speech. So it’s those kind of things that, yes, make them sound good to the human ear, but not necessarily good enough to fool a voice biometrics engine.
[00:25:31]
[00:25:31] Matt: That’s right. Thank you. And, um, the final category then is, uh, kind of boundary conditions that we, we continue to track and, and the one that’s less, least met today is really about speed of output. So, um, these, these systems even, even the, even the best state-of-the-art systems often require 30, 40 seconds to generate a new phrase or, or utterance. They, we are not yet at the stage where I can type and the words will come out because actually you need to know what the next word is in order to figure out how to make the, um, the current word sound correctly.
[00:26:03] So there is all, there is almost an in-built delay in how these systems will work. The, our human minds kind of takes care of without even worrying about, uh, the, ums, and, uhs, that I inevitably drop into sessions like [laughs] this. Um, but that’s, that’s something that isn’t, isn’t really available yet. Uh, I, I suspect it’s, uh, it’s, uh, it’s a function of computational power and it will emerge over time. But that means that, uh, systems that are predictable, um, in terms of the input that they require, are far more vulnerable to this form of attack than those unpredictable systems.
[00:26:34] And, and the biggest unpredictable system in the world is the call center agent, uh, and what they might ask, and what they might do next? Uh, so there is still some, uh, reduced feasibility as a result of the, the way in which those interactions take place. Yes, uh, in an IVR where the interactions are reasonably predictable, it’s far more likely that, uh, a fraudster could pre-prepare the required utterances. But when they get into a conversation with a human, being able to maintain a consistent human conversation is, is pretty challenging, um, without significant amounts of efforts. So, uh, uh, do you have anything to add on that, that area, Haydar, or?
[00:27:11]
[00:27:11] Haydar: No. I think you’ve covered it. You’re absolutely right.
[00:27:14] Matt: So, so just to kind of summarize where we are on the threat, threat-spectrum state, yeah. A- absolutely these forms of attack are, are feasible. There are constraints on fraudster’s ability to execute them, um, and there is, but there is prob- almost certainly a more significant, um, constraint on their intent and desire to use them, uh, which we’ll come back to later when we look at this threat in, in context to the whole. But right now, we’re just talking about the theoretical, uh, vulnerability of, of this technology.
[00:27:43] Matt: So we want to move onto next is thinking about how we might mitigate this threat. Uh, and there, there are a couple of areas in which it’s worth, uh, thinking about. We’ll, we’ll wrap these up in, uh, in a summary, uh, at the, at the end, but I, but I think of this in three buckets and we’ll, we’ll dig into each of these. The first is what I call the first line defense and that is the biometric system itself. So, um, like a customer who is not enrolled, um, is far more vulnerable than a customer who is enrolled because I don’t …
[00:28:10] To, to sound like them, I don’t need to pass any test. Um, so that’s the first line of defense. Um, but then again, I th- I think as we’ve talked about in, in the sessions, the curve isn’t exactly, um, overlapped with, uh, with a real customer. There are many attempts that do not require … Um, there are many attempts that fail to meet the required threshold and that in its own right is detectable.
[00:28:35] Um, implementing technol- implementing simple rate-limiting methodologies, either in, either permanently locking voice prints in accounts or temporarily locking them restricts a fraudster’s ability to exploit them. As does tuning and calibrating your system to, to make the most for your, to, to be optimized for your particular environment, your particularly acoustic characteristics, and your particular customers. So the biometric system itself, uh, is really the, the first line of defense and, and I think does actually provide, uh, a, a sig- a more security and safety than, than mi- than media reports might have suggested. Uh, ha- Haydar, I’m sure, sure you’ve got some thoughts there.
[00:29:12] Haydar: Yeah. I mean, certainly I, I, I don’t want to [laughs] start, r- start a debate about, uh, you know, journalistic integrity here or anything. But for sure, you know, some of those, those articles, um, perhaps lean a little bit towards the sensationalization, uh, I think is where, where we’re going with this. Um, they are indeed showing the, you know, the one attempt that succeeded and indeed, they can, that can certainly be something that is demonstrated.
[00:29:36] It is also recalling the point that, you know, even, even the, the fake audios that you all heard of myself, well, those were generated with my full participation and I was able to iterate and reiterate on the voice until it sounded true to life, providing as much audio of myself as the system needed. So it’s kind of like stacking, stacking the odds.
[00:29:57] E- e- everything that you do like that, that is not something that necessarily realistically a fraudster might be able to do or pull off with ease, um, gives an advantage obviously towards the, uh, the example of running the synthetic speech, uh, attack. And then finally, if you can try it any number of times and only play back the one time it, it fooled the system as it-
[00:30:20] Hmm.
[00:30:20] … were, um, then of course, you’re, you’re painting a certain picture that again, is not necessarily realistic.
[00:30:26]
[00:30:26] Matt: That’s great. Thanks for that. Uh, and the, the second bucket then and, and you will definitely have more, more comments on this is what, what I would call feature detection. So th- there, there are a couple of characteristics of this kind of attack that, um, maybe today we haven’t prioritized detection of, but that are certainly feasible to detect. So the first of those is, uh, we’ll, we’ll, we’ll look at each of these in turn, but they are the, the fact that the audio has to be played back through some form, form of device.
[00:30:51] And, and presentation attacks are something that we’ve been dealing with in the voice biometrics space for, for a while. And these, these attacks are in fact no different from other presentation attacks. And, uh, they have to be generated, and they have to be played back using speakers, and mic, uh, speakers and, and microphones in most cases. The second area is watermarking an increasingly kind of emerging field, which we’ll come back to, and then finally as we talked about, uh, a little bit earlier is, is actually detecting the characteristics of the text-to-speech engines themselves, that are creating, uh, these voices, whether that’s the, the processes they use or the, um, the corpuses in what, under which they’re trained.
[00:31:26] So if we just look at each of those, uh, in, in, in a bit more detail then. So I think play- playback is obviously something and chat, presentation attacks is, uh, obviously something we’ve had to deal with for, for as long as voice biometric systems have existed and, and there are distinct characteristics in some cases that can be detected that provide an opportunity to prevent kind of cursory and, and trivial attacks. And, and, and those, um, technologies have also improved, uh, at rates of knots given the, the advances i- in AI and, um, and, and DNN. So that, h- Haydar where, where, where, how are you, how are your latest, uh, playback and, uh, recording detection processes?
[00:32:04] Haydar: Um, I mean, I w- I won’t get too far into the specifics, but, you know, I think philosophically, so, you know, everything I said earlier about voice biometrics, that was only considering it on its own. But, um, you know, just speaking from my, my personal, uh, experience and where I’ve been working for, for the last number of years, um, as I mentioned earlier, we’ve actually gone on to develop additional technologies. And the reason for that is because, you know, as, as strong as voice biometrics might be on its own, it’s just safer and perhaps even wiser to, um, you know, deploy technologies without allowing for, let’s say a single point of failure, right?
[00:32:44] So what you want to do is now look into all the ways, all the different things you can, you can do to potentially, uh, ensure robustness of your solution as a whole, right? So no longer think of voice biometrics as something on its own, but think of voice biometrics complemented by additional security features or additional, uh, factors that can, um, ensure robustness. So, um, playback detection is what we would refer to as the, uh, feature that would help with detecting presentation attacks. That is part of our, our like, you know, we, we, we, the way we frame this is under the anti-spoofing sort of module, um, which also includes, um, you know, detection. I think we’re gonna touch upon these other-
[00:33:25] Okay, yeah.
[00:33:26] … technologies as well, but some of these other, other features that we, we aim to detect as well. Um, on the playback specifically, I mean, yes. It is, it is basically, uh, at this point a deep neural network, um, based technology. I would say for all of our features, uh, in our suite of, of factors, they’re all fundamentally based on deep neural network, uh, methods.
[00:33:49]
[00:33:49] Matt: Cool. The, the other, the other category I think that’s quite, quite interesting and, and emerged were the, these two, these two categories emerging is, is watermarking. So if we just think back to that, that slide with all the logos on, uh, a few minutes ago, like none of those organizations want to be associated with this kind of activity. Uh, and to some extent they’ve all been caught by surprise by how their technology which they invented, and matured, and developed for completely just m- commercial reasons, uh, has been misused, uh, in, in, in the way in which it has.
[00:34:20] Uh, ElevenLabs most famously the, whilst they were actually the, the s- the service that was used to create most of the, um, the p- recent press articles. In, in practice, they have also been used to create, uh, deepfake videos and voiceovers of, of famous celebrities in, in very awkward, uh, and, um, disturbing situations. And, and they, they have been forced to change their ways of operating, uh, in order to do that. So, uh, i- in, in order to, to reduce the damage to them and their brand.
[00:34:50] And, and increasingly, uh, we see organizations. Nearly everyone on that slide actually would, would stand up and tell you about their responsible AI practices about recognizing the potential harms in doing what is necessary to mitigate those risks. So watermarking is, is something we’ve been talking about again for probably as much, as long as I’ve been involved. But it’s something that’s come to fruition literally in, in the last six months as organizations have recognized that they really need s- some traceability and accountability for the, the voices that they’re creating. There don’t appear to be-
[00:35:21] Yes, I agree-
[00:35:21] Oh, go, go on, Haydar, yeah.
[00:35:22] Haydar: … I agree with that. Uh, sorry, to cut in. Um, I think, yeah. You’re, you’re touching upon an important theme here, right? So the, the providers of the technologies need to sort of continue to recognize their responsibility and provide potentially powerful capabilities that, that may be abused, uh, outside the intention of the original intent of the technology, but, you know, they have a responsibility too nonetheless. And the watermarking is indeed one way that the providers of these, these TTS lifelike, uh, technologies something they can do to help, I guess the, the community as a whole, um, you know, continue to operate in a trust- trustworthy manner.
[00:36:00] Um, but I would say that, that, you know, we, we, we don’t rest on our laurels, right? So as, as technology providers or those who deploy these kind of technologies, um, we essentially can’t take for granted that the watermarks will exist. So we still have to continue collectively developing, um, these countermeasures and essentially staying up-to-date with voice biometrics evolution, with anti-spoofing measures, um, assuming that there are no watermarks. But certainly when they are there, that will definitely help in the, in the overall defensiveness of these kind of solutions.
[00:36:35] Matt: Yeah. And I, I think that there are, there are some challenges with watermarks. Right now, the, um, though every provider has them, not every provider is making them publicly available if they’re, or even privately available to produce this technology. So I think we’re in, uh, we’re in a bit of, a bit of an awkward period where this, this is emerging. There are, there’s regulations being proposed in Europe. There’s regulation being discussed in the U.S. I think this will settle down in, in the medium term, so that these kind of commercial sources, um, are likely to be protected in this way.
[00:37:04] And we’ve already seen improved consent-based processes with people like Resemble. I think ElevenLabs doesn’t require kind of an audio consent statement, but pretty much everyone else now, uh, requires audio, like the, the real speaker to read out that they are happy for a voice to be created. And that is manually checked before a voice is created. So, so processes are emerging that will constrain the abilities and sources for their … That’s not to say however, there, there won’t be ways to circumvent those, uh, and that there might be sources that don’t use that technology.
[00:37:34] So that leads us on to that kind of final question, uh, Haydar, which is around actually can you detect the characteristics, uh, of either the pretrained models or the way in which they’re developed or the way in which they’re overlaid or, or, or the corpuses themselves? How, how, how effective, how effectively are you able to do that?
[00:37:51] Haydar: Um, I’ll, I’ll speak broadly here, right? So this, that’s a whole area of, of, of research and it’s a whole area of technology and development. I mean, it’s certainly something we’ve included in our, in our practice for the better part of a decade. So in a sense, again, just speaking through, you know, the lens of my own experience in this, in this industry and obviously working at Nuance, um, you know, the, the problems are not new in and of themselves. What’s, what’s really new here is the, uh, revolution in the quality of these types of attacks, right, the synthetic speech and, uh, and so on.
[00:38:27] Uh, perhaps even the access to data that is far more, uh, you know, prevalent now than it was a decade ago when we first started in this area. Um, certainly there’s a, a bit of a cat-and-mouse-game aspect to this, right? And it goes back to that idea of, you know, y- if you’re reading between the lines you will hear we have said things like moderns systems and, you know-
[00:38:49] [laughs]
[00:38:50] … contemporary systems. The technologies for voice biometrics, for synthetic speech detection, playback detection, the things we develop that, you know, make the solution as a whole continue to be effective and robust, those themselves evolve along with what we’re seeing in the TTS technology evolution. So I think it’s, it, it does become important to understand, you know, how do you keep pace with those technologies? So for sure there, there are, you know, the vendors and providers of the technology themselves want to keep, it’s in their best interest to keep evolving their own solutions.
[00:39:27] Make sure that they’re not vulnerable to things like, um, you know, sophisticated TTS attacks. Um, but, yeah. It’s, it’s important to stay current in terms of developing the technologies, but also, in deploying the technologies or staying up to date with the current thing. And, yes, unfortunately, it’s a little bit of a cat-and-mouse, um, dimension to that if you really look at it at the tech- technical level.
[00:39:50] Matt: And I, I kind of I, I think it’s reached a point that fortunately it coincides with the, kind of the, the, the move of, uh, a large number of implementations to the clouds, but I liken this to going back to the kind of, the ear- the early noughties, uh, updating your virus definitions once a week just to make sure you got the, the latest patch for the-
[00:40:07] Haydar:
[00:40:07] Yeah.
[00:40:07] Matt: … for the latest things that emerged because I think to some extent for a, for a period of time, whether that’s a few months or a few years, this will be a game of, a game of Whac-A-Mole, it’s not that these things aren’t detectable and preventable, it’s that actually the, the speed of that s- cycle, uh, observe or, and take and, and act on it, is, is going to be critical. So I think that, that’s, that’s for me would be one of the, the major takeaways for organizations, uh, battling with this, this risk right now.
[00:40:34] Matt: Um, just conscious of the, the time and, and people may have, uh, questions. We’ve got s- one very interesting question already come up in the chat, which I will save because it’s, um, I need to read it properly. Um, uh, the final bit I want to talk about is, I, I think, uh, and for me, this is actually quite positive. Um, what these range of attacks has shown us is that voice biometrics is not impenetrable. It, it never was. It was, uh, it was always a probabilistic system.
[00:40:58] There were always risks to it. Um, but they have shown us that it, it’s not, it’s not the golden bullet to or the silver bullet I think is probably the correct, uh, analogy to every security problem. It needs to be implemented p- as part of a led, and, uh, considered, and deliberate approach to security. So actually when, but what, so and therefore, when you combine voice biometrics with other forms of detection and, uh, I’m not talking about going back to passwords and pins here.
[00:41:24] I’m talking about leveraging adv- the f- continued advance in technology. Things like, uh, network authentication. Things like device, uh, identification and management, such that you can without h- requiring any additional customer effort, you can add these factors together. Uh, and I think Haydar described it as like a security web rather than a chain. Like you shouldn’t rely on like, breaking the chain will break the system. But actually if you have a web, then you can respond depending on where the weight of the, the attack is.
[00:41:52] Haydar:
[00:41:52] Yeah.
[00:41:53] Matt: And, and I know, Haydar, you, you actually and I didn’t know this until earlier in the week. You actually have a patent yourself on, uh, on ConversationPrint-
[00:42:00] Right.
[00:42:00] … which is kind of a- an approach to this. I don’t know if you want to give us a bit more detail on that whilst you’re on there?
[00:42:05] Haydar: Yeah. I, I, I’d be happy to. I’ll, I’ll, I’ll try to keep it brief, but happy if, if you have additional questions to, to talk more about this. But I think what you’re referring to ConversationPrint, um, what something … So indeed, as I said earlier, you know, we were conscious of the evolution of these technologies in general as a whole and spoofing was absolutely part of it. But we were always looking at the bigger fraud picture and the philosophy is, while the more things you’re using as part of your authentication-technological solution, the more things you’re looking at, at the same time, the harder and harder it becomes for somebody who has malicious intentions to bypass any one of the, the sub-technologies if you will.
[00:42:44] So now voice biometrics, while still a critical part of the solution, um, can be complemented by other things and one of the things we started to do a few years ago was, you know, analogously to how the voice biometrics tries to identify the acoustic characteristics of somebody’s voice based on the audio signal. We developed something called ConversationPrint, which actually starts to do analogously something similar, but focusing on the language that somebody employs.
[00:43:12] So not the language you’re speaking literally, but the expressions you use, the vocabulary you tend to use, um, you know, the, the, the grammatical constructions of how you, you put your phrases together when you’re in a conversation with somebody. Um, that as it turns out also helps to identify us, uh, individually. So now, when you combine this as a new dimension to the voice biometrics, well, all of a sudden, you’re looking at a much richer set of features using complementary technologies making it a lot more difficult potentially for anyone with malicious intention now to try to fool the whole as opposed to one, one specific, uh, type of technology.
[00:43:54] Matt: That’s great. That’s great. If, uh, I, I, I’ve a- and I guess that’s most effective when you’ve got a chunk of a call rather than two, three, four seconds.
[00:44:02]
[00:44:02] Yeah.
[00:44:02] You need, you need more. That’s ’cause there’s less data I guess in, in a-
[00:44:06] Haydar:
[00:44:06] Yeah.
[00:44:06] Matt: … conversation the best-
[00:44:08] Haydar: In, in a practical, yeah. On the practical side, what you’re doing is basically n- now, you’re kind of transcribing in real time if you’re talking about telephonic conversations ’cause the same technology, by the way, can apply in, in chat as well.
[00:44:19] Matt:
[00:44:19] Yeah.
[00:44:19] Haydar: But for the purposes of, uh, you know, a modern voice biometric system can probably get you a good authentication result with as little as one or two seconds of, of speech. That is not … That is barely a full sentence, so you’d be surprised what you could do with one full sentence, but generally something like ConversationPrint, we prefer to have a certain number of sentences strung together and get a sequence of, of phrases. That way before rendering a more and more confident decision, if you will.
[00:44:47] Matt: Oh, that’s great. And, and I think that just reinforces and, and actually I’m, I’m pretty excited about, uh, a range of developments in this space. If you think about the phone call itself, like, yes, there is the way in which the person speaks on the phone call. There is also the things they say when they’re on the phone call. There is the way in which the phone call gets to you and there are the things that are ha- … There’s the noise and the signal that’s in the silence in between what people are saying.
[00:45:11] And all of those have characteristics that can be used for authentication with different degrees of confidence and fraud detection, different degrees of confidence, yes. But none of them require the customer to go through any extra efforts. Uh, and none of them have the same vulnerabilities to, uh, a- as knowledge-based authentication or some other, uh, authentication scheme. So I think it’s a really exciting time to think about how we kind of expand. We already have the phone call, so what else can we do with it that increases security whilst, uh, not, um, having a detrimental effect on the customer experience?
[00:45:43] So I think we’re, we’re in for an interesting, uh, interesting few years as this, this evolves. I do, just conscious of time, uh, again, I’ll ask for any questions on, on the chat. I just want to kind of zoom back out because we’ve gone right down the rabbit hole of, uh, deepfake and synthetic risks.
[00:46:00] So first off, um, o- one of the questions that’s come in the chat has been around the kind of the, the different ways in which this kind of technology could be used, uh, and the different methods of attack that could be used.
[00:46:11] And, and we’ve, clearly we’ve seen a couple of those in the press. Um, in, in practice, there are probably four, uh, or so kind of modes of attack. Uh, we’re not going to lay those out today because I think that would be slightly, um, irresponsible, but if you want to reach out afterwards, uh, and if we can kind of qualify who, who, who you are, we can probably have, uh, an off, an offline conversation about those, um, specifically.
[00:46:36] But taking it back to, to context then, if I just, uh, pull up my last slides. Oh, yeah. Sorry. Yeah, we …
[00:46:44] Matt: Just finally, to, to summarize what we’re talking about in terms of mitigations.
[00:46:47] We’ve got that first level of defense biometrics, getting it tuned correctly. Maybe adjusting the threshold as appropriate, maybe for different customers, different types of interaction. Rate limiting, in order to mit- to reduce the opportunity to try and test this kind of service.
[00:47:02] Looking at different detection characteristics or detection services, and systems, and processes that are available in order to pick up those deepfakes, uh, whether that be through a recording, whether through watermarks or, or TTS detection, and bearing in mind that those things are going to evolve pretty rapidly over the next couple of years. And therefore, there’s a need to kind of keep on top of them. And then finally, other factors as we talked about using information from the network.
[00:47:25] Using information that we get about the device that’s being used and finally, um, the things that are being said. And, and all of these other factors that can improve the security of the pr- prove the security of the overall process. Uh, so, uh, there are a range of mitigations available, but they do require some, um, considered, uh, implementation and integration. This is not about a single-factor authentication solution anymore. This is about weighing the risks of various different cap- various detection and authentication factors.
[00:47:53] Zooming right back out though and thinking just about the fraudster threat overall. So this is the slide I used, uh, in a session a couple of weeks ago when we looked about vulnerabilities as an overall. In order to create a threat, uh, to a system, it requires, uh, the vulnerability to exist. Uh, that fraudsters have that capability, but most importantly, for the fraudster to have the intent to compromise, uh, the system in that way. What we’ve talked about today, uh, is that capability, and to some extent the vulnerabilities, and how we can mitigate those vulnerabilities.
[00:48:25] We haven’t really talked a lot about the intent. Uh, whilst we have in the wilds seen a handful of test attacks against these kind of systems, they, they are, they are only that. Today, the vast majority of callers to contact centers and to enterprises across the world are still protected by knowledge-based authentication processes, which are quite frankly trivial to compromise through social engineering, phishing and, and other methods.
[00:48:52] So, yes, whilst voice biometric systems may be vulnerable to this form of attack, a- as yet, we do not have a clear intent or exploitation of that, uh, at, at any scale, or, nor do we expect it whilst those, um, those other methods of authentication made. So it’s simply too easy to compromise someone’s password. It, it’s trivially easy now to compromise their SMS or, or online credentials. So I think it’s important to, yes, acknowledge there’s threats, continue to monitor it, make sure that you have met appropriate mitigations in place, evaluate the risk, uh, as, as we all should.
[00:49:27] But we should really bear in mind that i- it’s the, the intent doesn’t yet exist because n- nearly every organization who’s i- implementing this technology today, has a process by which if a customer fails biometrics they go to an alternative process. And that alternative process probably doesn’t have a lot of scrutiny on it. It probably has, um, very similar, uh, rules set up around what transactions can or cannot be done on it. So it’s only when, um, we remove some of those and the proportion of customers who are involved in these systems increases signif- increases to the majority that there’s an, there’s an, there’s an incentive to attack the system head-on.
[00:50:03] That said though, there are also far easier organizations to go to, to compromise it and potentially easier channels to go to. So whilst we have talked about the theoretical vulnerability, it’s, it’s not, this is not, this is still not trivial. Uh, and we expect that it will never be trivial to automate this form of attack. Uh, and therefore, whilst it will be irresponsible to say, “It’s never gonna happen.” It needs to borne in mind and evaluated against the complete risk of, uh, the complete spectrum of risks that an enterprise, uh, is, is faced with.
[00:50:37] Voice biometrics still provides a quicker and easier and significantly more secure authentication method than, um, traditional knowledge-based authentication and we mustn’t lose sight of that, um, particularly those ease of use, uh, and, uh, efficiency aspects that, um, are the reason why a number of people in this call, uh, have, have implemented these technologies. Uh, and, and finally, then, yeah. Just when we look at voice biometrics there are a spectrum of, of other risks that needs considered.
[00:51:05] And the one I aw- keep coming back to is this bypass risk that I see in many organizations have not appropriately addressed. When you fail the biometric, you probably fall back to a security process that has far more holes in it and is far easier to exploit than, um, going and creating a synthetic voice for your customer. So w- with that, I’d like to, to open the, the floor to any questions.
[00:51:24] We’re, we’re coming up on time, but, um, th- this is a, is a popular topic, so I, I’m sure there’ll be more questions than, than the one I had. Uh, and apologies I can’t answer that specifically on, on this call, but I don’t think that would be particularly responsible, but we will in follow-up if, if you want to. Gonna pause then. Haydar, do you have anything you want to add at this point?
[00:51:47] Haydar: No. I, I, uh, I think we, you know, hopefully this was, uh, i- illuminating, right? We gave I think a pretty, uh, complete overview on different aspects. I think those last points you mentioned really, uh, get us out of the rabbit hole remind- reminding everybody and also, you know, the work that you yourself do, uh, Matt, working with organizations I think is incredibly valuable.
[00:52:08] I think the, that probably the key takeaway here as even a session like this, the goal is to hopefully inform everybody and provide a reasoned, uh, and reasonable, you know, realistic view of what things look like today. What are the actual risks? Yes, we don’t. We can’t rest on our laurels, but on the other hand, you know, we have to sort of assess everything with the right level of, of, um, magnitude, uh, for every, every item, um, that we consider.
[00:52:38] Matt: Yeah. And I, I think that, this, this very much reminds me of the … I, I’ve always been nervous when implementing this technology when, when you don’t have the like a- as we talked about those curves, inevitably this technology is subjected for success, yeah, because customers, customers themselves test this technology and they test it with their friends and relatives. And those friends and relatives are far more like them than the population as a whole.
[00:53:01] And they test it enough that the law of probability suggests that they’re gonna get, somebody’s gonna get through at some point, uh, particularly when we’re dealing with millions and millions of, uh, of users. And, and organizations that haven’t had to deal with that I always worry about because they, um, they haven’t, they haven’t really understood the technology. They, they still have a perception that it’s completely bulletproof and, and I think as we’ve talked about here and these press articles highlight, it, it, it’s not.
[00:53:26] So, um, but it is still significantly better and it is still more secure in, in general, so I think that’s, uh, an important, uh, point to come back to. I’ve had one more question.
[00:53:36] Matt: Yeah. I think it’s, uh, I may be not quoting it correctly. I, I, I think that there is, um, a genuine point, uh, that somebody’s raised and, and this is the, the knowledge-based authentication. It, it, it’s the human psychology aspect of this, yeah. So, um, now that there’s been these high-profile attacks, does that, does that reduce people’s confidence in, in the technology and therefore their propensity to enroll?
[00:54:01] Um, and, and I think that’s, um, that’s, that’s a delicate line for an enterprise to, to walk because, um, we do need to be, we, uh, oh, in many cases the law requires that we take the risks, uh, that we are ultimately responsible if our customers are defrauded. Consumers are well-protected in, in most jurisdictions and we are responsible for most losses. Um, and we’d love them to move to more secure processes. And, and actually, um, for most enterprises about security, uh, an aggregate level rather than as an individual level.
[00:54:32] But if lots of individuals choose not to use our most secure methods, then that can have a large aggregate effect, uh, and, and so when we’ve talked about, uh, incentivizing enrollment and encouraging customers to enroll in voice biometric systems, we’ve always stopped short of just saying how bad and how insecure existing processes are. And how much fraud goes through those systems and I think that’s, um, may- maybe there’s an element of, um, human psychology here that we, we’ve kind of missed that we should probably …
[00:55:02] We should, uh, maybe the, the fraud that happens in that space has become such, such the background noise that it’s this new exciting forms of attack that have kind of started to worry people. And, and I absolutely have seen. Nearly every client has been on the, on the phone talking about what, what should our Q and A’s be back to, um, back to customers in this situation? And I think it is, it is an awkward question, uh, and we need to, to navigate it carefully. Um, but if you’ve got any more specific guidance people. Haydar? No? [laughs] Okay.
[00:55:32] Haydar: No. I think, I think you nailed it Matt. Yeah.
[00:55:34] Matt: Yeah. I think we, we always try to stick with, um, reasonably generic responses in these cases like, um, we have many methods of security. Not all of them are visible. We evolve into that. We still believe that this is the most secure method and we’re continuing. May- we, and we wouldn’t be recommending it if we didn’t think that was true. But it is, it does remain people’s rights in most jurisdictions to opt out of the use of technology if they choose to do so. But they may bear greater risk of it being defrauded if that was the case. Whether you actually explicitly state that in that way it, it is another matter.
[00:56:06] Haydar: Well, again, I would just say that it’s even this series of sessions that you’ve set up, Matt, and this community are super important because it’s, it’s only through reasoned information and education that everybody sort of has, develops a well-calibrated view of what the risks are and the magnitude of those risks, whereas, yeah. Indeed, if you just see an article somewhere posted then it kind of sensationalizes things. The public will, will sort of fall into a quick panic. But we’re sort of, that’s not unique to our space either. That’s something we’re sort of seeing. You know, this year I think has been sort of bonkers in terms of the amount of-
[00:56:44] [laughs]
[00:56:44] … revolutionary advances in AI and how it might displace and disrupt all over. So for sure, there, there are things to, to pay attention to and there are things that are changing in a very real and material way. But I think the challenge is gonna be keeping that information. Keeping the dialogue going and I would say, Matt, what you’re doing here with this forum is exactly the right approach. You know, set up the community. Let’s talk about these things. Let’s get, you know, people who are experts in the various facets to discuss and sort of, uh, you know, share information in that sense.
[00:57:16] Matt: That’s great. So thank you very much, Haydar, for your contribution today. Really, really useful. Really interesting to have, uh, kind of an expert on to give us the, the lowdown on the, on the detailed technology as well. So thanks so much for your contribution.
[00:57:29] Haydar:
[00:57:29] Yeah.
[00:57:29] Matt: Thank you everyone who attended as well. Um, we are taking a little bit of a break now with the Modern Security Community, so we will be back on the, uh, ooh, uh, 22nd of June. Uh, we’ll be holding a private member’s-only roundtable discussion of which I’m sure this will be one of the topics that people wish to talk about. So that will be going up on the website very shortly and I’ll be reaching out to, uh, to a specific, um, group to, to talk about how they want to participate in, in that. So again, thank you very much for joining. Haydar, thank you for, for contributing.
[00:58:02] Sure.
[00:58:03] And we’ll speak to you all soon. Oh, we got a clap. Hooray. Thank you, bye-bye.