Voice biometric systems use machine learning algorithms trained on generic datasets to recognise and differentiate individual voices. TuningTuning, in the context of voice biometrics, refers to the process of adjusting the configuration and parameters of a voice biometric system to optimize its performance for a particular task or environment. in this context refers to optimising system performance by determining the optimal algorithm for comparison, identifying the optimal configuration of any audio pre-processing steps and augmenting the existing training with data from the actual operating environment to improve discrimination between speakers and optimise overall performance. Tuning generally involves several steps:
- Data Collection – Collect a representative sample of verification and enrollment utterances from the operating environment. This should contain multiple verification utterances for each speaker and ideally take place over a significant period to allow for natural variation in the speaker’s voices and channel usage.
- Training and Testing – Part of the data set is used to augment or, in some cases, replace the Voice Biometric model’s existing training, often producing a custom Background Model (BGM). This new model is evaluated using different sample data to understand its performance with a True User Impost Test (TUITA True User Imposter Test evaluates the False Accept and False Accept rates of a Voice Biometric system by using existing users of the system to simulate imposter attempts. This is an essential step in establishing the appropriate Biometric threshold.). Other parameters, such as different minimum enrollment and verification audio lengths and audio processing configurations, such as signal-to-noise ratio, are also evaluated to understand the impact on performance. This may also include evaluating different detective measures, such as synthetic speech detectionSynthetic Speech Detection is a mechanism used to protect Voice Biometrics systems from presentation attacks using synthetic speech. It relies on detecting characteristics inherent in the text-to-speech (TTS) generation process..
- Decision – Test data is reviewed, and a decision is made on the optimal biometric threshold and other key configuration parameters based on the implementing organisation’s risk and performance objectives.
- Implementation – The new model and updated configuration are implemented into production, which may require retraining existing speakers against the new model using their original enrolmentEnrolment is a step in the registration process where specific utterances are requested from the user or previously acquired audio is used to create a Voice Biometric template (Voiceprint) for subsequent use in Authentication/Identification. audio.