Voice Biometrics
Introduction
Voice biometrics is a technology that uses the unique characteristics of a person’s voice to identify or authenticate them.
Use cases
Authentication: Verifies if the speaker matches a specific enrolled identity.
Identification: Determines which enrolled user is speaking.
Providers
Feature | TSSV | IDVoice |
|---|---|---|
Accuracy & Performance | Faster, but less accurate | Slower, but more accurate |
Result Behavior | Returns results only if confidence ≥ threshold | Returns all results, regardless of confidence |
Language Dependency | Language-agnostic | Language-agnostic |
Enrollment Flow | Identical for both providers | Identical for both providers |
Supported Modes | Text-dependent and text-independent | Text-dependent and text-independent |
Different SDKs will give you different results, for example vsdk-idvoice reports varying results as it analyzes the audio, while vsdk-tssv only sends you result if the engine thinks it is acceptable (depending of the confidence level you set).
We recommend that you try it out the application in real situation to select your custom minimum score required to satisfy your need in false rejection and false acceptation. But by default you can just check if the score is above 0.
Audio Format
The input audio data for enrollment and recognition is a 16-bit signed PCM buffer in Little-Endian format. It is always mono (1 channel), and the sample rate 16KHz.
Examples
You can see the different routes available in: REST API in the Voice biometrics section.
Inspection
To inspect what models you have several routes available.
Retrieve a list of available biometric models
[GET] /voice-biometrics/models
Retrieve information about the specified model
[GET] /voice-biometrics/models/{model}
Retrieve enrolled users in the specified model
[GET] /voice-biometrics/models/{model}/users
Deletion
You can either delete a model.
[DELETE] /voice-biometrics/models/{model}
Or a single user from a model.
[DELETE] /voice-biometrics/models/{model}/users/{user}
Enrollment (and model creation)
Enroll a user into an existing model, or automatically create a new model during the process if none exists.
[POST] /voice-biometrics/enroll
The request schema is described in the REST API documentation. Below is a quick start example that enrolls the user paul into the myModel model, configured as text_independent (no passphrases required).
{
"model": "myModel",
"model_type": "text_independent",
"user": "paul"
}
By default, creating a user requires 13 seconds of speech for text_independent models and 4 passphrase utterances for text_dependent models.
This behavior can be adjusted using the minimum_speech_duration parameter (see REST API documentation).
If your request is accepted, a token is returned in the JSON response.
Use this token to open a WebSocket connection and stream your audio data.
You may need to exceed the minimum_speech_duration. This can be caused by silence within audio resulting in the underlying technology deciding there’s not enough valid speech data to create the user.
During streaming, enrollment progress events are sent over the socket to indicate the current stage of the process.
Below is an example of events received during enrollment.
Notice that progress may exceed 100%. Any additional audio is not discarded and will be incorporated into the enrollment process.
[INFO] Event: {'message': 'Analysis progress: 15%', 'progress': 15, ...}
[INFO] Event: {'message': 'Analysis progress: 26%', 'progress': 26, ...}
...
[INFO] Event: {'message': 'Analysis progress: 113%', 'progress': 113, ...}
[INFO] Event: {'message': 'Model compiled', ...}
Authentication & Identification
To authenticate or identify an user, use one of the following route, the request body requires at least the model name (must exists).
[POST] /voice-biometrics/authenticate
[POST] /voice-biometrics/identify
{
"model": "myModel", # Mandatory
"user": "paul" # Authentication only
}
If your request is accepted, a token is returned in the JSON response.
Use this token to open a WebSocket connection and stream your audio data.
This is an example of result response (using IDVoice)
{
'result': {
'id': 'emmanuel',
'probability': 0.9999721050262451,
'score': 0.7068846821784973
}
}
id: The identified user, or the user being authenticated.
probability: Authentication only. Indicates the likelihood that the identified user is the correct match.
score: Raw similarity score between the input audio and the enrolled user model.
When using IDVoice, we recommend relying primarily on the probability value, with an authentication threshold around 0.4–0.5. You may also monitor low score values. A simple decision rule could be:min(probability, score) > 0.5.
When using TSSV, probability is binary and therefore not sufficient for decision-making. In this case, authentication must be based on the score property.
TSSV results are not bound to [0;1]
Sample project
A sample project is available for Voice Biometrics usage with VDK Service (in C# or Python).