Voice Biometrics - C++
Introduction
Voice Biometrics is a technology that uses the unique characteristics of a person’s voice to identify or authenticate them.
Use cases
Authentication: Verifies if the speaker matches a specific enrolled identity.
Identification: Determines which enrolled user is speaking.
Providers
Feature | TSSV | IDVoice |
|---|---|---|
Accuracy & Performance | Faster, but less accurate | Slower, but more accurate |
Result Behavior | Returns results only if confidence ≥ threshold | Returns all results, regardless of confidence |
Language Dependency | Language-agnostic | Language-agnostic |
Enrollment Flow | Identical for both providers | Identical for both providers |
Supported Modes | Text-dependent and text-independent | Text-dependent and text-independent |
When using IDVoice:
we recommend relying primarily on the
probabilityvalue, with an authentication threshold around 0.4–0.5. You may also monitor lowscorevalues. A simple decision rule could be:min(probability, score) > 0.5.probabilityandscoreare clamped to [0;1]
When using TSSV:
probabilitydoesn’t exist. In this case, your threshold must be based on thescoreproperty.scoreis unbounded, e.g. scores can vary from -10 to 30.
Audio Format
The input audio data for enrollment and recognition is a 16-bit signed PCM buffer in Little-Endian format. It is always mono (1 channel), and the sample rate 16KHz.
Getting started
Creating a project
We have a sample project called voice-biometrics, you can find all the steps here:
How-to: Download, Compile & Run C++ Samples | List-of-available-samples
Alternatively you can set up your own project from the VDK Studio: https://doc.vivoka.com/online/
Once this is done, you can create a C++ project. We use Conan to manage all dependencies.
Install the necessary libraries
vsdk-tssv/<version>@vivoka/customervsdk-idvoice/<version>@vivoka/customervsdk-audio-portaudio/<version>@vivoka/customer(Microphone recording)vsdk-samples-utils/<version>@vivoka/customer(EventLoop)
These steps are better explained in the Get Started guide.
Now that the project environment is set up, we’ll go through the key steps and the main logic behind using VSDK Voice Biometrics.
Engine initialization
You should have at least one of these two headers:
<vsdk/biometrics/tssv.hpp><vsdk/biometrics/idvoice.hpp>
Everything related to Voice Biometrics is located in the Vsdk::Biometrics namespace. We will use namespace Vsdk::Biometrics for the remainder of this guide.
#include <vsdk/biometrics/tssv.hpp>
#include <vsdk/biometrics/idvoice.hpp>
using namespace Vsdk::Biometrics;
auto const engine = Engine::make<Tssv::Engine>("config/vsdk.json"); // tssv or
auto const engine = Engine::make<Idvoice::Engine>("config/vsdk.json"); // idvoice
Working with models
You can either create or retrieve an existing model by calling Engine::makeModel, it requires:
A model name
A model type (text-dependent or text-independent)
A confidence threshold ranging from 1 to 10, where 10 is the strictest value. (used for enrollment)
auto model = engine->makeModel(modelName, ModelType::TextDependant, 5);
auto model = engine->makeModel(modelName, ModelType::TextIndependant, 5);
From this point, you can either enroll users or perform authentication/identification. Let’s start with enrollment.
Enrolling users
You can enroll users either through the VDK-Studio to be exported along with the project configuration or directly within your application and this is what we’ll go over here.
Enrollment Requirements
Text-dependent: Requires at least 4 utterances of the same phrase.
Text-independent: Requires at least 13 seconds of speech (excluding silence)
You can either use an audio file path or Vsdk::Audio::Buffer. Make sure the Audio Format is supported.
Model::addRecord(std::string const & user, std::string const & path);
Model::addRecord(std::string const & user, Vsdk::Audio::Buffer buffer);
model->addRecord("user-name", filePath1);
model->addRecord("user-name", filePath2);
...
model->compile();
fmt::print("Enrolled users: '{}'", fmt::join(model->users(), "', '"));
The more data you provide, the better the model's performance will be. For best results, record the data under conditions that match the model's intended use case.
The format is preferred to be 16 kHz mono-channel.
You can create such a WAV file as follows:
On Linux:
arecord -c 1 -f S16_LE -r 16000 filename.wavOn Windows:
Use Audacity or any audio recorder that allows manual format selection.
In Audacity:
Set the Project Rate (Hz) (bottom left) to 16000.
Set the recording to Mono (1 channel).
Record your audio.
Export it as WAV (Signed 16-bit PCM).
Authentication and Identification
In order to perform authentication or identification you need to set up a pipeline with one of the implementation of Vsdk::Biometrics::Recognizer which is a consumer.
Engine::makeAuthenticator(requiressetUserToRecognize)Engine::makeIdentificator
auto identificator = engine->makeIdentificator(recognizerName, model, 5);
auto authenticator = engine->makeAuthenticator(recognizerName, model, 5);
authenticator->setUserToRecognize("user-name");
recognizerName can be anything you want, the 5 is the confidence threshold.
Setting up callbacks
In order to know what’s happening, you need to setup two callbacks for the model (event and error) and one for the recognizer (result).
void onEvent (Model::Event const & e)
void onError (Model::Error const & e)
void onResult(Authenticator::Result const & r) // Authentication
void onResult(Identificator::Result const & r) // Identification
// As Authenticator::Result and Identificator::Result
// shares the same base result, you can share the same callback for both :
void onResult(Vsdk::details::StatusResult const & r)
void onEvent(Model::Event const & e)
{
auto const msg = e.message.empty() ? "" : ": " + e.message;
fmt::print("Event: [{}] (EVENT) {}{}\n", e.codeString, msg);
}
void onError(Model::Error const & e)
{
auto const type = e.type == Model::ErrorType::Error ? "Error" : "Warn";
auto const msg = e.message.empty() ? "" : ": " + e.message;
fmt::print("Error: ({:<5}) {}{}\n", type, e.codeString, msg);
}
void onResult(Vsdk::details::StatusResult const & r)
{
auto const id = r.json["id" ].get<std::string>();
auto const score = r.json["score"].get<float>();
fmt::print("Result: '{}' (score: {})\n", id, score);
}
model->subscribe(&onVoiceBioEvent);
model->subscribe(&onVoiceBioError);
authenticator->subscribe([] (Authenticator::Result const & r) { onResult(r); });
identificator->subscribe([] (Identificator::Result const & r) { onResult(r); });
Running the pipeline
#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/audio/Pipeline.hpp>
#include <vsdk/utils/PortAudio.hpp>
...
Vsdk::Audio::Pipeline pipeline;
auto const mic = Vsdk::Audio::Producer::PaMicrophone::make(); // Microphone
pipeline.setProducer(mic);
pipeline.pushBackConsumer(recognizer); // identificator or authenticator
pipeline.start();
You must start() when working with PaMicrophone, as run() is unavailable. Make sure the application does not exit immediately afterward.
Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.