Voice Biometrics - C++
Introduction
Voice biometrics is a technology that uses the unique characteristics of a person’s voice to identify or authenticate them.
Use cases
Authentication: Verifies if the speaker matches a specific enrolled identity.
Identification: Determines which enrolled user is speaking.
Providers
Feature | TSSV | IDVoice |
|---|---|---|
Accuracy & Performance | Faster, but less accurate | Slower, but more accurate |
Result Behavior | Returns results only if confidence ≥ threshold | Returns all results, regardless of confidence |
Language Dependency | Language-agnostic | Language-agnostic |
Enrollment Flow | Identical for both providers | Identical for both providers |
Supported Modes | Text-dependent and text-independent | Text-dependent and text-independent |
Audio Format
The input audio data for enrollment and recognition is a 16-bit signed PCM buffer in Little-Endian format. It is always mono (1 channel), and the sample rate 16KHz.
Getting Started
Before you begin, make sure you’ve completed all the necessary preparation steps.
There are two ways to prepare your project for Voice Biometrics:
Using sample code
Starting from scratch
From Sample Code
To download the sample code, you'll need Conan. All the necessary steps are outlined in the general Getting Started guide.
📦 voice-biometrics
conan search -r vivoka-customer voice-biometrics # To get the latest version.
conan inspect -r vivoka-customer -a options voice-biometrics/<version>@vivoka/customer
conan install -if voice-biometrics voice-biometrics/<version>@vivoka/customer -o voice_bio_engine=tssv
Open project.vdk in VDK-Studio
Export in the same directory assets from VDK-Studio (optionally, you can enroll users through VDK-Studio)
conan install . -if build
conan build . -if build
./build/Release/voice-biometrics
From Scratch
Before proceeding, make sure you’ve completed the following steps:
1. Prepare your VDK Studio project
Create a new project in VDK Studio
Add the Voice Biometrics technology
Add model (you can optionally enroll users now, or handle enrollment later within your app).
Export the project to generate the required assets and configuration
2. Set up your project
Install the necessary libraries
vsdk-audio-portaudio/<version>@vivoka/customervsdk-samples-utils/<version>@vivoka/customervsdk-tssv/<version>@vivoka/customervsdk-idvoice/<version>@vivoka/customer
These steps are better explained in the Get Started guide.
Start Recognition
1. Initialize Engine
Start by initializing the Voice Recognition engine and model:
You cannot create two instances of the same engine.
#include <vsdk/global.hpp>
#include <vsdk/Exception.hpp>
#include <vsdk/utils/samples/EventLoop.hpp>
#include <vsdk/biometrics/tssv.hpp>
// #include <vsdk/biometrics/idvoice.hpp>
using namespace Vsdk::Biometrics;
using BioEngine = Vsdk::Biometrics::Tssv::Engine;
// using BioEngine = Vsdk::Biometrics::Idvoice::Engine; // Use idvoice include if you prefer IDRD engine
auto const engine = Vsdk::Biometrics::Engine::make<BioEngine>("config/vsdk.json");
int confidenceThreshold = 5;
auto model = engine->makeModel(modelName, ModelType::TextDependant, confidenceThreshold);
auto model = engine->makeModel(modelName, ModelType::TextIndependant, confidenceThreshold);
The third parameter is the required confidence level. It ranges from 0 to 10 and behaves differently depending on your provider. A value of 10 makes the recognizer as strict as possible.
We recommend testing the application in real-world conditions to determine the minimum score that best fits your needs. This helps you balance between two types of errors:
False rejection: when a valid user is incorrectly rejected.
False acceptance: when an invalid user is incorrectly accepted.
By default, you can simply check if the score is greater than 0, but fine-tuning it based on your use case will give you better accuracy and security.
2. Enroll users
You can enroll users either through the VDK-Studio interface or directly within your application. In the app, enrollment can be done using a file or a buffer.
Enrollment Requirements
Text-dependent: Requires at least 4 recordings of the same phrase.
Text-independent: Requires at least 13 seconds of speech.
model->addRecord("user-name", filePath1);
model->addRecord("user-name", filePath2);
...
model->compile();
fmt::print("Enrolled users: '{}'", fmt::join(model->users(), "', '"));
The more data you provide, the better the model's performance will be. For best results, record the data under conditions that match the model's intended use case.
The format is preferred to be 16 kHz mono-channel.
You can create such a WAV file as follows:
On Linux: arecord -c 1 -f S16_LE -r 16000 filename.wav
On Windows:
Use Audacity or any audio recorder that allows manual format selection.
In Audacity:
Set the Project Rate (Hz) (bottom left) to 16000.
Set the recording to Mono (1 channel).
Record your audio.
Export it as WAV (Signed 16-bit PCM).
3. Recognition
For authenticator you need to set user to recognizer (you do it only once and change when needed).
void onVoiceBioResult(Vsdk::details::StatusResult const & r)
{
auto const id = r.json["id" ].get<std::string>();
auto const score = r.json["score"].get<float>();
fmt::print("[{}] Result: '{}' (score: {})\n", gRecognizerName, id, score);
}
auto authenticator = engine->makeAuthenticator(gRecognizerName, model, gConfidence);
authenticator->subscribe([] (Authenticator::Result const & r) { onVoiceBioResult(r); });
authenticator->setUserToRecognize("user-name");
auto identificator = engine->makeIdentificator(gRecognizerName, model, gConfidence);
identificator->subscribe([] (Identificator::Result const & r) { onVoiceBioResult(r); });
We’ll implement a simple pipeline that records audio from the microphone and sends it to recognizer:
#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/utils/PortAudio.hpp>
rec = std::move(identificator);
// or rec = std::move(authenticator);
auto const mic = Vsdk::Audio::Producer::PaMicrophone::make();
Vsdk::Audio::Pipeline pipeline;
pipeline.setProducer(mic);
pipeline.pushBackConsumer(rec);
pipeline.start();
pipeline.start();
pipeline.stop();
pipeline.run();
.start()runs the pipeline in a new thread.run()runs the pipeline and waits till it is finished (blocking).stop()is used to terminate the pipeline execution
Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.
4. Events and errors
#include <vsdk/biometrics/tssv/Constants.hpp>
// #include <vsdk/biometrics/idvoice/Constants.hpp>
void onVoiceBioEvent(Model::Event const & e)
{
namespace Key = Vsdk::Constants::Tssv::IdentResult;
// or namespace Key = Vsdk::Constants::Idvoice::IdentResult;
auto const user = result.json[Key::id ].get<std::string>();
auto const score = result.json[Key::score].get<float>();
fmt::print("Ident Result: '{}' (score: {})\n", user, score);
}
void onVoiceBioError(Model::Error const & e)
{
namespace Key = Vsdk::Constants::Tssv::AuthResult;
// or namespace Key = Vsdk::Constants::Idvoice::AuthResult;
auto const user = result.json[Key::id ].get<std::string>();
auto const score = result.json[Key::score].get<float>();
fmt::print("Auth Result: '{}' (score: {})\n", user, score);
}
model->subscribe(&onVoiceBioEvent);
model->subscribe(&onVoiceBioError);