Voice Biometrics - C++

Introduction

Voice biometrics is a technology that uses the unique characteristics of a person’s voice to identify or authenticate them.

Use cases

Authentication: Verifies if the speaker matches a specific enrolled identity.
Identification: Determines which enrolled user is speaking.

Providers

Feature	TSSV	IDVoice
Accuracy & Performance	Faster, but less accurate	Slower, but more accurate
Result Behavior	Returns results only if confidence ≥ threshold	Returns all results, regardless of confidence
Language Dependency	Language-agnostic	Language-agnostic
Enrollment Flow	Identical for both providers	Identical for both providers
Supported Modes	Text-dependent and text-independent	Text-dependent and text-independent

Audio Format

The input audio data for enrollment and recognition is a 16-bit signed PCM buffer in Little-Endian format. It is always mono (1 channel), and the sample rate 16KHz.

Getting Started

Before you begin, make sure you’ve completed all the necessary preparation steps.
There are two ways to prepare your project for Voice Biometrics:

Using sample code
Starting from scratch

From Sample Code

To download the sample code, you'll need Conan. All the necessary steps are outlined in the general Getting Started guide.

📦 voice-biometrics

conan search -r vivoka-customer voice-biometrics  # To get the latest version.
conan inspect -r vivoka-customer -a options voice-biometrics/<version>@vivoka/customer
conan install -if voice-biometrics voice-biometrics/<version>@vivoka/customer -o voice_bio_engine=tssv

Open project.vdk in VDK-Studio
Export in the same directory assets from VDK-Studio (optionally, you can enroll users through VDK-Studio)

conan install . -if build
conan build . -if build
./build/Release/voice-biometrics

From Scratch

Before proceeding, make sure you’ve completed the following steps:

1. Prepare your VDK Studio project

Create a new project in VDK Studio
Add the Voice Biometrics technology
Add model (you can optionally enroll users now, or handle enrollment later within your app).
Export the project to generate the required assets and configuration

2. Set up your project

Install the necessary libraries
- vsdk-audio-portaudio/<version>@vivoka/customer
- vsdk-samples-utils/<version>@vivoka/customer
- vsdk-tssv/<version>@vivoka/customer
- vsdk-idvoice/<version>@vivoka/customer

These steps are better explained in the Get Started guide.

Start Recognition

1. Initialize Engine

Start by initializing the Voice Recognition engine and model:

You cannot create two instances of the same engine.

C++

#include <vsdk/global.hpp>
#include <vsdk/Exception.hpp>
#include <vsdk/utils/samples/EventLoop.hpp>

#include <vsdk/biometrics/tssv.hpp>
// #include <vsdk/biometrics/idvoice.hpp>
using namespace Vsdk::Biometrics;
using BioEngine = Vsdk::Biometrics::Tssv::Engine;
// using BioEngine = Vsdk::Biometrics::Idvoice::Engine; // Use idvoice include if you prefer IDRD engine
auto const engine = Vsdk::Biometrics::Engine::make<BioEngine>("config/vsdk.json");

C++

int confidenceThreshold = 5;

auto model = engine->makeModel(modelName, ModelType::TextDependant, confidenceThreshold);
auto model = engine->makeModel(modelName, ModelType::TextIndependant, confidenceThreshold);

The third parameter is the required confidence level. It ranges from 0 to 10 and behaves differently depending on your provider. A value of 10 makes the recognizer as strict as possible.

We recommend testing the application in real-world conditions to determine the minimum score that best fits your needs. This helps you balance between two types of errors:

False rejection: when a valid user is incorrectly rejected.

False acceptance: when an invalid user is incorrectly accepted.

By default, you can simply check if the score is greater than 0, but fine-tuning it based on your use case will give you better accuracy and security.

2. Enroll users

You can enroll users either through the VDK-Studio interface or directly within your application. In the app, enrollment can be done using a file or a buffer.

Enrollment Requirements

Text-dependent: Requires at least 4 recordings of the same phrase.
Text-independent: Requires at least 13 seconds of speech.

C++

model->addRecord("user-name", filePath1);
model->addRecord("user-name", filePath2);
...
model->compile();
fmt::print("Enrolled users: '{}'", fmt::join(model->users(), "', '"));

The more data you provide, the better the model's performance will be. For best results, record the data under conditions that match the model's intended use case.

The format is preferred to be 16 kHz mono-channel.
You can create such a WAV file as follows:

On Linux: arecord -c 1 -f S16_LE -r 16000 filename.wav

On Windows:
Use Audacity or any audio recorder that allows manual format selection.
In Audacity:

Set the Project Rate (Hz) (bottom left) to 16000.
Set the recording to Mono (1 channel).
Record your audio.
Export it as WAV (Signed 16-bit PCM).

3. Recognition

For authenticator you need to set user to recognizer (you do it only once and change when needed).

C++

void onVoiceBioResult(Vsdk::details::StatusResult const & r)
{
    auto const id    = r.json["id"   ].get<std::string>();
    auto const score = r.json["score"].get<float>();
    fmt::print("[{}] Result: '{}' (score: {})\n", gRecognizerName, id, score);
}

C++

auto authenticator = engine->makeAuthenticator(gRecognizerName, model, gConfidence);
authenticator->subscribe([] (Authenticator::Result const & r) { onVoiceBioResult(r); });
authenticator->setUserToRecognize("user-name");

auto identificator = engine->makeIdentificator(gRecognizerName, model, gConfidence);
identificator->subscribe([] (Identificator::Result const & r) { onVoiceBioResult(r); });

We’ll implement a simple pipeline that records audio from the microphone and sends it to recognizer:

C++

#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/utils/PortAudio.hpp>

rec = std::move(identificator);
// or rec = std::move(authenticator);

auto const mic = Vsdk::Audio::Producer::PaMicrophone::make();

Vsdk::Audio::Pipeline pipeline;
pipeline.setProducer(mic);
pipeline.pushBackConsumer(rec);
pipeline.start();

pipeline.start();
pipeline.stop();
pipeline.run();

.start() runs the pipeline in a new thread
.run() runs the pipeline and waits till it is finished (blocking)
.stop() is used to terminate the pipeline execution

Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.

4. Events and errors

C++

#include <vsdk/biometrics/tssv/Constants.hpp>
// #include <vsdk/biometrics/idvoice/Constants.hpp>

void onVoiceBioEvent(Model::Event const & e)
{
    namespace Key = Vsdk::Constants::Tssv::IdentResult;
    // or namespace Key = Vsdk::Constants::Idvoice::IdentResult;
    auto const user = result.json[Key::id ].get<std::string>();
    auto const score = result.json[Key::score].get<float>();
    fmt::print("Ident Result: '{}' (score: {})\n", user, score);
}

void onVoiceBioError(Model::Error const & e)
{
    namespace Key = Vsdk::Constants::Tssv::AuthResult;
    // or namespace Key = Vsdk::Constants::Idvoice::AuthResult;
    auto const user = result.json[Key::id ].get<std::string>();
    auto const score = result.json[Key::score].get<float>();
    fmt::print("Auth Result: '{}' (score: {})\n", user, score);
}

model->subscribe(&onVoiceBioEvent);
model->subscribe(&onVoiceBioError);