Voice Biometrics - C++

Introduction

Voice Biometrics is a technology that uses the unique characteristics of a person’s voice to identify or authenticate them.

Use cases

Authentication: Verifies if the speaker matches a specific enrolled identity.
Identification: Determines which enrolled user is speaking.

Providers

Feature	TSSV	IDVoice
Accuracy & Performance	Faster, but less accurate	Slower, but more accurate
Result Behavior	Returns results only if confidence ≥ threshold	Returns all results, regardless of confidence
Language Dependency	Language-agnostic	Language-agnostic
Enrollment Flow	Identical for both providers	Identical for both providers
Supported Modes	Text-dependent and text-independent	Text-dependent and text-independent

When using IDVoice:

we recommend relying primarily on the probability value, with an authentication threshold around 0.4–0.5. You may also monitor low score values. A simple decision rule could be:
min(probability, score) > 0.5.
probability and score are clamped to [0;1]

When using TSSV:

probability doesn’t exist. In this case, your threshold must be based on the score property.
score is unbounded, e.g. scores can vary from -10 to 30.

Audio Format

The input audio data for enrollment and recognition is a 16-bit signed PCM buffer in Little-Endian format. It is always mono (1 channel), and the sample rate 16KHz.

Getting started

Creating a project

We have a sample project called voice-biometrics, you can find all the steps here:

How-to: Download, Compile & Run C++ Samples | List of available samples

Alternatively you can set up your own project from the VDK Studio: https://doc.vivoka.com/online/

Once this is done, you can create a C++ project. We use Conan to manage all dependencies.

Install the necessary libraries
- vsdk-tssv/<version>@vivoka/customer
- vsdk-idvoice/<version>@vivoka/customer
- vsdk-audio-portaudio/<version>@vivoka/customer (Microphone recording)
- vsdk-samples-utils/<version>@vivoka/customer (EventLoop)

These steps are better explained in the Get Started guide.

Now that the project environment is set up, we’ll go through the key steps and the main logic behind using VSDK Voice Biometrics.

Engine initialization

You should have at least one of these two headers:

<vsdk/biometrics/tssv.hpp>
<vsdk/biometrics/idvoice.hpp>

Everything related to Voice Biometrics is located in the Vsdk::Biometrics namespace. We will use namespace Vsdk::Biometrics for the remainder of this guide.

C++

#include <vsdk/biometrics/tssv.hpp>
#include <vsdk/biometrics/idvoice.hpp>
using namespace Vsdk::Biometrics;
auto const engine = Engine::make<Tssv::Engine>("config/vsdk.json"); // tssv or
auto const engine = Engine::make<Idvoice::Engine>("config/vsdk.json"); // idvoice

Working with models

You can either create or retrieve an existing model by calling Engine::makeModel, it requires:

A model name
A model type (text-dependent or text-independent)
A confidence threshold ranging from 1 to 10, where 10 is the strictest value. (used for enrollment)

C++

auto model = engine->makeModel(modelName, ModelType::TextDependant, 5); 
auto model = engine->makeModel(modelName, ModelType::TextIndependant, 5);

From this point, you can either enroll users or perform authentication/identification. Let’s start with enrollment.

Enrolling users

You can enroll users either through the VDK-Studio to be exported along with the project configuration or directly within your application and this is what we’ll go over here.

Enrollment Requirements

Text-dependent: Requires at least 4 utterances of the same phrase.
Text-independent: Requires at least 13 seconds of speech (excluding silence)

You can either use an audio file path or Vsdk::Audio::Buffer. Make sure the Audio Format is supported.

C++

Model::addRecord(std::string const & user, std::string const & path);
Model::addRecord(std::string const & user, Vsdk::Audio::Buffer buffer);

C++

model->addRecord("user-name", filePath1);
model->addRecord("user-name", filePath2);
...
model->compile();
fmt::print("Enrolled users: '{}'", fmt::join(model->users(), "', '"));

The more data you provide, the better the model's performance will be. For best results, record the data under conditions that match the model's intended use case.

The format is preferred to be 16 kHz mono-channel.
You can create such a WAV file as follows:

On Linux: arecord -c 1 -f S16_LE -r 16000 filename.wav
On Windows:
Use Audacity or any audio recorder that allows manual format selection.
In Audacity:

Set the Project Rate (Hz) (bottom left) to 16000.
Set the recording to Mono (1 channel).
Record your audio.
Export it as WAV (Signed 16-bit PCM).

Authentication and Identification

In order to perform authentication or identification you need to set up a pipeline with one of the implementation of Vsdk::Biometrics::Recognizer which is a consumer.

Engine::makeAuthenticator (requires setUserToRecognize)
Engine::makeIdentificator

C++

auto identificator = engine->makeIdentificator(recognizerName, model, 5);
auto authenticator = engine->makeAuthenticator(recognizerName, model, 5);
authenticator->setUserToRecognize("user-name");

recognizerName can be anything you want, the 5 is the confidence threshold.

Setting up callbacks

In order to know what’s happening, you need to setup two callbacks for the model (event and error) and one for the recognizer (result).

C++

void onEvent (Model::Event const & e)
void onError (Model::Error const & e)

void onResult(Authenticator::Result const & r) // Authentication
void onResult(Identificator::Result const & r) // Identification
// As Authenticator::Result and Identificator::Result
// shares the same base result, you can share the same callback for both :
void onResult(Vsdk::details::StatusResult const & r)

C++

void onEvent(Model::Event const & e)
{
    auto const msg = e.message.empty() ? "" : ": " + e.message;
    fmt::print("Event: [{}] (EVENT) {}{}\n", e.codeString, msg);
}
void onError(Model::Error const & e)
{
    auto const type = e.type == Model::ErrorType::Error ? "Error" : "Warn";
    auto const msg  = e.message.empty() ? "" : ": " + e.message;
    fmt::print("Error: ({:<5}) {}{}\n", type, e.codeString, msg);
}
void onResult(Vsdk::details::StatusResult const & r)
{
    auto const id    = r.json["id"   ].get<std::string>();
    auto const score = r.json["score"].get<float>();
    fmt::print("Result: '{}' (score: {})\n", id, score);
}

C++

model->subscribe(&onVoiceBioEvent);
model->subscribe(&onVoiceBioError);
authenticator->subscribe([] (Authenticator::Result const & r) { onResult(r); });
identificator->subscribe([] (Identificator::Result const & r) { onResult(r); });

Running the pipeline

C++

#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/audio/Pipeline.hpp>
#include <vsdk/utils/PortAudio.hpp>
...

Vsdk::Audio::Pipeline pipeline;
auto const mic = Vsdk::Audio::Producer::PaMicrophone::make(); // Microphone
pipeline.setProducer(mic);
pipeline.pushBackConsumer(recognizer); // identificator or authenticator
pipeline.start();

You must start() when working with PaMicrophone, as run() is unavailable. Make sure the application does not exit immediately afterward.

Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.