Skip to main content
Skip table of contents

Voice Biometrics - C++

Introduction

Voice Biometrics is a technology that uses the unique characteristics of a person’s voice to identify or authenticate them.

Use cases

  • Authentication: Verifies if the speaker matches a specific enrolled identity.

  • Identification: Determines which enrolled user is speaking.

Providers

Feature

TSSV

IDVoice

Accuracy & Performance

Faster, but less accurate

Slower, but more accurate

Result Behavior

Returns results only if confidence ≥ threshold

Returns all results, regardless of confidence

Language Dependency

Language-agnostic

Language-agnostic

Enrollment Flow

Identical for both providers

Identical for both providers

Supported Modes

Text-dependent and text-independent

Text-dependent and text-independent

When using IDVoice:

  • we recommend relying primarily on the probability value, with an authentication threshold around 0.4–0.5. You may also monitor low score values. A simple decision rule could be:
    min(probability, score) > 0.5.

  • probability and score are clamped to [0;1]

When using TSSV:

  • probability doesn’t exist. In this case, your threshold must be based on the score property.

  • score is unbounded, e.g. scores can vary from -10 to 30.

Audio Format

The input audio data for enrollment and recognition is a 16-bit signed PCM buffer in Little-Endian format. It is always mono (1 channel), and the sample rate 16KHz.

Getting started

Creating a project

We have a sample project called voice-biometrics, you can find all the steps here:

How-to: Download, Compile & Run C++ Samples | List-of-available-samples

Alternatively you can set up your own project from the VDK Studio: https://doc.vivoka.com/online/

Once this is done, you can create a C++ project. We use Conan to manage all dependencies.

  • Install the necessary libraries

    • vsdk-tssv/<version>@vivoka/customer

    • vsdk-idvoice/<version>@vivoka/customer

    • vsdk-audio-portaudio/<version>@vivoka/customer (Microphone recording)

    • vsdk-samples-utils/<version>@vivoka/customer (EventLoop)

These steps are better explained in the Get Started guide.

Now that the project environment is set up, we’ll go through the key steps and the main logic behind using VSDK Voice Biometrics.

Engine initialization

You should have at least one of these two headers:

  • <vsdk/biometrics/tssv.hpp>

  • <vsdk/biometrics/idvoice.hpp>

Everything related to Voice Biometrics is located in the Vsdk::Biometrics namespace. We will use namespace Vsdk::Biometrics for the remainder of this guide.

CPP
#include <vsdk/biometrics/tssv.hpp>
#include <vsdk/biometrics/idvoice.hpp>
using namespace Vsdk::Biometrics;
auto const engine = Engine::make<Tssv::Engine>("config/vsdk.json"); // tssv or
auto const engine = Engine::make<Idvoice::Engine>("config/vsdk.json"); // idvoice

Working with models

You can either create or retrieve an existing model by calling Engine::makeModel, it requires:

  • A model name

  • A model type (text-dependent or text-independent)

  • A confidence threshold ranging from 1 to 10, where 10 is the strictest value. (used for enrollment)

CPP
auto model = engine->makeModel(modelName, ModelType::TextDependant, 5); 
auto model = engine->makeModel(modelName, ModelType::TextIndependant, 5);

From this point, you can either enroll users or perform authentication/identification. Let’s start with enrollment.

Enrolling users

You can enroll users either through the VDK-Studio to be exported along with the project configuration or directly within your application and this is what we’ll go over here.

Enrollment Requirements
  • Text-dependent: Requires at least 4 utterances of the same phrase.

  • Text-independent: Requires at least 13 seconds of speech (excluding silence)

You can either use an audio file path or Vsdk::Audio::Buffer. Make sure the Audio Format is supported.

CPP
Model::addRecord(std::string const & user, std::string const & path);
Model::addRecord(std::string const & user, Vsdk::Audio::Buffer buffer);
CPP
model->addRecord("user-name", filePath1);
model->addRecord("user-name", filePath2);
...
model->compile();
fmt::print("Enrolled users: '{}'", fmt::join(model->users(), "', '"));

The more data you provide, the better the model's performance will be. For best results, record the data under conditions that match the model's intended use case.

The format is preferred to be 16 kHz mono-channel.
You can create such a WAV file as follows:

  • On Linux: arecord -c 1 -f S16_LE -r 16000 filename.wav

  • On Windows:
    Use Audacity or any audio recorder that allows manual format selection.
    In Audacity:

  1. Set the Project Rate (Hz) (bottom left) to 16000.

  2. Set the recording to Mono (1 channel).

  3. Record your audio.

  4. Export it as WAV (Signed 16-bit PCM).

Authentication and Identification

In order to perform authentication or identification you need to set up a pipeline with one of the implementation of Vsdk::Biometrics::Recognizer which is a consumer.

  • Engine::makeAuthenticator (requires setUserToRecognize)

  • Engine::makeIdentificator

CPP
auto identificator = engine->makeIdentificator(recognizerName, model, 5);
auto authenticator = engine->makeAuthenticator(recognizerName, model, 5);
authenticator->setUserToRecognize("user-name");

recognizerName can be anything you want, the 5 is the confidence threshold.

Setting up callbacks

In order to know what’s happening, you need to setup two callbacks for the model (event and error) and one for the recognizer (result).

CPP
void onEvent (Model::Event const & e)
void onError (Model::Error const & e)

void onResult(Authenticator::Result const & r) // Authentication
void onResult(Identificator::Result const & r) // Identification
// As Authenticator::Result and Identificator::Result
// shares the same base result, you can share the same callback for both :
void onResult(Vsdk::details::StatusResult const & r)
CPP
void onEvent(Model::Event const & e)
{
    auto const msg = e.message.empty() ? "" : ": " + e.message;
    fmt::print("Event: [{}] (EVENT) {}{}\n", e.codeString, msg);
}
void onError(Model::Error const & e)
{
    auto const type = e.type == Model::ErrorType::Error ? "Error" : "Warn";
    auto const msg  = e.message.empty() ? "" : ": " + e.message;
    fmt::print("Error: ({:<5}) {}{}\n", type, e.codeString, msg);
}
void onResult(Vsdk::details::StatusResult const & r)
{
    auto const id    = r.json["id"   ].get<std::string>();
    auto const score = r.json["score"].get<float>();
    fmt::print("Result: '{}' (score: {})\n", id, score);
}
CPP
model->subscribe(&onVoiceBioEvent);
model->subscribe(&onVoiceBioError);
authenticator->subscribe([] (Authenticator::Result const & r) { onResult(r); });
identificator->subscribe([] (Identificator::Result const & r) { onResult(r); });

Running the pipeline

CPP
#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/audio/Pipeline.hpp>
#include <vsdk/utils/PortAudio.hpp>
...

Vsdk::Audio::Pipeline pipeline;
auto const mic = Vsdk::Audio::Producer::PaMicrophone::make(); // Microphone
pipeline.setProducer(mic);
pipeline.pushBackConsumer(recognizer); // identificator or authenticator
pipeline.start();

You must start() when working with PaMicrophone, as run() is unavailable. Make sure the application does not exit immediately afterward.

Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.