VSDK - C++

Introduction

The Vivoka SDK (VSDK) is a C++ software development kit designed to simplify the integration of voice technologies into your application. It enables you to easily embed and run voice projects exported from VDK-Studio, providing a seamless way to bring embedded voice capabilities into your C++ environment.

Get Started Options

Start with the provided C++ sample project – Ideal for quickly getting up and running with a working example using VSDK in a C++ environment.
Integrate VSDK into your existing C++ application – Recommended if you're adding voice capabilities to an already developed application or system.

Package manager

VSDK is installable through Conan, the C++ package manager. Before getting started, make sure to install and configure Conan, for that we have a guide. Conan will be used to download libraries and/or sample code.

Option 1: Start from Sample Project

Quickly get up and running with a preconfigured project showcasing VSDK usage.

Review the descriptions below to choose the sample(s) that best match your use case.

Package name	Description
simple-application	Demonstrate Speech Synthesis usage immediately after completing Voice Recognition.
dynamic-grammar	Demonstrate the use of dynamic models by utilizing slots within grammars and populating them at runtime.
chained-grammars	Demonstrate how to perform seamless Voice Recognition between a Wake Word and a follow-up model without any gap in detection.
tts	Demonstrate how to implement and use Speech Synthesis functionality.
voice-biometrics	Demonstrate how to implement and use Voice Biometrics functionality.
voice-commands-language-understanding	Show how to integrate Natural Language Understanding (NLU) with Voice Recognition to interpret and act on spoken user input.
speech-enhancement	Demonstrate how to use the Speech Enhancement technology by running two Speech Recognition instances—one with Speech Enhancement enabled and one without—so their performance can be compared.
barge-in	Demonstrate the barge-in feature (Speech Enhancement technology) to filter out a reference audio from the input using files only.
barge-in-asr-tts	Show how to use the barge-in feature along with Voice Recognition and Speech Synthesis.

For detailed setup instructions, please refer to the accompanying page: How-to: Download, Compile & Run C++ Samples.
In short, you need to complete two steps:

Download all the required libraries (using conan).
Open the VDK project in VDK-Studio and export it to the sample directory.

Don’t forget to check the README.md file included in the sample code for additional guidance and usage instructions.

Option 2: Integrate into Existing Project

Add VSDK to your current app by following the setup and integration steps.

1. Creating and Exporting VDK-Studio Project

Currently, you cannot export a VSDK project directly from the online VDK Studio. Only VDK Service exports are supported at this time.

Before exporting, you’ll first need to create a VDK-Studio project. To do this, follow our dedicated setup guide. For the purpose of this integration, you don’t need to configure any specific technologies—exporting a project with whatever technology you have access.

You need to create a project with VSDK, not with VDK-Service!

When exporting your project:

Select Linux/Windows as the target platform.
Set the target folder in your project’s directory.

After the export completes, you will see the following folder structure inside your project:

TEXT

# VDK-Studio >=5.9.1

config/
  └── vsdk.json      # VSDK configuration file
data/              # Contains required voice technology resources

2. Install Libraries

To install the required dependencies, we use the Conan package manager. To get started, first install and configure Conan, then create a Conan configuration file specifying the necessary libraries.

As an example of a Conan configuration file, we’ll use the one provided in the sample code.

conanfile.py - example

PY

from conan             import ConanFile
from conan.tools.cmake import CMake, CMakeToolchain, cmake_layout
from conan.tools.files import copy, load
from os.path           import dirname, join
from re                import search


class SimpleApplicationSampleConan(ConanFile):
    name     = "simple-application"
    license  = "Copyright Vivoka"
    settings = "os", "compiler", "build_type", "arch"
    requires = [
        "vsdk-audio-portaudio/4.1.0@vivoka/customer",
        "vsdk-samples-utils/1.1.0@vivoka/customer",
        "vsdk-csdk-asr/1.1.4@vivoka/customer",
        "vsdk-csdk-tts/1.1.0@vivoka/customer"
    ]
    generators      = "CMakeDeps", "VirtualBuildEnv", "VirtualRunEnv"
    exports_sources = "CMakeLists.txt", "src/*"

    def set_version(self):
        content      = load(self, join(dirname(__file__), "CMakeLists.txt"))
        self.version = search(r"project\(.+?VERSION\s+(\d+\.\d+(\.\d+)?)", content).group(1)

    def layout(self):
        cmake_layout(self)

    def generate(self):
        tc = CMakeToolchain(self)
        tc.cache_variables["ASR_ENGINE"] = "csdk-asr"
        tc.cache_variables["TTS_ENGINE"] = "csdk-tts"
        tc.generate()

    def build(self):
        cmake = CMake(self)
        cmake.configure()
        cmake.build()

    def package(self):
        CMake(self).install()

    def deploy(self):
        exe = self.name + (".exe" if self.settings.os == "Windows" else "")
        copy(self, exe, join(self.package_folder, "bin"), join(self.install_folder, "bin"))

        deps = self.dependencies.host.values()
        for bindir in [dep.cpp_info.bindirs[0] for dep in deps if dep.cpp_info.bindirs]:
            copy(self, "*.dll", bindir, join(self.install_folder, "bin"), excludes="*/*")
        for libdir in [dep.cpp_info.libdirs[0] for dep in deps if dep.cpp_info.libdirs]:
            copy(self, "*.so*", libdir, join(self.install_folder, "lib", self.name), excludes="*/*")

To learn more about installing libraries, compiling, and running your app, please refer to our detailed guide.

To follow this guide, you’ll need the libraries listed below. These are common across all technologies. More specific dependencies will be provided in technology-specific guides.

vsdk-audio-portaudio/4.1.0@vivoka/customer
vsdk/10.1.2@vivoka/customer
vsdk-samples-utils/1.1.0@vivoka/customer

If you’re using Conan 2 you can’t use old packages. Try to stick to the latest if possible when possible.

3. Version

Once the libraries are installed, you should be able to run the code and retrieve the VSDK version.
The next step is to explore key concepts such as the Engine and Pipeline.

CPP

#include <vsdk/global.hpp>

fmt::print("VSDK v{}", Vsdk::version());

Engine

Each technology-provider pair in VSDK has its own dedicated engine, which must be initialized once.

Here’s an example initializing both Speech Enhancement and ASR engines:

CPP

#include <vsdk/asr/csdk.hpp>
#include <vsdk/speech-enhancement/s2c.hpp>

using AsrEngine = Vsdk::Asr::Csdk::Engine;
auto const engine = Vsdk::Asr::Engine::make<AsrEngine>("config/vsdk.json");

using S2cEngine = Vsdk::SpeechEnhancer::S2c::Engine;
auto const engine = Vsdk::SpeechEnhancer::Engine::make<S2cEngine>("config/vsdk.json");

Once an engine is initialized, you can build your audio pipeline using the corresponding components for that technology.

The following Audio Pipeline example does not require any engine initialization, as it doesn’t rely on any specific technology.

Audio Pipeline

What is a Pipeline?

A pipeline is a processing chain that handles audio flow through three types of components:

Producer: Captures or generates audio (e.g., microphone input, or TTS channel).
Modifiers (optional): Process or alter the audio (e.g., filters, noise reduction).
Consumers: Use or analyze the audio (e.g., speaker, ASR recognizer).

Flow

Producer → [Modifiers] → [Consumers]

Examples:

TEXT

TTS channel (Producer) → AudioPlayer (Consumer)
AudioRecorder (Producer) → ASR Recognizer (Consumer)
AudioRecorder (Producer) → Speech Enhancer (Modifier) → ASR Recognizer (Consumer)

This modular design allows you to plug and play components based on your use case.

Pipeline class

CPP

#include <vsdk/audio/Pipeline.hpp>
#include <vsdk/audio/producers/File.hpp>
#include <vsdk/audio/consumers/File.hpp>

Pipeline p;
p.setProducer<Vsdk::Audio::Producer::File>(inPath);
p.pushBackConsumer<Vsdk::Consumer::File>(outPath);
p.start();

The usage of .start(), .run(), and .stop() may vary depending on the technology you’re using (e.g., ASR, TTS). Always refer to the specific guide for each module.

However, some behaviors are consistent:

.start() runs the pipeline in a new thread
.run() runs the pipeline and waits till it is finished (blocking)
.stop() is used to terminate the pipeline execution

A pipeline can be stopped and safely restarted by calling .start() again when needed.

Custom Modules

You can implement your own audio modules. This is particularly useful for custom pre-processing or post-processing stages in your voice workflow.

Types of modules

ProducerModule
ModifierModule
ConsumerModule

Implementation

ProducerModule - File example

CPP

#pragma once

// Project includes
#include <vsdk/audio/ProducerModule.hpp>
#include <vsdk/audio/Buffer.hpp>
#include <vsdk/Exception.hpp>

// C++ includes
#include <thread>
#include <vector>
#include <cstdint>
#include <chrono>
#include <memory>

namespace Vsdk { namespace Audio { namespace Producer
{
    class SyncCustomProducer : public ProducerModuleImpl<SyncCustomProducer>
    {
    protected:
        void openImpl() override
        {
            _state = State::Opened;
        }

        void runImpl() override
        {
            _state = State::Started;

            int sampleRate = 16000;
            int channelCount = 1;
            std::size_t bufferSize = 320; // 20 ms of 16kHz mono PCM

            for (int i = 0; i < 3; ++i)
            {
                std::vector<int16_t> audio(bufferSize / sizeof(int16_t), 0); // silence
                dispatchAudio(std::move(audio), sampleRate, channelCount, false);
                std::this_thread::sleep_for(std::chrono::milliseconds(100));
            }

            // Final empty buffer
            dispatchAudio(std::vector<int16_t>{}, sampleRate, channelCount, true);
            _state = State::Idle;
        }
    };
}}} // namespace Vsdk::Audio::Producer

// ========================================

#pragma once

// Project includes
#include <vsdk/audio/ProducerModule.hpp>
#include <vsdk/audio/Buffer.hpp>
#include <vsdk/Exception.hpp>

// C++ includes
#include <atomic>
#include <memory>
#include <thread>
#include <vector>
#include <chrono>

namespace Vsdk { namespace Audio { namespace Producer
{
    class AsyncCustomProducer : public ProducerModuleImpl<AsyncCustomProducer>
    {
    private:
        std::atomic<bool> _running{false};
        std::unique_ptr<std::thread> _thread;

    protected:
        void openImpl() override
        {
            _state = State::Opened;
        }

        void startImpl() override
        {
            _state = State::Started;
            _running = true;

            _thread = std::make_unique<std::thread>([this]
            {
                try
                {
                    int sampleRate = 16000;
                    int channelCount = 1;
                    std::size_t bufferSize = 320; // 20 ms

                    while (_running)
                    {
                        std::vector<int16_t> audio(bufferSize / sizeof(int16_t), 0);
                        dispatchAudio(std::move(audio), sampleRate, channelCount, false);
                        std::this_thread::sleep_for(std::chrono::milliseconds(100));
                    }

                    // Final buffer
                    dispatchAudio(std::vector<int16_t>{}, sampleRate, channelCount, true);
                }
                catch (std::exception const & e)
                {
                    VSDK_C_ASSERT(false, "AsyncCustomProducer error: {}", e.what());
                }
            });
        }

        void stopImpl() override
        {
            _running = false;
            if (_thread && _thread->joinable())
            {
                _thread->join();
                _thread.reset();
            }

            _state = State::Idle;
        }
    };
}}} // namespace Vsdk::Audio::Producer

ModifierModule - CustomModifier

CPP

#pragma once

// Project includes
#include <vsdk/audio/Buffer.hpp>
#include <vsdk/audio/IModifierModule.hpp>
#include <vsdk/Exception.hpp>

// C++ includes
#include <iostream>

namespace Vsdk { namespace Audio { namespace Modifier
{
    /// @brief A custom modifier that modifies incoming audio buffers.
    /// This class can be used as a template to implement your own modifier logic.
    class CustomModifier : public ModifierModule
    {
    public:
        /// @brief Processes the incoming audio buffer and modifies it.
        /// @param buffer The buffer containing audio data. This may be modified in place.
        /// @param last True if this is the last buffer to process.
        ///
        /// This method is called when there is an available buffer to process
        /// from the producer. Execution will be done in the thread of the producer.
        void process(Buffer & buffer, bool last) override
        {
            // Example: zero out the buffer (silence it)
            std::fill(buffer.data().begin(), buffer.data().end(), 0);

            std::cout << "Modified buffer of size: " << buffer.size() << std::endl;

            if (last)
            {
                std::cout << "This was the last buffer." << std::endl;
            }

            // Add your custom modification logic here
        }

        /// @brief Number of expected audio channels to process
        int inputChannelCount() const override
        {
            return 1; // Adjust based on your implementation
        }

        /// @brief Number of audio channels to output after processing
        int outputChannelCount() const override
        {
            return 1; // Adjust based on your implementation
        }
    };
}}} // namespace Vsdk::Audio::Modifier

ConsumerModule - CustomConsumer

CPP

#pragma once

// Project includes
#include <vsdk/audio/Buffer.hpp>
#include <vsdk/audio/IConsumerModule.hpp>
#include <vsdk/Exception.hpp>

// C++ includes
#include <iostream>

namespace Vsdk { namespace Audio { namespace Consumer
{
    /// @brief A custom consumer that processes incoming audio buffers.
    /// This class can be used as a template to implement your own consumer logic.
    class CustomConsumer : public ConsumerModule
    {
    public:
        /// @brief Processes the incoming audio buffer.
        /// @param buffer The buffer containing audio data.
        /// @param last True if this is the last buffer to process.
        /// 
        /// This method is called when there is an available buffer to process
        /// from the producer. Execution will be done in the thread of the producer.
        void process(Buffer const & buffer, bool last) override
        {
            // Example: print the buffer size
            std::cout << "Received buffer of size: " << buffer.size() << std::endl;

            if (last)
            {
                std::cout << "This was the last buffer." << std::endl;
            }

            // Add your custom processing logic here
        }
    };
}}} // namespace Vsdk::Audio::Consumer

Example: Creating a basic Audio Recording Pipeline

CPP

#include <csignal>
#include <sstream>
#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/audio/consumers/File.hpp>
#include <vsdk/utils/PortAudio.hpp>
#include <vsdk/utils/samples/EventLoop.hpp>

using Vsdk::Utils::Samples::EventLoop;

int main() try
{
    std::shared_ptr<void> const eventLoopGuard(nullptr, [] (auto) { EventLoop::destroy(); });

    auto const mic = Vsdk::Audio::Producer::PaMicrophone::make();
    Vsdk::Utils::PortAudio::printAvailableDeviceNames(Vsdk::Utils::PortAudio::DeviceType::Input);
    fmt::print("Using input device '{}'\n", mic->name());

    Vsdk::Audio::Pipeline p1;
    p1.setProducer(mic);
    p1.pushBackConsumer<Vsdk::Audio::Consumer::File>("output.wav", true);

    EventLoop::instance().queue([&]
    {
        p1.start();
    });
    EventLoop::instance().run(); // Block on run() and wait for jobs until explicit shutdown

    return EXIT_SUCCESS;
}
catch (std::exception const & e)
{
    fmt::print(stderr, "A fatal error occured:\n");
    Vsdk::printExceptionStack(e);
    return EXIT_FAILURE;
}

If everything is configured correctly, a new file should be created. You can play the audio using the following command:

BASH

aplay -f S16_LE -r 16000 -c 1 output.wav

Explanation:

-f S16_LE — Signed 16-bit Little Endian (common PCM format)
-r 16000 — 16,000 Hz sample rate
-c 1 — Mono (single audio channel)

Error Handling

The VSDK uses exceptions to report errors, reducing the need to manually check every function call.
To help trace the origin of an error, an exception stack is maintained. The following base program is recommended for printing the full error stack:

CPP

#include <vsdk/Exception.hpp>

int main() try
{
    // use VSDK here
    return EXIT_SUCCESS;
}
catch (std::exception const & e)
{
    fmt::print(stderr, "A fatal error occured:\n");
    Vsdk::printExceptionStack(e);
    return EXIT_FAILURE;
}

Please note that some parts of the SDK may run on separate threads, and exceptions cannot cross thread boundaries. To ensure stability, you must either catch exceptions within those threads or delegate tasks to the main thread for safe execution.