VSDK - C++
Introduction
The Vivoka SDK (VSDK) is a C++ software development kit designed to simplify the integration of voice technologies into your application. It enables you to easily embed and run voice projects exported from VDK-Studio, providing a seamless way to bring embedded voice capabilities into your C++ environment.
Get Started Options
Start with the provided C++ sample project – Ideal for quickly getting up and running with a working example using VSDK in a C++ environment.
Integrate VSDK into your existing C++ application – Recommended if you're adding voice capabilities to an already developed application or system.
Package manager
VSDK is installable through Conan, the C++ package manager. Before getting started, make sure to install and configure Conan, for that we have a guide. Conan will be used to download libraries and/or sample code.
Option 1: Start from Sample Project
Quickly get up and running with a preconfigured project showcasing VSDK usage.
Review the descriptions below to choose the sample(s) that best match your use case.
Package name | Description |
|---|---|
simple-application | Demonstrate Speech Synthesis usage immediately after completing Voice Recognition. |
dynamic-grammar | Demonstrate the use of dynamic models by utilizing slots within grammars and populating them at runtime. |
chained-grammars | Demonstrate how to perform seamless Voice Recognition between a Wake Word and a follow-up model without any gap in detection. |
tts | Demonstrate how to implement and use Speech Synthesis functionality. |
voice-biometrics | Demonstrate how to implement and use Voice Biometrics functionality. |
voice-commands-language-understanding | Show how to integrate Natural Language Understanding (NLU) with Voice Recognition to interpret and act on spoken user input. |
speech-enhancement | Demonstrate how to use the Speech Enhancement technology by running two Speech Recognition instances—one with Speech Enhancement enabled and one without—so their performance can be compared. |
barge-in | Demonstrate the barge-in feature (Speech Enhancement technology) to filter out a reference audio from the input using files only. |
barge-in-asr-tts | Show how to use the barge-in feature along with Voice Recognition and Speech Synthesis. |
For detailed setup instructions, please refer to the accompanying page: How-to: Download, Compile & Run C++ Samples.
In short, you need to complete two steps:
Download all the required libraries (using
conan).Open the VDK project in VDK-Studio and export it to the sample directory.
Don’t forget to check the README.md file included in the sample code for additional guidance and usage instructions.
Option 2: Integrate into Existing Project
Add VSDK to your current app by following the setup and integration steps.
1. Creating and Exporting VDK-Studio Project
Currently, you cannot export a VSDK project directly from the online VDK Studio. Only VDK Service exports are supported at this time.
Before exporting, you’ll first need to create a VDK-Studio project. To do this, follow our dedicated setup guide. For the purpose of this integration, you don’t need to configure any specific technologies—exporting a project with whatever technology you have access.
You need to create a project with VSDK, not with VDK-Service!
When exporting your project:
Select Linux/Windows as the target platform.
Set the target folder in your project’s directory.
After the export completes, you will see the following folder structure inside your project:
# VDK-Studio >=5.9.1
config/
└── vsdk.json # VSDK configuration file
data/ # Contains required voice technology resources
2. Install Libraries
To install the required dependencies, we use the Conan package manager. To get started, first install and configure Conan, then create a Conan configuration file specifying the necessary libraries.
As an example of a Conan configuration file, we’ll use the one provided in the sample code.
To learn more about installing libraries, compiling, and running your app, please refer to our detailed guide.
To follow this guide, you’ll need the libraries listed below. These are common across all technologies. More specific dependencies will be provided in technology-specific guides.
vsdk-audio-portaudio/4.1.0@vivoka/customervsdk/10.1.2@vivoka/customervsdk-samples-utils/1.1.0@vivoka/customer
If you’re using Conan 2 you can’t use old packages. Try to stick to the latest if possible when possible.
3. Version
Once the libraries are installed, you should be able to run the code and retrieve the VSDK version.
The next step is to explore key concepts such as the Engine and Pipeline.
#include <vsdk/global.hpp>
fmt::print("VSDK v{}", Vsdk::version());
Engine
Each technology-provider pair in VSDK has its own dedicated engine, which must be initialized once.
Here’s an example initializing both Speech Enhancement and ASR engines:
#include <vsdk/asr/csdk.hpp>
#include <vsdk/speech-enhancement/s2c.hpp>
using AsrEngine = Vsdk::Asr::Csdk::Engine;
auto const engine = Vsdk::Asr::Engine::make<AsrEngine>("config/vsdk.json");
using S2cEngine = Vsdk::SpeechEnhancer::S2c::Engine;
auto const engine = Vsdk::SpeechEnhancer::Engine::make<S2cEngine>("config/vsdk.json");
Once an engine is initialized, you can build your audio pipeline using the corresponding components for that technology.
The following Audio Pipeline example does not require any engine initialization, as it doesn’t rely on any specific technology.
Audio Pipeline
What is a Pipeline?
A pipeline is a processing chain that handles audio flow through three types of components:
Producer: Captures or generates audio (e.g., microphone input, or TTS channel).
Modifiers (optional): Process or alter the audio (e.g., filters, noise reduction).
Consumers: Use or analyze the audio (e.g., speaker, ASR recognizer).
Flow
Producer → [Modifiers] → [Consumers]
Examples:
TTS channel (Producer) → AudioPlayer (Consumer)
AudioRecorder (Producer) → ASR Recognizer (Consumer)
AudioRecorder (Producer) → Speech Enhancer (Modifier) → ASR Recognizer (Consumer)
This modular design allows you to plug and play components based on your use case.
Pipeline class
#include <vsdk/audio/Pipeline.hpp>
#include <vsdk/audio/producers/File.hpp>
#include <vsdk/audio/consumers/File.hpp>
Pipeline p;
p.setProducer<Vsdk::Audio::Producer::File>(inPath);
p.pushBackConsumer<Vsdk::Consumer::File>(outPath);
p.start();
The usage of .start(), .run(), and .stop() may vary depending on the technology you’re using (e.g., ASR, TTS). Always refer to the specific guide for each module.
However, some behaviors are consistent:
.start()runs the pipeline in a new thread.run()runs the pipeline and waits till it is finished (blocking).stop()is used to terminate the pipeline execution
A pipeline can be stopped and safely restarted by calling .start() again when needed.
Custom Modules
You can implement your own audio modules. This is particularly useful for custom pre-processing or post-processing stages in your voice workflow.
Types of modules
ProducerModuleModifierModuleConsumerModule
Implementation
Example: Creating a basic Audio Recording Pipeline
#include <csignal>
#include <sstream>
#include <vsdk/audio/producers/PaMicrophone.hpp>
#include <vsdk/audio/consumers/File.hpp>
#include <vsdk/utils/PortAudio.hpp>
#include <vsdk/utils/samples/EventLoop.hpp>
using Vsdk::Utils::Samples::EventLoop;
int main() try
{
std::shared_ptr<void> const eventLoopGuard(nullptr, [] (auto) { EventLoop::destroy(); });
auto const mic = Vsdk::Audio::Producer::PaMicrophone::make();
Vsdk::Utils::PortAudio::printAvailableDeviceNames(Vsdk::Utils::PortAudio::DeviceType::Input);
fmt::print("Using input device '{}'\n", mic->name());
Vsdk::Audio::Pipeline p1;
p1.setProducer(mic);
p1.pushBackConsumer<Vsdk::Audio::Consumer::File>("output.wav", true);
EventLoop::instance().queue([&]
{
p1.start();
});
EventLoop::instance().run(); // Block on run() and wait for jobs until explicit shutdown
return EXIT_SUCCESS;
}
catch (std::exception const & e)
{
fmt::print(stderr, "A fatal error occured:\n");
Vsdk::printExceptionStack(e);
return EXIT_FAILURE;
}
If everything is configured correctly, a new file should be created. You can play the audio using the following command:
aplay -f S16_LE -r 16000 -c 1 output.wav
Explanation:
-f S16_LE— Signed 16-bit Little Endian (common PCM format)-r 16000— 16,000 Hz sample rate-c 1— Mono (single audio channel)
Error Handling
The VSDK uses exceptions to report errors, reducing the need to manually check every function call.
To help trace the origin of an error, an exception stack is maintained. The following base program is recommended for printing the full error stack:
#include <vsdk/Exception.hpp>
int main() try
{
// use VSDK here
return EXIT_SUCCESS;
}
catch (std::exception const & e)
{
fmt::print(stderr, "A fatal error occured:\n");
Vsdk::printExceptionStack(e);
return EXIT_FAILURE;
}
Please note that some parts of the SDK may run on separate threads, and exceptions cannot cross thread boundaries. To ensure stability, you must either catch exceptions within those threads or delegate tasks to the main thread for safe execution.