Skip to main content
Skip table of contents

Voice Synthesis - C++

VDK features two Voice Synthesis libraries: vsdk-csdk and vsdk-baratinoo.

Configuring the engine

Voice synthesis engines must be configured before the program starts.

An empty channel list will trigger an error, as well as an empty voice list!

Use the VDK Studio to generate the configuration and the data directory. After creating a custom project with the channels and the voices of your choice, just export it to your project’s location.

Voice ID format

Each engine has its own voice format, described in the following table:










Starting the engine

Source code
#include <vsdk/tts/baratinoo.hpp>

using TtsEngine = Vsdk::Tts::Baratinoo::Engine;
auto engine     = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");
Source code
#include <vsdk/tts/csdk.hpp>

using TtsEngine = Vsdk::Tts::Csdk::Engine;
auto engine     = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");

Listing the configured channels and voices

// C++17 or higher
for (auto const & [channel, voices] : engine->availableVoices())
    fmt::print("Available voices for '{}': ['{}']\n", channel, fmt::join(voices, "'; '"));

// C++11 or higher
for (auto const & it : engine->availableVoices())
    fmt::print("Available voices for '{}': ['{}']\n", it.first, fmt::join(it.second, "'; '"));

Getting a channel

auto channelFr = engine->channel("MyChannel_fr");
channelFr->setCurrentVoice("Arnaud_neutre"); // mandatory before any synthesis can take place

You can also activate a voice right away:

auto channelEn = engine->makeChannel("MyChannel_en", "laura");

The engine instance can't die while at least one channel instance is alive. Destruction order is important!

Blocking Speech Synthesis

The following method will block until the synthesis is fully finished, then return a buffer you can play right away.

Synthesizing raw text:

Vsdk::Audio::Buffer const buffer = Vsdk::Tts::synthesizeFromText(channel, "Hello");

Audio::Buffer is NOT a pointer type! Avoid copying it around, prefer move operations.

Synthesizing SSML text:

auto const ssml = R"(<speak version="1.0" xmlns="" xml:lang="enUS">
  Here is an <say-as interpret-as="characters">SSML</say-as> sample.
auto const buffer = Vsdk::Tts::synthesizeFromText(channel, ssml);

Synthesizing from a file:

auto const buffer = Vsdk::Tts::synthesizeFromFile(channel, "path/to/file.txt");

The synthesis result is a buffer that contains raw audio data. Audio format is 16-bit signed Little-Endian PCM buffer. Channel count and sample rate can be queried using Channel::channelCount() and Channel::sampleRate().


Sample Rate (Hz)





Playing the result

VSDK provides a cross-platform player in the vsdk-audio-portaudio package.

Playing the result is very easy:

#include <vsdk/audio/PaStandalonePlayer.hpp>

auto buffer = Vsdk::Tts::synthesizeFromText(channel, "Text to synthesize");
Vsdk::Audio::PaStandalonePlayer player;;

Storing the result on disk


Only PCM (raw) extension is available, which means the file has no audio header of any sort. You can play it by supplying the right parameter, i.e.: aplay -f S16_LE -s -c 1 file.pcm or add a WAV header: ffmpeg -f s16le -ar -ac 1 -i file.pcm file.wav.

In Windows you can use Audacity to import raw data and then you can play or convert it.

Streaming Speech Synthesis

Streaming the synthesis enables you to get audio chunk regularly instead of waiting for the whole generation process to be done. This is done using the pipeline system, which lets you choose the synchronous or asynchronous mode ( vs. pipeline.start());


Starts an asynchronous synthesis that plays the result on default output device right away:

#include <vsdk/audio/consumer/PaPlayer.hpp>

Vsdk::Audio::Pipeline pipeline;
pipeline.start(); // Starts the channel for future synthesis requests

// Starts the actual synthesis
// Since we are using the asynchronous mode, we will reach here without blocking!
// Make sure to do something or wait else the pipeline will go out of score and stop.

Starts a synchronous synthesis from a text file whose audio data gets stored in a buffer:

#include <vsdk/audio/BufferModule.hpp>

Vsdk::Audio::Pipeline pipeline;
auto bufferModule = *pipeline.pushBackConsumer<Vsdk::Audio::BufferModule>();
// Channel not started yet, waiting for start() or run() to truly do the job
channel->synthesizeFromFile("...");; // Block this thread until task is done
auto const result = std::move(bufferModule->buffer()); // Get available result data

Most functions used have useful return values to indicate whether everything worked. Pipeline never throw so in case of an error using one of its operations, you can get the error string with the lastError() method.

Marker events and runtime errors or warnings

Be it synchronous or asynchronous, you can subscribe to a channel to get events and/or errors & warnings happening during synthesis:

channel->subscribe([&] (Vsdk::Tts::Channel::Event const & e)
    if (e.type == Vsdk::Tts::Channel::EventCode::WordMarkerStart)
        Vsdk::Tts::Events::WordMarker const marker = nlohmann_json::parse(e.message);
        fmt::print("[{}] Current word being played: '{}'\n", channel->name(), marker.word);
    else if (e.type == Vsdk::Tts::Channel::EventCode::ProcessFinished)
        // Synthesis finished, can now start a new one or signal another process
        auto const msg = e.message.empty() ? "" : ": " + e.message;
        fmt::print("[{}] {}{}\n", channel->name(), e.codeString, msg);

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.