Skip to main content
Skip table of contents

Voice Synthesis - C++

VDK features three Voice Synthesis libraries: CSDK (Cerence), Baratinoo (Voxygen) and VtApi (ReadSpeaker).

Configuring the engine

Voice synthesis engines must be configured before the program starts. Here is a complete setup with 2 channels, one for each language possible.

Baratinoo
Configuration file: config/vsdk.json
JSON
{
  "version": "2.0",
  "baratinoo": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "Arnaud_neutre" ]
        },
        "MyChannel_en": {
          "voices": [ "Laura" ]
        }
      }
    }
  }
}
Csdk
Configuration file: config/vsdk.json
JSON
{
  "csdk": {
    "paths": {
      "data_root": "../data/csdk",
      "tts": "tts"
    },
    "tts": {
      "channels": {
        "MyChannel_en": {
          "voices": [
            "enu,zoe-ml,embedded-premium",
            "enu,tom,embedded-high"
          ]
        },
        "MyChannel_fr": {
          "voices": [
            "frf,audrey,embedded-compact",
            "frf,thomas,embedded-pro"
          ]
        }
      }
    }
  },
  "version": "2.0"
}
VtApi
Configuration file: config/vsdk.json
JSON
{
  "version": "2.0",
  "vtapi": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "louis,p22" ]
        },
        "MyChannel_en": {
          "voices": [ "kate,d22" ]
        }
      }
    }
  }
}

Configuration parameter

Type

Description

version

String

The configuration version number. Constant 2.0.

<provider>.paths.data_path /

<provider>.paths.tts

String

The voices data location.

This is relative to vsdk.json itself, NOT the program's working dir!

<provider>.<tech>.channels

Object

Contains collection of channel description. The key is the channel name.

<provider>.channels.<channelid>.voices

Array

List of the voices used by the channel.

An empty channel list will trigger an error, as well as an empty voice list!

You can use the VDK to generate the configuration and the data directory. After creating a custom project with the channels and the voices of your choice, just export it to your binary location.

Voice id format

Each engine has its own voice format, described in the following table:

Engine

Format

Example

vsdk-csdk

<language>,<name>,<quality>

enu,evan,embedded-pro

vsdk-vtapi

<name>,<quality>

alice,d22

vsdk-baratinoo

<name>

Arnaud_neutre

Starting the engine

Baratinoo
Source code
CPP
#include <vsdk/tts/baratinoo.hpp>

using TtsEngine = Vsdk::Tts::Baratinoo::Engine;
auto const engine = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");
Csdk
Source code
CPP
#include <vsdk/tts/csdk.hpp>

using TtsEngine = Vsdk::Tts::Csdk::Engine;
auto const engine = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");
VtApi
Source code
CPP
// VtApi defines ERROR which is already defined in windows
#undef ERROR
#include <vsdk/tts/vtapi.hpp>

using TtsEngine = Vsdk::Tts::VtApi::Engine;
auto const engine = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");

Listing the configured channels and voices

CPP
// With C++17 or higher
for (auto const & [channel, voices] : engine->availableVoices())
 fmt::print("Available voices for '{}': ['{}']\n", channel, fmt::join(voices, "'; '"));

// With C++11 or higher
for (auto const & it : engine->availableVoices())
 fmt::print("Available voices for '{}': ['{}']\n", it.first, fmt::join(it.second, "'; '"));

Creating a channel

Remember, channel must be configured beforehand!

CPP
Vsdk::Tts::ChannelPtr const channelFr = engine->channel("MyChannel_fr");
channelFr->setCurrentVoice("Arnaud_neutre"); // mandatory before any synthesis can take place

You can also activate a voice right away:

CPP
auto const channelEn = engine->makeChannel("MyChannel_en", "laura");
// ChannelPtr is also a std::shared_ptr

The engine instance can't die while at least one channel instance is alive. Destruction order is important!

Speech Synthesis

Synthesizing raw text:

CPP
Vsdk::Audio::Buffer const resultFr = channelFR->synthesizeFromText("Bonjour!");

Synthesizing ssml text:

CPP
auto const ssml = R"(<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="enUS">
 Here is an <say-as interpret-as="characters">SSML</say-as> sample.
 </speak>)";
auto const resultEn = channelEn->synthesisFromText(ssml);

Synthesizing ssml from file:

CPP
auto const result = channel->synthesisFromFile("path/to/file");

Speech Synthesis is synchronous! That means the call will block the thread until the synthesis is done or an error occurred. If you need to keep going, put that in another thread.

Audio::Buffer is NOT a pointer type! Avoid copying it around, prefer move operations.

The synthesis result is a buffer that contains a raw audio data. It’s a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:

Engine

Sample Rate (kHz)

csdk

22050

baratinoo

24000

vtapi

22050

Playing the result

VSDK provides a cross-platform player in the vsdk-audio-portaudio package.

Playing the result is very easy:

CPP
#include <vsdk/audio/consumers/PaPlayer.hpp>

...
Vsdk::Audio::Consumer::PaPlayer player;

player.play(buffer.data(), buffer.sampleRate(), buffer.channelCount());
// Or more simply
player.play(buffer);
...

Storing the result on disk

CPP
buffer.saveToFile("path/to/file.pcm");

Only PCM extension is available, which means the file has no audio header of any sort. You can play it by supplying the right parameter, i.e.: aplay -f S16_LE -s -c 1 file.pcm or add a WAV header: ffmpeg -f s16le -ar -ac 1 -i file.pcm file.wav.

In Windows you can use Audacity to import raw data and then you can play or convert it.

Known issues

ReadSpeaker

The errors bellow was tested on ARMV7HF architecture.

CODE
A fatal error occurred:
 * Failed to load VTAPI engine
  * Wrong 'vtapi.paths.data_root' value 'config/../data/vtapi/tts': no such file or directory [Engine.cpp:106]

Voice data was not exported by studio. Make sure to create a custom project which contains the ReadSpeaker voice of your choice and export the data to your project directory.

CODE
A fatal error occurred:
 * Failed to load VTAPI engine
  * Failed to load licence: No such file or directory [VoiceEngineManager.cpp:52]

License file was not exported by studio make sure it is in the correct location data/vtapi/tts/verification.enc.
We suggest that you delete your current data directory and export again your custom project.

CODE
A fatal error occurred:
 * Failed to synthesize text
  * Synthesis failed [Channel.cpp:102]
   * Error: VTAPI_OVER_CHANNEL_ERROR (code: -11) [Channel.cpp:102]

This error indicate that your license file is invalid.
We suggest that you delete your current data directory and export again your custom project.

CODE
[WARN] Failed to load the following modules: [libvtconv, libvteffect, libvtsave, libvtssml] (11110)

These libraries are exported with the VDK Studio in the following directory data/vtapi/tts.
We suggest that you delete your current data directory and export again your custom project.

The readspeaker voices in armv7hf doesn’t have the library libvteffect.

CODE
A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
  * Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
   * Missing voice info [VoiceEngineManager.cpp:93]

The voice info doesn’t exist. Make sure that you use the right voice id and channel name that you created in your custom project using the VDK Studio. The voice data will be exported in the following directory data/vtapi/tts/{voice_name}/{quality} for readspeaker channels.
We suggest that you delete your current data directory and export again your custom project.

CODE
A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
  * Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
   * Failed to load voice engine from '/home/pi/project/data/vtapi/tts/elisa/p22' [VoiceEngine.cpp:43]
    * Error: VTENGINE_INIT_ERROR (code: -111) [VoiceEngine.cpp:43]

This error indicate that the voice library was not found make sure the voice’s .so files are located under data/vtapi/tts/{voice_name}/{quality}/bin directory.
We suggest that you delete your current data directory and export again your custom project.

CODE
A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
  * Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
   * Failed to load voice engine from '/home/pi/project/data/vtapi/tts/elisa/p22' [VoiceEngine.cpp:43]
    * Error: Can't load engine [elisa][p22](-133) (code: -133) [VoiceEngine.cpp:43]

This error indicate that the voice data was not found make.
We suggest that you delete your current data directory and export again your custom project.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.