Voice Synthesis - C++

VDK features three Voice Synthesis libraries: CSDK (Cerence), Baratinoo (Voxygen) and VtApi (ReadSpeaker).

Configuring the engine

Voice synthesis engines must be configured before the program starts. Here is a complete setup with 2 channels, one for each language possible.

Baratinoo

Configuration file: `config/vsdk.json`

JSON

{
  "version": "2.0",
  "baratinoo": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "Arnaud_neutre" ]
        },
        "MyChannel_en": {
          "voices": [ "Laura" ]
        }
      }
    }
  }
}

Csdk

Configuration file: `config/vsdk.json`

JSON

{
  "csdk": {
    "paths": {
      "data_root": "../data/csdk",
      "tts": "tts"
    },
    "tts": {
      "channels": {
        "MyChannel_en": {
          "voices": [
            "enu,zoe-ml,embedded-premium",
            "enu,tom,embedded-high"
          ]
        },
        "MyChannel_fr": {
          "voices": [
            "frf,audrey,embedded-compact",
            "frf,thomas,embedded-pro"
          ]
        }
      }
    }
  },
  "version": "2.0"
}

VtApi

Configuration file: `config/vsdk.json`

JSON

{
  "version": "2.0",
  "vtapi": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "louis,p22" ]
        },
        "MyChannel_en": {
          "voices": [ "kate,d22" ]
        }
      }
    }
  }
}

Configuration parameter	Type	Description
`version`	String	The configuration version number. Constant `2.0`.
`<provider>.paths.data_path` / `<provider>.paths.tts`	String	The voices data location. This is relative to vsdk.json itself, NOT the program's working dir!
`<provider>.<tech>.channels`	Object	Contains collection of channel description. The key is the channel name.
`<provider>.channels.<channelid>.voices`	Array	List of the voices used by the channel.

An empty channel list will trigger an error, as well as an empty voice list!

You can use the VDK to generate the configuration and the data directory. After creating a custom project with the channels and the voices of your choice, just export it to your binary location.

Voice id format

Each engine has its own voice format, described in the following table:

Engine	Format	Example
vsdk-csdk	`<language>,<name>,<quality>`	`enu,evan,embedded-pro`
vsdk-vtapi	`<name>,<quality>`	`alice,d22`
vsdk-baratinoo	`<name>`	`Arnaud_neutre`

Starting the engine

Baratinoo

Source code

CPP

#include <vsdk/tts/baratinoo.hpp>

using TtsEngine = Vsdk::Tts::Baratinoo::Engine;
auto const engine = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");

Csdk

Source code

CPP

#include <vsdk/tts/csdk.hpp>

using TtsEngine = Vsdk::Tts::Csdk::Engine;
auto const engine = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");

VtApi

Source code

CPP

// VtApi defines ERROR which is already defined in windows
#undef ERROR
#include <vsdk/tts/vtapi.hpp>

using TtsEngine = Vsdk::Tts::VtApi::Engine;
auto const engine = Vsdk::Tts::Engine::make<TtsEngine>("config/vsdk.json");

Listing the configured channels and voices

CPP

// With C++17 or higher
for (auto const & [channel, voices] : engine->availableVoices())
 fmt::print("Available voices for '{}': ['{}']\n", channel, fmt::join(voices, "'; '"));

// With C++11 or higher
for (auto const & it : engine->availableVoices())
 fmt::print("Available voices for '{}': ['{}']\n", it.first, fmt::join(it.second, "'; '"));

Creating a channel

Remember, channel must be configured beforehand!

CPP

Vsdk::Tts::ChannelPtr const channelFr = engine->channel("MyChannel_fr");
channelFr->setCurrentVoice("Arnaud_neutre"); // mandatory before any synthesis can take place

You can also activate a voice right away:

CPP

auto const channelEn = engine->makeChannel("MyChannel_en", "laura");
// ChannelPtr is also a std::shared_ptr

The engine instance can't die while at least one channel instance is alive. Destruction order is important!

Speech Synthesis

Synthesizing raw text:

CPP

Vsdk::Audio::Buffer const resultFr = channelFR->synthesizeFromText("Bonjour!");

Synthesizing ssml text:

CPP

auto const ssml = R"(<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="enUS">
 Here is an <say-as interpret-as="characters">SSML</say-as> sample.
 </speak>)";
auto const resultEn = channelEn->synthesisFromText(ssml);

Synthesizing ssml from file:

CPP

auto const result = channel->synthesisFromFile("path/to/file");

Speech Synthesis is synchronous! That means the call will block the thread until the synthesis is done or an error occurred. If you need to keep going, put that in another thread.

Audio::Buffer is NOT a pointer type! Avoid copying it around, prefer move operations.

The synthesis result is a buffer that contains a raw audio data. It’s a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:

Engine	Sample Rate (kHz)
csdk	22050
baratinoo	24000
vtapi	22050

Playing the result

VSDK provides a cross-platform player in the vsdk-audio-portaudio package.

Playing the result is very easy:

CPP

#include <vsdk/audio/consumers/PaPlayer.hpp>

...
Vsdk::Audio::Consumer::PaPlayer player;

player.play(buffer.data(), buffer.sampleRate(), buffer.channelCount());
// Or more simply
player.play(buffer);
...

Storing the result on disk

CPP

buffer.saveToFile("path/to/file.pcm");

Only PCM extension is available, which means the file has no audio header of any sort. You can play it by supplying the right parameter, i.e.: aplay -f S16_LE -s -c 1 file.pcm or add a WAV header: ffmpeg -f s16le -ar -ac 1 -i file.pcm file.wav.

In Windows you can use Audacity to import raw data and then you can play or convert it.

Known issues

ReadSpeaker

The errors bellow was tested on ARMV7HF architecture.

CODE

A fatal error occurred:
 * Failed to load VTAPI engine
  * Wrong 'vtapi.paths.data_root' value 'config/../data/vtapi/tts': no such file or directory [Engine.cpp:106]

Voice data was not exported by studio. Make sure to create a custom project which contains the ReadSpeaker voice of your choice and export the data to your project directory.

CODE

A fatal error occurred:
 * Failed to load VTAPI engine
  * Failed to load licence: No such file or directory [VoiceEngineManager.cpp:52]

License file was not exported by studio make sure it is in the correct location data/vtapi/tts/verification.enc.
We suggest that you delete your current data directory and export again your custom project.

CODE

A fatal error occurred:
 * Failed to synthesize text
  * Synthesis failed [Channel.cpp:102]
   * Error: VTAPI_OVER_CHANNEL_ERROR (code: -11) [Channel.cpp:102]

This error indicate that your license file is invalid.
We suggest that you delete your current data directory and export again your custom project.

CODE

[WARN] Failed to load the following modules: [libvtconv, libvteffect, libvtsave, libvtssml] (11110)

These libraries are exported with the VDK Studio in the following directory data/vtapi/tts.
We suggest that you delete your current data directory and export again your custom project.

The readspeaker voices in armv7hf doesn’t have the library libvteffect.

CODE

A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
  * Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
   * Missing voice info [VoiceEngineManager.cpp:93]

The voice info doesn’t exist. Make sure that you use the right voice id and channel name that you created in your custom project using the VDK Studio. The voice data will be exported in the following directory data/vtapi/tts/{voice_name}/{quality} for readspeaker channels.
We suggest that you delete your current data directory and export again your custom project.

CODE

A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
  * Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
   * Failed to load voice engine from '/home/pi/project/data/vtapi/tts/elisa/p22' [VoiceEngine.cpp:43]
    * Error: VTENGINE_INIT_ERROR (code: -111) [VoiceEngine.cpp:43]

This error indicate that the voice library was not found make sure the voice’s .so files are located under data/vtapi/tts/{voice_name}/{quality}/bin directory.
We suggest that you delete your current data directory and export again your custom project.

CODE

A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
  * Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
   * Failed to load voice engine from '/home/pi/project/data/vtapi/tts/elisa/p22' [VoiceEngine.cpp:43]
    * Error: Can't load engine [elisa][p22](-133) (code: -133) [VoiceEngine.cpp:43]

This error indicate that the voice data was not found make.
We suggest that you delete your current data directory and export again your custom project.

Configuring the engine

Configuration file: config/vsdk.json

Configuration file: config/vsdk.json

Configuration file: config/vsdk.json

Voice id format

Starting the engine

Source code

Source code

Source code

Listing the configured channels and voices

Creating a channel

Speech Synthesis

Playing the result

Storing the result on disk

Known issues

ReadSpeaker

Configuration file: `config/vsdk.json`

Configuration file: `config/vsdk.json`

Configuration file: `config/vsdk.json`