VSDK Voice Synthesis - C++
VDK features three Voice Synthesis libraries: CSDK (Cerence)
, Baratinoo (Voxygen)
and VtApi (ReadSpeaker)
.
Configuring the engine
Voice synthesis engines must be configured before the program starts. Here is a complete setup with 2 channels, one for each language possible.
Configuration parameter | Type | Description |
---|---|---|
| String | The configuration version number. Constant |
| String | The voices data location. This is relative to vsdk.json itself, NOT the program's working dir! |
| Object | Contains collection of channel description. The key is the channel name. |
| Array | List of the voices used by the channel. |
An empty channel list will trigger an error, as well as an empty voice list!
You can use the VDK to generate the configuration and the data directory. After creating a custom project with the channels and the voices of your choice, just export it to your binary location.
Voice id format
Each engine has its own voice format, described in the following table:
Engine | Format | Example |
---|---|---|
vsdk-csdk |
|
|
vsdk-vtapi |
|
|
vsdk-baratinoo |
|
|
Starting the engine
Listing the configured channels and voices
// With C++17 or higher
for (auto const & [channel, voices] : engine->availableVoices())
fmt::print("Available voices for '{}': ['{}']\n", channel, fmt::join(voices, "'; '"));
// With C++11 or higher
for (auto const & it : engine->availableVoices())
fmt::print("Available voices for '{}': ['{}']\n", it.first, fmt::join(it.second, "'; '"));
Creating a channel
Remember, channel must be configured beforehand!
Vsdk::Tts::ChannelPtr const channelFr = engine->channel("MyChannel_fr");
channelFr->setCurrentVoice("Arnaud_neutre"); // mandatory before any synthesis can take place
You can also activate a voice right away:
auto const channelEn = engine->makeChannel("MyChannel_en", "laura");
// ChannelPtr is also a std::shared_ptr
The engine instance can't die while at least one channel instance is alive. Destruction order is important!
Speech Synthesis
Synthesizing raw text:
Vsdk::Audio::Buffer const resultFr = channelFR->synthesizeFromText("Bonjour!");
Synthesizing ssml text:
auto const ssml = R"(<speak version="1.0" xmlns="http://www.w3.org/2001/10/synthesis" xml:lang="enUS">
Here is an <say-as interpret-as="characters">SSML</say-as> sample.
</speak>)";
auto const resultEn = channelEn->synthesisFromText(ssml);
Synthesizing ssml from file:
auto const result = channel->synthesisFromFile("path/to/file");
Speech Synthesis is synchronous! That means the call will block the thread until the synthesis is done or an error occurred. If you need to keep going, put that in another thread.
Audio::Buffer is NOT a pointer type! Avoid copying it around, prefer move operations.
The synthesis result is a buffer that contains a raw audio data. It’s a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:
Engine | Sample Rate (kHz) |
---|---|
csdk | 22050 |
baratinoo | 24000 |
vtapi | 22050 |
Playing the result
VSDK provides a cross-platform player in the vsdk-audio-portaudio
package.
Playing the result is very easy:
#include <vsdk/audio/consumers/PaPlayer.hpp>
...
Vsdk::Audio::Consumer::PaPlayer player;
player.play(buffer.data(), buffer.sampleRate(), buffer.channelCount());
// Or more simply
player.play(buffer);
...
Storing the result on disk
buffer.saveToFile("path/to/file.pcm");
Only PCM extension is available, which means the file has no audio header of any sort. You can play it by supplying the right parameter, i.e.: aplay -f S16_LE -s -c 1 file.pcm
or add a WAV header: ffmpeg -f s16le -ar -ac 1 -i file.pcm file.wav
.
In Windows you can use Audacity to import raw data and then you can play or convert it.
Known issues
ReadSpeaker
The errors bellow was tested on ARMV7HF architecture.
A fatal error occurred:
* Failed to load VTAPI engine
* Wrong 'vtapi.paths.data_root' value 'config/../data/vtapi/tts': no such file or directory [Engine.cpp:106]
Voice data was not exported by studio. Make sure to create a custom project which contains the ReadSpeaker voice of your choice and export the data to your project directory.
A fatal error occurred:
* Failed to load VTAPI engine
* Failed to load licence: No such file or directory [VoiceEngineManager.cpp:52]
License file was not exported by studio make sure it is in the correct location data/vtapi/tts/verification.enc
.
We suggest that you delete your current data directory and export again your custom project.
A fatal error occurred:
* Failed to synthesize text
* Synthesis failed [Channel.cpp:102]
* Error: VTAPI_OVER_CHANNEL_ERROR (code: -11) [Channel.cpp:102]
This error indicate that your license file is invalid.
We suggest that you delete your current data directory and export again your custom project.
[WARN] Failed to load the following modules: [libvtconv, libvteffect, libvtsave, libvtssml] (11110)
These libraries are exported with the VDK Studio in the following directory data/vtapi/tts
.
We suggest that you delete your current data directory and export again your custom project.
The readspeaker voices in armv7hf
doesn’t have the library libvteffect
.
A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
* Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
* Missing voice info [VoiceEngineManager.cpp:93]
The voice info doesn’t exist. Make sure that you use the right voice id and channel name that you created in your custom project using the VDK Studio. The voice data will be exported in the following directory data/vtapi/tts/{voice_name}/{quality}
for readspeaker channels.
We suggest that you delete your current data directory and export again your custom project.
A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
* Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
* Failed to load voice engine from '/home/pi/project/data/vtapi/tts/elisa/p22' [VoiceEngine.cpp:43]
* Error: VTENGINE_INIT_ERROR (code: -111) [VoiceEngine.cpp:43]
This error indicate that the voice library was not found make sure the voice’s .so files are located under data/vtapi/tts/{voice_name}/{quality}/bin
directory.
We suggest that you delete your current data directory and export again your custom project.
A fatal error occurred:
Engine not null * Failed to set voice 'elisa,p22' on channel 'main'
* Failed to load voice engine for 'elisa,p22' [VoiceEngineManager.cpp:114]
* Failed to load voice engine from '/home/pi/project/data/vtapi/tts/elisa/p22' [VoiceEngine.cpp:43]
* Error: Can't load engine [elisa][p22](-133) (code: -133) [VoiceEngine.cpp:43]
This error indicate that the voice data was not found make.
We suggest that you delete your current data directory and export again your custom project.