VSDK Voice Synthesis - Android
VDK features two TTS libraries: CSDK and Baratinoo.
Configuration
TTS engines must be configured before the program starts. Here is a complete setup with 2 channels, each one using a different language (using the CSDK engine):
{
"version": "2.0",
"csdk": {
"tts": {
"channels": {
"channelFrf": {
"voices": ["frf,aurelie,embedded-compact", "enu,ava,embedded-compact"]
},
"channelEnu": {
"voices": ["enu,ava,embedded-compact"]
}
}
}
}
}
An empty channel list will trigger an error, as well as an empty voice list!
Voice format
Each engine has its own voice format, described in the following table:
Engine | Format | Example |
---|---|---|
vsdk-csdk |
|
|
vsdk-baratinoo |
|
|
Starting the engine
com.vivoka.vsdk.Vsdk.init(mContext, "config/main.json", vsdkSuccess -> {
if (vsdkSuccess)
{
com.vivoka.csdk.tts.Engine.getInstance().init(mContext, engineSuccess -> {
if (engineSuccess)
{
// at this point the TtsEngine has been correctly initialized
}
});
}
});
Creating a channel
Remember, channel must be configured beforehand!
Channel channelFrf = com.vivoka.csdk.tts.Engine.getInstance().makeChannel("channelFrf", "frf,aurelie,embedded-compact");
Speech Synthesis
Speech Synthesis is asynchronous! That means the call will not block the thread during the synthesis.
channelFrf.synthesisFromText("Bonjour ! Je suis une voix synthétique", () -> {
// channelFrf.synthesisResult contains the audioData to play
});
// Also works with SSML input
final String ssml = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"fr-FR\">Bonjour Vivoka</speak>";
channelFrf.synthesisFromSSML(ssml, () -> {
// channelFrf.synthesisResult contains the audioData to play
});
Playing the result
VSDK provides an audio player. Playing the result is very easy:
AudioPlayer.play(channel.synthesisResult.getAudioData(),
channel.synthesisResult.getSampleRate(),
new AudioTrack.OnPlaybackPositionUpdateListener()
{
@Override
public void onMarkerReached(AudioTrack track) {}
@Override
public void onPeriodicNotification(AudioTrack track) { }
});
The audio data is a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:
Engine | Sample Rate (kHz) |
---|---|
csdk | 22050 |
baratinoo | 24000 |
Storing the result on disk
channel.synthesisResult.saveToFile("directory", "filename", new ICreateAudioFileListener(){});
Only PCM extension is available, which means the file has no audio header of any sort.