VSDK Voice Synthesis - Android

VDK features two TTS libraries: CSDK and Baratinoo.


Configuration

TTS engines must be configured before the program starts. Here is a complete setup with 2 channels, each one using a different language (using the CSDK engine):

JSON
{
    "version": "2.0",
    "csdk": {
        "tts": {
            "channels": {
                "channelFrf": {
                    "voices": ["frf,aurelie,embedded-compact", "enu,ava,embedded-compact"]
                },
                "channelEnu": {
                    "voices": ["enu,ava,embedded-compact"]
                }
            }
        }
    }
}

An empty channel list will trigger an error, as well as an empty voice list!

Voice format

Each engine has its own voice format, described in the following table:

Engine

Format

Example

vsdk-csdk

<language>,<name>,<quality>

enu,evan,embedded-pro

vsdk-baratinoo

<name>

Arnaud_neutre

Starting the engine

Java
com.vivoka.vsdk.Vsdk.init(mContext, "config/main.json", vsdkSuccess -> {
    if (vsdkSuccess)
    {
        com.vivoka.csdk.tts.Engine.getInstance().init(mContext, engineSuccess -> {
            if (engineSuccess)
            {
                // at this point the TtsEngine has been correctly initialized   
            }    
        });
    }
});

Creating a channel

Remember, channel must be configured beforehand!

Java
Channel channelFrf = com.vivoka.csdk.tts.Engine.getInstance().makeChannel("channelFrf", "frf,aurelie,embedded-compact");

Speech Synthesis

Speech Synthesis is asynchronous! That means the call will not block the thread during the synthesis.

Java
channelFrf.synthesisFromText("Bonjour ! Je suis une voix synthétique", () -> {
    // channelFrf.synthesisResult contains the audioData to play
});

// Also works with SSML input
final String ssml = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"fr-FR\">Bonjour Vivoka</speak>";
channelFrf.synthesisFromSSML(ssml, () -> {
    // channelFrf.synthesisResult contains the audioData to play
});

Playing the result

VSDK provides an audio player. Playing the result is very easy:

Java
AudioPlayer.play(channel.synthesisResult.getAudioData(),
                 channel.synthesisResult.getSampleRate(),
                 new AudioTrack.OnPlaybackPositionUpdateListener()
                 {
                    @Override
                    public void onMarkerReached(AudioTrack track) {}

                    @Override
                    public void onPeriodicNotification(AudioTrack track) { }
                });

The audio data is a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:

Engine

Sample Rate (kHz)

csdk

22050

baratinoo

24000

Storing the result on disk

Java
 channel.synthesisResult.saveToFile("directory", "filename", new ICreateAudioFileListener(){});	

Only PCM extension is available, which means the file has no audio header of any sort.