VSDK Voice Synthesis - Android

VDK features two TTS libraries: CSDK and Baratinoo.

Conﬁguration

TTS engines must be conﬁgured before the program starts. Here is a complete setup with 2 channels, each one using a different language (using the CSDK engine):

JSON

{
    "version": "2.0",
    "csdk": {
        "tts": {
            "channels": {
                "channelFrf": {
                    "voices": ["frf,aurelie,embedded-compact", "enu,ava,embedded-compact"]
                },
                "channelEnu": {
                    "voices": ["enu,ava,embedded-compact"]
                }
            }
        }
    }
}

An empty channel list will trigger an error, as well as an empty voice list!

Voice format

Each engine has its own voice format, described in the following table:

Engine	Format	Example
vsdk-csdk	`<language>,<name>,<quality>`	`enu,evan,embedded-pro`
vsdk-baratinoo	`<name>`	`Arnaud_neutre`

Starting the engine

JAVA

com.vivoka.vsdk.Vsdk.init(mContext, "config/main.json", vsdkSuccess -> {
    if (vsdkSuccess)
    {
        com.vivoka.csdk.tts.Engine.getInstance().init(mContext, engineSuccess -> {
            if (engineSuccess)
            {
                // at this point the TtsEngine has been correctly initialized   
            }    
        });
    }
});

Creating a channel

Remember, channel must be conﬁgured beforehand!

JAVA

Channel channelFrf = com.vivoka.csdk.tts.Engine.getInstance().makeChannel("channelFrf", "frf,aurelie,embedded-compact");

Speech Synthesis

Speech Synthesis is asynchronous! That means the call will not block the thread during the synthesis.

JAVA

channelFrf.synthesisFromText("Bonjour ! Je suis une voix synthétique", () -> {
    // channelFrf.synthesisResult contains the audioData to play
});

// Also works with SSML input
final String ssml = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"fr-FR\">Bonjour Vivoka</speak>";
channelFrf.synthesisFromSSML(ssml, () -> {
    // channelFrf.synthesisResult contains the audioData to play
});

Playing the result

VSDK provides an audio player. Playing the result is very easy:

JAVA

AudioPlayer.play(channel.synthesisResult.getAudioData(),
                 channel.synthesisResult.getSampleRate(),
                 new AudioTrack.OnPlaybackPositionUpdateListener()
                 {
                    @Override
                    public void onMarkerReached(AudioTrack track) {}

                    @Override
                    public void onPeriodicNotification(AudioTrack track) { }
                });

The audio data is a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:

Engine	Sample Rate (kHz)
csdk	22050
baratinoo	24000

Storing the result on disk

JAVA

 channel.synthesisResult.saveToFile("directory", "filename", new ICreateAudioFileListener(){});

Only PCM extension is available, which means the ﬁle has no audio header of any sort.