VDK features three Voice Synthesis libraries: CSDK (Cerence), Baratinoo (Voxygen) and VtApi (ReadSpeaker).

Configuration

Voice synthesis engines must be configured before the program starts. Here is a complete setup with 2 channels, one for each language possible.

Baratinoo
Configuration file: config/vsdk.json
{
  "version": "2.0",
  "baratinoo": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "Arnaud_neutre" ]
        },
        "MyChannel_en": {
          "voices": [ "Laura" ]
        }
      }
    }
  }
}
JSON
Csdk
Configuration file: config/vsdk.json
{
  "csdk": {
    "paths": {
      "data_root": "../data/csdk",
      "tts": "tts"
    },
    "tts": {
      "channels": {
        "MyChannel_en": {
          "voices": [
            "enu,zoe-ml,embedded-premium",
            "enu,tom,embedded-high"
          ]
        },
        "MyChannel_fr": {
          "voices": [
            "frf,audrey,embedded-compact",
            "frf,thomas,embedded-pro"
          ]
        }
      }
    }
  },
  "version": "2.0"
}
JSON
VtApi
Configuration file: config/vsdk.json
{
  "version": "2.0",
  "vtapi": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "louis,p22" ]
        },
        "MyChannel_en": {
          "voices": [ "kate,d22" ]
        }
      }
    }
  }
}
JSON

Configuration parameter

Type

Description

version

String

The configuration version number. Constant 2.0.

<provider>.paths.data_path /

<provider>.paths.tts

String

The voices data location.

This is relative to vsdk.json itself, NOT the program's working dir!

<provider>.<tech>.channels

Object

Contains collection of channel description. The key is the channel name.

<provider>.channels.<channelid>.voices

Array

List of the voices used by the channel.

An empty channel list will trigger an error, as well as an empty voice list!

You can use the VDK to generate the configuration and the data directory. After creating a custom project with the channels and the voices of your choice, just export it to your binary location.

Voice format

Each engine has its own voice format, described in the following table:

Engine

Format

Example

vsdk-csdk

<language>,<name>,<quality>

enu,evan,embedded-pro

vsdk-vtapi

<name>,<quality>

alice,d22

vsdk-baratinoo

<name>

Arnaud_neutre

Starting the engine

com.vivoka.vsdk.Vsdk.init(mContext, "config/main.json", vsdkSuccess -> {
    if (vsdkSuccess)
    {
        Engine.getInstance().init(mContext, engineSuccess -> {
            if (engineSuccess)
            {
                // at this point the TtsEngine has been correctly initialized   
            }    
        });
    }
});
JAVA

Creating a channel

Remember, channel must be configured beforehand!

Channel channelFrf = Engine.getInstance().makeChannel("channelFrf", "frf,aurelie,embedded-compact");
JAVA

The engine instance can't die while at least one channel instance is alive. Destruction order is important!

Speech Synthesis

channelFrf.synthesisFromText("Bonjour ! Je suis une voix synthétique", () -> {
    // channelFrf.synthesisResult contains the audioData to play
});

// Also works with SSML input
final String ssml = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"fr-FR\">Bonjour Vivoka</speak>";
channelFrf.synthesisFromSSML(ssml, () -> {
    // channelFrf.synthesisResult contains the audioData to play
});
JAVA

Speech Synthesis is asynchronous! That means the call will not block the thread during the synthesis.

The audio data is a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:

Engine

Sample Rate (kHz)

csdk

22050

baratinoo

24000

vtapi

22050

Playing the result

VSDK provides an audio player. Playing the result is very easy:

AudioPlayer.play(channel.synthesisResult.getAudioData(),
                 channel.synthesisResult.getSampleRate(),
                 new AudioTrack.OnPlaybackPositionUpdateListener()
                 {
                    @Override
                    public void onMarkerReached(AudioTrack track) {}

                    @Override
                    public void onPeriodicNotification(AudioTrack track) { }
                });
JAVA

Storing the result on disk

 channel.synthesisResult.saveToFile("directory", "filename", new ICreateAudioFileListener(){});	
JAVA

Only PCM extension is available, which means the file has no audio header of any sort.