Voice Synthesis - Android

VDK features three Voice Synthesis libraries: VSDK-CSDK, VSDK-BARATINOO and VSDK-VTAPI.

Conﬁguration

Voice synthesis engines must be configured before the program starts. Here is a complete setup with 2 channels, one for each language possible.

vsdk-baratinoo

Configuration file: `config/vsdk.json`

JSON

{
  "version": "2.0",
  "baratinoo": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "Arnaud_neutre" ]
        },
        "MyChannel_en": {
          "voices": [ "Laura" ]
        }
      }
    }
  }
}

vsdk-csdk

Configuration file: `config/vsdk.json`

JSON

{
  "csdk": {
    "paths": {
      "data_root": "../data/csdk",
      "tts": "tts"
    },
    "tts": {
      "channels": {
        "MyChannel_en": {
          "voices": [
            "enu,zoe-ml,embedded-premium",
            "enu,tom,embedded-high"
          ]
        },
        "MyChannel_fr": {
          "voices": [
            "frf,audrey,embedded-compact",
            "frf,thomas,embedded-pro"
          ]
        }
      }
    }
  },
  "version": "2.0"
}

vsdk-vtapi

Configuration file: `config/vsdk.json`

JSON

{
  "version": "2.0",
  "vtapi": {
    "paths": {
      "data_root": "../data"
    },
    "tts": {
      "channels": {
        "MyChannel_fr": {
          "voices": [ "louis,p22" ]
        },
        "MyChannel_en": {
          "voices": [ "kate,d22" ]
        }
      }
    }
  }
}

Configuration parameter	Type	Description
`version`	String	The configuration version number. Constant `2.0`.
`<provider>.paths.data_path` / `<provider>.paths.tts`	String	The voices data location. This is relative to vsdk.json itself, NOT the program's working dir!
`<provider>.<tech>.channels`	Object	Contains collection of channel description. The key is the channel name.
`<provider>.channels.<channelid>.voices`	Array	List of the voices used by the channel.

An empty channel list will trigger an error, as well as an empty voice list!

You can use the VDK to generate the configuration and the data directory. After creating a custom project with the channels and the voices of your choice, just export it to your binary location.

Voice format

Each engine has its own voice format, described in the following table:

Engine	Format	Example
vsdk-csdk	`<language>,<name>,<quality>`	`enu,evan,embedded-pro`
vsdk-vtapi	`<name>,<quality>`	`alice,d22`
vsdk-baratinoo	`<name>`	`Arnaud_neutre`

Starting the engine

JAVA

com.vivoka.vsdk.Vsdk.init(mContext, "config/main.json", vsdkSuccess -> {
    if (vsdkSuccess)
    {
        Engine.getInstance().init(mContext, engineSuccess -> {
            if (engineSuccess)
            {
                // at this point the TtsEngine has been correctly initialized   
            }    
        });
    }
});

Creating a channel

Remember, channel must be conﬁgured beforehand!

JAVA

Channel channelFrf = Engine.getInstance().makeChannel("channelFrf", "frf,aurelie,embedded-compact", new IChannelListener() {
      @Override
      public void onEvent(Event<com.vivoka.vsdk.tts.Channel.EventCode> event) {
          Log.d(TAG, "On channel event: " + event.codeString + " - " + event.message);      
      }
  
      @Override
      public void onError(Error<com.vivoka.vsdk.tts.Channel.ErrorCode> error) {
          Log.e(TAG, "On channel error: " + error.codeString);
      }
  });

The engine instance can't die while at least one channel instance is alive. Destruction order is important!

Creating the Audio Pipeline and the listeners

CODE

  // Create Audio Pipeline
  mPipeline = new Pipeline();
  // Audio player designed to play audio and sent progression events
  mAudioPlayer = new AudioPlayer(mChannel.getSampleRate());
  // Optional listener to synchronize words with playback
  // Fetch events from TTS Engine and send word played when being played by the Audioplayer
  mWordMarkerManager = new WordMarkerManager(new WordMarkerManager.WordMarkerNotifier() {
      @Override
      public void onEvent(WordMarker wordMarker) {
          onTextPlayed(wordMarker);
      }
  });
  // Optional: set the workerManager as listener is instantiated before
  // Set callback to notify WordMarker Manager about playback position
  mAudioPlayer.setPlaybackPositionUpdateListener(mWordMarkerManager);
  // Give a producer to the pipeline
  mPipeline.setProducer(mChannel);
  // Set the AudioPlayer as consumer. This class is designed 
  mPipeline.pushBackConsumer(mAudioPlayer);
  // Start the pipeline in asynchronus mode to allow future Voice Synthesis
  mPipeline.start();

Speech Synthesis

JAVA

// Synthesise voice data
channelFrf.synthesizeFromText("Text to say"));

// Also works with SSML input
final String ssml = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"fr-FR\">Bonjour Vivoka</speak>";
channelFrf.synthesizeFromText(ssml);

// If Pipeline is used in synchronus mode, call the line below each time you want to synthesize data
pipeline.run();

Speech Synthesis is asynchronous is Pipeline.start() is used! That means the call will not block the thread during the synthesis.

The audio data is a 16bit signed Little-Endian PCM buffer. Channel count is always 1 and sample rate varies depending on the engine:

Engine	Sample Rate (kHz)
csdk	22050
baratinoo	24000
vtapi	22050

Playing the result

The data is forwarded by the Pipeline to the Consumer which will play the audio for you.

The class AudioPlayer is included in the Vsdk library (com.vivoka.vsdk.audio.AudioPlayer) as an example of a ConsumerModule.

Conﬁguration

Configuration file: config/vsdk.json

Configuration file: config/vsdk.json

Configuration file: config/vsdk.json

Voice format

Starting the engine

Creating a channel

Creating the Audio Pipeline and the listeners

Speech Synthesis

Playing the result

Configuration file: `config/vsdk.json`

Configuration file: `config/vsdk.json`

Configuration file: `config/vsdk.json`