Speech Synthesis - Android
Introduction
Speech synthesis (also known as text-to-speech or TTS) is the process of converting written text into spoken audio.
In VSDK, speech synthesis is powered by CSDK, which offers a wide range of voices across different languages, genders, and voice quality (Voice quality availability).
Channels
Channel is what you use to generate speech. It holds one or more voices.
A channel itself doesn’t have a language—the language is defined by the voices you assign to it.
This means a single channel can include voices in different languages.
You can also define multiple channels in your configuration. This is useful when:
You want to synthesize multiple texts at the same time (parallel TTS).
You want to organize voices based on use case (e.g., one channel for alerts, another for navigation).
SSML Support
VSDK also supports SSML (Speech Synthesis Markup Language), which gives you finer control over how the text is spoken—allowing adjustments such as:
Pronunciation
Pauses
Pitch
Rate
Emphasis
SSML is supported for embedded voices, but not for neural voices (if present in your configuration). Neural voices are more natural-sounding but behave as a black box and do not support markup-based control.
Audio Format
The audio data is a 16-bit signed PCM buffer in Little-Endian format.
It is always mono (1 channel), and the sample rate depends on the engine being used.
Engine | Sample Rate (kHz) |
|---|---|
csdk | 22050 |
Voice Format
For <language>, refer to the table and use the value from the Vsdk-csdk Code column.
For <name>, use the lowercase version of the name shown in VDK-Studio.
For <quality>, you can find this information in VDK-Studio under Resources → Voice.
Engine | Format | Example |
|---|---|---|
vsdk-csdk |
|
|
Getting Started
Before you begin, make sure you’ve completed all the necessary preparation steps.
There are two ways to prepare your Android project for Voice Synthesis:
Using sample code
Starting from scratch
From Sample Code
Start by downloading the sample package from the Vivoka Console:
Open the Vivoka Console and navigate to your Project Settings.
Go to the Downloads section.
In the search bar enter package name from table.
📦 sample-tts-x.x.x-android-deps-vsdk-x.x.x.zip
Once downloaded, you’ll have a fully functional project that you can test, customize, and extend to fit your specific use case.
From Scratch
Before proceeding, make sure you’ve completed the following steps:
1. Prepare your VDK Studio project
Create a new project in VDK Studio
Add the Voice Synthesis technology and channel with voice(s)
Export the project to generate the required assets and configuration
2. Set up your Android project
Install the necessary libraries (
vsdk-csdk-tts-x.x.x-android-deps-vsdk-x.x.x.zip)Initialize VSDK in your application code
These steps are better explained in the Integrating Vsdk Libraries guide.
Start Synthesis
1. Initialize Engine
Start by initializing the VSDK, followed by the Voice Recognition engine:
import com.vivoka.vsdk.Vsdk;
Vsdk.init(context, "config/vsdk.json", vsdkSuccess -> {
if (!vsdkSuccess) {
return;
}
com.vivoka.vsdk.tts.csdk.Engine.getInstance().init(engineSuccess -> {
if (!engineSuccess) {
return;
}
// The TTS engine is now ready
Log.i("CSDK", "TTS Engine is successfully started.");
});
});
You cannot create two instances of the same engine.
If you call Engine.getInstance() multiple times, you will receive the same singleton instance.
2. Build Pipeline
For the sake of this example, we’ll implement a simple pipeline that plays synthesized voice directly to the speaker:
import com.vivoka.vsdk.audio.Pipeline;
import com.vivoka.vsdk.tts.Channel;
import com.vivoka.vsdk.tts.IChannelListener;
import com.vivoka.vsdk.audio.consumers.AudioPlayer;
import com.vivoka.vsdk.common.Error;
import com.vivoka.vsdk.common.Event;
try {
// Create channel (producer)
String channelName = "channel-1";
String voice = "enu,ava,embedded-compact";
Channel channel = com.vivoka.vsdk.tts.csdk.Engine.getInstance().getChannel(
channelName, voice, new IChannelListener() {
@Override
public void onEvent(Event<Channel.EventCode> event) {
Log.d(TAG, "onEvent: " + event.codeString + " : " + event.message);
}
@Override
public void onError(Error<Channel.ErrorCode> error) {
Log.e(TAG, "onError: " + error.type.toString() + " on channel '" + channelName + "': " + error.message);
}
}
);
// Create audio player (consumer)
AudioPlayer audioPlayer = new AudioPlayer(channel.getSampleRate(), channel.getChannelCount());
audioPlayer.setOnFinished(() -> {
// On audio finished playing.
});
// Create and start pipeline
Pipeline pipeline = new Pipeline();
pipeline.setProducer(channel);
pipeline.pushBackConsumer(audioPlayer);
pipeline.start();
} catch (Exception e) {
e.printFormattedMessage();
}
Calling .getChannel() with the same name twice will always return the same channel instance.
When you call pipeline.pushBackConsumer(audioPlayer), the audioPlayer becomes linked to the channel instance.
If you do this a second time—even with a new Pipeline—the same channel will send audio to both consumers, resulting in the audio being played twice.
To avoid this issue, make sure to create the pipeline for a given channel only once, or remove existing consumers from the pipeline before creating a new one.
3. Start/Stop Pipeline
pipeline.start();
pipeline.stop();
pipeline.run();
.start()runs the pipeline in a new thread.run()runs the pipeline and waits till it is finished (blocking).stop()is used to terminate the pipeline execution
Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.
To stop playing:
pipeline.stop();
audioPlayer.stop();
Before calling .synthesizeFromText() you need to start a pipeline first:
pipeline.start();
channel.synthesizeFromText("Hello world!");
String ssml = "<speak version=\"1.0\" xmlns=\"http://www.w3.org/2001/10/synthesis\" xml:lang=\"fr-FR\">Bonjour Vivoka</speak>";
channel.synthesizeFromText(ssml);
To pause/resume TTS:
audioPlayer.pause();
audioPlayer.resume();
If you call .synthesizeFromText() more than once, then the last call will override all the previous ones.
5. Destroy Engine
com.vivoka.vsdk.tts.csdk.Engine.getInstance().release();
The engine instance cannot be destroyed while at least one channel is still active.
Make sure to release all channel instances before shutting down the engine—the destruction order matters!
Audio Player
AudioPlayer is a consumer module provided by VSDK that handles the playback of synthesized audio.
When used in a pipeline, it receives audio data from a TTS channel and plays it back through the device’s speaker. It also supports progress tracking, including word-level markers, which can be used to synchronize text display or trigger actions as speech is spoken.
Text Marker
The following code demonstrates how to integrate word-level markers into the pipeline for synchronized text playback.
import com.vivoka.vsdk.audio.Pipeline;
import com.vivoka.vsdk.tts.Channel;
import com.vivoka.vsdk.tts.IChannelListener;
import com.vivoka.vsdk.audio.consumers.AudioPlayer;
import com.vivoka.vsdk.common.Error;
import com.vivoka.vsdk.common.Event;
String channelName = "channel-1";
String voice = "enu,ava,embedded-compact";
try {
// Create channel (producer)
channel = Engine.getInstance().getChannel(channelName, voice, new IChannelListener() {
@Override
public void onEvent(Event<Channel.EventCode> event) {
Log.d(TAG, "onEvent: " + event.codeString + " : " + event.message);
if (event.code == Channel.EventCode.WORD_MARKER_END_EVENT) {
try {
if (textMarker != null) {
textMarker.addMarker(event.message);
}
} catch (Exception e) {
e.printFormattedMessage();
}
}
}
@Override
public void onError(Error<Channel.ErrorCode> error) {
Log.e(TAG, "onError: " + error.type.toString() + " on channel '" + channelName + "': " + error.message);
}
}
);
// Create text marker
textMarker = new TextMarker(channel.getSampleRate(), channel.getChannelCount());
textMarker.setReachedMarkerCallback(marker -> {
Log.d("Marker", marker.text());
});
// Create audio player (consumer)
audioPlayer = new AudioPlayer(channel.getSampleRate(), channel.getChannelCount());
audioPlayer.setOnProgress(position -> {
try {
textMarker.onPlayerProgress(position);
} catch (Exception e) {
e.printFormattedMessage();
}
});
audioPlayer.setOnFinished(() -> {
textMarker.reset();
});
// Create and start pipeline
Pipeline pipeline = new Pipeline();
pipeline.setProducer(channel);
pipeline.pushBackConsumer(audioPlayer);
channel.synthesizeFromText("Hello world");
pipeline.start();
} catch (Exception e) {
e.printFormattedMessage();
}