Skip to main content
Skip table of contents

Speech Synthesis‎

Introduction

Speech synthesis (also known as text-to-speech or TTS) is the process of converting written text into spoken audio.

In VSDK, speech synthesis is powered by CSDK, which offers a wide range of voices across different languages, genders, and voice quality (Voice quality availability).

Voice Format

For <language>, refer to the table and use the value from the Vsdk-csdk Code column.
For <name>, use the lowercase version of the name shown in VDK-Studio.
For <quality>, you can find this information in VDK-Studio under Resources → Voice.

Engine

Format

Example

vsdk-csdk

<language>,<name>,<quality>

enu,evan,embedded-pro

SSML Support

VSDK also supports SSML (Speech Synthesis Markup Language), which gives you finer control over how the text is spoken—allowing adjustments such as:

  • Pronunciation

  • Pauses

  • Pitch

  • Rate

  • Emphasis

SSML is supported for embedded voices, but not for neural voices (if present in your configuration). Neural voices are more natural-sounding but behave as a black box and do not support markup-based control.

Audio Format

The audio data is a 16-bit signed PCM buffer in Little-Endian format.
It is always mono (1 channel), and the sample rate depends on the engine being used.

Engine

Sample Rate (kHz)

csdk

22050

Examples

You can see the different routes available in: REST API ‎ in the Voice synthesis section.

Synthesis

You can retrieve a list of the voices you configured for your loaded project.

CODE
[GET] /voice-synthesis/voices

Then you can request a synthesis.

CODE
[POST] /voice-synthesis/synthesize
JSON
{
  "text": "Hello world, my name is Tom !",
  "voice_id": "enu,tom,embedded-compact"
}

If the request is successful, we receive a token and we can head to the WebSocket API.

You can now receive the generated audio through the newly opened socket.

Sample project

A sample project is available for Speech Synthesis usage with VDK Service (in C# or Python).

Python
  • Download and extract the zip below

  • Head inside the project

  • (Optional) Create and activate a virtual environment (Python Venv documentation)

  • Install the project : pip install .

  • Run the script : vdk-synthesis --help

If you see the list of options, you can start your configured VDK Service and interact with it using the options available. For example vdk-synthesis --list will list available voices.

VdkServiceSample-VoiceSynthesis-Python-v1.0.0.zip

C#
  • Download and extract the zip below

  • Open the project solution (.sln)

  • Build and run project with the argument “--help”

If you see the list of options, you can start your configured VDK Service and interact with it using the options available. For example --list will list available voices.

VdkServiceSample-VoiceSynthesis-CSharp-v1.0.0.zip

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.