Voice Synthesis

Voice synthesis, also known as Text-to-Speech or Text-to-Voice, is a technology used to create real-time voice in order to read aloud your text. These synthetic voices can be selected according to language, gender and size.

Voice synthesis card

Voice synthesis settings. This button will open the voice synthesis settings which allows you to edit a channel provider or delete a channel.
Channel card. Click on this card to open the voice synthesis main screen.
Channel name. The name of the channel was chosen in the creation step.
Channel provider. Indicates which provider is used for the channel. Please refer to Providers specifics for Voice Synthesis to edit it.
Voices list. The list of voices used in the channel. Please refer to Providers specifics for Voice Synthesis to choose your channel voices.
Add a channel. This button will open a new wizard which allows you to add a new voice synthesis channel to your current project.

Voice synthesis main screen

Section text input. Enter your text in this section.
Fine tune voice (SSML). This button will open a new instance which allows you, using SSML markup (SSML reference) , to customize your text input. Checkout the SSML Markups screen to learn more about this section.
Phonetic dictionary. If you already choose dictionaries for your channel you can enable them by checking this combobox. This option is not available if the channel doesn’t have a dictionary.
Section available voices. You can select the voices to use in voice synthesis in this section. Once you are satisfied with the entry, you can then click on Save as audio files to export it on your computer (.wav file).
Voices list. In this section, you will find all the available voices to test. To test one or several voices for your project, click on the voice name from the list. To select/deselect all the voices list, you can check/
uncheck the Selection indicator box. To save your selection in your current project channel you can click on Save configuration button.
Voice filtering. You can make a filtering of the voices by “Languages”, “Genders” and/or “size”. To show or hide the voice filter, you can click on Filter icon. Depending on which filters you set up, the list below will show you the available voices. To reset all the current filters, you can click on Reset icon.
Play voices. By clicking on Play icon you will be able to test your selected entries. To stop you can click on Stop icon. You can uncheck Use SSML checkbox to play as raw text.

The list of voices that will be shown will depend on the voices you have already downloaded in VDK Studio and your channel provider. To download voices, you can click on Download more voices button.

You can select only downloaded voices from your channel's provider

Not all SSML markups are supported by all providers.

SSML Markups screen

Markups. Choose the markup you want to insert in your text input.
Description. Once you click on a tag on the markup list, you will find here a description of what the markup is used for.
Parameters. For each markup, you can select attributes and values to apply on it.
Insert SSML tag. Once you have selected and configured your markup tag, you can click onto insert it in your text input.

Once you are done with the markup selection, you can close the current window to get back to the main screen. When you return to the main screen, your markup tag will appear in the text input.

Note: By default, the markup is placed at the location of the text cursor. You can also select a specific part of the text you want to modify. In this case, the markup you insert will automatically be placed around this selected part.

Add a voice synthesis channel screen

Channel name. The channel name must be unique in the voice synthesis technology.
Provider. Three different providers (Cerence, Readspeaker and Voxygen) are available. Choose the best provider suited for your channel. Depending on your choice, later you will have a different voices list to choose from when synthesizing your text. Please refer to chapter Providers specifics to determine which provider you should use.
Add to project. Once you chose your channel name and provider, you can add it to your current project by clicking on Add to project button.

Voice synthesis settings screen

Channels. The list channels in your current project. You can choose from here the channel to edit or to delete.
Delete. Once you select your channel you can use this button to delete it.
Channel name. The channel name is read only and you can’t modify it.
Provider. Three different providers (Cerence, Readspeaker and Voxygen) are available. Choose the best provider suited for your channel. Depending on your choice, later you will have a different voices list to choose from when synthesizing your text. Please refer to chapter Providers specifics to determine which provider you should use.
Save. Once you chose your new channel provider, you can save your changes to your current project by clicking on Save button.

How to create a channel ?

Goto Playground.
In the voice synthesis card, click on add a channel.
In the opened wizard you have to enter:
1. Your channel name: It must be unique among voice synthesis technology.
2. Your provider: The SDK provider that will be used for voice synthesis. You can check the user guide for full comparison between the providers.
You finish by clicking on Add to project.

You can see your newly created channels in the voice synthesis card.

How to open voice synthesis settings screen ?

In the Voice synthesis card click on the Settings button to open the settings window that allows you to:

Change a channel's provider
Delete a channel

If you are using a project and you don't have voice synthesis technology yet, you can click on Add a technology to add it.

How to use SSML Markups screen ?

Select some words in the text input if you want to add a markup with content.
Click on the Fine tune voice (SSML) button to open the SSML markups window.
Select and configure a SSML markup.
Click on the Insert SSML tag button to insert the markup at cursor position in text input (the selected text will be moved inside the markup).

How to generate audio files from text input ?

In the Voice synthesis screen Enter your text input.
Select the voices to use in the Available voices section.
Click on Save as audio files to save synthesized text as ".wav" files.

A file will be created for each selected voice with the following name format {your_prefix}_{voice_id}.wav.

How to add Voice Synthesis technology to your project ?

Open or create a custom project
In the project editor (left side bar) click on Add a technology button.
Select Add voice synthesis and click on Next.
Enter your channel name, select a provider and click on Next.
Click on Add to project button.