Skip to main content
Skip table of contents

SSML reference

The Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis. Its essential role is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms.

Note that not all of the elements and options described in the W3 SSML specification are currently supported by all SDKs. This page details which elements are available for each SDK.

Reserve characters

Avoid using SSML reserve characters in the text that is to be converted to audio. When you need to use an SSML reserve character, prevent the character from being read as code by using its escape code. The following table shows reserved SSML characters and their associated escape codes.

Character

Escape code

"

"

&

&

'

'

<

&lt;

>

&gt;

Markups

Audio VSDK-CSDK VSDK-BARATINOO VSDK-VTAPI

Audio SSML Markup is used to insert a recorded audio file. If the audio file cannot be retrieved, the element's contents are synthesized. The content is the fallback text used when the audio file is not supported.

Example:

XML
<audio src="file:laugh">
  haha
</audio>

Attribute

Description

src

The URI of a document with an appropriate MIME type. URIs may be absolute or relative to the base:uri specified in <speak> element. Audio files may be local (file://, or absolute paths) or remote (http://).

Supported audio files:

  • VSDK-CSDK .WAV containing linear 16 bit PCM samples.

  • VSDK-BARATINOO .au (audio/x-au), .wav (audio/x-wav), .a8k .alaw (audio/x-alaw-basic), .raw .ulaw (audio/basic)

  • VSDK-VTAPI WAV or PCM format.

VSDK-VTAPI
The path can be also file:FILENAME while FILENAME is a prerecorded paralinguistic sounds (laughter, coughs etc). Lists of available sounds for each voice are found in a separate document.

VSDK-CSDK VSDK-VTAPI
The audio file will automatically be resampled to match the current sampling rate before inserting it in the speech output.

mode VSDK-VTAPI

It is a custom attribute. If it is set as background, the audio can be mixed up with the text inside <audio> element.

fetchtimeout VSDK-BARATINOO VSDK-VTAPI (SSML 1.1)

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds.

VSDK-BARATINOO
Default: 30s

VSDK-VTAPI
Default: 10000ms

fetchhint VSDK-BARATINOO (SSML 1.1)

This tells the synthesis processor whether or not it can attempt to optimize rendering by pre-fetching audio. Available values:

  • prefetch (default)

  • safe

maxage VSDK-BARATINOO (SSML 1.1)

A positive integer or zero.

maxstale VSDK-BARATINOO (SSML 1.1)

A positive integer or zero.

clipBegin VSDK-BARATINOO (SSML 1.1)

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 0s.

clipEnd VSDK-BARATINOO (SSML 1.1)

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds.

repeatCount VSDK-BARATINOO (SSML 1.1)

Signed or unsigned positive number or zero. Default 1.

VSDK-BARATINOO
If the repeatCount attribute is used, the maximum duration of the audio insertion is 5 minutes.

repeatDur VSDK-BARATINOO (SSML 1.1)

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds.

VSDK-BARATINOO
If the repeatDur attribute is used, the maximum duration of the audio insertion is 5 minutes.

soundLevel VSDK-VTAPI VSDK-BARATINOO (SSML 1.1)

Signed number followed by dB for decibels. Default +0.0dB.

VSDK-BARATINOO
The soundLevel attribute is truncated to the interval [-90.0dB;+12dB].

speed VSDK-VTAPI VSDK-BARATINOO (SSML 1.1)

Unsigned positive number or zero followed by %. Default 100%.

VSDK-BARATINOO
The speed attribute is truncated to the interval [50%;200%].

vox:gain VSDK-BARATINOO

Signed number followed by dB for decibels. Default +0.0dB.

vox:fadelevel VSDK-BARATINOO

Signed number followed by dB for decibels. Default +0.0dB.

vox:fadein VSDK-BARATINOO

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 0s.

VSDK-BARATINOO
The fadein attributes are truncated to the interval [0s;60s].

vox:fadeout VSDK-BARATINOO

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 0s.

VSDK-BARATINOO
The fadeout attributes are truncated to the interval [0s;60s].

vox:fadeinAttack VSDK-BARATINOO

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 20s.

vox:fadeinRelease VSDK-BARATINOO

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 20s.

vox:fadeoutAttack VSDK-BARATINOO

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 20s.

vox:fadeoutRelease VSDK-BARATINOO

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 20s.

vox:tempo VSDK-BARATINOO

The tempo attribute can be used to speed up or slow down the rate of the audio file without changing the pitch level.
Unsigned positive number followed by %. Default 100%.

VSDK-BARATINOO
The tempo attribute is truncated to the interval [50%;200%].

Audio Mix VSDK-BARATINOO

Audiomix SSML Markup is used to insert a recorded audio file, and mix it with the element content. If the audio file is longer than the speech, it is truncated. If he is shorter, he is repeatedly read.

Attributes of the <audiomix> element have the same meaning and restrictions as those of the <audio> element, but the default fade attack and release durations may differ.

Example:

XML
<vox:audiomix src="file:laugh" fetchtimeout="3ms">
  haha
</vox:audiomix>

Attribute

Description

src

Name of file (absolute or relative URI)

fetchtimeout

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 30s.

fetchhint

This tells the synthesis processor whether or not it can attempt to optimize rendering by pre-fetching audio. Available values:

  • prefetch (default)

  • safe

maxage

A positive integer or zero.

maxstale

A positive integer or zero.

clipBegin

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 0s.

clipEnd

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds.

soundLevel

Signed number followed by dB for decibels. Default +0.0dB.

VSDK-BARATINOO
The soundLevel attribute is truncated to the interval [-90.0dB;+12dB].

speed

Unsigned positive number or zero followed by %. Default 100%.

VSDK-BARATINOO
The speed attribute is truncated to the interval [50%;200%].

gain

Signed number followed by dB for decibels. Default +0.0dB.

fadelevel

Signed number followed by dB for decibels. Default +0.0dB.

fadein

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 0s.

VSDK-BARATINOO
The fadein attributes are truncated to the interval [0s;60s].

fadeout

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 0s.

VSDK-BARATINOO
The fadeout attributes are truncated to the interval [0s;60s].

fadeinAttack

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 480s.

fadeinRelease

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 480s.

fadeoutAttack

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 480s.

fadeoutRelease

Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 480s.

Break VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Break SSML Markup is used to temporarily pause the speech.

It is inserted at cursor position as an empty element, and can be used with milliseconds or seconds.

Example:

XML
<break time="300ms"/>

Attribute

Description

time

Signed or unsigned positive number or zero followed by s for seconds or
ms for milliseconds.

VSDK-BARATINOO Extension: percentage values are also accepted.

strength

Value

VSDK-CSDK

VSDK-BARATINOO

VSDK-VTAPI

none

0ms

≅ 0ms

≅ 0ms

x-weak

20ms

≅ 50ms

≅ 200ms

weak

100ms

≅ 100ms

≅ 450ms

medium

500ms

≅ 500ms

≅ 700ms

strong

1000ms

≅ 1000s

≅ 900ms

x-strong

1500ms

≅ 2000s

≅ 1200ms

Checksum VSDK-BARATINOO

Enable a cyclic-redundancy check to be performed on the signal and events in the most recent breath group (delimited by silence) rendered from the content of the current document.

Example:

XML
<vox:checksum crc32="2016915618"></vox:checksum>

Attribute

Description

crc32

Unsigned positive number or zero.

Computed duration VSDK-BARATINOO

Example:

XML
<prosody vox:computedduration="on"></prosody>

Attribute

Description

vox:computedduration

on: Apply phoneme duration computed by system.
off: Intrinsic phoneme duration.
default: Reset to default behavior of voice.

Computed pitch VSDK-BARATINOO

Example:

XML
<prosody vox:computedpitch="on"></prosody>

Attribute

Description

vox:computedpitch

on: Apply pitch contour computed by system.
off: Intrinsic pitch contour.
default: Reset to default behaviour of voice.

Contour VSDK-BARATINOO

Contour SSML Markup is used to set different pitch values at different timestamps.
In each pair (time, pitch), the first value is a percentage of the period of the contained text and the second value is the value of the pitch attribute.

Example:

XML
<prosody contour="(0%, +10%) (50%, +50%) (100%, +90%)">
  I am speaking.
</prosody>

DurationVSDK-BARATINOO

Duration SSML Markup is used to set the duration of the marked speech. Signed or unsigned positive number or zero followed by s for seconds or ms for milliseconds.

Example:

XML
<prosody duration="5s">I'm speaking very slow.</prosody>

Emphasis VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Emphasis SSML Markup is used to request that the contained text be spoken with emphasis. Please note that the realization of emphasis is voice dependent.

Example:

XML
That is a <emphasis> big </emphasis> car!

VSDK-BARATINOO
The realization of emphasis is voice dependent.

Attribute

Description

level

none
reduced
moderate (default)
strong

Lang VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

With the lang SSML makup it is possible to switch language. Changing the language also changes the voice, if there is a voice available.

The lang element can only contain text to be rendered and the following elements: audio, break, emphasis, lang, lookup, mark, p, phoneme, prosody, say-as, sub, s, token, voice and w.

Example:

XML
English, <lang xml:lang="de">Deutsch</lang>, English.

Attribute

Description

xml:lang

A required attribute specifying the language of the element.

onlangfailure

An optional attribute specifying the desired behavior upon language speaking failure.

VSDK-VTAPI
The attribute onlangfailure is always treated as ignoretext.

Value

Description

ignoretext

The synthesis processor will not attempt to render the text that is in the failed language.

ignorelang

The synthesis processor will ignore the change in language and speak as if the content were in the previous language.

changevoice

If a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content. Otherwise, the processor chooses another behavior (either ignoretext or ignorelang).

processorchoice

The synthesis processor chooses the behavior (either changevoice, ignoretext, or ignorelang).


Lexicon VSDK-VTAPI VSDK-BARATINOO

Lexicon SSML markup is used to reference a lexicon document.

VSDK-VTAPI

Supported format: PLS (Pronunciation Lexicon Specification 1.0) and CSV (User-Dictionary of vsdk-vtapi).

VSDK-BARATINOO

Supported format: PLS (Pronunciation Lexicon Specification 1.0).

Example:

XML
<lexicon xml:id="myLexiconDoc"></lexicon>

Attribute

Description

uri

Location of the lexicon document.

xml:id

A unique identifier for the lexicon document.

type

VSDK-BARATINOO
Preferred media type of the lexicon document.

fetchtimeout

VSDK-BARATINOO
Signed or unsigned positive number or zero followed by s for seconds or ms or milliseconds. Default 30s.

maxage

VSDK-BARATINOO
A positive integer or zero.

maxstale

VSDK-BARATINOO
A positive integer or zero.

Lookup VSDK-VTAPI VSDK-BARATINOO

Example:

XML
<lookup ref="myLexiconDoc"></lookup>

Attribute

Description

ref

The ref attribute specifies a name that references a lexicon document as assigned by the xml:id attribute of the lexicon element.

Mark VSDK-VTAPI VSDK-BARATINOO

The mark element specifies a named event which is triggered by the TTS engine when that location in the text is encountered in the generated audio stream. (What effect this event has is application specific, but it doesn’t affect the audion being generated)

The mark event must have a name attribute. The given name doesn’t have any meaning to the TS engine, but is included in the generated event.

Note that built-in normalization rules might, in some particular contexts such as date and currency expressions, cause adjacent words and numbers to be reordered. The TTS engine will generally try to preserve the association between marks and adjacent words in such cases, meaning that the mark events are not necessarily triggered in the exact order in which they occur in the SSML input but rather in a way that is more true to the reading order.

Example:

XML
<mark name="item1"/>First item, <mark name="item2"/>second item.

Attribute

Description

name

Marker name

vox:typeVSDK-BARATINOO

Value

Description

sync (default)

The voice synthesis engine will trigger an event when that location in the text is encountered in the generated audio stream.

wait

A wait marker allows rendering of the audio signal to be deferred until the duration of the immediately following content has been determined. The end of the content whose duration is to be determined is marked by either the end of the root <speak> element or a <mark> element, of any type, that bears the same name (case-sensitive).
For example:

XML
Text before…
<mark name="foo" vox:type="wait"/>
piece of text
<mark name="foo"/> 
Text after…

When Baratinoo processes the above markup, notification is first made by a WAITMARKER event with the name foo and the duration in samples of the rendered content piece of text. Then the signal for the piece of text is sent, and finally, notification is made by a MARKER event with the name foo, signaling the end of the marked sequence.
It is possible to set another <mark>, of any type, before the end of the deferred content is encountered.

Paragraph VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Paragraph SSML markup is used to indicate a paragraph in your text.

While the TTS engine already recognize paragraphs automatically, it can help it to better understand and render your text.

Example:

XML
<p>
  You have 4 new messages.
</p>

Vsdk-vtapi adds a sentence break before and after the element.

Attribute

Description

xml:lang

An optional attribute specifying the language of the element.

onlangfailure

An optional attribute specifying the desired behavior upon language speaking failure.

Not supported by VSDK-VTAPI.

Value

Description

ignoretext

The synthesis processor will not attempt to render the text that is in the failed language.

ignorelang

The synthesis processor will ignore the change in language and speak as if the content were in the previous language.

changevoice

If a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content. Otherwise, the processor chooses another behavior (either ignoretext or ignorelang).

processorchoice

The synthesis processor chooses the behavior (either changevoice, ignoretext, or ignorelang).

Phoneme VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Phoneme SSML Markup is used to provide a phonetic pronunciation for the contained text.

Example:

XML
<phoneme alphabet="ipa" ph="vivo͡ʊkə">
  Vivoka
</phoneme>

Support of the alphabet is limited to sounds that map to the phonetic symbols of the current voice.

VSDK-BARATINOO

The value of the ph attribute is ignored for unsupported alphabets and a warning is issued.

Attribute

Description

alphabet

SDK

Values

VSDK-CSDK

lhp, nt-sampa, sxm-sampa, pinyin (for Chinese only), diacritized (for Arabic only)

VSDK-VTAPI

ipa

VSDK-BARATINOO

x-voxygen, ipa

ph

List of phonetic symbols.

Separated by underscore _ when x-voxygen alphabet is used.

type
(SSML 1.1)

VSDK-BARATINOO
Indicates additional information about how the pronunciation information is to be interpreted. The only allowed values for this attribute are default, which has no implications, and ruby, which indicates that the pronunciation information is from ruby text. The default value of this attribute is default.

VSDK-VTAPI
It is provided as an optional attribute for some engines.

vox:idl

VSDK-BARATINOO

Control the inclusion or exclusion of specific acoustic units as candidate realizations for each part of the given phonetic pronunciation.

[ids]pho[ids]pho...[ids]

  • ids: list of comma separated acoustic unit identifiers (integers). An identifier may be preceded by + for inclusion during unit selection, otherwise exclusion from unit selection is inferred.

  • pho: a x-voxygen phonetic symbol

Pitch VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Pitch SSML Markup is used to set the pitch of the voice.

It accepts predefined values as well as relative percentages numbers followed by %.

Example:

XML
<prosody pitch="x-low">Oh my voice</prosody>

Value

VSDK-BARATINOO

VSDK-CSDK

VSDK-VTAPI

x-low

50% of default

-30%

50

low

75% of default

-15%

75

medium

100% of default

0%

100

high

133% of default

+35%

150

x-high

200% of default

+60%

200

default

Initial value for current voice

0%

100

Relative percentage

[+/-] number followed by %

Prompt VSDK-CSDK

Prompt SSML Markup is used to insert an ActivePrompt at a specific location in the text.

Example:

XML
<prompt id="myPrompt"></prosody>

Attribute

Description

id

The prompt id.

Range VSDK-BARATINOO

Range SSML Markup is used to set the range of the voice.

It accepts predefined values as well as relative percentages numbers followed by %.

Example:

XML
I'm going <prosody range="x-low">far</prosody>

Value

Description

x-low

50% of default

low

75% of default

medium

100% of default

high

133% of default

x-high

200% of default

default

Initial value for current voice

Relative percentage

[+/-] number followed by %

Relative change

[+/-] number followed by Hz for Hertz or st for semitones

Absolute value in Hertz

Unsigned number followed by Hz

Rate VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Rate SSML Markup is used to set speech rate of the voice.

It accepts predefined values as well as relative percentages numbers followed by %.

Example:

XML
<p>
  <s>
    The subject is <prosody rate="-20%">ski trip</prosody>
  </s>
</p>

Value

VSDK-BARATINOO

VSDK-CSDK

VSDK-VTAPI

x-slow

50% of default

50

50

slow

75% of default

75

75

medium

100% of default

100

100

fast

125% of default

150

125

x-fast

150% of default

200

150

default

Initial value for current voice

100

100

Relative percentage

[+/-] number followed by %, Extension of SSML 1.1.

Rate subject VSDK-BARATINOO

Example:

XML
<prosody vox:rate-subject="pause"></prosody>

Value

Description

vox:rate-subject

articulation Rate value affects only speech.
pause Rate value affects only pauses originated from the synthesis engine (<break> value are not affected).
all Rate value affects both speech and pauses (default value).

Say as VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Say-as SSML Markup is used to indicate the type of text construct contained within the element.

Multiple format values are available for each interpret-as values, but their realization is voice-dependant.

The attribute values that may have an effect on rendering depend on the current voice.

Example: ​ Will be read as "third"

XML
<say-as interpret-as="ordinal">3</say-as>

Attribute

Description

format

The date format may be optionally specified via format attribute, to supersede the language defaults, e.g. dmy or mdy.

interpret-as

Indicates the content type of the contained text construct.

Value

Description

address VSDK-CSDK

Expand text as an address, including street names and numbers, zip codes, state names, etc.

boolean VSDK-VTAPI

Reads as a boolean.

cardinal VSDK-CSDK
VSDK-BARATINOO VSDK-VTAPI

Reads as a cardinal number.

characters VSDK-BARATINOO VSDK-VTAPI

Spells out letters, reads digits one by one, and expands
non-alphabetical characters.

code VSDK-CSDK

Expand numbers or codes reading them digit by digit

currencyVSDK-CSDKVSDK-VTAPI

Expand text as a decimal currency including currency abbreviations.

date VSDK-CSDK VSDK-BARATINOO VSDK-VTAPI

Read digits as date.

decimal VSDK-CSDK

Same as number but including comma/dot normalization.

digits VSDK-CSDKVSDK-VTAPI

Expand numbers or codes reading them digit by digit.

distance VSDK-CSDK

Expand text as a distance measurement.

normal VSDK-CSDK

Default text normalization

number VSDK-CSDK VSDK-VTAPI

Expand cardinal/ comma formatted numbers up to 15 digits.

ordinal VSDK-CSDK
VSDK-BARATINOO VSDK-VTAPI

Reads as an ordinal number.

phone VSDK-CSDK VSDK-VTAPI

Expand text as a telephone number including country codes, prefixes, tel. word indicators, etc.

rational VSDK-CSDK

Same as number but including comma/dot normalization.

real VSDK-CSDK

Same as number but including comma/dot normalization.

sms VSDK-CSDK

Expand text as a sms message, reading web addresses, smileys, email addresses, etc.

spell VSDK-CSDK

Spell out the input text that follows.

telephone VSDK-CSDK
VSDK-BARATINOO VSDK-VTAPI

Reads as a telephone number.

time VSDK-CSDK VSDK-BARATINOO VSDK-VTAPI

Expand text as a clock reading (hour, minutes, am, pm), a duration or a time range.

zip VSDK-CSDK

Expand text as a zip code.

detailVSDK-VTAPI

An optional attribute, a value changes, depending on the interpret-as.

typeVSDK-VTAPI

A custom attribute, the interpret-as can be bypassed. it renders by defining a duration format. (duration(:hms), duration:hm, duration:ms, duration:h, duration:m, duration:s are available.)

Sentence VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Sentence SSML markup is used to indicate a sentence in your text.

While the TTS engine already recognize sentences automatically, it can help it to better understand and render your text. You can place multiple sentences in a paragraph.

Example:

XML
  <p>
    <s>This is the first sentence of the paragraph.</s>
    <s>Here's another sentence.</s>
  </p>

Attribute

Description

xml:lang

An optional attribute specifying the language of the element.

onlangfailure

An optional attribute specifying the desired behavior upon language speaking failure.

Not supported by VSDK-VTAPI.

Value

Description

ignoretext

The synthesis processor will not attempt to render the text that is in the failed language.

ignorelang

The synthesis processor will ignore the change in language and speak as if the content were in the previous language.

changevoice

If a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content. Otherwise, the processor chooses another behavior (either ignoretext or ignorelang).

processorchoice

The synthesis processor chooses the behavior (either changevoice, ignoretext, or ignorelang).

Style VSDK-CSDK

Style SSML Markup is used to set an alternative speaking style instead of the normal one.

Please note that a particular style can be incompatible with some voices.

Example:

XML
Sorry <break time="300ms"/>
<style name="lively">sorry</style>

Not all styles are supported by all vsdk-csdk voices.

Attribute

Description

name

The speaking style name to use.

You can check this page to get the supported values for each voice.

Sub VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Sub SSML Markup is used to substitute text for the purposes of pronunciation. The sub element can contain only text (no elements).

Example:

XML
<sub alias="Voice Development Kit">VDK</sub>

Attribute

Description

alias

The content that the voice synthesis will read instead of the content of the element.

Timbre VSDK-CSDK VSDK-BARATINOO

Timbre SSML Markup is a rate/pitch warping coefficient that maintains the duration of phonemes and enables voice timbre to be modified.

It accepts predefined values as well as relative percentages numbers followed by %.

Vsdk-csdk example:

XML
<prosody timbre="+100%">
    I am speaking with a different voice timber.
</prosody>

Vsdk-baratinoo example:

XML
<prosody vox:timbre="+100%">
    I am speaking with a different voice timber.
</prosody>

Attribute

Description

timbreVSDK-CSDK

x-young

+35%

x-young

+20%

medium

0%

old

-20%

x-old

-35%

default

0%

Relative percentage

[+/-] number followed by %

vox:timbreVSDK-BARATINOO

Relative percentage

[+/-] number followed by %

Relative value

[+/-] number with no units

Absolute value

Multiplier of the initial timbre value for the current voice (unsigned number with no units or followed by %).

Token VSDK-VTAPI VSDK-BARATINOO

Token SSML Markup can be used to disambiguate heteronyms.

Example:

XML
<token xml:id="myToken">VDK</token>

Attribute

Description

xml:lang

An optional attribute specifying the language of the element.

role

A QName used in conjunction with lexicons.

onlangfailure

VSDK-BARATINOO
changevoice
ignoretext
ignorelang
processorchoice

xml:id

VSDK-BARATINOO
A unique identifier for the token.

Voice VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Voice SSML Markup is used to change the language and voice applied to the text for rendering.

Example:

XML
<voice xml:lang="de">Deutsch</voice>

Attribute

Description

name

VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO
Voice name

gender

VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO
male
female
neutral

xml:lang

VSDK-CSDK VSDK-BARATINOO
An optional attribute specifying the language of the element.

age

VSDK-CSDK VSDK-BARATINOO
Positive integer or zero.

languages

VSDK-VTAPI VSDK-BARATINOO
List of space-separated languages the voice is desired to speak.

required

VSDK-BARATINOO
A list of space-separated feature names from gender, age, variant, languages, name. Initial value is languages.

ordering

VSDK-BARATINOO
A list of space-separated feature names from gender, age, variant, languages, name. Initial value is languages

onvoicefailure

VSDK-BARATINOO

priorityselect
keepexisting
processorchoice

variant

VSDK-BARATINOO
Positive integer or zero.

Volume VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Volume SSML Markup is used to set the volume of the voice. It accepts predefined values as well as positive numbers.

Example:

XML
<prosody volume="+100%">
    I am speaking this at approximately twice the original signal amplitude.
</prosody>

Value

VSDK-BARATINOO

VSDK-CSDK

VSDK-VTAPI

default

Initial value for current voice (60)

80

100

silent

0 relative to default

0

0

x-soft

20 relative to default

26

32

soft

40 relative to default

52

66

medium

60 relative to default

80

100

loud

80 relative to default

90

200

x-loud

100 relative to default

100

300

Relative percentage

[+/-] number followed by %

Relative value

[+/-] number with no units

Absolute value

Multiplier of the initial timbre value for the current voice (unsigned number with no units or followed by %).

Word VSDK-CSDK VSDK-VTAPI VSDK-BARATINOO

Word SSML Markup can be used to express segmentation of a word.

Example:

XML
<w>Apple</w>

Attribute

Description

xml:lang

An optional attribute specifying the language of the element.

role

A QName used in conjunction with lexicons.

onlangfailure

VSDK-BARATINOO
changevoice
ignoretext
ignorelang
processorchoice

xml:id

VSDK-BARATINOO
A unique identifier for the token.

vox:modes

VSDK-BARATINOO
A space-separated list of speech mode names.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.