SSML reference

The Speech Synthesis Markup Language (SSML) is an XML-based markup language for speech synthesis. Its essential role is to provide authors of synthesizable content a standard way to control aspects of speech such as pronunciation, volume, pitch, rate, etc. across different synthesis-capable platforms.

Note that not all of the elements and options described in the W3 SSML specification are currently supported by all providers. This page details which elements are available for each provider.

Reserve characters

Avoid using SSML reserve characters in the text that is to be converted to audio. When you need to use an SSML reserve character, prevent the character from being read as code by using its escape code. The following table shows reserved SSML characters and their associated escape codes.

Character	Escape code
"	`"`
&	`&`
'	`'`
<	`<`
>	`>`

Markups

Audio CERENCE VOXYGEN READSPEAKER

Audio SSML Markup is used to insert a recorded audio file. If the audio file cannot be retrieved, the element's contents are synthesized. The content is the fallback text used when the audio file is not supported.

Example:

XML

<audio src="file:laugh">
  haha
</audio>

Attribute	Description
src	The URI of a document with an appropriate MIME type. URIs may be absolute or relative to the base:uri specified in <speak> element. Audio files may be local (`file://`, or absolute paths) or remote (`http://`). Supported audio files: CERENCE `.WAV` containing linear 16 bit PCM samples. VOXYGEN `.au` (`audio/x-au`), `.wav` (`audio/x-wav`), `.a8k` `.alaw` (`audio/x-alaw-basic`), `.raw` `.ulaw` (`audio/basic`) READSPEAKER `WAV` or `PCM` format. READSPEAKER The path can be also `file:FILENAME` while `FILENAME` is a prerecorded paralinguistic sounds (laughter, coughs etc). Lists of available sounds for each voice are found in a separate document. CERENCE READSPEAKER The audio file will automatically be resampled to match the current sampling rate before inserting it in the speech output.
mode READSPEAKER	It is a custom attribute. If it is set as `background`, the audio can be mixed up with the text inside `<audio>` element.
fetchtimeout VOXYGEN READSPEAKER (SSML 1.1)	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. VOXYGEN Default: `30s` READSPEAKER Default: `10000ms`
fetchhint VOXYGEN (SSML 1.1)	This tells the synthesis processor whether or not it can attempt to optimize rendering by pre-fetching audio. Available values: `prefetch` (default) `safe`
maxage VOXYGEN (SSML 1.1)	A positive integer or zero.
maxstale VOXYGEN (SSML 1.1)	A positive integer or zero.
clipBegin VOXYGEN (SSML 1.1)	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `0s`.
clipEnd VOXYGEN (SSML 1.1)	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds.
repeatCount VOXYGEN (SSML 1.1)	Signed or unsigned positive number or zero. Default `1`. VOXYGEN If the `repeatCount` attribute is used, the maximum duration of the audio insertion is 5 minutes.
repeatDur VOXYGEN (SSML 1.1)	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. VOXYGEN If the `repeatDur` attribute is used, the maximum duration of the audio insertion is 5 minutes.
soundLevel READSPEAKER VOXYGEN (SSML 1.1)	Signed number followed by `dB` for decibels. Default `+0.0dB`. VOXYGEN The soundLevel attribute is truncated to the interval `[-90.0dB;+12dB]`.
speed READSPEAKER VOXYGEN (SSML 1.1)	Unsigned positive number or zero followed by `%`. Default `100%`. VOXYGEN The speed attribute is truncated to the interval `[50%;200%]`.
vox:gain VOXYGEN	Signed number followed by `dB` for decibels. Default `+0.0dB`.
vox:fadelevel VOXYGEN	Signed number followed by `dB` for decibels. Default `+0.0dB`.
vox:fadein VOXYGEN	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `0s`. VOXYGEN The `fadein` attributes are truncated to the interval `[0s;60s]`.
vox:fadeout VOXYGEN	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `0s`. VOXYGEN The `fadeout` attributes are truncated to the interval `[0s;60s]`.
vox:fadeinAttack VOXYGEN	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `20s`.
vox:fadeinRelease VOXYGEN	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `20s`.
vox:fadeoutAttack VOXYGEN	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `20s`.
vox:fadeoutRelease VOXYGEN	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `20s`.
vox:tempo VOXYGEN	The tempo attribute can be used to speed up or slow down the rate of the audio file without changing the pitch level. Unsigned positive number followed by `%`. Default `100%`. VOXYGEN The tempo attribute is truncated to the interval `[50%;200%]`.

Audio Mix VOXYGEN

Audiomix SSML Markup is used to insert a recorded audio file, and mix it with the element content. If the audio file is longer than the speech, it is truncated. If he is shorter, he is repeatedly read.

Attributes of the <audiomix> element have the same meaning and restrictions as those of the <audio> element, but the default fade attack and release durations may differ.

Example:

XML

<vox:audiomix src="file:laugh" fetchtimeout="3ms">
  haha
</vox:audiomix>

Attribute	Description
src	Name of file (absolute or relative URI)
fetchtimeout	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `30s`.
fetchhint	This tells the synthesis processor whether or not it can attempt to optimize rendering by pre-fetching audio. Available values: `prefetch` (default) `safe`
maxage	A positive integer or zero.
maxstale	A positive integer or zero.
clipBegin	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `0s`.
clipEnd	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds.
soundLevel	Signed number followed by `dB` for decibels. Default `+0.0dB`. VOXYGEN The soundLevel attribute is truncated to the interval `[-90.0dB;+12dB]`.
speed	Unsigned positive number or zero followed by `%`. Default `100%`. VOXYGEN The speed attribute is truncated to the interval `[50%;200%]`.
gain	Signed number followed by `dB` for decibels. Default `+0.0dB`.
fadelevel	Signed number followed by `dB` for decibels. Default `+0.0dB`.
fadein	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `0s`. VOXYGEN The `fadein` attributes are truncated to the interval `[0s;60s]`.
fadeout	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `0s`. VOXYGEN The `fadeout` attributes are truncated to the interval `[0s;60s]`.
fadeinAttack	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `480s`.
fadeinRelease	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `480s`.
fadeoutAttack	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `480s`.
fadeoutRelease	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `480s`.

Break CERENCE READSPEAKER VOXYGEN

Break SSML Markup is used to temporarily pause the speech.

It is inserted at cursor position as an empty element, and can be used with milliseconds or seconds.

Example:

XML

<break time="300ms"/>

Attribute	Description
time	Signed or unsigned positive number or zero followed by `s` for seconds or `ms` for milliseconds. VOXYGEN Extension: percentage values are also accepted.
strength	Value	CERENCE	VOXYGEN	READSPEAKER
	none	0ms	≅ 0ms	≅ 0ms
	x-weak	20ms	≅ 50ms	≅ 200ms
	weak	100ms	≅ 100ms	≅ 450ms
	medium	500ms	≅ 500ms	≅ 700ms
	strong	1000ms	≅ 1000s	≅ 900ms
	x-strong	1500ms	≅ 2000s	≅ 1200ms

Checksum VOXYGEN

Enable a cyclic-redundancy check to be performed on the signal and events in the most recent breath group (delimited by silence) rendered from the content of the current document.

Example:

XML

<vox:checksum crc32="2016915618"></vox:checksum>

Attribute	Description
crc32	Unsigned positive number or zero.

Computed duration VOXYGEN

Example:

XML

<prosody vox:computedduration="on"></prosody>

Attribute	Description
vox:computedduration	`on`: Apply phoneme duration computed by system. `off`: Intrinsic phoneme duration. `default`: Reset to default behavior of voice.

Computed pitch VOXYGEN

Example:

XML

<prosody vox:computedpitch="on"></prosody>

Attribute	Description
vox:computedpitch	`on`: Apply pitch contour computed by system. `off`: Intrinsic pitch contour. `default`: Reset to default behaviour of voice.

Contour VOXYGEN

Contour SSML Markup is used to set different pitch values at different timestamps.
In each pair (time, pitch), the first value is a percentage of the period of the contained text and the second value is the value of the pitch attribute.

Example:

XML

<prosody contour="(0%, +10%) (50%, +50%) (100%, +90%)">
  I am speaking.
</prosody>

DurationVOXYGEN

Duration SSML Markup is used to set the duration of the marked speech. Signed or unsigned positive number or zero followed by s for seconds or ms for milliseconds.

Example:

XML

<prosody duration="5s">I'm speaking very slow.</prosody>

Emphasis CERENCE READSPEAKER VOXYGEN

Emphasis SSML Markup is used to request that the contained text be spoken with emphasis. Please note that the realization of emphasis is voice dependent.

Example:

XML

That is a <emphasis> big </emphasis> car!

VOXYGEN
The realization of emphasis is voice dependent.

Attribute	Description
level	`none` `reduced` `moderate` (default) `strong`

Lang CERENCE READSPEAKER VOXYGEN

With the lang SSML makup it is possible to switch language. Changing the language also changes the voice, if there is a voice available.

The lang element can only contain text to be rendered and the following elements: audio, break, emphasis, lang, lookup, mark, p, phoneme, prosody, say-as, sub, s, token, voice and w.

Example:

XML

English, <lang xml:lang="de">Deutsch</lang>, English.

Attribute	Description
xml:lang	A required attribute specifying the language of the element.
onlangfailure	An optional attribute specifying the desired behavior upon language speaking failure. READSPEAKER The attribute `onlangfailure` is always treated as `ignoretext`.
	Value	Description
	ignoretext	The synthesis processor will not attempt to render the text that is in the failed language.
	ignorelang	The synthesis processor will ignore the change in language and speak as if the content were in the previous language.
	changevoice	If a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content. Otherwise, the processor chooses another behavior (either `ignoretext` or `ignorelang`).
	processorchoice	The synthesis processor chooses the behavior (either `changevoice`, `ignoretext`, or `ignorelang`).

Lexicon READSPEAKER VOXYGEN

Lexicon SSML markup is used to reference a lexicon document.

READSPEAKER

Supported format: PLS (Pronunciation Lexicon Speciﬁcation 1.0) and CSV (User-Dictionary of ReadSpeaker).

VOXYGEN

Supported format: PLS (Pronunciation Lexicon Speciﬁcation 1.0).

Example:

XML

<lexicon xml:id="myLexiconDoc"></lexicon>

Attribute	Description
uri	Location of the lexicon document.
xml:id	A unique identifier for the lexicon document.
type	VOXYGEN Preferred media type of the lexicon document.
fetchtimeout	VOXYGEN Signed or unsigned positive number or zero followed by `s` for seconds or `ms` or milliseconds. Default `30s`.
maxage	VOXYGEN A positive integer or zero.
maxstale	VOXYGEN A positive integer or zero.

Lookup READSPEAKER VOXYGEN

Example:

XML

<lookup ref="myLexiconDoc"></lookup>

Attribute	Description
ref	The `ref` attribute specifies a name that references a lexicon document as assigned by the `xml:id` attribute of the lexicon element.

Mark READSPEAKER VOXYGEN

The mark element specifies a named event which is triggered by the TTS engine when that location in the text is encountered in the generated audio stream. (What effect this event has is application specific, but it doesn’t affect the audion being generated)

The mark event must have a name attribute. The given name doesn’t have any meaning to the TS engine, but is included in the generated event.

Note that built-in normalization rules might, in some particular contexts such as date and currency expressions, cause adjacent words and numbers to be reordered. The TTS engine will generally try to preserve the association between marks and adjacent words in such cases, meaning that the mark events are not necessarily triggered in the exact order in which they occur in the SSML input but rather in a way that is more true to the reading order.

Example:

XML

<mark name="item1"/>First item, <mark name="item2"/>second item.

Attribute	Description
name	Marker name
vox:typeVOXYGEN	Value	Description
	sync (default)	The voice synthesis engine will trigger an event when that location in the text is encountered in the generated audio stream.
	wait	A wait marker allows rendering of the audio signal to be deferred until the duration of the immediately following content has been determined. The end of the content whose duration is to be determined is marked by either the end of the root <speak> element or a <mark> element, of any type, that bears the same name (case-sensitive). For example: XML `Text before… <mark name="foo" vox:type="wait"/> piece of text <mark name="foo"/> Text after…` When Baratinoo processes the above markup, notification is first made by a `WAITMARKER` event with the name `foo` and the duration in samples of the rendered content `piece of text`. Then the signal for the `piece of text` is sent, and finally, notification is made by a `MARKER` event with the name `foo`, signaling the end of the marked sequence. It is possible to set another <mark>, of any type, before the end of the deferred content is encountered.

Paragraph CERENCE READSPEAKER VOXYGEN

Paragraph SSML markup is used to indicate a paragraph in your text.

While the TTS engine already recognize paragraphs automatically, it can help it to better understand and render your text.

Example:

XML

<p>
  You have 4 new messages.
</p>

ReadSpeaker adds a sentence break before and after the element.

Attribute	Description
xml:lang	An optional attribute specifying the language of the element.
onlangfailure	An optional attribute specifying the desired behavior upon language speaking failure. Not supported by READSPEAKER.
	Value	Description
	ignoretext	The synthesis processor will not attempt to render the text that is in the failed language.
	ignorelang	The synthesis processor will ignore the change in language and speak as if the content were in the previous language.
	changevoice	If a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content. Otherwise, the processor chooses another behavior (either `ignoretext` or `ignorelang`).
	processorchoice	The synthesis processor chooses the behavior (either `changevoice`, `ignoretext`, or `ignorelang`).

Phoneme CERENCE READSPEAKER VOXYGEN

Phoneme SSML Markup is used to provide a phonetic pronunciation for the contained text.

Example:

XML

<phoneme alphabet="ipa" ph="vivo͡ʊkə">
  Vivoka
</phoneme>

Support of the alphabet is limited to sounds that map to the phonetic symbols of the current voice.

VOXYGEN

The value of the ph attribute is ignored for unsupported alphabets and a warning is issued.

Attribute	Description
alphabet	Provider	Values
	CERENCE	`lhp`, `nt-sampa`, `sxm-sampa`, `pinyin` (for Chinese only), `diacritized` (for Arabic only)
	READSPEAKER	`ipa`
	VOXYGEN	`x-voxygen`, `ipa`
ph	List of phonetic symbols. Separated by underscore `_` when `x-voxygen` alphabet is used.
type (SSML 1.1)	VOXYGEN Indicates additional information about how the pronunciation information is to be interpreted. The only allowed values for this attribute are `default`, which has no implications, and `ruby`, which indicates that the pronunciation information is from ruby text. The default value of this attribute is `default`. READSPEAKER It is provided as an optional attribute for some engines.
vox:idl	VOXYGEN Control the inclusion or exclusion of specific acoustic units as candidate realizations for each part of the given phonetic pronunciation. `[ids]pho[ids]pho...[ids]` ids: list of comma separated acoustic unit identifiers (integers). An identifier may be preceded by + for `inclusion` during unit selection, otherwise `exclusion` from unit selection is inferred. pho: a `x-voxygen` phonetic symbol

Pitch CERENCE READSPEAKER VOXYGEN

Pitch SSML Markup is used to set the pitch of the voice.

It accepts predefined values as well as relative percentages numbers followed by %.

Example:

XML

<prosody pitch="x-low">Oh my voice</prosody>

Value	VOXYGEN	CERENCE	READSPEAKER
`x-low`	50% of `default`	-30%	50
`low`	75% of `default`	-15%	75
`medium`	100% of `default`	0%	100
`high`	133% of `default`	+35%	150
`x-high`	200% of `default`	+60%	200
`default`	Initial value for current voice	0%	100
Relative percentage	[+/-] number followed by `%`

Prompt CERENCE

Prompt SSML Markup is used to insert an ActivePrompt at a specific location in the text.

Example:

XML

<prompt id="myPrompt"></prosody>

Attribute	Description
id	The prompt id.

Range VOXYGEN

Range SSML Markup is used to set the range of the voice.

It accepts predefined values as well as relative percentages numbers followed by %.

Example:

XML

I'm going <prosody range="x-low">far</prosody>

Value	Description
`x-low`	50% of `default`
`low`	75% of `default`
`medium`	100% of `default`
`high`	133% of `default`
`x-high`	200% of `default`
`default`	Initial value for current voice
Relative percentage	[+/-] number followed by `%`
Relative change	[+/-] number followed by `Hz` for Hertz or `st` for semitones
Absolute value in Hertz	Unsigned number followed by `Hz`

Rate CERENCE READSPEAKER VOXYGEN

Rate SSML Markup is used to set speech rate of the voice.

It accepts predefined values as well as relative percentages numbers followed by %.

Example:

XML

<p>
  <s>
    The subject is <prosody rate="-20%">ski trip</prosody>
  </s>
</p>

Value	VOXYGEN	CERENCE	READSPEAKER
`x-slow`	50% of `default`	50	50
`slow`	75% of `default`	75	75
`medium`	100% of `default`	100	100
`fast`	125% of `default`	150	125
`x-fast`	150% of `default`	200	150
`default`	Initial value for current voice	100	100
Relative percentage	[+/-] number followed by `%`, Extension of SSML 1.1.

Rate subject VOXYGEN

Example:

XML

<prosody vox:rate-subject="pause"></prosody>

Value	Description
vox:rate-subject	`articulation` Rate value affects only speech. `pause` Rate value affects only pauses originated from the synthesis engine (<break> value are not affected). `all` Rate value affects both speech and pauses (default value).

Say as CERENCE READSPEAKER VOXYGEN

Say-as SSML Markup is used to indicate the type of text construct contained within the element.

Multiple format values are available for each interpret-as values, but their realization is voice-dependant.

The attribute values that may have an effect on rendering depend on the current voice.

Example: Will be read as "third"

XML

<say-as interpret-as="ordinal">3</say-as>

Attribute	Description
format	The date format may be optionally specified via format attribute, to supersede the language defaults, e.g. `dmy` or `mdy`.
interpret-as	Indicates the content type of the contained text construct.
	Value	Description
	address CERENCE	Expand text as an address, including street names and numbers, zip codes, state names, etc.
	boolean READSPEAKER	Reads as a boolean.
	cardinal CERENCE VOXYGEN READSPEAKER	Reads as a cardinal number.
	characters VOXYGEN READSPEAKER	Spells out letters, reads digits one by one, and expands non-alphabetical characters.
	code CERENCE	Expand numbers or codes reading them digit by digit
	currencyCERENCEREADSPEAKER	Expand text as a decimal currency including currency abbreviations.
	date CERENCE VOXYGEN READSPEAKER	Read digits as date.
	decimal CERENCE	Same as number but including comma/dot normalization.
	digits CERENCEREADSPEAKER	Expand numbers or codes reading them digit by digit.
	distance CERENCE	Expand text as a distance measurement.
	normal CERENCE	Default text normalization
	number CERENCE READSPEAKER	Expand cardinal/ comma formatted numbers up to 15 digits.
	ordinal CERENCE VOXYGEN READSPEAKER	Reads as an ordinal number.
	phone CERENCE READSPEAKER	Expand text as a telephone number including country codes, prefixes, tel. word indicators, etc.
	rational CERENCE	Same as number but including comma/dot normalization.
	real CERENCE	Same as number but including comma/dot normalization.
	sms CERENCE	Expand text as a sms message, reading web addresses, smileys, email addresses, etc.
	spell CERENCE	Spell out the input text that follows.
	telephone CERENCE VOXYGEN READSPEAKER	Reads as a telephone number.
	time CERENCE VOXYGEN READSPEAKER	Expand text as a clock reading (hour, minutes, am, pm), a duration or a time range.
	zip CERENCE	Expand text as a zip code.
detailREADSPEAKER	An optional attribute, a value changes, depending on the interpret-as.
typeREADSPEAKER	A custom attribute, the interpret-as can be bypassed. it renders by deﬁning a duration format. (`duration(:hms)`, `duration:hm`, `duration:ms`, `duration:h`, `duration:m`, `duration:s` are available.)

Sentence CERENCE READSPEAKER VOXYGEN

Sentence SSML markup is used to indicate a sentence in your text.

While the TTS engine already recognize sentences automatically, it can help it to better understand and render your text. You can place multiple sentences in a paragraph.

Example:

XML

  <p>
    <s>This is the first sentence of the paragraph.</s>
    <s>Here's another sentence.</s>
  </p>

Attribute	Description
xml:lang	An optional attribute specifying the language of the element.
onlangfailure	An optional attribute specifying the desired behavior upon language speaking failure. Not supported by READSPEAKER.
	Value	Description
	ignoretext	The synthesis processor will not attempt to render the text that is in the failed language.
	ignorelang	The synthesis processor will ignore the change in language and speak as if the content were in the previous language.
	changevoice	If a voice exists that can speak the language, the synthesis processor will switch to that voice and speak the content. Otherwise, the processor chooses another behavior (either `ignoretext` or `ignorelang`).
	processorchoice	The synthesis processor chooses the behavior (either `changevoice`, `ignoretext`, or `ignorelang`).

Style CERENCE

Style SSML Markup is used to set an alternative speaking style instead of the normal one.

Please note that a particular style can be incompatible with some voices.

Example:

XML

Sorry <break time="300ms"/>
<style name="lively">sorry</style>

Not all styles are supported by all Cerence voices.

Attribute	Description
name	The speaking style name to use. You can check this page to get the supported values for each voice.

Sub CERENCE READSPEAKER VOXYGEN

Sub SSML Markup is used to substitute text for the purposes of pronunciation. The sub element can contain only text (no elements).

Example:

XML

<sub alias="Voice Development Kit">VDK</sub>

Attribute	Description
alias	The content that the voice synthesis will read instead of the content of the element.

Timbre CERENCE VOXYGEN

Timbre SSML Markup is a rate/pitch warping coefficient that maintains the duration of phonemes and enables voice timbre to be modified.

It accepts predefined values as well as relative percentages numbers followed by %.

Cerence example:

XML

<prosody timbre="+100%">
    I am speaking with a different voice timber.
</prosody>

Voxygen example:

XML

<prosody vox:timbre="+100%">
    I am speaking with a different voice timber.
</prosody>

Attribute	Description
timbreCERENCE	x-young	+35%
	x-young	+20%
	medium	0%
	old	-20%
	x-old	-35%
	default	0%
	Relative percentage	[+/-] number followed by `%`
vox:timbreVOXYGEN	Relative percentage	[+/-] number followed by `%`
	Relative value	[+/-] number with no units
	Absolute value	Multiplier of the initial timbre value for the current voice (unsigned number with no units or followed by `%`).

Token READSPEAKER VOXYGEN

Token SSML Markup can be used to disambiguate heteronyms.

Example:

XML

<token xml:id="myToken">VDK</token>

Attribute	Description
xml:lang	An optional attribute specifying the language of the element.
role	A QName used in conjunction with lexicons.
onlangfailure	VOXYGEN `changevoice` `ignoretext` `ignorelang` `processorchoice`
xml:id	VOXYGEN A unique identifier for the token.

Voice CERENCE READSPEAKER VOXYGEN

Voice SSML Markup is used to change the language and voice applied to the text for rendering.

Example:

XML

<voice xml:lang="de">Deutsch</voice>

Attribute	Description
name	CERENCE READSPEAKER VOXYGEN Voice name
gender	CERENCE READSPEAKER VOXYGEN `male` `female` `neutral`
xml:lang	CERENCE VOXYGEN An optional attribute specifying the language of the element.
age	CERENCE VOXYGEN Positive integer or zero.
languages	READSPEAKER VOXYGEN List of space-separated languages the voice is desired to speak.
required	VOXYGEN A list of space-separated feature names from `gender`, `age`, `variant`, `languages`, `name`. Initial value is `languages`.
ordering	VOXYGEN A list of space-separated feature names from `gender`, `age`, `variant`, `languages`, `name`. Initial value is `languages`
onvoicefailure	VOXYGEN `priorityselect` `keepexisting` `processorchoice`
variant	VOXYGEN Positive integer or zero.

Volume CERENCE READSPEAKER VOXYGEN

Volume SSML Markup is used to set the volume of the voice. It accepts predefined values as well as positive numbers.

Example:

XML

<prosody volume="+100%">
    I am speaking this at approximately twice the original signal amplitude.
</prosody>

Value	VOXYGEN	CERENCE	READSPEAKER
`default`	Initial value for current voice (60)	80	100
`silent`	0 relative to `default`	0	0
`x-soft`	20 relative to `default`	26	32
`soft`	40 relative to `default`	52	66
`medium`	60 relative to `default`	80	100
`loud`	80 relative to `default`	90	200
`x-loud`	100 relative to `default`	100	300
Relative percentage	[+/-] number followed by `%`
Relative value	[+/-] number with no units
Absolute value	Multiplier of the initial timbre value for the current voice (unsigned number with no units or followed by `%`).

Word CERENCE READSPEAKER VOXYGEN

Word SSML Markup can be used to express segmentation of a word.

Example:

XML

<w>Apple</w>

Attribute	Description
xml:lang	An optional attribute specifying the language of the element.
role	A QName used in conjunction with lexicons.
onlangfailure	VOXYGEN `changevoice` `ignoretext` `ignorelang` `processorchoice`
xml:id	VOXYGEN A unique identifier for the token.
vox:modes	VOXYGEN A space-separated list of speech mode names.