Speech Synthesis (TTS)

Speech Synthesis (TTS) converts written text into audible speech. Our TTS module provides configurable speech output that can be adjusted to fit the requirements of your application.

VDK’s engine supports a wide range of languages (up to 65 for TTS) and offers multiple voice options, along with parameters such as pitch, rate, volume, and timbre.

Key features

Natural-Looking Speech:
Produce lifelike audio output using models of different size and quality. Benefit from smooth intonation and flexible expressiveness.

Multilingual and Multi-dialect support:
Synthesize speech in up to 65 languages and multiple regional dialects. Choose from a wide variety of voices to match your target audience.

SSML Compatibility:
Enhance your synthesized speech using the Speech Synthesis Markup Language (SSML) to control pronunciation, volume, pitch, rate, and other expressive features.

SSML is not available for neural voices.

Performance and Optimization

Real-Time Synthesis
Our engine is optimized for low-latency synthesis, providing near-instantaneous feedback for interactive applications.

Resource Management
Choose the right voice quality for your deployment—embedded voices are designed for devices with limited resources, while neural voices offer enhanced naturalness on more powerful systems.