Glossary
Term | Definition |
---|---|
AFE | Audio Front-End. |
ASR | Automatic Speech Recognition |
Bandpass filter | A bandpass filter “passes” a band of frequencies (a defined range above a low cutoff and below a high cutoff) while progressively attenuating frequencies below the low cutoff and above the high cutoff. |
Bitrate | Bitrate describes the rate at which bits are transferred from one location to another in a given amount of time. Bitrate is commonly measured in bits per second (bps), kilobits per second (Kbps), or megabits per second (Mbps). |
Channels | A channel is a representation of sound coming from or going to a single point. A single microphone can produce one channel of audio, and a single speaker can accept one channel of audio, for example. |
Emphasis | Emphasis refers to the particular prominence given in reading or pronoucing one or more words or syllables |
Highpass filter | A highpass filter “passes” the high-frequencies above their cutoff frequency while progressively attenuating frequencies below the cut-off frequency. In other words, high-pass filters remove low-frequency content from an audio signal below a defined cut-off point. |
Logs | When you run an application, logs refer to recordings of process events. The history of these events will then be recorded in an event log which you can access to view each action of the process. |
Lowpass filter | A lowpass filter is an audio signal processor that removes unwanted frequencies from a signal above a determined cutoff frequency. It progressively filters out (attenuates) the high-end above its cutoff frequency while allowing the low-end to pass through, ideally without any changes. |
NLU | Natural Language Understanding is a branch of artificial intelligence that uses computer software to understand input in the form of sentences using text or speech. |
Phonemes | Phonemes represent the smallest unit of a sound that differentiate the meaning of two words. Example: The word dad has the following transcription: /dæd/. The word bad has the following transcription: /bæd/. Those two words don't have the same meaning, and the only difference between them are the /b/ and /d/. We can then say that those are phonemes. |
Pitch | Pitch refers to the highness or lowness of the voice. |
Rate | Rate refers to how fast or slow a person speaks. |
Samplerate (Hz) | In audio production, a sample rate (or "sampling rate") defines how many times per second a sound is sampled. Technically speaking, it is the frequency of samples used in a digital recording. |
Script | A script designates a program responsible for executing one or more pre-defined actions when a user performs an action. |
SDK | Software Development Kit. |
Speech Synthesis Markup Language (SSML) | XML-based markup language that allows developers to specify how input text is converted into synthesized speech via text-to-speech. |
Timbre | Timbre refers to the rate/pitch warping coefficient that maintains the duration of phonemes. |
TTS | Text-To-Speech. |
UI | User Interface. |
VDK | Voice Development Kit. |
VSDK | Voice Software Development Kit. |