Speech Enhancement ‎

Introduction

Speech enhancement allows you to improve the quality of audio captured from the microphone—reducing noise, removing artifacts, and enhancing clarity before sending it to ASR, saving it to file, or forwarding it elsewhere.

This makes it especially useful when you want to improve speech recognition accuracy.

You can configure your Speech Enhancer using either VDK-Studio. There’s no single configuration that fits all use cases, but you can start with one of the available templates and choose the one that best matches your needs.

Barge-In (AEC)

Acoustic Echo Cancellation (AEC) is a technique used to eliminate the echo that can occur when a device plays audio (e.g., TTS output) and simultaneously captures audio through its microphone. Without AEC, the playback audio may be picked up by the microphone and misinterpreted as user input—especially problematic in interactive voice applications.

Barge-In relies on AEC to ensure the system doesn’t mistakenly detect its own voice as user input.

AEC is already available in the Android API. To enable it, set the AudioSource to VOICE_COMMUNICATION in your AudioPlayer configuration:

JAVA

import com.vivoka.vsdk.audio.producers.AudioRecorder;

AudioRecorder audioRecorder = new AudioRecorder(AudioSource.VOICE_COMMUNICATION);

Once your audio is clean enough, with your ASR module, you can use the events Speech Detected and Silence Detected to perform barge-in operations, for instance, you could stop the TTS pipeline.

Audio Format

Input: 16 kHz, 16-bit signed PCM, mono or stereo.

Output: 16 kHz, 16-bit signed PCM, mono.

Note that mono or stereo input is defined when configuring a Speech Enhancement technology in VDK-Studio.

Sample project

A sample project is available for Speech Enhancement usage with VDK Service (in C# or Python).

Python

Download and extract the zip below
Head inside the project
Create and activate a virtual environment (Python Venv documentation)
Install the project : pip install -e .
Run the script : vdk-enhancement --help

If you see the list of options, you can start your configured VDK Service and interact with it using the options available. For example vdk-enhancement --list will list available enhancers.

VdkServiceSample-SpeechEnhancement (python).zip

C#

Download and extract the zip below
Open the project solution (.sln)
Set SpeechEnhancement as startup project
Build and run project with the argument “--help”

If you see the list of options, you can start your configured VDK Service and interact with it using the options available. For example --list will list available enhancers.

VdkServiceSample-SpeechEnhancement (C#).zip