Skip to main content
Skip table of contents

Speech Enhancement ‎

Introduction

Speech enhancement allows you to improve the quality of audio captured from the microphone—reducing noise, removing artifacts, and enhancing clarity before sending it to ASR, saving it to file, or forwarding it elsewhere.

This makes it especially useful when you want to improve speech recognition accuracy.

You can configure your Speech Enhancer using either VDK-Studio. There’s no single configuration that fits all use cases, but you can start with one of the available templates and choose the one that best matches your needs.

Format

Input: 16 kHz, 16-bit signed PCM, mono or stereo.

Reference: 16 kHz, 16-bit signed PCM, mono.

Output: 16 kHz, 16-bit signed PCM, mono.

Note that mono or stereo input is defined when configuring a Speech Enhancement technology in VDK-Studio.

Examples

You can see the different routes available in: REST API ‎ in the Speech Enhancement section.

Enhancement

We can retrieve a list of available enhancers before starting the actual enhancements to make the enhancer we configured in the Studio is available.

JAVA
[GET] /speech-enhancement/enhancers

Then we can perform the enhancement by using the following route.

JAVA
[POST] /speech-enhancement/enhance
JAVA
{
  "speech_enhancer": "my_enhancer"
}

If the request is successful, we receive a token and we can head to the WebSocket API.

You can now send and receive audio through the newly opened socket. WebSocket API ‎ | SEND-Audio-Chunk-Message.5

Acoustic Echo Cancellation (AEC)

Acoustic Echo Cancellation (AEC) is an audio processing technique used to remove playback echo from the microphone signal when a device is simultaneously playing and recording audio.

Without AEC, the system may incorrectly interpret its own playback audio as user speech, which can negatively affect barge-in performance.

Android

AEC is already available in the Android API. To enable it, set the AudioSource to VOICE_COMMUNICATION in your AudioPlayer configuration:

JAVA
import com.vivoka.vsdk.audio.producers.AudioRecorder;

AudioRecorder audioRecorder = new AudioRecorder(AudioSource.VOICE_COMMUNICATION);

Once your audio is clean enough, with your ASR module, you can use the events Speech Detected and Silence Detected to perform barge-in operations, for instance, you could stop the TTS pipeline.

Other platforms (Windows, Linux)

AEC uses the same API route, the difference lies in the enhancer configuration (with multi_mic enabled) and in the audio streams that are sent.

JAVA
[POST] /speech-enhancement/enhance

WebSocket API ‎ | Route-Enhance

When performing AEC, two audio streams must be sent through the socket: the primary input stream, which needs to be cleaned and a reference stream, which will be removed from the input signal. The reference stream is identified using the is_reference parameter when sending the audio.

Sample project

A sample project is available for Speech Enhancement usage with VDK Service (in C# or Python).

Python
  • Download and extract the zip below

  • Head inside the project

  • (Optional) Create and activate a virtual environment (Python Venv documentation)

  • Install the project : pip install .

  • Run the script : vdk-enhancement --help

If you see the list of options, you can start your configured VDK Service and interact with it using the options available. For example vdk-enhancement --list will list available enhancers.

VdkServiceSample-SpeechEnhancement-Python-v1.0.0.zip

C#
  • Download and extract the zip below

  • Open the project solution (.sln)

  • Build and run project with the argument “--help”

If you see the list of options, you can start your configured VDK Service and interact with it using the options available. For example --list will list available enhancers.

VdkServiceSample-SpeechEnhancement-CSharp-v1.0.0.zip

Assets

Along with this project, we provide you with assets containing audio that have been tested for AEC and noise reduction.

vdk-service-sample-assets-1.0.0.zip

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.