Speech Enhancement - Android
Introduction
Speech enhancement allows you to improve the quality of audio captured from the microphone—reducing noise, removing artifacts, and enhancing clarity before sending it to ASR, saving it to file, or forwarding it elsewhere.
This makes it especially useful when you want to improve speech recognition accuracy.
You can configure your Speech Enhancer using either VDK-Studio. There’s no single configuration that fits all use cases, but you can start with one of the available templates and choose the one that best matches your needs.
In a pipeline, it sits between the Producer (e.g., microphone) and the Consumer. To learn more about Pipeline check Get Started guide.
Acoustic Echo Cancellation (AEC)
This component prevents the device’s own playback audio (such as TTS output) from being captured by the microphone and mistakenly interpreted as user speech. Android already provides AEC through the system’s audio framework. To enable it, you can use the audio source AudioSource.VOICE_COMMUNICATION, which activates built-in echo cancellation. Starting from vsdk 6.0.0, this is the default audio source. If you want raw microphone input instead, you can still use AudioSource.MIC.
Audio Format
Input: 16 kHz, 16-bit signed PCM, mono or stereo.
Output: 16 kHz, 16-bit signed PCM, mono.
Note that mono or stereo input is defined when configuring a Speech Enhancement technology in VDK-Studio.
Getting Started
Before you begin, make sure you’ve completed all the necessary preparation steps.
There are two ways to prepare your Android project for Speech Enhancement:
Using sample code
Starting from scratch
From Sample Code
Start by downloading the sample package from the Vivoka Console:
Open the Vivoka Console and navigate to your Project Settings.
Go to the Downloads section.
In the search bar enter package name from table.
📦 sample-speech-enhancement-x.x.x-deps-vsdk-x.x.x.zip
Once downloaded, you’ll have a fully functional project that you can test, customize, and extend to fit your specific use case.
From Scratch
Before proceeding, make sure you’ve completed the following steps:
1. Prepare your VDK Studio project
Create a new project in VDK Studio
Add the Speech Enhancement technology and add enhancer.
Export the project to generate the required assets and configuration
2. Set up your Android project
Install the necessary libraries (
vsdk-s2c-x.x.x-android-deps-vsdk-x.x.x.zip)Initialize VSDK in your application code
These steps are better explained in the Integrating Vsdk Libraries guide.
Start Speech Enhancement
1. Initialize Engine
Start by initializing the VSDK, followed by the Speech Enhancement engine:
import com.vivoka.vsdk.Vsdk;
Vsdk.init(context, "config/vsdk.json", vsdkSuccess -> {
if (!vsdkSuccess) {
return;
}
com.vivoka.vsdk.speechenhancement.s2c.Engine.getInstance().init(engineSuccess -> {
if (!engineSuccess) {
return;
}
// The ASR engine is now ready!
});
});
You cannot create two instances of the same engine.
If you call Engine.getInstance() multiple times, you will receive the same singleton instance.
2. Build Pipeline
For the sake of this example, we’ll implement a simple pipeline that records audio from the microphone, applies speech enhancement, and saves the processed audio to a file:
import com.vivoka.vsdk.audio.Pipeline;
import com.vivoka.vsdk.audio.consumers.File;
import com.vivoka.vsdk.audio.modifiers.ChannelExtractor;
import com.vivoka.vsdk.audio.producers.AudioRecorder;
import com.vivoka.vsdk.Exception;
// Create audio recorder (producer)
AudioRecorder audioRecorder = new AudioRecorder(16000, AudioFormat.CHANNEL_IN_STEREO, 1024);
// Create speech enhancer (modifier)
SpeechEnhancer enhancer = com.vivoka.vsdk.speechenhancement.s2c.Engine.getInstance().getSpeechEnhancer("enhancer-1");
// Create File (consumer)
File file = new File(getFilesDir().getAbsolutePath() + "/audio.pcm", true); // Will be saved in internal storage of you app.
// Create the pipeline and start it
Pipeline pipeline = new Pipeline();
pipeline.setProducer(audioRecorder);
if (mEnhancer.getInputChannelCount() == 1) {
// If the enhancer needs only one channel use the first one.
// You choose mono or stereo in VDK-Studio.
mEnhancedPipeline.pushBackModifier(new ChannelExtractor(0));
}
pipeline.pushBackModifier(enhancer);
pipeline.pushBackConsumer(file);
pipeline.start(); //.stop()
3. Start/Stop Pipeline
pipeline.start();
pipeline.stop();
pipeline.run();
.start()runs the pipeline in a new thread.run()runs the pipeline and waits till it is finished (blocking).stop()is used to terminate the pipeline execution
Once a pipeline has been stopped, you can restart it at any time by simply calling .start() again.
4. Release the Engine
com.vivoka.vsdk.speechenhancement.s2c.Engine.getInstance().release();