How-to: Setup a barge-in
What is Barge-in ?
Barge-in is the capability of a voice system to detect and process user speech while it is playing audio output, allowing the user to interrupt the system.
Barge-in is a conversational interaction feature. It relies on ASR and speech events to determine when a user starts speaking. It does not refer to any specific audio processing technique.
Since barge-in relies on speech event detection, its performance is strongly influenced by speech enhancement and noise reduction. However, speech enhancement alone may not be sufficient to filter out Text-to-Speech (TTS) audio that is being played back and simultaneously captured by the microphone. To ensure reliable barge-in across diverse acoustic environments, we provide Acoustic Echo Cancellation (AEC).
Setting Up
(Optional but recommended) Perform AEC and denoise audio
Set up real-time speech recognition to capture speech events.
When a speech detection event occurs, interrupt the system’s audio playback.