Speech To Text

Speech-to-Text is a voice technology, based on deep learning language models, that is used to transform audio signals into transcribed text. The results are statistically determined regarding the most frequent sentence structures and word occurrence regarding the context identified.

This technology comes in two flavors: Continuous and Dictation. Both enable us to recognize almost anything in our language's dictionary, but the Dictation flavor takes it a step further by considering punctuation and other linguistic nuances.

While both allow us to use our voice to type, each has its own set of use cases. Dictation is well-suited for tasks such as taking notes and writing emails or documents where punctuation and other linguistic nuances are important. On the other hand, Continuous is more suited for scenarios where linguistic nuances and punctuation are not critical such as coupled with NLU systems. Let's see how they respond to the same input speech.

Continuous:

CODE

hey remember to water the plants today

Dictation:

NONE

Hey, remember to water the plants today.

https://www.youtube.com/watch?v=-J-70KG5TDA&list=PLxpkg3kmxJgii81jzA9lgohtwcxSm0SHB

Main Screen

Audio Recording. Starts recognition using the microphone as input.
Audio File. Starts recognition using an audio file as input.
Result Panel. Displays the previous records and their hypotheses. The amount of hypothesis is controllable from the model settings dialog (accessible through the Modify button). Records can be individually removed with the Delete button.
Hypothesis Explorer. Displays the selected hypothesis here, where the text can be selected and copied.

Please note that the speech-to-text widget is used for both types of recognition: continuous and dictation.

Create a model

Go to the Playground.
In the voice recognition card, click on Add a Model.
In the opened wizard with a choice, select continuous or dictation.
You will next have to choose the name and language for your model.
Finish by clicking on Add to project.

Test

You can click on the model inside the Voice recognition card to open the widget.
You can now import audio or record to have the audio fully transcribed.
You can change the number of results by editing the settings and adding the key LH_SEARCH_PARAM_MAXNBEST. To find these settings you can go to the settings of the ASR, advanced settings.