VASR - C++
VASR if the name of Vivoka's own ASR engine. It is provided through the VSDK framework under the designation VSDK-VASR.
It is designed to run completly offline and on small devices (like Raspberry Pi 3 / 4)
Supported languages
Currently 5 languages are supported:
🇫🇷 fra-FR → French of France
🇺🇸 eng-US → Enlish of United States
🇮🇹 ita-IT → Italian from Italy
🇪🇸 spa-ES → Spanish from Spain
🇩🇪 deu-DE → German from Germany
Supported features
Here is a list of all the features currently supported by the engine:
Feature name | Description |
---|---|
Grammars | Ability to compile / load BNF formatted grammars |
Dynamic Content | A.k.a. Dynamic slots. Ability to specify in a grammar a rule where its values will be given later |
Custom phonetic | Ability to specify any phonetic for a given word or expression in a grammar |
Custom phonetic in dynamic content | Ability to specify 1 or more custom phonetic with the values of a particular slot |
Tag annotations | Ability to specify tags in the grammar used to easily retreive information from it |
Intermediate result | Ability to return non-final results while the user is still speaking |
VAD (Voice Activity Detection) | Ability to automatically detect when a user speak |
Confidence score | Ability to return a confidence score with the result |
Event detection | Ability to send feedback when speech / silence is detected |
Engine configuration
Like any other engine in VSDK, this one also have its own configuration that needs to be provided as a JSON file when instantiating the engine.
Here is a sample of the configuration for VASR:
{
"version": "2.0",
"vasr": {
"paths": {
"data_root": "../data"
},
"asr": {
"recognizers": {
"rec": {
"acmods": ["eng-US.vam"]
}
},
"models": {
"cmd": {
"type": "static",
"file": "eng-US.vgg"
}
}
}
}
}
This configuration represent the minimum required in order to operate the engine. You can see the complete configuration with the description of every field Configuration file .
Required resources
In order to correctly run the engine require at least 2 files: an acoustic model and a compiled grammar.
The acoustic model file ends with a .vam
extension (Vivoka Acoustic Model) and the compiled grammar ends with .vgg
(Vivoka Grammar Graph). You can check the page Acoustic model file for the acoustic model and the page Compiled grammar for the compiled grammar if you want more details about those files.
Sample code
This sample code assume that the content of the vsdk.json
configuration file is similar to the one above.
#include <vasr/Engine.hpp>
#include <vasr/GrammarComposer.hpp>
#include <vasr/LanguageModel.hpp>
#include <vasr/Recognizer.hpp>
std::vector<float> audioData();
int main() try
{
Vasr::Engine engine("vsdk.json");
auto & rec = engine.recognizer("rec"); // String taken from the vsdk.json file at vasr/asr/recognizers
auto & grm = engine.grammarComposer("cmd"); // String taken from the vsdk.json file at vasr/asr/models
if (grm.hasSlots())
{
// Fill slots
}
auto model = grm.compose();
recognizer.setModels({ model });
recogniser.installResultCallback([](Vasr::AsrResult result)
{
if (result.isFinal())
{
// Process final result
}
else
{
// Process intermediate result
}
});
recognizer.processAudioBuffer(audioData(), true);
}
catch (std::exception const & e)
{
fmt::print(stderr, "A fatal error occured:\n");
Vsdk::printExceptionStack(e);
return EXIT_FAILURE;
}