VASR - C++

VASR if the name of Vivoka's own ASR engine. It is provided through the VSDK framework under the designation VSDK-VASR.
It is designed to run completly offline and on small devices (like Raspberry Pi 3 / 4)

Supported languages

Currently 5 languages are supported:

🇫🇷 fra-FR → French of France
🇺🇸 eng-US → Enlish of United States
🇮🇹 ita-IT → Italian from Italy
🇪🇸 spa-ES → Spanish from Spain
🇩🇪 deu-DE → German from Germany

Supported features

Here is a list of all the features currently supported by the engine:

Feature name	Description
Grammars	Ability to compile / load BNF formatted grammars
Dynamic Content	A.k.a. Dynamic slots. Ability to specify in a grammar a rule where its values will be given later
Custom phonetic	Ability to specify any phonetic for a given word or expression in a grammar
Custom phonetic in dynamic content	Ability to specify 1 or more custom phonetic with the values of a particular slot
Tag annotations	Ability to specify tags in the grammar used to easily retreive information from it
Intermediate result	Ability to return non-final results while the user is still speaking
VAD (Voice Activity Detection)	Ability to automatically detect when a user speak
Confidence score	Ability to return a confidence score with the result
Event detection	Ability to send feedback when speech / silence is detected

Engine configuration

Like any other engine in VSDK, this one also have its own configuration that needs to be provided as a JSON file when instantiating the engine.
Here is a sample of the configuration for VASR:

JSON

{
  "version": "2.0",
  "vasr": {
    "paths": {
      "data_root": "../data"
    },  
    "asr": {
      "recognizers": {
        "rec": {
          "acmods": ["eng-US.vam"]
        }
      },  
      "models": {
        "cmd": {
          "type": "static",
          "file": "eng-US.vgg"
        }
      }   
    }   
  }
}

This configuration represent the minimum required in order to operate the engine. You can see the complete configuration with the description of every field Configuration file .

Required resources

In order to correctly run the engine require at least 2 files: an acoustic model and a compiled grammar.

The acoustic model file ends with a .vam extension (Vivoka Acoustic Model) and the compiled grammar ends with .vgg (Vivoka Grammar Graph).

Sample code

This sample code assume that the content of the vsdk.json configuration file is similar to the one above.

CPP

#include <vasr/Engine.hpp>
#include <vasr/GrammarComposer.hpp>
#include <vasr/LanguageModel.hpp>
#include <vasr/Recognizer.hpp>
 
std::vector<float> audioData();
 
int main() try
{
    Vasr::Engine engine("vsdk.json");
 
    auto & rec = engine.recognizer("rec"); // String taken from the vsdk.json file at vasr/asr/recognizers
    auto & grm = engine.grammarComposer("cmd"); // String taken from the vsdk.json file at vasr/asr/models
    if (grm.hasSlots())
    {
        // Fill slots
    }
 
    auto model = grm.compose();
    recognizer.setModels({ model });
 
    recogniser.installResultCallback([](Vasr::AsrResult result)
    {
        if (result.isFinal())
     	{
     	    // Process final result
     	}
        else
        {
            // Process intermediate result
        }
    });
 
    recognizer.processAudioBuffer(audioData(), true);
}
catch (std::exception const & e)
{
    fmt::print(stderr, "A fatal error occured:\n");
    Vsdk::printExceptionStack(e);
    return EXIT_FAILURE;
}