Skip to main content
Skip table of contents

VASR - C++

VASR if the name of Vivoka's own ASR engine. It is provided through the VSDK framework under the designation VSDK-VASR.
It is designed to run completly offline and on small devices (like Raspberry Pi 3 / 4)

Supported languages

Currently 5 languages are supported:

  • 🇫🇷 fra-FR → French of France

  • 🇺🇸 eng-US → Enlish of United States

  • 🇮🇹 ita-IT → Italian from Italy

  • 🇪🇸 spa-ES → Spanish from Spain

  • 🇩🇪 deu-DE → German from Germany

Supported features

Here is a list of all the features currently supported by the engine:

Feature name

Description

Grammars

Ability to compile / load BNF formatted grammars

Dynamic Content

A.k.a. Dynamic slots. Ability to specify in a grammar a rule where its values will be given later

Custom phonetic

Ability to specify any phonetic for a given word or expression in a grammar

Custom phonetic in dynamic content

Ability to specify 1 or more custom phonetic with the values of a particular slot

Tag annotations

Ability to specify tags in the grammar used to easily retreive information from it

Intermediate result

Ability to return non-final results while the user is still speaking

VAD (Voice Activity Detection)

Ability to automatically detect when a user speak

Confidence score

Ability to return a confidence score with the result

Event detection

Ability to send feedback when speech / silence is detected

Engine configuration

Like any other engine in VSDK, this one also have its own configuration that needs to be provided as a JSON file when instantiating the engine.
Here is a sample of the configuration for VASR:

JSON
{
  "version": "2.0",
  "vasr": {
    "paths": {
      "data_root": "../data"
    },  
    "asr": {
      "recognizers": {
        "rec": {
          "acmods": ["eng-US.vam"]
        }
      },  
      "models": {
        "cmd": {
          "type": "static",
          "file": "eng-US.vgg"
        }
      }   
    }   
  }
}

This configuration represent the minimum required in order to operate the engine. You can see the complete configuration with the description of every field Configuration file .

Required resources

In order to correctly run the engine require at least 2 files: an acoustic model and a compiled grammar.

The acoustic model file ends with a .vam extension (Vivoka Acoustic Model) and the compiled grammar ends with .vgg (Vivoka Grammar Graph).

Sample code

This sample code assume that the content of the vsdk.json configuration file is similar to the one above.

CPP
#include <vasr/Engine.hpp>
#include <vasr/GrammarComposer.hpp>
#include <vasr/LanguageModel.hpp>
#include <vasr/Recognizer.hpp>
 
std::vector<float> audioData();
 
int main() try
{
    Vasr::Engine engine("vsdk.json");
 
    auto & rec = engine.recognizer("rec"); // String taken from the vsdk.json file at vasr/asr/recognizers
    auto & grm = engine.grammarComposer("cmd"); // String taken from the vsdk.json file at vasr/asr/models
    if (grm.hasSlots())
    {
        // Fill slots
    }
 
    auto model = grm.compose();
    recognizer.setModels({ model });
 
    recogniser.installResultCallback([](Vasr::AsrResult result)
    {
        if (result.isFinal())
     	{
     	    // Process final result
     	}
        else
        {
            // Process intermediate result
        }
    });
 
    recognizer.processAudioBuffer(audioData(), true);
}
catch (std::exception const & e)
{
    fmt::print(stderr, "A fatal error occured:\n");
    Vsdk::printExceptionStack(e);
    return EXIT_FAILURE;
}

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.