Skip to main content
Skip table of contents

Voice Recognition - Android

VDK features two different ASR SDKs: vsdk-csdk and vsdk-vasr.

Basics

You will need to manipulate 2 concepts: Recognizers & Models. Both need to be configured but first let's explain who's who.

Models are fed to the Recognizer and describe the range of words and utterances that can be recognized. They will either be pre-compiled depending on the SDK (like “free speech” models), or compiled from a grammar that you've written beforehand in the VDK Studio.

There are 3 types of models:

Type

Description

static

Static models embed all possible vocabulary inside a single file or folder.

dynamic

Dynamic models have “holes” where you can plug new vocabulary at runtime. These need to be prepared and compiled at runtime before installing it on a recognizer.

free-speech

Free-Speech models are very large vocabulary static models. They often require additional files and are not supported by all engines.

Configuration

Each engine has its own configuration quirks and tweaks, but here is a common (though incomplete) pattern using vsdk-csdk, which supports all 3 types of models:

JSON
{
    "version": "2.0",
    "csdk": {
        "paths": {
            "data_root": "../data"
        },
        "asr": {
            "recognizers": {
                "rec": { ... }
            },
            "models": {
                "static_example": {
                    "type": "static",
                    "file": "<model_name>.fcf"
                },
                "dynamic_example": {
                    "type": "dynamic",
                    "file": "<base_model_name>.fcf",
                    "slots": {
                    "firstname": { ... },
                    "lastname": { ... }
                },
                ...
                },
                "free-speech_example": {
                    "type": "free-speech",
                    "file": "<base_model_name>.fcf",
                    "extra_models": { ... }
                }
            }
        }
    }
}

Starting the engine

JAVA
com.vivoka.vsdk.Vsdk.init(mContext, "config/main.json", vsdkSuccess -> {
    if (vsdkSuccess)
    {
        com.vivoka.csdk.asr.Engine.getInstance().init(mContext, engineSuccess -> {
            if (engineSuccess)
            {
                // at this point the AsrEngine has been correctly initialized
            }
        });
    }
});

You can't create two separate instances of the same engine! Attempting to create a second one will get you another pointer to the existing engine.

Each engine has its own configuration document, check it out for further details, as well as the ASR samples to get started with actual, production-ready code.

Creating a Recognizer

JAVA
// First you have to create a recognizer listener to subscribe to the recognizer events
IRecognizerListener recognizerListener = new IRecognizerListener()
{
    @Override
    public void onEvent(RecognizerEventCode eventCode, int timeMarker, String message) {}

    @Override
    public void onResult(String result, RecognizerResultType resultType, boolean isFinal) {}

    @Override
    public void onError(RecognizerErrorCode error, String message) {}

    @Override
    public void onWarning(RecognizerErrorCode error, String message) {}
};

// Then you can create a recognizer
recognizer = Engine.getInstance().makeRecognizer("rec", recognizerListener);

And finally, apply a model to actually recognize vocabulary:

JAVA
recognizer.setModel("static_example"); // same call whether the model is static, dynamic or free- speech!

Also, don't forget to insert it in the pipeline or nothing's going to happen by itself:

JAVA
pipeline.pushBackConsumer(recognizer);

Dynamic Models

Only dynamic models need to be manipulated explicitely to add the missing data at runtime:

JAVA
DynamicModel model = Engine.getInstance().getDynamicModel("dynamic_example");
model.addData("firstname", "André");
model.addData("lastname", "Lemoine");
model.compile();

// We can now apply it to a recognizer!
recognizer.setModel("dynamic_example");

Releasing the engine

Some resources are not handled by the JVM and needs to be released when your app is being destroyed.

CODE
  // Stop the Audio Pipeline before releasing the resources
  pipeline.stop();
  // Destroy all resources unhandled by the JVM
  Engine.getInstance().destroy();

You can now init the Engine again when restarting your application.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.