Skip to main content
Skip table of contents

Voice Error Correction (VEC)

This page of documentation describes how to use the VEC (Voice Error Correction) module of Vivoka.

Automatic Speech Recognition (ASR) engines often struggle with alphanumeric input such as serial codes. Even modern models confuse similar-sounding letters and digits ("b" vs "d", "m" vs "n", "o" vs "0" in English for example). These errors can frustrate users and make ASR unusable for tasks where a single wrong character invalidates the entire sequence.

The VEC (Voice Error Correction) module addresses this problem by acting as a post-processor on top of the ASR system. It analyzes the ASR output, applies targeted corrections, and delivers more reliable results without requiring changes to application code. Importantly, VEC is structure-preserving: the JSON schema of the ASR result remains identical, only values are adjusted.

VEC currently supports two operating modes:

  • Alphanumeric sequences (available today, documented in this page)

  • Free-speech (planned for future release)

Because the module is language-dependent, it must be paired with the correct acoustic model and lexicon on the ASR side to function properly.

System Requirements

To function correctly, VEC depends on the following conditions:

  • Language resources
    VEC is language-dependent. It must run with the matching acoustic model and lexicon used by the ASR engine. Using mismatched resources will reduce or eliminate its effectiveness.

  • ASR output requirements

  • Runtime compatibility
    VEC runs in the same pipeline as the ASR engine and has no additional platform dependencies beyond those already required for the ASR.

  • Performance

    • Latency: typically under 10 ms per recognition result.

    • Memory: stable even with large context lists (up to several hundred thousand entries).

    • Throughput: scales linearly with ASR output; no bottleneck in real-time use.

How to enable it?

To activate VEC in your project, three things must be done:

  1. Place the VEC add-on library in the same directory as the other VSDK libraries.

  2. Update the grammar, so the system knows which parts can be corrected by VEC.

  3. Update the configuration, so the runtime knows which post-processor, models and context list (if any) to use.

Install the add-on library

VEC is delivered as a shared library. It must be placed next to the other ASR libraries (e.g. in the same lib/ or bin/ directory as the existing recognizer components).

  • If the library is missing or misplaced, the recognizer will fail to start.

  • No additional environment variables are required if the library is in the standard location.

Grammar changes

VEC does not apply globally — you must mark the grammar regions that can be corrected by VEC. This prevents commands such as help or yes from being altered.

To do this, wrap the alphanumeric span with !tag(vec, …) and limit its length with !repeat (1–7 characters, the range supported by VEC. See Limitations). Note that this range can be smaller but not larger. If you need a larger range, feel free to contact us.

Example before (without VEC):

CODE
<main>: <commands> | ["ok"] !repeat(<alphanum>, 1, 7);

Example after (VEC-enabled):

CODE
<main>: <commands> | ["ok"] !tag(vec, !repeat(<alphanum>, 1, 7));

In case you have more than 1 alphanumeric input in your grammar then you have to tag them with different name. The only important requirement regarding the name of the tag is that is starts with vec but can otherwise be anything like vec-1, vec-2 or vec-postprocess. They just have to be unique.

On top of this required change, it is highly recommended that you remove any custom pronunciation on alphanumeric sequences that you may have added in your grammar using the !pronounce directive. This is because the VEC model has been trained with the default pronunciation and adding a custom one will impact its performance.

Configuration changes

On the configuration side, add a post_processing section to the recognizer config. This tells the runtime to activate VEC and load its model. It is also possible to specify an initial context and accent if needed.

In addition to this new section in the configuration, you must also update the model settings used by the recognizer. For this set the parameter LH_SEARCH_PARAM_MAXNBEST to at least 10.

This controls how many alternative hypotheses the ASR engine produces for each recognition.

VEC uses these multiple hypotheses to improve correction accuracy — a value of 10 has shown the best performance in internal benchmarks, but you may adjust it as needed.

JSON
{
  "version": "2.0",
  "csdk": {
    "paths": {
      "data_root": "../data"
    },
    "asr": {
      "recognizers": {
        "rec": {
          "acmods": ["am_enu_vocon_car_202312090302.dat"],
          "post_processing": {
            "type": "vec",
            "model": "VEC-250901-eng-US.vec",
            "context": "len6_100k.txt",
            "accent": "eni"
          }
        }
      },
      "models": {
        "grm": {
          "type": "static",
          "file": "alphanum.fcf",
          "acmod": "am_enu_vocon_car_202312090302.dat",
          "lexicon": {
            "clc": "clc_enu_cfg3_v14_8_000000.dat"
          },
          "settings": {
            "LH_SEARCH_PARAM_MAXNBEST": 10
          }
        }
      }
    }
  }
}
  • type — must be "vec" to activate VEC.

  • model — (Optional) path to the VEC correction model (.vec). Relative path are relative to the configuration file path. If not provided VEC will work only with the context provided.

  • context — (Optional) path to a context list file containing valid sequences. Large files (hundreds of thousands of entries) are supported with minimal overhead. More on this in the Context list section.

  • accent — (Optional) default accent that will be use in the model. This value is set to unknown by default and can be changed at any point at runtime using the invoke interface.

Context list

One of VEC’s most effective features is its ability to use a context list: a predefined set of valid alphanumeric sequences.

Why Context Helps

ASR errors are often phonetically plausible but semantically invalid. By checking results against a context list, VEC can prefer corrections that yield sequences known to be valid in your application (e.g. product codes, customer IDs).

Format

  • Flat list of strings

  • Each string is a space separated alphanumeric sequence (lowercase letters and digits only)

CODE
6 i 
j y 
v 7 
0 c 0 
s h z 
8 1 8
5 j g
e c 8 8
7 5 k p 3 a
1 l k 0 z w

Size & Performance

  • Supports hundreds of thousands of entries without significant memory growth or latency impact

  • Lookup is optimized for large sets

  • Works equally well with small lists (dozens of entries)

Behavior

  • If the ASR output matches a sequence in the list → passed through unchanged

  • If the ASR output is close to one or more entries → VEC corrects towards the best candidate

  • If a non-alphanumeric value is inserted in the context then an exception will be thrown.

Changing Context / Accent at Runtime

While VEC can load a static context list from configuration, applications may also modify the context or accent at runtime using the recognizer’s post-processor invoke interface.

This makes it possible to adapt dynamically to user input, environment changes, or session-specific requirements without restarting the recognizer.

API Usage

The invoke method accepts two arguments:

C++
CPP
auto rec = engine->recognizer("rec");
auto pp  = rec->postProcessor();

pp->invoke(command, params);
Android
JAVA
Recognizer rec = com.vivoka.vsdk.asr.csdk.Engine.getInstance().getRecognizer("rec", recognizerListener);
IEntryPoint pp = rec.postProcessor();

pp.invoke(command, params);
  • command: the operation to perform (string). The full list of possible action for this addon is given in the next section. If an unknown command is given, an exception will be thrown.

  • params: a JSON object which structure depends on the command.

Supported Commands

1. set-context

Replaces the current runtime context with the provided list of sequences.

Parameters: JSON array of strings.
Return: none.

Example:

C++
CPP
pp->invoke("set-context", {{ "context", {"a 1 3", "b 1 4"} }});
Android
JAVA
JSONObject params = new JSONObject();
params.put("context", com.vivoka.vsdk.util.JsonUtils.makeJsonArray("a 1 3", "b 1 4"));
pp.invoke("set-context", params);

2. add-context

Adds new entries to the current runtime context without removing existing ones.

Parameters: JSON array of strings.
Return: none.

Example:

C++
CPP
pp->invoke("add-context", {{ "context", {"c 2 7", "x 9 9"} }});
Android
JAVA
JSONObject params = new JSONObject();
params.put("context", com.vivoka.vsdk.util.JsonUtils.makeJsonArray("c 2 7", "x 9 9"));
pp.invoke("add-context", params);

3. remove-context

Removes the specified entries from the current runtime context.

Parameters: JSON array of strings.
Return: none.

Example:

C++
CPP
pp->invoke("remove-context", {{ "context", {"a 1 3"} }});
Android
JAVA
JSONObject params = new JSONObject();
params.put("context", com.vivoka.vsdk.util.JsonUtils.makeJsonArray("a 1 3"));
pp.invoke("remove-context", params);

4. clear-context

Clears the current runtime context completely.

Parameters: unused.
Return: none.

Example:

C++
CPP
pp->invoke("clear-context", {});
Android
JAVA
pp.invoke("clear-context", new JSONObject());

5. load-context

Loads a context list from a file and replaces the current runtime context.

Parameters: JSON string with the path to the file. Path to the file is relative to the location of the configuration file.
Return: none.

Example:

C++
CPP
pp->invoke("load-context", {{ "file", "context.txt" }});
Android
JAVA
JSONObject params = new JSONObject();
params.put("file", "context.txt");
pp.invoke("load-context", params);

6. set-accent

Sets the user’s accent at runtime.

Parameters: JSON string with the accent name. Use an empty string if unknown.
Return: none.

Example:

C++
CPP
pp->invoke("set-accent", {{ "accent", "eng" }});
pp->invoke("set-accent", {{ "accent", "" }});  // reset to unknown
Android
JAVA
JSONObject params = new JSONObject();
params.put("accent", "eng");
pp.invoke("set-accent", params);

params.put("accent", ""); // reset to unknown
pp.invoke("set-accent", params);

7. get-accent

Retrieves the currently configured accent.

Parameters: unused.
Return: JSON string with the accent name, or empty string if unknown.

Example:

C++
CPP
auto const accent = pp->invoke("get-accent", {})["accent"].get<std::string>();
Android
JAVA
JSONObject result = pp.invoke("get-accent", new JSONObject());
result.getString("accent");

8. list-accent

Retrieves available accents for this model.

Parameters: unused.
Return: JSON array of strings.

Example:

C++
CPP
auto const availableAccents = pp->invoke("list-accent", {})["accents"];
Android
JAVA
JSONObject result = pp.invoke("list-accent", new JSONObject());
result.getString("accents");

9. version

Retrieves the version of the VEC module.

Parameters: unused.
Return: JSON string with the version in the form of X.Y.Z.

Example:

C++
CPP
auto const version = pp->invoke("version", {})["version"].get<std::string>();
Android
JAVA
JSONObject result = pp.invoke("version", new JSONObject());
result.getString("version");

Behavior Notes

  • All operations are lightweight; context updates and accent changes take effect immediately.

  • Changes remain until explicitly overridden or until the recognizer is destroyed.

Limitations

VEC is powerful but has clear boundaries:

  • Sequence length: Only supports 1–7 alphanumeric characters. Longer sequences are not handled. If you need to go beyond this limit, feel free to contact us.

  • Letter case: you have to use lowercase letter, never uppercase one. We forced this because the underlying engine may only understand specific pronunciation when uppercase letter are used. For example in English you may have to say ‘Capital A’ and not just ‘A' or for French you may have to say 'A majuscule’. To avoid any misconfiguration VEC only accepts lowercase letter.

  • Mode availability: Currently limited to alphanumeric mode. Free-speech mode is planned but not yet available.

  • Language dependence: Requires the correct acoustic and lexicon. Using mismatched resources reduces or eliminates accuracy gains.

  • Error types: Optimized for typical letter/digit confusions. It does not correct arbitrary word errors.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.