Skip to main content
Skip table of contents

Acoustic model file

When you use the VASR (VASR - C++) you need to have 1 or more acoustic model (.vam files) to provide to the engine. This page explain what exacly is inside those files.
First of all the .vam file is in reality an encrypted zip archive which contains multiple files and here is a list of all of them:

Filename

Description

config.json

Main configuration file for this acoustic model. A complete description of its content is given later in this page

g2p_config.json

G2P configuration file. A complete description of its content is given later in this page

g2p_decoder.onnx

ONNX decoder of G2P used when converting a word (or set of words) into their phonetic representation

g2p_encoder.onnx

ONNX encoder of G2P used when converting a word (or set of words) into their phonetic representation

g2p_graphemes.txt

Represent a list of all the graphemes supported by this G2P model

g2p_phones.txt

Represent a list of all the phonemes supported by this G2P model

ipa.csv

CSV files with 2 columns: an index and its associated phoneme. Used to convert from one phonetic alphabet to another

kirshenbaum.csv

CSV files with 2 columns: an index and its associated phoneme. Used to convert from one phonetic alphabet to another

lhp.csv

CSV files with 2 columns: an index and its associated phoneme. Used to convert from one phonetic alphabet to another

silero_vad.onnx

VAD (Voice Activity Detection) ONNX model. Mainly used in the ASR to detect the beginning and end of speech for a given utterance

t7l2.conf_model.txt

Confidence model used by the engine to compute the confidence score of the word of the utterance

pretrained.dyn.uint8-quant.onnx

Acoustic model it-self. Receive audio features as input and output an encoded nnet result

tokens.txt

The list of phonemes specific to the acoustic model engine (used by the VASR-compiler)

It is important to note that with the exception of config.json the actual name of those files can differ as they don't matter for the engine. They are written in the config.json file.

Config.json content

JSON
{
    "feature_extraction_attributes": {
        "sample_rate": 16000,
        "dither": 0.0,
        "snip_edges": false,
        "num_mel_bins": 80,
        "vtln_wrap": 1.0
    },
    "decoding_attributes": {
        "search_beam": 20.0,
        "output_beam": 5.0,
        "min_activate_states": 30,
        "max_activate_states": 10000,
        "sub_sampling_factor": 3,
        "num_detailed_nbest": 10
    },
    "online_decoding_attributes": {
        "cache_size": 2,
        "hidden_layer_size": 500,
        "num_lstms": 2,
        "num_tdnns": 7,
        "feat_frame_size": 25,
        "feat_frame_shift": 10,
        "decode_chunk_size": 75,
        "inference_chunk_size": 75,
        "accumulate_decoding": true,
        "padding_frames": 45,
        "nbest_scale": 0.5
    },
    "grammar_compiler_attributes": {
        "sil_score": 0.693147182,
        "no_sil_score": 0.693147182,
        "add_self_loops": true
    },
    "vad_attributes": {
        "min_speech_duration": 500,
        "min_silence_duration": 700,
        "speech_probability_threshold": 0.5,
        "window_sample_size": 1536,
        "onnx": {
            "session_options": {
                "intra_op_thread": 1
            },
            "input": {
                "input": "input",
                "history_context": "h0",
                "cell_state": "c0"
            },
            "output": {
                "output": "output",
                "history_context": "hn",
                "cell_state": "cn"
            }
        }
    },
    "ignored_phone_id_of_alphabets": [
        100,
        103
    ],
    "onnx_model_names": {
        "session_options": {
            "intra_op_thread": 1
        },
        "input": {
            "feature": "feats",
            "feature_cache": "feat_cache",
            "tdnn_cache": "tdnn_cache",
            "lstm_context": "in_lstm_cntxts"
        },
        "output": {
            "result": "posts",
            "tdnn_cache": "tdnn_out",
            "lstm_context": "out_lstm_cntxts"
        }
    },
    "files": {
        "accoustic_model": "pretrained.dyn.uint8-quant.onnx",
        "confidence_model": "t7l2.conf_model.txt",
        "g2p_config": "g2p_config.json",
        "vad_model": "silero_vad.onnx",
        "phoneme_table": "tokens.txt",
        "phonetic_alphabets": {
            "ipa": "ipa.csv",
            "kirshenbaum": "kirshenbaum.csv",
            "lhp": "lhp.csv"
        }
    },
    "metadata": {
        "version": 1,
        "model_version": 1,
        "language": "eng-US",
        "id": "vasr-eng-US-t7l2-1.0"
    }
}

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.