Skip to main content
Skip table of contents

Acoustic model file

When you use the VASR (VASR - C++) you need to have 1 or more acoustic model (.vam files) to provide to the engine. This page explain what exacly is inside those files.
First of all the .vam file is in reality an encrypted zip archive which contains multiple files and here is a list of all of them:




Main configuration file for this acoustic model. A complete description of its content is given later in this page


G2P configuration file. A complete description of its content is given later in this page


ONNX decoder of G2P used when converting a word (or set of words) into their phonetic representation


ONNX encoder of G2P used when converting a word (or set of words) into their phonetic representation


Represent a list of all the graphemes supported by this G2P model


Represent a list of all the phonemes supported by this G2P model


CSV files with 2 columns: an index and its associated phoneme. Used to convert from one phonetic alphabet to another


CSV files with 2 columns: an index and its associated phoneme. Used to convert from one phonetic alphabet to another


CSV files with 2 columns: an index and its associated phoneme. Used to convert from one phonetic alphabet to another


VAD (Voice Activity Detection) ONNX model. Mainly used in the ASR to detect the beginning and end of speech for a given utterance


Confidence model used by the engine to compute the confidence score of the word of the utterance


Acoustic model it-self. Receive audio features as input and output an encoded nnet result


The list of phonemes specific to the acoustic model engine (used by the VASR-compiler)

It is important to note that with the exception of config.json the actual name of those files can differ as they don't matter for the engine. They are written in the config.json file.

Config.json content

    "feature_extraction_attributes": {
        "sample_rate": 16000,
        "dither": 0.0,
        "snip_edges": false,
        "num_mel_bins": 80,
        "vtln_wrap": 1.0
    "decoding_attributes": {
        "search_beam": 20.0,
        "output_beam": 5.0,
        "min_activate_states": 30,
        "max_activate_states": 10000,
        "sub_sampling_factor": 3,
        "num_detailed_nbest": 10
    "online_decoding_attributes": {
        "cache_size": 2,
        "hidden_layer_size": 500,
        "num_lstms": 2,
        "num_tdnns": 7,
        "feat_frame_size": 25,
        "feat_frame_shift": 10,
        "decode_chunk_size": 75,
        "inference_chunk_size": 75,
        "accumulate_decoding": true,
        "padding_frames": 45,
        "nbest_scale": 0.5
    "grammar_compiler_attributes": {
        "sil_score": 0.693147182,
        "no_sil_score": 0.693147182,
        "add_self_loops": true
    "vad_attributes": {
        "min_speech_duration": 500,
        "min_silence_duration": 700,
        "speech_probability_threshold": 0.5,
        "window_sample_size": 1536,
        "onnx": {
            "session_options": {
                "intra_op_thread": 1
            "input": {
                "input": "input",
                "history_context": "h0",
                "cell_state": "c0"
            "output": {
                "output": "output",
                "history_context": "hn",
                "cell_state": "cn"
    "ignored_phone_id_of_alphabets": [
    "onnx_model_names": {
        "session_options": {
            "intra_op_thread": 1
        "input": {
            "feature": "feats",
            "feature_cache": "feat_cache",
            "tdnn_cache": "tdnn_cache",
            "lstm_context": "in_lstm_cntxts"
        "output": {
            "result": "posts",
            "tdnn_cache": "tdnn_out",
            "lstm_context": "out_lstm_cntxts"
    "files": {
        "accoustic_model": "pretrained.dyn.uint8-quant.onnx",
        "confidence_model": "t7l2.conf_model.txt",
        "g2p_config": "g2p_config.json",
        "vad_model": "silero_vad.onnx",
        "phoneme_table": "tokens.txt",
        "phonetic_alphabets": {
            "ipa": "ipa.csv",
            "kirshenbaum": "kirshenbaum.csv",
            "lhp": "lhp.csv"
    "metadata": {
        "version": 1,
        "model_version": 1,
        "language": "eng-US",
        "id": "vasr-eng-US-t7l2-1.0"

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.