Compiled grammar
When you use the VASR (VASR - C++ ) you need to have 1 or more compiled grammar files (.vgg
files) to provide to the engine. This page explain what exacly is inside those files.
First of all the .vgg
file is in reality an encrypted zip archive which contains multiple files and here is a list of all of them:
Filename | Description |
---|---|
| AST representation of the grammar. Mainly used for tag retreival |
| Main configuration file for this acoustic model. A complete description of its content is given later in this page |
| Lexicon in FST format (a.k.a. l.fst). Contains every word written in a grammar with its associated pronunciation |
| Grammar in FST format (a.k.a. g.fst). Contains the structure of the different rules in the grammar |
| Symbol table which contains every phoneme and disambiguation symbols the ASR needs to understand |
| Symbol tables which contains the list of all words contained in the grammar |
| (Optional) Pre-build H.fst that will be composed with LG during runtime. Will be created by the engine if not provided |
It is important to note that with the exception of config.json
the actual name of those files can differ as they don't matter for the engine. They are written in the config.json file.
Config.json content
{
"dynamic_lexicon_attributes": {
"add_self_loops": true,
"loop_state": 1,
"no_sil_score": 0.693147182,
"sil_score": 0.693147182,
"sil_state": 2
},
"files": {
"ast": "ast.json",
"g_fst": "root.fst",
"l_fst": "L_disambig.fst",
"phoneme_table": "tokens.txt",
"word_table": "words.txt"
},
"metadata": {
"acmod_id": "vasr-eng-US-t7l2-1.0",
"language": "eng-US",
"version": 1
}
}