Configuration file
The table below describe the VSDK configuration file located at config/vsdk.json
:
Field | Description/Notes | Optional | Default Value | Type | Possible Values |
version | Version of the whole document | ✘ | String | 2.0 | |
csdk | ✘ | Object | |||
csdk/paths | ✔ | Object | |||
csdk/paths/cache | Absolute or relative to vsdk.json | ✔ | cache | Path | |
csdk/paths/data_root | Absolute or relative to vsdk.json | ✔ | . | Path | |
csdk/paths/acmod | Absolute or relative to data_root | ✔ | acmod | Path | |
csdk/paths/asr | Absolute or relative to data_root | ✔ | asr | Path | |
csdk/paths/clc | Absolute or relative to data_root | ✔ | clc | Path | |
csdk/paths/clc_ruleset | Absolute or relative to data_root | ✔ | clc | Path | |
csdk/paths/dictionary | Absolute or relative to data_root | ✔ | dictionaries | Path | |
csdk/paths/search | Absolute or relative to data_root | ✔ | ctx | Path | |
csdk/paths/sem3 | Absolute or relative to data_root | ✔ | ctx | Path | |
csdk/paths/users | Absolute or relative to data_root | ✔ | users | Path | |
csdk/paths/audio_based_classifier_model | Absolute or relative to data_root | ✔ | abc | Path | |
csdk/paths/confusion_dictionary | Absolute or relative to data_root | ✔ | dictionaries | Path | |
csdk/paths/language_model | Absolute or relative to data_root | ✔ | lm | Path | |
csdk/tts | ✘ | Object | |||
csdk/tts/channels | ✘ | Object | |||
csdk/tts/channels/<channel_name_1> | Name of the channel, used in code | ✘ | String | ||
csdk/tts/channels/<channel_name_1>/voices | ✘ | Array | |||
csdk/tts/channels/<channel_name_1>/voices/0 | ✘ | String | <speaker>,<lang>,<quality> | ||
csdk/asr | ✘ | Object | |||
csdk/asr/recognizers | ✘ | Object | |||
csdk/asr/recognizers/<recognizer_name_1> | Name of the recognizer, used in code | ✘ | String | ||
csdk/asr/recognizers/<recognizer_name_1>/acmods | Recognizers accept multiple acoustic models | ✘ | Array | ||
csdk/asr/recognizers/<recognizer_name_1>/acmods/0 | ✘ | String | |||
csdk/asr/models | ✘ | Object | |||
csdk/asr/models/<model_name_1> | Name of the model, used in code | ✘ | String | ||
csdk/asr/models/<model_name_1>/type | ✘ | String | static, dynamic, | ||
csdk/asr/models/<model_name_1>/file | Compiled model file name, extension is .fcf | ✘ | File | ||
csdk/asr/models/<model_name_1>/sem3 | Compiled semantic model file name, extension is .s3c | ✔ | "" | String | |
csdk/asr/models/<model_name_1>/settings | ✔ | Object | |||
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_STREAM_RESULT_MODE | The mode in which intermediate results are displayed during recognition. 1 means partial result are activated | ✔ | 0 | Int | 0, 1 |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_ACCURACY | Trade-off between CPU- load, memory requirements and the obtained accuracy of the search | ✔ | 10000 | Int | [100 ; 50000] |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_MAXNBEST | Maximum number of hypotheses returned in a result | ✔ | 3 | Int | [1 ; 1000] |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_TSILENCE | Minimum amount of trailing silence, in milliseconds. Use a higher value for non- WUW models | ✔ | 100 | Int | [100 ; 10000] |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_IG_LOWCONF | Maximum amount of confidence level that indicates that a spoken utterance is out of grammar | ✔ | 5000 | Int | [0 ; 10000] |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_IG_HIGHCONF | Minimum amount of confidence level that indicates that a spoken utterance is in grammar | ✔ | 5000 | Int | [0 ; 10000] |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_INITBEAMWIDTH | Init beam width. This parameter affects low- level behavior of the algorithm | ✔ | 2500 | Int | [0 ; 10000] |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_TANYSPEECH | Allows the recognizer to stop the recognition process during the trailing AnySpeech state | ✔ | LH_FALSE | String | LH_TRUE, LH_FALSE |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_NBESTRESULT_SETHIDDENKEYS | When enabled additional information is included on the ASR result that can be used for the FM use case | ✔ | LH_FALSE | String | LH_TRUE, LH_FALSE |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_ONDEMANDLOADING | Context on-demand loading | ✔ | LH_FALSE | Int | LH_TRUE, LH_FALSE |
csdk/asr/models/<model_name_1>/settings/LH_SEARCH_PARAM_SPEECH_TIMEOUT | Speech duration timeout in milliseconds | ✔ | 0 | Int | 0, [100 ; 60000] |
csdk/asr/models/<model_name_1>/acmod | ∗Only for dynamic models. Must match with the one configured on the recognizer that will apply this model | ✘∗ | String | ||
csdk/asr/models/<model_name_1>/slots | ∗Only for dynamic models. | ✘∗ | Object | ||
csdk/asr/models/<model_name_1>/slots/<slot_name_1> | Name of the slot, used in the code | ✘ | Object | ||
csdk/asr/models/<model_name_1>/slots/<slot_name_1>/values | Values of the given slot | ✘ | Array | ||
csdk/asr/models/<model_name_1>/slots/<slot_name_1>/category | ✔ | normal | String | normal, name, artist | |
csdk/asr/models/<model_name_1>/slots/<slot_name_1>/allow_custom_phonetic | Setting to true will allow for custom phonetics to be provided for this slot | ✔ | false | Bool | |
csdk/asr/models/<model_name_1>/lexicon | ∗Only for dynamic models. | ✘∗ | String | ||
csdk/asr/models/<model_name_1>/lexicon/clc | Used during runtime compilation. Use a language that match the rest of the grammar and the recognizer this model will be applied on | ✘ | File | ||
csdk/asr/models/<model_name_1>/lexicon/settings | ✔ | Object | |||
csdk/asr/models/<model_name_1>/extra_models | ∗Only for free-speech models. All models for a given language must be listed or the program won't function properly | ✘∗ | Object | ||
csdk/asr/models/<model_name_1>/extra_models/<name> | ✘ | File | |||
vasr | ✘ | Object | |||
vasr/paths | ✔ | Object | |||
vasr/paths/data_root | Absolute or relative to vsdk.json | ✔ | . | Path | |
vasr/paths/acmod | Absolute or relative to data_root | ✔ | acmod | Path | |
vasr/paths/graph | Absolute or relative to data_root | ✔ | graph | Path | |
vasr/paths/log | Directory where log will be placed. If relative, it will be based on the data_root path. | ✔ | log | String | |
vasr/log | ✔ | Object | |||
vasr/log/<logger_name> | ✔ | Object | *, perf | ||
vasr/log/<logger_name>/level | Level of the debugging information printed | ✔ | String | info, debug | |
vasr/log/<name>/sink | Set the destination of log messages for this logger. | ✔ | stdout-color | String | stdout, stdout-color, stderr, stderr-color, file |
vasr/log/<name>/sink_options | Contains all options specific to the type of sink. | ✔ | N/A | Object | |
vasr/log/<name>/sink_options/path | (For file sink type only) Set the output file path. If relative, it will be based on the paths/log path. | ✔ | String | ||
vasr/log/<name>/sink_options/max_size | (For file sink type only) Specify the maximum file size (in kbytes) before rotating the file. | ✔ | Int | [1'024 - inf] | |
vasr/log/<name>/sink_options/max_file | (For file sink type only) Specify the maximum file size (in kbytes) before rotating the file. | ✔ | Int | [1 - inf] | |
vasr/log/<name>/sink_options/truncate | (For file sink type only) If set to true file will be truncate / rotate (depending if the maxsize / maxfile options have been specified or not) when the engine is created. | ✔ | false | Boolean | |
vasr/asr | Object that contains all the configuration of VASR | ✘ | Object | ||
vasr/asr/recognizers | List here all recognizer that will be created during run-time | ✘ | Object | ||
vasr/asr/recognizers/<recognizer_name> | Name of the recognizer, used in the code | ✘ | Object | ||
vasr/asr/recognizers/<recognizer_name>/acmods | Array containing all acmods used | ✘ | Array | ||
vasr/asr/recognizers/<recognizer_name>/acmods/0 | Acmod file name | ✘ | String | ||
vasr/asr/recognizers/<recognizer_name>/settings | Object containing optional settings | ✔ | Object | ||
vasr/asr/recognizers/<recognizer_name>/settings/min_speech_duration | Amount of time (in milliseconds) that speech needs to be detected before triggering a recognition. Note that after a successful trigger the X ms of audio of this value will also be passed on to the recognizer. It needs to be a multiple of 100 | ✔ | 500 | Int | [100 - 10'000] |
vasr/asr/recognizers/<recognizer_name>/settings/min_silence_duration | Minimum amount of trailing silence, in milliseconds. Use a higher value for non- WUW models. It needs to be a multiple of 100 | ✔ | 700 | Int | [100 - 10'000] |
vasr/asr/recognizers/<recognizer_name>/settings/padding_size | Amount of perfect silence (in milliseconds) that will be artificially added to the audio at its beginning and its end. This improves performance at almost no cost. Rarely needs to be modified. | ✔ | Int | [0 - 1'000] | |
vasr/asr/recognizers/<recognizer_name>/settings/speech_probability_threshold | Set the sensibility of the VAD. A Lower value means more prone to false positives and higher value to false negative | ✔ | 0.5 | Float | ]0.0 - 1.0[ |
vasr/asr/recognizers/<recognizer_name>/settings/left_padding_size | Amount of perfect silence (in milliseconds) that will be artificially added to the audio at its beginning. This improves performance at almost no cost. Rarely needs to be modified. Note that this setting is only for free-speech mode. This setting takes priority over the Since VASR version 2.0.0 (VSDK-VASR version 4.0.0) | ✔ | Int | [0 - 1'000] | |
vasr/asr/recognizers/<recognizer_name>/settings/right_padding_size | Amount of perfect silence (in milliseconds) that will be artificially added to the audio at its end. This improves performance at almost no cost. Rarely needs to be modified. Note that this setting is only for free-speech mode. This setting takes priority over the Since VASR version 2.0.0 (VSDK-VASR version 4.0.0) | ✔ | Int | [0 - 1'000] | |
vasr/asr/recognizers/<recognizer_name>/settings/audio_cache_size | Specify the maximum amount of audio (in milliseconds) that a recognizer will keep in its internal buffer. The more audio it has, the higher memory consumption will be. The audio buffering is used for gapless applications. It needs to be a multiple of 100 Since VASR version 2.1.0 (VSDK-VASR version 4.2.0) | ✔ | 10'000 | Int | [0 - inf] |
vasr/asr/recognizers/<recognizer_name>/settings/invalid_start_time_strategy | If, when calling the Since VASR version 2.2.0 (VSDK-VASR version 5.2.0) | ✔ | warn_and_clamp | String | [warn_and_clamp, error_and_unset] |
vasr/asr/recognizers/<name>/settings/intermediate_result_frequency | Amount of audio (in milliseconds) that needs to be send to the engine before it returns an intermediate result. A value of 0 completely disable intermediate result. Note that this value will only be an aproximation. | ✔ | 750 | Int | 0 or [450 - 10'000] |
vasr/asr/models | ✘ | Object | |||
vasr/asr/models/<model_name_1> | Name of the model, used in the code | ✘ | String | ||
vasr/asr/models/<model_name_1>/type | ✘ | String | static, dynamic, | ||
vasr/asr/models/<model_name_1>/file | Absolute or relative to paths/models | ✘ | Path | ||
vasr/asr/models/<model_name_1>/recognizer | ∗Only for dynamic models. Absolute or relative to recognizer | ✘∗ | File | ||
vasr/asr/models/<model_name_1>/slots | ∗Only for dynamic models. | ✘∗ | Array | ||
vasr/asr/models/<model_name_1>/slots/<slot_name_1> | Name of the slot, used in the code | ✘ | String | ||
vasr/asr/models/<name>/settings/max_hypothesis | Set the number of hypothesis that will be returned by the engine for this model. | ✔ | 3 | Int | [1 - 10] |
vnlu | ✘ | Object | |||
vnlu/paths | Object that contains the different path required by the engine to load specific resources | ✔ | Object | ||
vnlu/paths/data_root | Base location where the engine will load resources from. If relative, it will be based on the configuration file base path | ✔ |
| String | |
vnlu/paths/parser | Directory where nlu models (.vum) will be loaded from. If relative, it will be based on the data_root path | ✔ | parsers | String | |
vnlu/paths/log | Directory where log will be placed. If relative, it will be based on the data_root path | ✔ | log | String | |
vnlu/log | Object that contains all the logging configuration of VASR | ✔ | Object | ||
vnlu/log/<name> | Configuration for the logger <name>. Possible logger names are [vasr (the default logger), perf (the performance logger), recognizer:<name> (the recognizer named name), model:<name> (the model named name)]. It is also possible to use wildcard as specific places to configure multiple logger at once. Here are the allowed values: [*, recognizer:*, model:*] | ✔ | vasr | Object | [vasr, perf, recognizer:<name>, model:<name>] |
vnlu/log/<name>/level | Set the minimal log level for this logger. Note that setting this will print the level specified and everything above the level e.g. setting the level to warning will still print error and critical messages. Of course warn and warning / err and error are equivalent | ✔ | info | String | [trace, debug, info, warning, warn, error, err, critical, off] |
vnlu/log/<name>/sink | Set the destination of log messages for this logger | ✔ | stdout-color | String | [stdout, stdout-color, stderr, stderr-color, file] |
vnlu/log/<name>/sink_options | Contains all options specific to the type of sink | ✔ | Object | ||
vnlu/log/<name>/sink_options/path | (For file sink type only) Set the output file path. If relative, it will be based on the paths/log path | ✔ | String | ||
vnlu/log/<name>/sink_options/max_size | (For file sink type only) Specify the maximum file size (in kbytes) before rotating the file | ✔ | Int | [1024 - inf] | |
vnlu/log/<name>/sink_options/max_file | (For file sink type only) Specify the maximum number of log file that will be kept | ✔ | Int | [1 - inf] | |
vnlu/log/<name>/sink_options/truncate | (For file sink type only) If set to true file will be truncate / rotate (depending if the maxsize / maxfile options have been specified or not) when the engine is created | ✔ | false | Boolean | |
vnlu/nlu | Object that contains all the configuration of VNLU | ✘ | Object | ||
vnlu/nlu/parsers | List here all parsers that will be created during run-time | ✘ | Object | ||
vnlu/nlu/parsers/<name> | Specify the definition of a parser. The name is completely free and will be used in the code to create the Parser object | ✘ | Object | ||
vnlu/nlu/parsers/<name>/model | Path to the nlu model file (.vum) that will be loaded by the Parser. If relative, it will be based on the paths/parser path | ✘ | String | ||
baratinoo | ✘ | Object | |||
baratinoo/paths | ✔ | Object | |||
baratinoo/paths/data_root | Absolute or relative to vsdk.json | ✔ | Path | ||
baratinoo/tts | ✘ | Object | |||
baratinoo/tts/channels | ✘ | Object | |||
baratinoo/tts/channels/<channel_name_1> | ✘ | Object | |||
baratinoo/tts/channels/<channel_name_1>/voices | ✘ | Array | |||
baratinoo/tts/channels/<channel_name_1>/voices/0 | ✘ | String | <speaker> | ||
vtapi | ✘ | Object | |||
vtapi/paths | ✔ | Object | |||
vtapi/paths/data_root | Absolute or relative to vsdk.json | ✔ | Path | ||
vtapi/tts | ✘ | Object | |||
vtapi/tts/channels | ✘ | Object | |||
vtapi/tts/channels/<channel_name_1> | ✘ | Object | |||
vtapi/tts/channels/<channel_name_1>/voices | ✘ | Array | |||
vtapi/tts/channels/<channel_name_1>/voices/0 | ✘ | String | <speaker>,<quality> | ||
tssv | ✘ | Object | |||
tssv/biometrics | ✘ | Object | |||
tssv/biometrics/generated_models_path | Absolute or relative to the program's working directory | ✘ | Path | ||
tssv/biometrics/background_model_TD | Absolute or relative to the program's working directory | ✘ | File | ||
tssv/biometrics/background_model_TI | Absolute or relative to the program's working directory | ✘ | File | ||
idvoice | ✘ | Object | |||
idvoice/biometrics | ✘ | Object | |||
idvoice/biometrics/generated_models_path | Absolute or relative to the program's working directory | ✘ | Path | ||
idvoice/biometrics/background_model_TD | Absolute or relative to the program's working directory | ✘ | File | ||
idvoice/biometrics/background_model_TI | Absolute or relative to the program's working directory | ✘ | File | ||
s2c | ✘ | Object | |||
s2c/speech_enhancement | ✘ | Object | |||
s2c/speech_enhancement/speech_enhancers | ✘ | Object | |||
s2c/speech_enhancement/speech_enhancers/<name> | Enhancer name. The name is completely free and will be used in the code to create the Enhancer object | ✘ | Object | ||
s2c/speech_enhancement/speech_enhancers/<name>/configuration | A JSON object that holds the Speech Enhancement configuration generated by vdk-studio | ✘ | Object |