How-to: Create your grammar
Overview
This document describes how to write a grammar which is used in voice recognition to specify the set of commands a voice recognition engine will be able to recognize.
Such texts follow a syntax called BNF+
which is based on the Backus-Naur form
(also called Backus normal form
).
Use case example
The following is a working example of what a grammar can look like.
That grammar uses some more advanced features that will be explained later on.
#BNF+EM V2.1;
!grammar Home_Automation;
!start <main>;
// Knixhult is a swedish lamp name pronounced as "nixult"
!pronounce "knixhult" PRONAS "nixult";
<main>: <...> <command> !repeat([<repeat_word>] <command>, 0, *) [<politeness>];
<command>: <verb> <device>;
<repeat_word>: and | "along with" | as well as;
<politeness>: please | if you may;
<verb>: turn (on | off | down | up) | raise | lower | open | close;
<device>: [<determiner>] (lamp | lamps | knixhult | oven | blinds);
<determiner>: a | an | the | all [the];
Using this grammar, you can recognize a lot of sentences among which:
Turn on all lamps
Lower the blinds and turn on the lamp
Could you close the oven please
Turn up the knixhult
Basic grammar concepts
Header
In the use case example, you can see that the first line is #BNF+EM V2.1;
. It is called the grammar’s header and needs to be the first thing you write as it will describe the format and its version in use in the document.
The header must be the first line and cannot be preceded by any character of any sort
The body of a grammar consists of a collection of statements (mainly rule statements) always ending with a semicolon ;
.
Grammar statement
Example:
!grammar Home_Automation;
A grammar statement is used to specify the grammar’s name and corresponds to Home_Automation
in our example. It always starts with !grammar
. Keywords starting with an exclamation mark !
are called directives and have a special meaning and role in writing a grammar. Other directives exist and will be shown later on.
Every grammar file must contain a grammar statement
Rule statement
Example:
<verb>: turn (on | off | down | up) | raise | lower | open | close;
A rule statement describes the actual content to be recognized (i.e: words). It is composed of the rule’s name enclosed in <>
and followed by a colon :
and a combination of terminals and non terminals which are explained later. In our example, several rules are declared such as main
, verb
, command
, device
, …
A rule statement cannot be cyclic (call itself), either directly, or indirectly
Terminal
Example:
and | "along with" | as well as
In the previous rule statement’s example, on the right hand side of the colon :
, there are terminals. Those are words or group of words that will need to be spoken in order for the rule verb
to be recognized by the engine (we will talk later about |
, ()
, and other special characters).
Terminals can be either written as is, or enclosed in double quotes "
in order to group several words (can have some impact on performance) or when you have a special character.
Non terminal
Example:
<verb> <device>;
Instead of using terminals to fill your rule statement, you can also use non terminals. Those are enclosed in <>
and represent the content of the corresponding rule statement with the same name.
For example, <command>: <verb> <device>;
is a rule statement with the name command
, while verb
and device
are non terminals referencing other rule statements with the names verb
and device
declared elsewhere in the grammar. It means that in order for the engine to recognize the rule command
, we need to recognize the rule verb
and then the rule device
.
It is generally a good idea to factorize terminals with the same meaning in a dedicated non terminal rule to organize your grammar better (see the device
or verb
rule statement from the use case example)
There are three reserved non terminals that have a special meaning:
<VOID>
: This rule will not match anything. Inserting<VOID>
into a sequence of grammar symbols automatically makes that sequence unspeakable.<NULL>
: This rule is automatically matched with an empty result. The optional symbol[]
(explained later) uses in fact this rule internally:<command>: I want [a] (dollar | pennies);
is similar to<command>: I want (a | <NULL>) (dollar |pennies);
.<...>
: This is the any speech rule. It will absorb any speech:<command>: <...> turn on the light <...>;
This rule will match for example “can you turn on the light please”. Even though it can match anything, it is required to match at least one result, so the sentence “Can you please turn on the light” won’t get recognized because the last one did not have anything to match. To solve this problem, you can make this rule optional by surrounding it with[]
.
The <NULL>
and the <VOID>
rules are almost never used, but the <...>
one has many uses. Even though you can think of adding this rule almost everywhere to be resilient to what the user would say, keep in mind that it will impact performances and results confidence a lot.
Special characters
|
The pipe creates an alternative between its left and right hand sideplease | if you may
will recognize eitherplease
orif you may
()
Parentheses will group terminals and/or non terminals together so they can’t be separatedturn (on | off | down | up)
applies the|
only onon
,off
,down
andup
. The recognition possibilities will then beturn on
,turn off
,turn down
orturn up
[]
Square brackets will turn its content into optional items[<determiner>] (lamp | lamps | knixhult | oven | blinds)
allows the engine to recognize a device either with or without determiner before it
[]
is a shorter alias for the directive !optional(...)
A grammar cannot have empty paths. It means that all rules must have lead to at least one terminal.
For example, the following grammar is not valid:!start <main>;
<main>: [<singular>] | [<plural>];
Even though singular
and plural
can be non empty, because both of them are optional, the rule main
can be composed of nothing and thus is invalid.
Start statement
Example:
!start <main>;
The second statement of our use case example is the start statement and represents the entry point of the recognition.
Every grammar must contain a start statement but is not limited to one. You can have as many start rules as you want by chaining them like this:
!start <rule_1> <rule_2> <rule_3> ...;
A rule that isn’t a start rule or does not belong to one indirectly would never be recognized by the engine
Advanced features
Earlier, we have talked about directives. Some more advanced one also exist and will allow you to customize your grammar more in depth.
Slot
Example:
!slot <contact>;
Writing down all recognizable words and sentences you want your engine to recognize is nice, but what if you don’t know everything at the time of writing that grammar ?
The slot directive will allow you to mark a rule as “to be determined later” so you don’t have to implement it anywhere in the actual grammar. You can see slot rules as “holes” that will be filled later on, programmatically.
You can find more details on how to program the use of slots by looking at the dynamic_grammar
sample program in reading the following section: Voice Recognition - C++ | Dynamic Models
Slots will only allow terminals and custom pronunciations when replaced programmatically. You can fill a slot with multiple terminals. It will be interpreted as alternatives.
Slot statements may contain one rule name but are not limited to it. Several rules can be specified per statement:
!slot <rule_1> <rule_2> <rule_3> ...;
Pronunciation (phonetic)
Example (statement):
!pronounce "knixhult" PRONAS "nixult";
Example (directive):<device>: lamp | oven | !pronounce(PRONAS "nixult") knixhult | blinds;
Before being able to recognize a word, the engine will have to be aware of how each word is pronounced. To do that, it uses a grapheme to phoneme (G2P) algorithm to know which phonemes (smallest unit of sound) compose those words.
It can sometimes happen that the automatic transcription gives a wrong result (different accent in a language, custom unknown word, etc…).
Fortunately, to help the engine, you can tell it how words are pronounced.
You have two different ways of doing it:
Using a pronounce statement: You will affect that word globally in the grammar (every occurrences)
The format is the following:!pronounce "my_word" [alphabet] "my_transcription"
.Using a pronounce directive: You will affect that word locally only, for this unique instance.
The format is the following:"my_word" !pronounce([alphabet] "my_transcription")
.
All supported alphabets for every supported languages can be found here. If no alphabet is specified, the kirshenbaum (ascii-ipa) will be selected.
Instead of an alphabet, you can also choose PRONAS
which indicates that the word is pronounced like the one written after that keyword (no phonetic involved).
When a local pronounce directive is used on words, it overrides the global one if exists
You can give for a same word several pronunciations by chaining them like such:!pronounce "word" PRONAS "pronunciation_1" | PRONAS "pronunciation_2";
<rule_name>: word !pronounce(PRONAS "pronunciation_1" | PRONAS "pronunciation_2");
Repeat
Example:
<command>: <action> !repeat( "and" <action>, 0, *);
Sometimes, you want to be able to say things several times in a row. It would be really annoying having to write them four times if we want to be able to say four times. Instead, you can use the !repeat
directive.
Its format is the following:!repeat(X, <min>, <max>)
The first parameter X
represents any expression that you could have used in a normal way in a rule (even nested repeat).
The second parameter <min>
is the minimum number of time you can repeat X
. It must be zero or more.
The third and last parameter <max>
is the maximum number of time you can repeat X
. It must be one or more. You can also set it to a special value *
which means an infinite number of times.
You can shorten the repeat directive for two special cases: !repeat(X, 0, *)
and !repeat(X, 1, *)
.
The first one can be replaced by X*
and the second one by X+
Tag
Example:
<command>: !tag(ACTION_ON, turn on | activate | light) <device>;
Processing voice recognition results can sometimes be a little bit messy. Let’s say you want to do some home automation assistant and when you say turn on the light
, you want to run a script that sends a command to the device to turn it on.
Now let’s say you want to have different sentences to do this task, you will have to check for every one of those sentences for in the end do the same process.
Let’s go a bit further and imagine we want to have an assistant that understands multiple languages. You now have to process those same sentences but for several languages.
Instead of having to check for every possibilities, you can use this tagging directive to tell the engine to output that specific tag ACTION_ON
whenever one of the following verb is emitted. You can also have your other languages grammars do the same with the same tag name, and congratulation, you now have only one thing to check in your process: the ACTION_ON
tag !
This grammar feature is really useful as it simulates NLU processing but on a grammar instead of a text.
Grammar syntax formalism
The following documents describe the exact syntax of a BNF+ grammar that is in used in the VDK Studio.
The lexer sets all usable characters and categorize them to be used by the parser.
The parser sets the rules of appearance of the lexer’s tokens.