How-to: Create your grammar

Overview

This document describes how to write a grammar which is used in voice recognition to specify the set of commands a voice recognition engine will be able to recognize.

Such texts follow a syntax called BNF+ which is based on the Backus-Naur form (also called Backus normal form).

Use case example

The following is a working example of what a grammar can look like.
That grammar uses some more advanced features that will be explained later on.

XML

#BNF+EM V2.1;

!grammar Home_Automation;
!start <main>;

// Knixhult is a swedish lamp name pronounced as "nixult"
!pronounce "knixhult" PRONAS "nixult";

<main>: <...> <command> !repeat([<repeat_word>] <command>, 0, *) [<politeness>];

<command>: <verb> <device>;
<repeat_word>: and | "along with" | as well as;
<politeness>: please | if you may;

<verb>: turn (on | off | down | up) | raise | lower | open | close;
<device>: [<determiner>] (lamp | lamps | knixhult | oven | blinds);
<determiner>: a | an | the | all [the];

Using this grammar, you can recognize a lot of sentences among which:

Turn on all lamps
Lower the blinds and turn on the lamp
Could you close the oven please
Turn up the knixhult

Basic grammar concepts

Header

In the use case example, you can see that the first line is #BNF+EM V2.1;. It is called the grammar’s header and needs to be the first thing you write as it will describe the format and its version in use in the document.

The header must be the first line and cannot be preceded by any character of any sort

The body of a grammar consists of a collection of statements (mainly rule statements) always ending with a semicolon ;.

Grammar statement

Example:
!grammar Home_Automation;

A grammar statement is used to specify the grammar’s name and corresponds to Home_Automation in our example. It always starts with !grammar. Keywords starting with an exclamation mark ! are called directives and have a special meaning and role in writing a grammar. Other directives exist and will be shown later on.

Every grammar file must contain a grammar statement

Rule statement

Example:
<verb>: turn (on | off | down | up) | raise | lower | open | close;

A rule statement describes the actual content to be recognized (i.e: words). It is composed of the rule’s name enclosed in <> and followed by a colon : and a combination of terminals and non terminals which are explained later. In our example, several rules are declared such as main, verb, command, device, …

A rule statement cannot be cyclic (call itself), either directly, or indirectly

Terminal

Example:
and | "along with" | as well as

In the previous rule statement’s example, on the right hand side of the colon :, there are terminals. Those are words or group of words that will need to be spoken in order for the rule verb to be recognized by the engine (we will talk later about |, (), and other special characters).

Terminals can be either written as is, or enclosed in double quotes " in order to group several words (can have some impact on performance) or when you have a special character.

Non terminal

Example:
<verb> <device>;

Instead of using terminals to fill your rule statement, you can also use non terminals. Those are enclosed in <> and represent the content of the corresponding rule statement with the same name.

For example, <command>: <verb> <device>; is a rule statement with the name command, while verb and device are non terminals referencing other rule statements with the names verb and device declared elsewhere in the grammar. It means that in order for the engine to recognize the rule command, we need to recognize the rule verb and then the rule device.

It is generally a good idea to factorize terminals with the same meaning in a dedicated non terminal rule to organize your grammar better (see the device or verb rule statement from the use case example)

There are three reserved non terminals that have a special meaning:

<VOID>: This rule will not match anything. Inserting <VOID> into a sequence of grammar symbols automatically makes that sequence unspeakable.
<NULL>: This rule is automatically matched with an empty result. The optional symbol [] (explained later) uses in fact this rule internally:
<command>: I want [a] (dollar | pennies); is similar to <command>: I want (a | <NULL>) (dollar |pennies);.
<...>: This is the any speech rule. It will absorb any speech:
<command>: <...> turn on the light <...>; This rule will match for example “can you turn on the light please”. Even though it can match anything, it is required to match at least one result, so the sentence “Can you please turn on the light” won’t get recognized because the last one did not have anything to match. To solve this problem, you can make this rule optional by surrounding it with [].

The <NULL> and the <VOID> rules are almost never used, but the <...> one has many uses. Even though you can think of adding this rule almost everywhere to be resilient to what the user would say, keep in mind that it will impact performances and results confidence a lot.

Special characters

| The pipe creates an alternative between its left and right hand side
please | if you may will recognize either please or if you may
() Parentheses will group terminals and/or non terminals together so they can’t be separated
turn (on | off | down | up) applies the | only on on, off, down and up. The recognition possibilities will then be turn on, turn off, turn down or turn up
[] Square brackets will turn its content into optional items
[<determiner>] (lamp | lamps | knixhult | oven | blinds) allows the engine to recognize a device either with or without determiner before it

[] is a shorter alias for the directive !optional(...)

A grammar cannot have empty paths. It means that all rules must have lead to at least one terminal.

For example, the following grammar is not valid:
!start <main>;
<main>: [<singular>] | [<plural>];

Even though singular and plural can be non empty, because both of them are optional, the rule main can be composed of nothing and thus is invalid.

Start statement

Example:
!start <main>;

The second statement of our use case example is the start statement and represents the entry point of the recognition.

Every grammar must contain a start statement but is not limited to one. You can have as many start rules as you want by chaining them like this:

!start <rule_1> <rule_2> <rule_3> ...;

A rule that isn’t a start rule or does not belong to one indirectly would never be recognized by the engine

Advanced features

Earlier, we have talked about directives. Some more advanced one also exist and will allow you to customize your grammar more in depth.

Slot

Example:
!slot <contact>;

Writing down all recognizable words and sentences you want your engine to recognize is nice, but what if you don’t know everything at the time of writing that grammar ?

The slot directive will allow you to mark a rule as “to be determined later” so you don’t have to implement it anywhere in the actual grammar. You can see slot rules as “holes” that will be filled later on, programmatically.

You can find more details on how to program the use of slots by looking at the dynamic_grammar sample program in reading the following section: Voice Recognition - C++ | Dynamic Models

Slots will only allow terminals and custom pronunciations when replaced programmatically. You can fill a slot with multiple terminals. It will be interpreted as alternatives.

Slot statements may contain one rule name but are not limited to it. Several rules can be specified per statement:

!slot <rule_1> <rule_2> <rule_3> ...;

Pronunciation (phonetic)

Example (statement):
!pronounce "knixhult" PRONAS "nixult";
Example (directive):
<device>: lamp | oven | !pronounce(PRONAS "nixult") knixhult | blinds;

Before being able to recognize a word, the engine will have to be aware of how each word is pronounced. To do that, it uses a grapheme to phoneme (G2P) algorithm to know which phonemes (smallest unit of sound) compose those words.
It can sometimes happen that the automatic transcription gives a wrong result (different accent in a language, custom unknown word, etc…).
Fortunately, to help the engine, you can tell it how words are pronounced.

You have two different ways of doing it:

Using a pronounce statement: You will affect that word globally in the grammar (every occurrences)
The format is the following: !pronounce "my_word" [alphabet] "my_transcription".
Using a pronounce directive: You will affect that word locally only, for this unique instance.
The format is the following: "my_word" !pronounce([alphabet] "my_transcription").

All supported alphabets for every supported languages can be found here. If no alphabet is specified, the kirshenbaum (ascii-ipa) will be selected.
Instead of an alphabet, you can also choose PRONAS which indicates that the word is pronounced like the one written after that keyword (no phonetic involved).

When a local pronounce directive is used on words, it overrides the global one if exists

You can give for a same word several pronunciations by chaining them like such:
!pronounce "word" PRONAS "pronunciation_1" | PRONAS "pronunciation_2";
<rule_name>: word !pronounce(PRONAS "pronunciation_1" | PRONAS "pronunciation_2");

Repeat

Example:
<command>: <action> !repeat( "and" <action>, 0, *);

Sometimes, you want to be able to say things several times in a row. It would be really annoying having to write them four times if we want to be able to say four times. Instead, you can use the !repeat directive.

Its format is the following:
!repeat(X, <min>, <max>)

The first parameter X represents any expression that you could have used in a normal way in a rule (even nested repeat).

The second parameter <min> is the minimum number of time you can repeat X. It must be zero or more.

The third and last parameter <max> is the maximum number of time you can repeat X. It must be one or more. You can also set it to a special value * which means an infinite number of times.

You can shorten the repeat directive for two special cases: !repeat(X, 0, *) and !repeat(X, 1, *).
The first one can be replaced by X* and the second one by X+

Tag

Example:
<command>: !tag(ACTION_ON, turn on | activate | light) <device>;

Processing voice recognition results can sometimes be a little bit messy. Let’s say you want to do some home automation assistant and when you say turn on the light, you want to run a script that sends a command to the device to turn it on.
Now let’s say you want to have different sentences to do this task, you will have to check for every one of those sentences for in the end do the same process.
Let’s go a bit further and imagine we want to have an assistant that understands multiple languages. You now have to process those same sentences but for several languages.

Instead of having to check for every possibilities, you can use this tagging directive to tell the engine to output that specific tag ACTION_ON whenever one of the following verb is emitted. You can also have your other languages grammars do the same with the same tag name, and congratulation, you now have only one thing to check in your process: the ACTION_ON tag !

This grammar feature is really useful as it simulates NLU processing but on a grammar instead of a text.

Grammar syntax formalism

The following documents describe the exact syntax of a BNF+ grammar that is in used in the VDK Studio.

The lexer sets all usable characters and categorize them to be used by the parser.

Lexer

CODE

NEWLINE
        : ('\r\n' | [\r\n] ) -> skip
        ;

WHITE_SPACE
        : [\u{0009}\u{000d}\u{000a}\u{0020}\u{00a0}] -> skip
        ;

// Just white space between letters, no tabs or new lines
SIMPLE_SPACE
        : [\u{0009}\u{0020}\u{00a0}] -> skip
        ;

COMMENT
        : ('/*' (COMMENT|.)*? '*/') -> skip
        ;
LINE_COMMENT
        : ('//' .*? '\n') -> skip
        ;


            // ------ Fragments ------ //


fragment ESCAPE_SEQUENCE
        : '\\'('n' | 'r' | 't' | 'v' | '\\' | '"')
        ;

fragment DIGIT
        : [0-9]
        ;

fragment HEX_DIGIT
        : DIGIT | [a-fA-F]
        ;

fragment BASE64_SYMBOL
        : [a-zA-Z+/=]
        ;

fragment UNICODE_CHAR
        : '\\u'HEX_DIGIT HEX_DIGIT HEX_DIGIT HEX_DIGIT
        ;

// Letters
fragment UNICHAR_L
        : [\p{L}]
        ;

// Marks
fragment UNICHAR_M
        : [\p{M}]
        ;

// Numbers
fragment UNICHAR_N
        : [\p{N}]
        ;

// Punctuations
fragment UNICHAR_P
        : [\p{P}]
        ;

// Symbols
fragment UNICHAR_S
        : [\p{S}]
        ;

// Separator
fragment UNICHAR_Z
        : [\p{Z}]
        ;

// Connector punctuations
fragment UNICHAR_PC
        : [\p{Pc}]
        ;

// Dash punctuations
fragment UNICHAR_PD
        : [\p{Pd}]
        ;

fragment QUOTED_STRING_CHAR
        : (UNICHAR_L | UNICHAR_M | UNICHAR_N | UNICHAR_P | UNICHAR_S | UNICHAR_Z)
            { !endsWithBackslashOrDoubleQuote() }?
        ;

//  ASCII + Latin-1 without control chars and without < and >
fragment NON_TERMINAL_CHAR
        : ([\u{0020}-\u{003b}] | [\u{003f}-\u{007e}] | [\u{00a0}-\u{00ff}] | [\u{003d}])
        ;

fragment TERMINAL_LETTER
        : (['&.] | UNICHAR_L | UNICHAR_M | UNICHAR_N | UNICHAR_PC | UNICHAR_PD)
        ;

            // ------ Tokens ------ //

// A quoted string can contain any character that is not a control character.
// The characters " and \ must be escaped inside a quoted string.
QUOTED_STRING
        : '"' (QUOTED_STRING_CHAR | ESCAPE_SEQUENCE | UNICODE_CHAR)*? '"'
        ;

INTEGER
        : [-]?DIGIT+
        | '0x' HEX_DIGIT+
        | [aA]'64' BASE64_SYMBOL+
        ;

// A non-terminal can contain any character that is not a control character.
// The exception is the use of < and >.
// There is no escape mechanism for these characters.
NON_TERMINAL
        : '<' NON_TERMINAL_CHAR+ '>'
        ;

// A terminal primarily consists of alphanumeric characters.
TERMINAL
        : (TERMINAL_LETTER | DIGIT)* TERMINAL_LETTER (TERMINAL_LETTER | DIGIT)*
        ;

GRAMMAR_KEYWORD
        : '!grammar'
        ;

START_KEYWORD
        : '!start'
        ;

SLOT_KEYWORD
        : '!slot'
        ;

PRONOUNCE_KEYWORD
        : '!pronounce'
        ;

PRONOUNCE_OPEN
        : PRONOUNCE_KEYWORD SIMPLE_SPACE* '('
        ;

OPTIONAL_OPEN
        : '!optional' SIMPLE_SPACE* '('
        ;

REPEAT_OPEN
        : '!repeat' SIMPLE_SPACE* '('
        ;

TAG_OPEN
        : '!tag' SIMPLE_SPACE* '('
        ;

HEADER
        : '#BNF+EM' WHITE_SPACE* 'V' DIGIT+ '.' DIGIT+ ';'
        ;

OPEN_BRACKET
        : '{' -> pushMode(NLU)
        ;

OPEN_SQUARE_BRACKET
        : '['
        ;

CLOSE_SQUARE_BRACKET
        : ']'
        ;

OPEN_PARENTHESIS
        : '('
        ;

CLOSE_PARENTHESIS
        : ')'
        ;

COLON
        : ':'
        ;

PIPE
        : '|'
        ;

PLUS
        : '+'
        ;

STAR
        : '*'
        ;

COMMA
        : ','
        ;

SEMI_COLON
        : ';'
        ;

The parser sets the rules of appearance of the lexer’s tokens.

Parser

CODE

main
        : header statement* EOF
        ;

header
        : HEADER
        ;

statement
        : startStatement
        | slotStatement
        | grammarStatement
        | pronounceStatement
        | ruleStatement
        ;

grammarStatement
        : GRAMMAR_KEYWORD (QUOTED_STRING | TERMINAL) SEMI_COLON
        ;

startStatement
        : START_KEYWORD ruleList SEMI_COLON
        ;

slotStatement
        : SLOT_KEYWORD ruleList SEMI_COLON
        ;

pronounceStatement
        : PRONOUNCE_KEYWORD (TERMINAL | QUOTED_STRING | INTEGER) transcriptionList SEMI_COLON
        ;

ruleList
        : NON_TERMINAL+
        ;

transcriptionList
        : transcription (PIPE transcription)*
        ;

transcription
        : TERMINAL? QUOTED_STRING
        ;

ruleStatement
        : NON_TERMINAL COLON expression exportAttachment? SEMI_COLON
        ;

expression
        : alternativeList (PIPE alternativeList)* extensionAttachment?
        ;

kleeneModifier
        : PLUS
        | STAR
        ;

alternativeList
        : alternative+
        ;

alternative
        : factor kleeneModifier*
        ;

factor
        : (TERMINAL | QUOTED_STRING | INTEGER) terminalModifier*    # factorTerminal
        | NON_TERMINAL importAttachment?                            # factorNonTerminal
        | OPEN_PARENTHESIS expression CLOSE_PARENTHESIS             # groupExpression
        | OPEN_SQUARE_BRACKET expression CLOSE_SQUARE_BRACKET       # optionalExpression
        | OPTIONAL_OPEN expression CLOSE_PARENTHESIS                # optionalDirective
        | REPEAT_OPEN expression COMMA repeatBody CLOSE_PARENTHESIS # repeatDirective
        | TAG_OPEN tag COMMA expression CLOSE_PARENTHESIS           # tagDirective
        ;

repeatBody
        : INTEGER (COMMA (INTEGER | STAR))?  # repeatRange
        | STAR                               # repeatAtLeastZero
        | PLUS                               # repeatAtLeastOne
        ;

terminalModifier
        : PRONOUNCE_OPEN transcriptionList CLOSE_PARENTHESIS
        ;

tag
        : TERMINAL
        ;