How-to: Create a good NLU model
Machine Learning is not Magic !
While Natural Language Understanding (NLU) has made great strides in recent years and has been able to achieve impressive accuracy rates, it is important to note that it is not a magical solution that can understand any form of language input.
The performance of ML models is still dependent on the training data used. That means that if you use “bad” data you will have “bad” results even if you have an immaculate model. On the other hand, if you use a “weak” model combined with “high quality” data, you would be surprised by the results. That is why data scientists often spend more than 70% of their time on data processing.
A prevalent error in creating data is prioritizing quantity over quality. Many resort to automated tools that generate training examples rapidly, resulting in a large dataset. However, the generated data may be of lower quality and may not accurately reflect the complexity and nuances of real use cases. Instead, it is essential to focus on creating high-quality data, even if it means having a small one, to ensure the best performance of your model. Focus on building your data over time and experiments.
This guideline provides a bag of tricks that you need to build high quality NLU data from scratch. By the end, we expect to obtain an accurate NLU model that can be deployed on offline embedded devices. We target a real use case: Smart Offline Cooker
.
Before starting, here is the first tip: Avoid improvisation! You should carefully prepare your data and anticipate your use case even if it is a simple one. Spending time on designing data will not only help you to better understand your data, but also save you countless hours of debugging.
Device
Since it will hold and run your model, verify that the device setup is compatible with the expected model footprint. If the device does not have enough memory, then the model will not generate any results.
Intents
These are the actions that the user wants to accomplish with the device. Any user should be able to distinguish them easily without confusion. start
and stop
are good Intents because they are different and clear. If you keep these two, avoid defining begin
, activate
, or similar intents in addition, because not only your model but also humans will confuse them with start
. Even turn-on
is to be avoided, for the same reason.
Gather maximum information from the use case specification, draw a table containing all your expected actions and transform them into intents.
You might assume the best way to design data is to increase the number of Intents. No! Keep your Intents general and avoid having detailed ones like “increase_heat_maximum” or “increase_heat_medium”, merge them in the same Intent: “increase_heat”.
Always include an out-of-scope Intent. If you only have start
and stop
Intents, then the model will always provide one of them as Intent, even if the user command is hello world
. Here, the intent None
will include what the model should not handle/recognize.
Action | Intent |
---|---|
We can start the cooker |
|
We can stop the cooker |
|
We can increase the cooker’s heat |
|
We can decrease the cooker’s heat |
|
We can lock the cooker’s security |
|
We can unlock the cooker’s security |
|
We can ask the cooker how long it still needs |
|
Everything else we don’t want to handle |
|
Slots
If you expect only Intents from your model, then you can skip this paragraph, slots are optional.
Otherwise, remember that slots are the information that your device needs for the action (intent).
For example, if you have the intent increase_heat
, then your cooker should be able to increase the heat to a precise value. In Increase the heat to 5
, 5
must be recognized as the slot: heat_level
. Choose relevant slot names. At this level, in order to define your potential slots, just focus on the categories of slots and ignore what values they would have. You will come back to this later.
In your initial table add a column
Intent Specification
containing all specifications that each intent may have.
In this column, contrary to the previous table, you need more specifications for your intents. It will help you gather all the slots you need.
Forstart
, the user can simply ask forstart
action, orstart
action later (or another wanted time). This specification can be modeled bystart + time
.
Here, you got a first slot:action_time
. Remember that slots are optional, some specifications would need no slots at all.
Intent | Intent Specification | Slots |
---|---|---|
| start/stop |
|
start/stop + time |
| |
start/stop + mode_of_cooking |
| |
start/stop + mode_of_cooking + time + duration |
| |
| increase_heat/decrease_heat |
|
increase_heat/decrease_heat + time |
| |
increase_heat/decrease_heat + level |
| |
increase_heat/decrease_heat + level + time |
| |
| lock_security/unlock_security |
|
lock_security/unlock_security |
| |
| tell_remaining_time |
|
Now, the data is starting to take shape. If you identify some bottlenecks at this level, remember that often in NLU, what is difficult for humans will probably be difficult for models. Thus, simplify the data structure as much as possible so the model can understand it.
Examples
These are the expected user commands and also what the model will learn during the training process.
Intents need examples that reflect the real use case. For the model to effectively distinguish different intents, it is crucial to have distinct examples.
Remember two rules:
For each example: The more your example is close to the real use case, the more successful the user experience will be. Each example should be inspired by a real use case. Put yourself into the shoes of the final user.
For each intent: The more your examples are consistent, the more accurate your model will be. Put yourself into the shoes of a learner. If you want to learn an Intent, but Examples do not express the same meaning, you would not be able to learn it well.
When creating your examples, keep in mind the following tips:
Inside the same intent:
Each intent should contain only consistent examples: They must have the same meaning and they must cover the most common ways to express an Intent.
start the cooker
andyou can start the cooking program
are two acceptable ways to express the same Intent:start
.Each example should be typical of its intent: One example should fit with only one intent.
start the cooker
andyou can start the cooking program
are typical examples ofstart
intent only. You can also considergo ahead
, orlet the cooking start
as common ways to express the same Intent. Sometimes, you want to add less typical examples aseverything is in the cooker, finish the job
forstart
ornot enough heat
forincrease_heat
. These implicit examples should not be confused with another intent.Avoid making redundancy: The model will process examples as patterns, so avoid adding similar examples that do not bring new knowledge to the intent. Knowledge can be a new word position like
start in 2 min
orin 2 min starts
, a new word, or information that may help a human learner (and thus, the model).Avoid negative forms. Traditionally in Human Machine Interfaces, machines accomplish positive actions.
do not increase the heat
is not an appreciated form by the model.Entities (and also words) should have the same distribution: In general, NLU models can be sensitive to data distribution. Except if you want your model to recognize some Intents depending on some “key words”, avoid having a dominant word or dominant entities. Make your data balanced. Otherwise, if the intent
start
only has examples likestart now
,start in two minutes
,start...
the model would not recognizeturn the cooking program on
. Instead, if you need 12 examples, then use three withstart..
, three withbegin...
, three withturn-on...
, and three withactivate...
.
Between intents:
Examples must be distinct across intents: So avoid having similar examples in different intents as it can lead to confusion. Examples that share many words are too similar.
Examples should be balanced across intents. Avoid having Intents with less examples than others. If you have one dominant intent in terms of number of examples (100 for one versus less than 10 for others), that means that your model will always recognize this intent (since it is the most probable).
Entities (and also words) should be balanced: If an entity (or a word) occurs in several intents, make sure that it appears with the same number in these intents.
Go back to the previous table and add a new column “Example”. For every Intent Specification row, add three examples with the previous tips in mind.
The following table is an example for the intentstart
Intent Specification | Slots | Example |
---|---|---|
start |
| Start the cooker |
Activate the cooking program please | ||
Go on | ||
start + time |
| Start in |
In | ||
Go on in | ||
start + mode_of_cooking |
| Start the |
Please activate the | ||
Alright you can go on with | ||
start + duration |
| Start cooking for |
Can you activate the program for | ||
Go on for a duration of | ||
start + mode_of_cooking + time |
| |
start + mode_of_cooking + time + duration |
|
Now, you can delete the Intent Specification column, and add a new column Relevant Tokens. This column contains the relevant words/entities for each Intent. If a token occurs only in one intent, it can be discriminative for it: the model will probably recognize intents thanks to these tokens. If it fits with the target use case, it will increase the model accuracy. Otherwise, add examples including these words to the right Intents.
Slots | Example | Relevant Tokens |
---|---|---|
| Start the cooker |
|
Activate the cooking program please | ||
Go on | ||
| Start in | |
In | ||
Go on in | ||
| Start the | |
Please activate the | ||
Alright you can go on with | ||
| Start cooking for | |
Can you activate the program for | ||
Go on for a duration of | ||
| ||
|
This looks simple, but sometimes while defining examples it is possible to realize that the initial intents cut is not clear. If you start to gather similar examples in different intents, it makes sense to reassess your intent design and merge similar Intents into a more general one.
Entities
Finally, once you've made improvements to your data in the previous table, the final step is to define each slot’s values, called entities. You do not need to list all the possible words. Add only the main words that represent the slot. For cooking_mode
, values could be high
, low
, medium
, gentle
and gentle high
. If you expect numeric values like action_time
, then choose them randomly: 1 min
, 3.4 min
, 5 min
, 7.10 min
, 10 min
etc. Some values can be shared in different slots. It will be handled by the model. For the cooker, action_time
and cooking_duration
have numerical values. It would be better to avoid adding exactly the same values to these slots.
Conclusion
Do not skip testing! Testing ensures that your model is providing accurate predictions as intended.
It's essential to keep in mind that models are not static and require continual updates with new data to improve their accuracy and enable them to tackle new scenarios. If you have a messy data set, it may be better to start from scratch, and assess your data based on the best practices listed above.