How Do Chatbots Understand?

Building a Chatbot with Rasa — part IV

Aniruddha Karajgi
Towards Data Science

--

image by author

In the previous posts in this series, we’ve discussed the fundamentals of building chatbots, slots and entities and handling bot failure. We’ll now talk about how Rasa implements bot understanding through intents. We’ll also build a simple custom intent classifier based on Logistic Regression, though Rasa does provide some good ones right out of the box.

Table of Contents

- Natural Language Understanding
- Intent Classification
- The DIETClassifier
- A custom intent Classifier

Previous articles in the series

Part I: Building Chatbots with Rasa

Part II: Are Slots and Entities the same?

Part III: Handling Chatbot Failure

Natural Language Understanding

Natural Language Understanding, or NLU for short, is the field that deals with how machines have reading comprehension. It's a sub-field of Natural Language Processing.

While NLP focuses on the understanding itself, NLP can derive context and meaning that may not be directly apparent. Rasa uses intents to help the bot understand what the user’s saying.

Entities are used to extract key information that also helps the bot formulate a response. In this post, we’ll focus on intent classification.

Intent Classification

Intent classification was one of the techniques we just discussed. You just assign a label to each message that the user enters. This label represents a topic, called an intent . For example, you can define intents like greet , goodbye , supply_contact_info , etc.

Since we have this training data already labelled as part of our nlu data, it turns into a (usually) straightforward text classification problem. I say “usually” because the way you define your intents has a lot to do with how easy they are to classify.

Well, this is true for any classification problem: it's easier to discriminate between different mammals compared to different species of horse. Similarly, it's easier to have an intent called book_flight with entities for extracting destinations, dates, etc. than to have separate intents for each use case:

  • book_flights_destination_given
  • book_flights_date_mentioned
  • book_flights_budget_mentioned ,etc.

Intent Classifiers

Many approaches can be taken to classify intents. You could have a purely rule-based system, which would look for particular words and phrases to figure out what the user’s trying to say. As you can imagine, this approach won’t work too well, especially for more complex use cases.

Machine learning approaches are really good here, especially with the development that’s happening in the field of NLP. For example, you could build your own intent classifier using something as simple as a Naive Bayes model.

On the other hand, you could use the DIETClassifier, a transformer-based model that can perform both entity extraction and intent classification, which we’ll discuss in a minute.

Rasa provides some intent classifiers which you can use directly by mentioning them in your config.yml file. Some of these are:

  • DIETClassifier
  • SklearnIntentClassifier: uses sklearn’s SVC
  • KeywordIntentClassifier: matches string with data (rule-based)
  • FallbackClassifier: helps handle edge-cases and low confidence results. This was discussed in the previous post.

Note

For more information on fallback and handling failure, please check this post in the series:

Let’s look into the information that intent classifiers generally provide apart from the predicted intent itself.

A typical output of an intent classifier — image by author

Along with the intent, these classifiers usually return a confidence score (if it's a probabilistic model) and a ranking of intents. These are really useful while debugging your bot’s performance.

Some intent classifiers like DIET also output intent rankings — image by author

The DIETClassifier

The DIETCLassifier is a transformer-based model that performs both as an entity extractor and intent classifier, hence the name Dual Intent Entity Transformer. It was developed by Rasa.

It's one of the components that come with Rasa. Let’s discuss some of the most interesting parts of the architecture.

  • There are two special tokens: _MASK_ and _CLS_.
  • The _CLS_ token represents the entire user message through the sum of the sparse embeddings of the other tokens or a pre-trained embedding like BERT directly. It helps in intent classification.
  • The _MASK_ token helps generalize the model by masking random tokens. It’s used to calculate mask loss.
  • The feed-forward modules are shared for all tokens and have 80% dropped connections by default.
  • These tokens are passed through feed-forward layers into a transformer that has 2 layers by default.
  • These are then passed into a Conditional Random Field module, implemented using Tensorflow. This is where the entity extraction happens (recall that DIETClassifier is capable of doing both extraction and intent classification).
  • Finally all three losses: entity loss, intent loss and mask loss give us a total loss which we attempt to reduce through backpropagation.

You can see the architecture below.

The DIETClassifier — image from Rasa’s paper

Why the DIETClassifier is so flexible

Rasa designed the DIETClassifier to be very customizable. Apart from the usual hyperparameters that you can change in most models, like number of epochs, number of hidden layers, etc., these are the features that Rasa’s DIETClassifier provides:

  • Adding pre-trained embeddings: There’s support for embeddings like BERT, GloVe and ConveRT. This gives you the flexibility of using both sparse embeddings (CountVectorizers, for example) and dense ones too.
  • Using it only as an intent classifier, or an entity extractor: You can use DIET for a single task and have other components do the rest.
  • The _MASK_ variable flag: This masks certain tokens randomly, so there’s a better chance for the model to generalize.
  • Modifying the connection density of the feed-forward layers: By default, the DIETClassifier keeps only 20% of all weights as non-zero, keeping the model light and reducing the chance of overfitting. You can modify these hyperparameters to build increasingly complex models.

Building our intent classifier

Rasa makes it really simple to build our own components, from entity extractors, policies and intent classifiers all the way to spellcheckers and semantic analyzers.

Let’s build a logistic regression-based intent classifier, that takes in our NLU data and fits sklearn ‘s LogisticRegression model . We’ll use the same chatbot as before, including NLU data, stories and actions.

The blueprint

Every component is defined as a class inheriting from Component. Some attributes are defined in the class, along with a set of necessary methods, which rasa uses to train and ultimately pass data to our component based on the steps defined in the config.yml file.

Our intent classifier’s high-level structure looks something like this:

# importsclass LRClassifier(IntentClassifier):    def __init__(self, component_config: Optional[Dict[Text, Any]] =        None) -> None:
super().__init__(component_config)
# necessary attributes: eg. name, provides, requires, etc. # necessary methods: eg. train, process, persist, etc.

The train method

This method simply trains our classifier. Since we’re using sklearn, it’s pretty straightforward.

The process method

This method is executed every time Rasa’s pipeline is run, which happens after every user message. It contains the logic of your component. In the case of our intent classifier, the process method will contain a predict call, which predicts an intent, along with an intent ranking if we want.

The persist method

This method saves our model for later. We’ll use joblib since we’re using sklearn. Sklearn’s documentation recommended using joblib since its models usually contain a lot of numpy matrices.

The load method

The load method is called whenever you start the chatbot. It loads the model from the file it was saved to in the persist method.

Evaluating our Model

Rasa provides support for evaluating both the NLU and the Core of your bot. All we have to do is create some test data and run rasa test . There’s support for cross-validation too.

To test our classifier, add testing data to tests/test.yml . It’ll look something like this:

After that, run:

rasa test

A results folder will be generated and you can view the performance of your classifier through charts and reports. You’ll also see DIETClassifier's entity extraction performance.

A chart generated by Rasa’s testing support — image by author

Notes

Intents are limiting

You can probably imagine that’s it pretty limiting to have a bot classify a message into a set of exclusive classes. Rasa helps with this by providing support for hierarchical intents and is working on removing intents altogether.

In the meantime, we can design a better conversational agent by structuring our intents to be very generic, and then extracting the more nuanced aspects of a user message using entities or hierarchical intents.

Advanced testing

Rasa provides a lot of features that you can use while testing your components. These include:

  • cross-validation
  • comparing models
  • policy testing

and a lot more things, though we won’t be discussing them in this post.

Conclusion

In this post, we discussed how chatbots actually understand what the user is saying. We also built a custom model that understands simple queries, and this is accomplished by classifying a user message into a fixed set of intents.

We also touched on why intents are limiting and if there are better ways to handle intent classification.

Hopefully, this post gave you some idea of how chatbots extract meaning from user messages.

--

--