Handling chatbot failure gracefully

Building Chatbots with Rasa — part III

Published in

Towards Data Science

10 min readDec 22, 2021

No matter how well you train your Rasa Chatbot, there are going to be scenarios your bot will not know how to handle. The goal isn’t to make a perfect failproof chatbot, since that isn’t even possible. In fact, as humans, we often ask for clarifications ourselves. Rather, the goal is to have systems in place when things go south.

People prefer being asked for clarification over being misunderstood. Of course, asking a user to repeat themselves five times isn’t ideal and that is exactly why a fallback system should be in place.

In this post, we’ll talk about handling cases where your bot doesn’t quite understand what a user is trying to say. In Rasa terminology, this is called a Fallback.

Quick Note

This is the next post in my series on Rasa. You can check out the previous articles in the series below:

Building a Chatbot with Rasa

Getting started — part I

towardsdatascience.com

Are Slots and Entities the same?

Building Chatbots with Rasa — part II

towardsdatascience.com

There, we covered the building blocks of Rasa and chatbots in general and if you’re new to Rasa, going through the first two articles will definitely help.

- An example
- The Fallback Classifier
- Simple Fallback
- Single-stage Fallback
- Two-stage Fallback
- Handling Human Handoff

An example

We’ll look at a simple case where there are no fallback mechanisms in place, and then we’ll work up from there.

Let’s continue with the same chatbot we’ve been using in this series — it asks the user for their contact information. We’ll be simulating a fallback scenario by just reducing the training epochs for our DIETClassifier, which tries to understand what the user is saying.

The default config file (which you get when you run rasa init) has a FallbackClassifier component by default. We’ll make two changes here:

We’ll remove the FallbackClassifier component, just to see what happens.
We’ll also reduce the training epochs for the DIETClassifier to 2, so simulate a phrase that’s hard to detect.

language: enpipeline:
- name: WhitespaceTokenizer
- name: RegexFeaturizer
- name: LexicalSyntacticFeaturizer
- name: CountVectorsFeaturizer
- name: CountVectorsFeaturizer
  analyzer: char_wb
  min_ngram: 1
  max_ngram: 4
- name: DIETClassifier
  epochs: 2 # reduced epochs
  constrain_similarities: true
- name: EntitySynonymMapper
- name: ResponseSelector
  epochs: 100
  constrain_similarities: true
# removed the FallbackClassifier from herepolicies:
- name: MemoizationPolicy
- name: TEDPolicy
  max_history: 5
  epochs: 100
  constrain_similarities: true
- name: RulePolicy

After training the bot, we’ll chat with it using

rasa shell --debug

The conversation will go something like this:

user: Hi
bot: Sorry, you'll have to provide your information to proceed.

This does not make sense at all. Digging through the debug logs, we see this:

Received user message 'hi' with intent '{'id': -4215159201959237708, 'name': 'deny', 'confidence': 0.37452369928359985}' and entities '[]'

The bot predicted the user’s message “hi” as being a deny intent, instead of the greet intent. The bot’s response will likely confuse the user, and this will be more pronounced in actual projects, where you’d typically have many intents with lots of data which may often be a little similar in some cases.

Now, let’s see what happens when we just include the FallbackClassifier in the config, without actually setting up the entire mechanism.

Adding to the end of the pipeline:

- name: FallbackClassifier
  threshold: 0.5
  ambiguity_threshold: 0.1

and retraining, we see a change in the logs when the user says “hi”:

Received user message 'hi' with intent '{'name': 'nlu_fallback', 'confidence': 0.5}' and entities '[]'

It no longer predicts whatever intent has the highest confidence, even if it was really low. The FallbackClassifier now overrides that intent with nlu_fallback . Before we proceed with the fallback mechanism, we’ll talk about the Fallback Classifier.

The Fallback Classifier

The Fallback classifier implements two thresholds:

the nlu fallback threshold
and the ambiguity_threshold

The fallback classifier thresholds — image by author

In your config file, it's defined after the intent classifiers like so:

language: en
pipeline:
   ..
   ..
   # intent classifier like DIETClassifier defined here
   - name: FallbackClassifier
         threshold: 0.7
         ambiguity_threshold: 0.1

(NLU Fallback) Threshold

The threshold key is used to specify the threshold for NLU fallback in config.yml. If the top intent’s confidence is below this, the fallback classifier ignores it and goes ahead with nlu_fallback .

Ambiguity Threshold

This key specifies how much the confidences of the top two intents should be to continue with regular conversation flow. If the difference in confidence scores is less than what’s specified here, the Fallback Classifier goes with nlu_fallback, regardless of whether the threshold condition is met.

A Simple Fallback

This kind of fallback is the most basic: When your bot predicts a user’s intent below a certain threshold (nlu_fallback_threshold), we’ll simply transfer the chat to a human agent.

Now the actual mechanism of transferring a chat to a human agent, called “human handoff” depends on how and where your chatbot is deployed. To keep this applicable to everyone, we’ll create a place-holder custom action, that will return a transferred_to_human flag, just to let us know that it's working.

Let’s implement it.

Adding the rule

We’ll first need to define a rule that whenever nlu_fallback is predicted, run action_default_fallback. Add this to rules.yml :

# simple fallback:
- rule: Implementation of the simple-Fallback
  steps:
  - intent: nlu_fallback
  - action: action_default_fallback

Implementation

Next, we’ll override action_default_fallback. Simply add this class to actions.py:

class ActionDefaultFallback(Action):def name(self) -> Text:
        return "action_default_fallback"def run(self, dispatcher, tracker, domain):
        # output a message saying that the conversation will now be
        # continued by a human.
 
        message = "Sorry! Let me connect you to a human..."
        dispatcher.utter_message(text=message)# pause tracker
        # undo last user interaction
        return [ConversationPaused(), UserUtteranceReverted()]

We’ll also define it in the domain under the actions field.

Testing

We get a much better experience now.

user: Hi
bot: Sorry, couldn't understand that! Let me connect you to a    
     human...

Remember that a properly trained bot will almost certainly be able to respond to “hi”. Here, we’re just simulating a user message that the bot can’t understand.

If you look through the logs, you can see that Rasa correctly applied the rule we defined.

rasa.core.policies.rule_policy - There is a rule for the next action 'action_default_fallback'.

Single Stage fallback

The Simple Fallback was better than just forging ahead with a conversation almost certain to fail.

But we can do better. Instead of just performing a handoff in case your bot isn’t quite sure what the user wants to say, we can make the bot clarify what the user meant by providing suggestions.

Of course, even after that, if the bot still doesn’t get what the user wants to say, we can perform a handoff.

This will be more useful since

it reduces the load on a human
it keeps the user engaged for longer because if no human is currently available, they most likely won’t have the patience to wait.

By default, when asking for clarification, the bot suggests the intent with the highest confidence.

The utter_ask_rephrase response — image by the author using carbon.now.sh

You’ll probably notice two things here:

that only the top intent is suggested, rather than providing, say, the top 3 predictions
the name of the intent as defined in NLU is used. Not a great experience for the user, especially for intents with more cryptic wording.

We can solve both of these by overriding a method called action_default_ask_affirmation .

Let’s implement this.

Adding relevant rules

This is a little more involved than the previous case. We’ll define two rules here.

nlu_fallback → provide suggestions to the user

Once the bot fails to under the user, it will provide relevant suggestions of what the user could have meant. Here, we’ll perform the overridden action_default_ask_affirmation custom action, which we’ll implement in a bit.

- rule: Single stage fallback | ask user to choose what they meant
  steps:
  - intent: nlu_fallback
  - action: action_default_ask_affirmation

user doesn’t like suggestions → human handoff

The second rule is for when the user doesn’t want to go ahead with any of the provided suggestions and clicks on an option like “none of these” (we’ll implement that option too). It is mapped to the default out_of_scope intent.

When that happens, we want to directly connect the user to a human, so we’ll call the action_default_fallback .

- rule: Single stage fallback | call default fallback if user is not ok
  steps:
  - action: action_default_ask_affirmation
  - intent: out_of_scope
  - action: action_default_fallback

Overriding the default affirmation custom action

We’ll do two things here to solve the issues we noticed above:

create a mapping between intents and readable phrasing. Something like: supply_contact_info --> "Supply Contact Information"
provide a user with the top three intent predictions as suggestions, rather than only the top one, so the user is more likely to find what they were looking for.

class ActionDefaultAskAffirmation(Action):def name(self):
        return "action_default_ask_affirmation"async def run(self, dispatcher, tracker, domain):
        # select the top three intents from the tracker        
        # ignore the first one -- nlu fallback
        predicted_intents = tracker.latest_message["intent_ranking"][1:4]# A prompt asking the user to select an option
        message = "Sorry! What do you want to do?"# a mapping between intents and user friendly wordings
        intent_mappings = {
            "supply_contact_info": "Supply Contact Information",
            "affirm": "Agree",
            "deny": "Disagree",
            "goodbye": "End conversation"
        }# show the top three intents as buttons to the user
        buttons = [
            {
                "title": intent_mappings[intent['name']],
                "payload": "/{}".format(intent['name'])
            }
            for intent in predicted_intents
        ]# add a "none of these button", if the user doesn't
        # agree when any suggestion
        buttons.append({
            "title": "None of These",
            "payload": "/out_of_scope"
        })dispatcher.utter_message(text=message, buttons=buttons)return []

Note that for some intents, you may not want them to appear as suggestions. “Do you want to greet me?” sounds off. If such an intent is among the top 3, a simple check can be added to make sure the user doesn’t see it as a suggestion.

Remember to add action_default_ask_affirmation to your domain as a registered action.

Testing

We’ll try the same conversation flow as before.

Testing the single-stage fallback mechanism — image by the author using carbon.now.sh

This is much better than before. Let’s try a third strategy now.

Two-Stage fallback

Two-stage fallback was introduced relatively recently. It lets the user clarify themselves (by suggesting possible actions to them) twice, before heading off to ultimate fallback.

Though it looks complex, Rasa has an implementation for this already, so it’s much easier to set this up, especially because we’ve done half the work with the previous two approaches.

Adding a response that asks the user to rephrase

Rasa provides a default response called utter_ask_rephrase . Of course, there’s also a default custom action called action_default_ask_rephrase that you can override to implement custom behaviour, just like we did for action_default_fallback and action_default_ask_affirmation previously.

For now, we’ll just use the response. We’ll define it under the responses field of domain.yml.

utter_ask_rephrase:
  - text: I'm sorry, I didn't quite understand that. Could you rephrase?

Adding the two-stage fallback rule

We’ll replace the rules for the previous single-stage fallback approach with this.

# two stage fallback:
- rule: Implementation of the Two-Stage-Fallback
  steps:
  - intent: nlu_fallback
  - action: action_two_stage_fallback
  - active_loop: action_two_stage_fallback

No other changes are necessary.

Testing

This is how the flow goes:

testing the two-stage fallback policy — image by the author using carbon.now.sh

What do you do when a human handoff isn’t successful?

Sometimes, even after multiple clarifications from the user, your bot may not understand what they are trying to say. In this case, it’s smart to let the user know that they would be now connected to a human, and then transfer the conversation to where a human can access it.

Sending the tracker along with help since the human now already has the context of the conversation, meaning that the user won’t have to start their request from the beginning.

Even after this, a human may not be available to attend to this user. Maybe the user is in a different timezone, or all customer care agents are occupied. Sometimes, the mechanism to connect the user to an agent may break. In these cases, rather than keep the user waiting with no further information on how to proceed, it makes sense to inform them of this issue.

Apart from a simple message stating that human agents are currently unavailable, along with some functionality to let the user leave a message or provide their contact details so they can be contacted later.

Sample Code

BlogCode/FallbackExample · Polaris000/BlogCode

Sample code from my blog posts on Medium and my personal website. - BlogCode/FallbackExample at main ·…

github.com

Conclusion

No matter how good your bot is, there will be scenarios where the bot fails to understand what the user is trying to say, especially early on in development. You may have good data to train your bot on, but it may not be representative of how real users would converse.

Learning how to fail gracefully is important when it comes to building chatbots. After all, even as humans, we may often end up having to do the same, due to a plethora of communication issues, from language to poor network quality.

Ultimately, my goal with this post was to give you an idea of what’s possible when designing a fallback strategy for your bot.

Hope it helped!

Updates

20.3.2022

Add links to other posts in the series

Handling chatbot failure gracefully

Building Chatbots with Rasa — part III

Building a Chatbot with Rasa

Getting started — part I

Are Slots and Entities the same?

Building Chatbots with Rasa — part II

Table of Contents

An example

The Fallback Classifier

(NLU Fallback) Threshold

Ambiguity Threshold

A Simple Fallback

Adding the rule

Implementation

Testing

Single Stage fallback

Adding relevant rules

Overriding the default affirmation custom action

Testing

Two-Stage fallback

Adding a response that asks the user to rephrase

Adding the two-stage fallback rule

Testing

What do you do when a human handoff isn’t successful?

Sample Code

BlogCode/FallbackExample · Polaris000/BlogCode

Sample code from my blog posts on Medium and my personal website. - BlogCode/FallbackExample at main ·…

Other articles in the series (this is #3)

Conclusion

Updates

Written by Aniruddha Karajgi