A few years ago, a client asked us to create an application that allows its users to create bookings for conference rooms and workspaces. That looks quite easy, right? A few database tables, a thin server, and thick web and mobile applications for a smooth user experience. Almost every company has a solution like that, so it should be fairly easy. But wait, there is a catch! The user interface has to be a chatbot!
That’s a completely different situation. How do we build something like that from scratch? We need to adjust our strategy a bit; we are going to need a thick server, thin web, and mobile applications. To limit the scope of this article, we will focus on the server-side.
So it begins
After a few searches and a fair amount of experiments we stumbled across NLP - Natural language processing. These three words describe a key component of every modern chatbot platform. The chatbot takes ordinary sentences and transforms them into a data structure that can be easily processed further, without all the noise that surrounds core information.
Let's look at this example:
A simple sentence like this is split into multiple items that can be named and searched for.
In this case, the phrase “I need place” is identified as a general intent that can
be interpreted as a request for booking. Other items add information to this request.
These attributes can carry either simple or complex information.
In this example, the word “some” gives us the freedom to select any room from a list of available rooms,
and the word “meeting” is interpreted as a request for a meeting room.
Those parts were the easiest to classify. Time recognition attributes are more complex.
This is great for identifying atomic attributes in the sentence, but it's still a text.
It took us almost a year to put together a comprehensive training data set for our target languages
(English and German), but our bot finally understands the vast majority of user's requests.
But how do you connect a room number to a specific room entity, username to the user,
or date description to an actual date?
For that, we had to build an additional layer. Some of the post-processors need a whole blog post to describe it, but in the end, we managed to get a nice set of domain objects that are used in the bot’s decision-making process. In general, it looks like this:
Input sentences are processed by the NLP and each intent or attribute is then passed to an interpreter that
creates one or more objects that are used in conversation flow.
The most difficult part - the recognition - was solved (or so we thought).
NLP gave us a nice structure with multiple items that can be interpreted as simple data objects.
Neurons or no neurons, that’s the question
The logic for conversion of recognized data to actions on the database was quite simple at the beginning.
We had a few separated, well-defined use cases that were easy to implement.
But complexity grew quite rapidly. A few 'if's were not sufficient anymore,
so we had to look for a more robust solution.
After a little bit of research, we found that most of the solutions depend heavily on neural networks.
That gives these solutions an edge with multiple short sentences, and general conversations about
weather, sport, local natural wonders, etc. This is a robust solution for general use,
when the conversations flow naturally from beginning to end.
Decision-making is hidden in the neural network, which is trained with a sample data set.
Neural networks are easy to start with, and adding new features is simple. Let's use it!
Well, not so fast... In testing, it works wonders, but as soon as we put it into the hands of test-users,
we were bombarded with bugs. There was something we forgot: Real people. Users were giving us only partial
information, and we didn't cover every possible angle. We quickly lost control over the conversation flow,
with multiple use cases and various responses from the database.
This was not ideal. If we were aiming for a small-talk bot, a neural network would be ideal,
but we were building a single-purpose bot. Users know exactly why they open a conversation with our
bot - they want a reservation. We had to regain control of the conversation flow in the code and get all
the information which the app needed from the user. The solution had to be simple, maintainable,
testable, and scalable.
And so we rebuilt the application into a state machine where the bot is in control of the conversation flow.
Simply put, it gets a state, in our case a point in the conversation with the user,
and a list of information that the user already gave to the bot. Based on these data, it transitions
into the next state - the next point in the conversation.
Users seek to achieve a goal, the bot guides them through the conversation and asks for the information needed.
We didn't forget to add a few simple small talk phrases as well, but not many users come to our bot for small talk.
The core of the state machine deserves its own blog post, but here is a small example:
What does the bot say?
Let's say we have a user who needs to book a room for an important meeting for next Monday from 9:30 to 10:30. In general, to make such a reservation, you will need a time and room specification. This is how the conversation could unfold:
User : Book me room E.3.30 for next Friday from 9:30 to 10:30.
Bot : I have booked room E.3.30 on 26.02. from 09:30 to 10:30 for you.
Quite simple, right? But the same conversation could look also like this:
User : I need a reservation.
Bot : Do you want me to book you a seat or a meeting room?
User : Meeting room.
Bot : Alright! Would you like to book a specific meeting room or Shall I just book one that is free?
User : Specific meeting room.
Bot : What room would that be?
User : R.23.
Bot : For which time shall I reserve the room?
User : Next Monday.
Bot : Can you specify the time for me, please?
User : 9:30 to 10:30.
Bot : I have booked room R.23 for 1st March from 09:30 to 10:30 for you.
In the first example, the user knew exactly what he wanted. In the second conversation, the bot guides the user. These examples are on the opposite sides of the conversation spectrum, but we also cover everything in the middle. When the user states the date and time earlier in the conversation, the bot should not ask for it again. The main point is that all of these conversations are processed with the same conversation flow (same code, same tests).
What is neat about this approach is that we can take a part of the conversation and re-use it for multiple intents. For example, time validation can be reused in any conversation where a time specification is needed.
There is one part of the example that I've excluded, and that's the access to the reservation system itself. Here we simply save the request and call it a day, but in everyday use, there are some limitations - the reservation may very well be refused. All of these possibilities have to be covered, and users have to be properly informed. Again, how to do that is a topic for a whole new blog post.
As you can see, there are a number of topics to consider when building a chatbot from scratch: from NLP to decision making, to actions in the reservation system, and finally to the answers.
Thanks to rigorous testing and a clear framework, we are not blocked by bloated training data sets, and multiple devs can develop independently of each other.
Currently, our application can process multiple base intents like show, cancel, check or
book in English and German. Based on these intents, the bot can give the user up to 300 different conversations
with multiple responses. More conversations and variations are still in development and we hope to reach
500 in the near future. Our system is currently used by more than 1400 users and on average 2000
interactions happen every week.