How-to: Dialogue management¶
Overview¶
At the heart of the communication subsystem, the dialogue management provides a multi-party, multi-turn design to handle conversation flows. Dialogues can be kept separate by actors and purposes, and can be started and stopped independently of each other using a high-level API.
Dialogue anatomy¶
Dialogue role¶
Each dialogue (1
in figure) handled by communication_hub has a
specific purpose named role
, which is used to configure the chatbot for
this specific dialogue. Examples of roles can be chitchat
, ask
,
entertain
, plan
, etc. Such roles define the style and goal of the
conversation, and can be used by the chatbot to steer the conversation and
finally close it when the goal is achieved (e.g., an ask
dialogue can be
closed when the user has provided the requested information). A
machine-readable result might be also be returned on dialogue closure (e.g., an
ask
dialogue returns the information requested in a structured format
defined by the role
configuration). The support for each role is
chatbot-dependant, but the default
role is is generally available, which is
used for generic conversations. See How-to: RASA chatbot and How-to: LLM chatbot
for more information.
Dialogue actor¶
A dialogue has also a target actor the robot is talking to. This actor can be a
person, a group of people or just anyone. The actor is used to filter the
incoming speeches from ASR (2
in figure) and to select for
which dialogues they are relevant. For example, if a dialogue A is opened
targeting Bob
and dialogue B targeting anyone, an incoming speech from
Bob
will be added to the dialogue history of both dialogues, while a speech
from Alice
will be added only to dialogue B. When multiple dialogues match
the incoming speech, the one with the highest priority is selected to formulate
a response (3
in figure).
Parallel dialogues management¶
With such a design, it is possible to have multiple dialogues opened at the same time, each one with a different purpose and actor, and have the robot switch between them without losing the context of the conversation.
For example, you might start with a generic chitchat
dialogue targeting anyone.
Then Bob
, while talking to the robot, expresses the request to fetch him a book.
An intent is recognized and sent to the application (see Chatbot interaction).
The application might know about two different books (likely querying the knowledge_base),
and might be programmed to start a new high-priority dialogue to ask
Bob which one he prefers.
After Bob
has answered, the ask
dialogue closes returning the selected book,
and the application can run a routine to fetch it.
When the robot is done, the chitchat
dialogue is resumed,
and has full knowledge of the conversation that happened in the ask
dialogue,
so speaking about the book may naturally happen.
ROS interface¶
communication_hub is the one performing dialogue management system. It listens to the human speech recognized by ASR, uses chatbots (like RASA or a LLM) to extract the Intents and get a response, which are played back to the human using text-to-speech (potentially as multi-modal expressions).
Currently, communication hub can only connect to a single chatbot at a given time, and the default one is chatbot_rasa. The Chatbot interaction section (below) shows how a chatbot can be integrated with communication hub and how to select which one to use.
To manage the dialogues, the communication_hub provides the following skills:
/chat: the general-purpose interface to open a dialogue
/ask: the interface to open a dialogue with the specific purpose of getting some pieces of information from someone
See Dialogue anatomy, below, for more information on how to use these interfaces.
By default, a dialogue for chitchatting with anyone is opened by 🚧 Default application while the robot is awake. This can be changed by replacing the default 🚧 Default application with a custom one.
You can also set to true
the parameter enable_default_chatbot
to open on startup a dialogue with anyone,
with the role specified by the parameter default_chatbot_role
.
Web interface¶
The Web User Interface offers some insight on the status of
communication_hub under Diagnostics > Communication > Manager >
communication_hub
.
The most relevant values are:
Not listening
: flag indicating any incoming speech from ASR is temporarily ignored (which happens while a chatbot is already busy processing some previous speech)Dialogue N
: an ongoing dialogue opened by /chat, /ask orenable_default_chatbot
Expression N
: a queued multi-modal expression, created by /say or from a response of an ongoing dialogue
Chatbot interaction¶
The communication_hub node is designed to work with any chatbot that implements the following interfaces:
/chatbot/start_dialogue
: opens a new dialogue with the chatbot, according to arole
(the same as in Dialogue anatomy). When the dialogue fulfills its purpose, it returns arole
specific machine-readable result, which communication_hub forwards to the associated /chat or /ask (if any)./chatbot/dialogue_interaction
: selecting an open dialogue, sends a human speech input to the chatbot to get a response and/or infer the human intent. communication_hub plays the response to How-to: Speech synthesis (TTS) and forwards the detected Intents to /intents.
Currently, two different chatbots nodes are supported, chatbot_rasa and
chatbot_ollama, both of which implement the above interfaces under the
name /<chatbot_node_name>/start_dialogue
and
/<chatbot_node_name>/dialogue_interaction
.
To select which chatbot to use, the communication_hub interfaces should be remapped to the correct chatbot ones. By default, this is done in favor or chatbot_rasa using the Configuration files. The user can use the same configuration files to remap the interfaces to any other compliant chatbot instead.
If you want to use no chatbot, you can set the parameter enable_chatbot
to
false. In this case, communication_hub will directly forward the
incoming speech to /intents as raw_user_speech
, letting the
application handle them, if required.