How-to: Dialogue management

Overview

At the heart of the communication subsystem, the dialogue management provides a multi-party, multi-turn design to handle conversation flows. Dialogues can be kept separate by actors and purposes, and can be started and stopped independently of each other using a high-level API.

Dialogue anatomy

image/svg+xml 3D environment:reMapother semanticknowledge sourcespeople perception:ROS4HRI semantic state aggregator ~/get_updates ~/configure communication hubinputmaps inputto dialogue2missioncontrollerchat skill /chat[skills_list/Chat.action] mic: reSpeaker /audio_in/raw /humans/voices/*/speech knowledge baseKnowledgeCore ASR: vosk robot state chatbot enginechatbot_rasaor chatbot_ollama ~/get_supported_roles ~/start_dialogue ~/dialogue_interaction speech/tts_engine/say gestures expressions/robot_face/expressions closed captions/communication_hub/closed_captions ask skill /ask[.../Ask.action] say skill /say[.../Say.action] create adialogue1get response and/oruser intent3responseintentactive dialogues- role- person_id/group_id- prioritya5b001 f457e06ff6e2578d5eeeb9a1 .../intents[hri_action_msgs/Intent.msg] inputsoft wake-up word

Dialogue role

Each dialogue (1 in figure) handled by communication_hub has a specific purpose named role, which is used to configure the chatbot for this specific dialogue. Examples of roles can be chitchat, ask, entertain, plan, etc. Such roles define the style and goal of the conversation, and can be used by the chatbot to steer the conversation and finally close it when the goal is achieved (e.g., an ask dialogue can be closed when the user has provided the requested information). A machine-readable result might be also be returned on dialogue closure (e.g., an ask dialogue returns the information requested in a structured format defined by the role configuration). The support for each role is chatbot-dependant, but the default role is is generally available, which is used for generic conversations. See How-to: RASA chatbot and How-to: LLM chatbot for more information.

Dialogue actor

A dialogue has also a target actor the robot is talking to. This actor can be a person, a group of people or just anyone. The actor is used to filter the incoming speeches from ASR (2 in figure) and to select for which dialogues they are relevant. For example, if a dialogue A is opened targeting Bob and dialogue B targeting anyone, an incoming speech from Bob will be added to the dialogue history of both dialogues, while a speech from Alice will be added only to dialogue B. When multiple dialogues match the incoming speech, the one with the highest priority is selected to formulate a response (3 in figure).

Parallel dialogues management

With such a design, it is possible to have multiple dialogues opened at the same time, each one with a different purpose and actor, and have the robot switch between them without losing the context of the conversation.

For example, you might start with a generic chitchat dialogue targeting anyone. Then Bob, while talking to the robot, expresses the request to fetch him a book. An intent is recognized and sent to the application (see Chatbot interaction). The application might know about two different books (likely querying the knowledge_base), and might be programmed to start a new high-priority dialogue to ask Bob which one he prefers. After Bob has answered, the ask dialogue closes returning the selected book, and the application can run a routine to fetch it. When the robot is done, the chitchat dialogue is resumed, and has full knowledge of the conversation that happened in the ask dialogue, so speaking about the book may naturally happen.

ROS interface

communication_hub is the one performing dialogue management system. It listens to the human speech recognized by ASR, uses chatbots (like RASA or a LLM) to extract the Intents and get a response, which are played back to the human using text-to-speech (potentially as multi-modal expressions).

Currently, communication hub can only connect to a single chatbot at a given time, and the default one is chatbot_rasa. The Chatbot interaction section (below) shows how a chatbot can be integrated with communication hub and how to select which one to use.

To manage the dialogues, the communication_hub provides the following skills:

  • /chat: the general-purpose interface to open a dialogue

  • /ask: the interface to open a dialogue with the specific purpose of getting some pieces of information from someone

See Dialogue anatomy, below, for more information on how to use these interfaces.

By default, a dialogue for chitchatting with anyone is opened by 🚧 Default application while the robot is awake. This can be changed by replacing the default 🚧 Default application with a custom one.

You can also set to true the parameter enable_default_chatbot to open on startup a dialogue with anyone, with the role specified by the parameter default_chatbot_role.

Web interface

The Web User Interface offers some insight on the status of communication_hub under Diagnostics > Communication > Manager > communication_hub.

The most relevant values are:

  • Not listening: flag indicating any incoming speech from ASR is temporarily ignored (which happens while a chatbot is already busy processing some previous speech)

  • Dialogue N: an ongoing dialogue opened by /chat, /ask or enable_default_chatbot

  • Expression N: a queued multi-modal expression, created by /say or from a response of an ongoing dialogue

Chatbot interaction

The communication_hub node is designed to work with any chatbot that implements the following interfaces:

  • /chatbot/start_dialogue: opens a new dialogue with the chatbot, according to a role (the same as in Dialogue anatomy). When the dialogue fulfills its purpose, it returns a role specific machine-readable result, which communication_hub forwards to the associated /chat or /ask (if any).

  • /chatbot/dialogue_interaction: selecting an open dialogue, sends a human speech input to the chatbot to get a response and/or infer the human intent. communication_hub plays the response to How-to: Speech synthesis (TTS) and forwards the detected Intents to /intents.

Currently, two different chatbots nodes are supported, chatbot_rasa and chatbot_ollama, both of which implement the above interfaces under the name /<chatbot_node_name>/start_dialogue and /<chatbot_node_name>/dialogue_interaction.

To select which chatbot to use, the communication_hub interfaces should be remapped to the correct chatbot ones. By default, this is done in favor or chatbot_rasa using the Configuration files. The user can use the same configuration files to remap the interfaces to any other compliant chatbot instead.

If you want to use no chatbot, you can set the parameter enable_chatbot to false. In this case, communication_hub will directly forward the incoming speech to /intents as raw_user_speech, letting the application handle them, if required.

See also