ASR, TTS and dialogue management APIs#

ASR API#

The vosk_asr node is in charge of processing the /audio/channel0 input from the ReSpeaker microphone. It fully runs on the CPU (no GPU acceleration currently available).

Once the language is selected, it will start processing the audio until the /stop_asr ROS action is called. The recognized text is published in the /humans/voices/*/speech topic corresponding to the current voice ID.

Warning

As of pal-sdk-23.1, automatic voice separation and identification is not available. Therefore all detected speech will be published on the topic /humans/voices/anonymous_speaker/speech.

The available ROS interfaces to process speech are:

ROS actions#

/start_asr ROS action (type hri_actions_msgs/StartASR): starts processing audio captured through the ReSpeaker microphone in a given language
/stop_asr ROS action (type hri_actions_msgs/StopASR): stops processing audio captured

Published topics#

/humans/voices/*/speech ROS topic (type hri_msgs/LiveSpeech): publishes the incremental and final text recognized
/humans/voices/*/is_speaking ROS topic (type std_msgs/Bool): publishes a boolean indicating whether a person is speaking or not
/humans/voices/*/audio ROS topic (type audio_common_msgs/AudioData): republishes the /audio/channel0 processed audio topic coming from the ReSpeaker array

Wake-up word#

ROS services#

/wakeup_monitor/enable (type soft_wakeup_word/Enable): enable/disable the monitoring
/wakeup_monitor/set_wakeup_pattern (type soft_wakeup_word/SetWakeupPattern): set a custom ‘wakeup’ pattern (C++ regular expression)
/wakeup_monitor/get_wakeup_pattern (type soft_wakeup_word/GetWakeupPattern): get the current wake-up pattern
/wakeup_monitor/set_sleep_pattern (type soft_wakeup_word/SetSleepPattern): set a custom ‘sleep’ pattern (C++ regular expression)
/wakeup_monitor/get_sleep_pattern (type soft_wakeup_word/GetSleepPattern): get the current ‘sleep’ pattern

Published topics#

/active_listening (type std_msgs/Bool): whether or not the robot is ‘awake’ and should actively process incoming speech. In particular, this topic is used by the dialogue manager (chatbot) to decide to process or not incoming speech.

Note that you can manually publish true or false on this topic to manually activate or disactivate the processing of incoming speech by the chatbot.

Chatbot/Dialogue management#

The chatbot engine comes with the following set of ROS interfaces:

ROS actions#

/manage_chatbot (type chatbot_msgs/ChatbotServerAction). Starts the respective RASA language model given a language goal input, as long as it has not already been loaded.

Warning

Switching RASA language takes about 20 seconds, depending on the size of the model.

/train_chatbot (type chatbot_msgs/ChatbotTrainAction). Trains the chatbot specified. Note that the goal lang must exist either in ~/.pal/chatbot_cfg/rasa_chatbot_chitchat or /opt/pal/gallium/share/rasa_chatbot_chitchat directories.

ROS services#

/active_chatbot (type chatbot_msgs/GetActiveChatbot). Returns the chatbot that is currently running on the robot.

Published topics#

/intents (type hri_actions_msgs/Intent) Optionally the chatbot publishes detected Intents on this topic. This is only if the response requires a more advanced action of the robot that is not only to use text-to-speech.

ASR, TTS and dialogue management APIs#

ASR API#

ROS actions#

Published topics#

Wake-up word#

ROS services#

Published topics#

Chatbot/Dialogue management#

ROS actions#

ROS services#

Published topics#

Text-to-speech (TTS)#

ROS actions#

ROS services#

Published topics#