ASR, TTS and dialogue management APIs#
ASR API#
The vosk_asr node is in charge of processing the /audio/channel0 input from the ReSpeaker microphone. It fully runs on the CPU (no GPU acceleration currently available).
Once the language is selected, it will start processing the audio until the /stop_asr ROS action is called. The recognized text is published in the /humans/voices/*/speech topic corresponding to the current voice ID.
Warning
As of pal-sdk-23.1
, automatic voice separation and identification is not
available. Therefore all detected speech will be published on the topic
/humans/voices/anonymous_speaker/speech
.
The available ROS interfaces to process speech are:
ROS actions#
/start_asr ROS action (type
hri_actions_msgs/StartASR
): starts processing audio captured through the ReSpeaker microphone in a given language/stop_asr ROS action (type
hri_actions_msgs/StopASR
): stops processing audio captured
Published topics#
/humans/voices/*/speech ROS topic (type
hri_msgs/LiveSpeech
): publishes the incremental and final text recognized/humans/voices/*/is_speaking ROS topic (type
std_msgs/Bool
): publishes a boolean indicating whether a person is speaking or not/humans/voices/*/audio ROS topic (type
audio_common_msgs/AudioData
): republishes the /audio/channel0 processed audio topic coming from the ReSpeaker array
Wake-up word#
ROS services#
/wakeup_monitor/enable (type
soft_wakeup_word/Enable
): enable/disable the monitoring/wakeup_monitor/set_wakeup_pattern (type
soft_wakeup_word/SetWakeupPattern
): set a custom ‘wakeup’ pattern (C++ regular expression)/wakeup_monitor/get_wakeup_pattern (type
soft_wakeup_word/GetWakeupPattern
): get the current wake-up pattern/wakeup_monitor/set_sleep_pattern (type
soft_wakeup_word/SetSleepPattern
): set a custom ‘sleep’ pattern (C++ regular expression)/wakeup_monitor/get_sleep_pattern (type
soft_wakeup_word/GetSleepPattern
): get the current ‘sleep’ pattern
Published topics#
/active_listening (type
std_msgs/Bool
): whether or not the robot is ‘awake’ and should actively process incoming speech. In particular, this topic is used by the dialogue manager (chatbot) to decide to process or not incoming speech.Note that you can manually publish
true
orfalse
on this topic to manually activate or disactivate the processing of incoming speech by the chatbot.
Chatbot/Dialogue management#
The chatbot engine comes with the following set of ROS interfaces:
ROS actions#
/manage_chatbot (type
chatbot_msgs/ChatbotServerAction
). Starts the respective RASA language model given alanguage
goal input, as long as it has not already been loaded.
Warning
Switching RASA language takes about 20 seconds, depending on the size of the model.
/train_chatbot (type
chatbot_msgs/ChatbotTrainAction
). Trains the chatbot specified. Note that the goallang
must exist either in ~/.pal/chatbot_cfg/rasa_chatbot_chitchat or /opt/pal/gallium/share/rasa_chatbot_chitchat directories.
ROS services#
/active_chatbot (type
chatbot_msgs/GetActiveChatbot
). Returns the chatbot that is currently running on the robot.
Published topics#
/intents (type
hri_actions_msgs/Intent
) Optionally the chatbot publishes detected Intents on this topic. This is only if the response requires a more advanced action of the robot that is not only to use text-to-speech.
Text-to-speech (TTS)#
/tts
(documentation)