ASR, TTS and dialogue management APIs#
ASR API#
The vosk_asr node is in charge of processing the /audio/channel0 input from the ReSpeaker microphone. It fully runs on the CPU (no GPU acceleration currently available).
Once the language is selected, it will start processing the audio. The recognized text is published in the /humans/voices/*/speech topic corresponding to the current voice ID.
Warning
As of pal-sdk-23.12
, automatic voice separation and identification is not
available. Therefore all detected speech will be published on the topic
/humans/voices/anonymous_speaker/speech
.
The available ROS interfaces to process speech are:
ASR ROS actions#
/asr/set_locale ROS action (type
i18n_msgs/SetLocaleAction
): change the ASR language
ASR published topics#
/humans/voices/*/speech ROS topic (type
hri_msgs/LiveSpeech
): publishes the incremental and final text recognized/humans/voices/*/is_speaking ROS topic (type
std_msgs/Bool
): publishes a boolean indicating whether a person is speaking or not/humans/voices/*/audio ROS topic (type
audio_common_msgs/AudioData
): republishes the /audio/channel0 processed audio topic coming from the ReSpeaker array
Wake-up word#
Wake-up word ROS services#
/wakeup_monitor/enable (type
soft_wakeup_word/Enable
): enable/disable the monitoring/wakeup_monitor/set_wakeup_pattern (type
soft_wakeup_word/SetWakeupPattern
): set a custom ‘wakeup’ pattern (C++ regular expression)/wakeup_monitor/get_wakeup_pattern (type
soft_wakeup_word/GetWakeupPattern
): get the current wake-up pattern/wakeup_monitor/set_sleep_pattern (type
soft_wakeup_word/SetSleepPattern
): set a custom ‘sleep’ pattern (C++ regular expression)/wakeup_monitor/get_sleep_pattern (type
soft_wakeup_word/GetSleepPattern
): get the current ‘sleep’ pattern
Wake-up word published topics#
/active_listening (type
std_msgs/Bool
): whether or not the robot is ‘awake’ and should actively process incoming speech. In particular, this topic is used by the dialogue manager (chatbot) to decide to process or not incoming speech.Note that you can manually publish
true
orfalse
on this topic to manually activate or disactivate the processing of incoming speech by the chatbot.
Chatbot/Dialogue management#
The chatbot engine comes with the following set of ROS interfaces:
Chatbot ROS actions#
/chatbot/set_locale (type
i18n_msgs/SetLocaleAction
): changes the current language of the RASA chantbot engine.
Warning
Switching RASA language takes about 30 seconds, depending on the size of the model.
Chatbot ROS services#
/active_chatbot (type
chatbot_msgs/GetActiveChatbot
): returns the chatbot that is currently running on the robot.
Topics subscribed to by the chatbot#
the main input of the chatbot is the text recognised from the people speaking around the robot on the /humans/voices/*/speech family of topics.
You can manually publish text on one of these topics to test the chatbot behaviour.
/active_listening: the chatbot only processes text input if the last value published on /active_listening is
true
. If not, the text input is ignored.You can use this topic to temporarily disable the chatbot (for instance, if you want to process the ASR speech yourself).
Topics published by the chatbot#
/intents (type
hri_actions_msgs/Intent
): the chatbot publishes on the/intents
topic the recognised ROS intents (for instance,BRING something somewhere
). See Intents for details on intents.
Note
Note that some interactions do not lead to intents being published, are they are fully handled within the chatbot engine. For instance, if you say ‘Hi’ to the robot, it will directly reply with a greeting, without emitting a dedicated intent.
Text-to-speech (TTS)#
TTS ROS actions#
/tts
(documentation)