ASR, TTS and dialogue management APIs#

ASR API#

The vosk_asr node is in charge of processing the /audio/channel0 input from the ReSpeaker microphone. It fully runs on the CPU (no GPU acceleration currently available).

Once the language is selected, it will start processing the audio. The recognized text is published in the /humans/voices/*/speech topic corresponding to the current voice ID.

Warning

As of pal-sdk-23.12, automatic voice separation and identification is not available. Therefore all detected speech will be published on the topic /humans/voices/anonymous_speaker/speech.

The available ROS interfaces to process speech are:

ASR ROS actions#

/asr/set_locale ROS action (type i18n_msgs/SetLocaleAction): change the ASR language

ASR published topics#

/humans/voices/*/speech ROS topic (type hri_msgs/LiveSpeech): publishes the incremental and final text recognized
/humans/voices/*/is_speaking ROS topic (type std_msgs/Bool): publishes a boolean indicating whether a person is speaking or not
/humans/voices/*/audio ROS topic (type audio_common_msgs/AudioData): republishes the /audio/channel0 processed audio topic coming from the ReSpeaker array

Wake-up word#

Wake-up word ROS services#

/wakeup_monitor/enable (type soft_wakeup_word/Enable): enable/disable the monitoring
/wakeup_monitor/set_wakeup_pattern (type soft_wakeup_word/SetWakeupPattern): set a custom ‘wakeup’ pattern (C++ regular expression)
/wakeup_monitor/get_wakeup_pattern (type soft_wakeup_word/GetWakeupPattern): get the current wake-up pattern
/wakeup_monitor/set_sleep_pattern (type soft_wakeup_word/SetSleepPattern): set a custom ‘sleep’ pattern (C++ regular expression)
/wakeup_monitor/get_sleep_pattern (type soft_wakeup_word/GetSleepPattern): get the current ‘sleep’ pattern

Wake-up word published topics#

/active_listening (type std_msgs/Bool): whether or not the robot is ‘awake’ and should actively process incoming speech. In particular, this topic is used by the dialogue manager (chatbot) to decide to process or not incoming speech.

Note that you can manually publish true or false on this topic to manually activate or disactivate the processing of incoming speech by the chatbot.

Chatbot/Dialogue management#

The chatbot engine comes with the following set of ROS interfaces:

Chatbot ROS actions#

/chatbot/set_locale (type i18n_msgs/SetLocaleAction): changes the current language of the RASA chantbot engine.

Warning

Switching RASA language takes about 30 seconds, depending on the size of the model.

Chatbot ROS services#

/active_chatbot (type chatbot_msgs/GetActiveChatbot): returns the chatbot that is currently running on the robot.

Topics subscribed to by the chatbot#

the main input of the chatbot is the text recognised from the people speaking around the robot on the /humans/voices/*/speech family of topics.

You can manually publish text on one of these topics to test the chatbot behaviour.
/active_listening: the chatbot only processes text input if the last value published on /active_listening is true. If not, the text input is ignored.

You can use this topic to temporarily disable the chatbot (for instance, if you want to process the ASR speech yourself).

Topics published by the chatbot#

/intents (type hri_actions_msgs/Intent): the chatbot publishes on the /intents topic the recognised ROS intents (for instance, BRING something somewhere). See Intents for details on intents.

Note

Note that some interactions do not lead to intents being published, are they are fully handled within the chatbot engine. For instance, if you say ‘Hi’ to the robot, it will directly reply with a greeting, without emitting a dedicated intent.

Text-to-speech (TTS)#

TTS ROS actions#

/tts (documentation)