How-to: Automatic Speech Recognition (ASR)#
🏁 Goal of this tutorial
By the end of this tutorial, you will know how to call PAL’s robots’s speech recognition and use the output
PAL’s robots uses Vosk, an offline speech recognition software, for automatic speech recognition. It supports more than 20 languages and dialects. The ROS wrapper the robot uses is the vosk_asr node.
ROS interface#
The vosk_asr node is in charge of processing the /audio/channel0 input from the ReSpeaker microphone. It fully runs on the CPU (no GPU acceleration currently available).
The recognized text is published on the /humans/voices/*/speech topic corresponding to the current voice ID.
Warning
As of pal-sdk-23.12
, automatic voice separation and identification is not
available. Therefore all detected speech will be published on the topic
/humans/voices/anonymous_speaker/speech
.
The available ROS interfaces to process speech are listed on the ASR, TTS and dialogue management APIs page.
Step 1: Installed languages#
vosk_asr requires a language model to recognise a given language.
These language models are installed on the robot at this location:
/opt/pal/gallium/share/vosk_language_models
.
You can quickly check which language are available for ASR by connecting to the robot over SSH and typing:
ls /opt/pal/gallium/share/vosk_language_models/
Note
Contact PAL support if you need additional languages.
Step 2: Testing ASR from the terminal#
The ASR is started automatically when you boot up the robot (you can check whether is it currently running from the WebCommander’s Start-ups tab and Diagnostics tab).
Try to speak to the robot and monitor the recognized output by
checking the /humans/voices/anonymous_speaker/speech
topic:
rostopic echo /humans/voices/anonymous_speaker/speech
header:
seq: 1
stamp:
secs: 0
nsecs: 0
frame_id: ''
incremental: ''
final: "hi robot"
confidence: 0.0
Step 3: Testing ASR from code#
From Python you can call the respective ROS actions, and subscribe to the audio
transcription. The asr_tutorial
example package below shows how the robot
replies back based on the recognized speech. To answer the user, the robot uses
its text-to-speech capability, calling the /tts action.
Example script name: asr_tutorial.py
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import rospy
import time
from actionlib import SimpleActionClient
from hri_msgs.msg import LiveSpeech
from pal_interaction_msgs.msg import TtsAction, TtsGoal
from std_msgs.msg import String
# The following demo subscribes to speech-to-text output and triggers TTS
# based on response
class ASRDemo(object):
def __init__(self):
self.asr_sub = rospy.Subscriber(
'/humans/voices/anonymous_speaker/speech',
LiveSpeech,
self.asr_result)
self.tts_client = SimpleActionClient("/tts", TtsAction)
self.tts_client.wait_for_server()
self.language = "en_US"
rospy.loginfo("ASR demo ready")
def asr_result(self, msg):
# the LiveSpeech message has two main field: incremental and final.
# 'incremental' is updated has soon as a word is recognized, and
# will change while the sentence recognition progresses.
# 'final' is only set at the end, when a full sentence is
# recognized.
sentence = msg.final
rospy.loginfo("Understood sentence: " + sentence)
if (sentence == "what is your name"):
self.tts_output("My name is ARI")
elif (sentence == "how are you"):
self.tts_output("I am feeling great")
elif (sentence == "goodbye"):
self.tts_output("See you!")
def tts_output(self, answer):
self.tts_client.cancel_goal()
goal = TtsGoal()
goal.rawtext.lang_id = self.language
goal.rawtext.text = str(answer)
self.tts_client.send_goal_and_wait(goal)
if __name__ == "__main__":
rospy.init_node("asr_tutorial")
node = ASRDemo()
rospy.spin()
You can run the script from the robot docker as explained in ROS development, or by deploying the package inside the robot as described in Deploying code.
Make sure the CMakeLists.txt
includes the dependencies for hri_msgs
,
hri_actions_msgs
and pal_interaction_msgs
and installs the Python script
in the robot if you plan to deploy it.
cmake_minimum_required(VERSION 3.0.2)
project(asr_tutorial)
find_package(catkin REQUIRED COMPONENTS
rospy
std_msgs
hri_msgs
pal_interaction_msgs
)
catkin_package(
CATKIN_DEPENDS rospy std_msgs hri_msgs pal_interaction_msgs
)
catkin_install_python(PROGRAMS src/asr_tutorial.py
DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
)
And the following package.xml
:
<?xml version="1.0"?>
<package format="2">
<name>asr_tutorial</name>
<version>1.0.0</version>
<description>The asr_tutorial package</description>
<maintainer email="user@example.com">My Name</maintainer>
<buildtool_depend>catkin</buildtool_depend>
<depend>rospy</depend>
<depend>std_msgs</depend>
<depend>actionlib</depend>
<depend>hri_msgs</depend>
<depend>pal_interaction_msgs</depend>
<export>
</export>
</package>
You can execute the code in two ways:
From the robot docker, deploy the code inside PAL’s robots and run from the robot directly, as in Deploying code.
$ cd ~/my_ws/
$ source devel/setup.bash
$ rosrun pal_deploy deploy.py ari-0c --package "asr_tutorial"
$ ssh pal@robot-0c
Run directly from an external computer by exporting
ROS_MASTER_URI
Regardless, at the end run the ROS script:
$ rosrun asr_tutorial asr_tutorial.py