How-to: Automatic Speech Recognition (ASR)#

🏁 Goal of this tutorial

By the end of this tutorial, you will know how to call PAL’s robots’s speech recognition and use the output

PAL’s robots uses Vosk, an offline speech recognition software, for automatic speech recognition. It supports more than 20 languages and dialects. The ROS wrapper the robot uses is the vosk_asr node.

ROS interface#

The vosk_asr node is in charge of processing the /audio/channel0 input from the ReSpeaker microphone. It fully runs on the CPU (no GPU acceleration currently available).

The recognized text is published on the /humans/voices/*/speech topic corresponding to the current voice ID.

Warning

As of pal-sdk-23.12, automatic voice separation and identification is not available. Therefore all detected speech will be published on the topic /humans/voices/anonymous_speaker/speech.

The available ROS interfaces to process speech are listed on the ASR, TTS and dialogue management APIs page.

Step 1: Installed languages#

vosk_asr requires a language model to recognise a given language.

These language models are installed on the robot at this location: /opt/pal/gallium/share/vosk_language_models.

You can quickly check which language are available for ASR by connecting to the robot over SSH and typing:

ls /opt/pal/gallium/share/vosk_language_models/

Note

Contact PAL support if you need additional languages.

Step 2: Testing ASR from the terminal#

The ASR is started automatically when you boot up the robot (you can check whether is it currently running from the WebCommander’s Start-ups tab and Diagnostics tab).

Try to speak to the robot and monitor the recognized output by checking the /humans/voices/anonymous_speaker/speech topic:

rostopic echo /humans/voices/anonymous_speaker/speech
header:
  seq: 1
  stamp:
    secs: 0
    nsecs:         0
  frame_id: ''
incremental: ''
final: "hi robot"
confidence: 0.0

Step 3: Testing ASR from code#

From Python you can call the respective ROS actions, and subscribe to the audio transcription. The asr_tutorial example package below shows how the robot replies back based on the recognized speech. To answer the user, the robot uses its text-to-speech capability, calling the /tts action.

Example script name: asr_tutorial.py

#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import rospy
import time

from actionlib import SimpleActionClient

from hri_msgs.msg import LiveSpeech
from pal_interaction_msgs.msg import TtsAction, TtsGoal
from std_msgs.msg import String

# The following demo subscribes to speech-to-text output and triggers TTS
# based on response

class ASRDemo(object):
    def __init__(self):

        self.asr_sub = rospy.Subscriber(
            '/humans/voices/anonymous_speaker/speech',
            LiveSpeech,
            self.asr_result)

        self.tts_client = SimpleActionClient("/tts", TtsAction)
        self.tts_client.wait_for_server()

        self.language = "en_US"
        rospy.loginfo("ASR demo ready")

    def asr_result(self, msg):

        # the LiveSpeech message has two main field: incremental and final.
        # 'incremental' is updated has soon as a word is recognized, and
        # will change while the sentence recognition progresses.
        # 'final' is only set at the end, when a full sentence is
        # recognized.
        sentence = msg.final

        rospy.loginfo("Understood sentence: " + sentence)

        if (sentence == "what is your name"):
            self.tts_output("My name is ARI")

        elif (sentence == "how are you"):
            self.tts_output("I am feeling great")

        elif (sentence == "goodbye"):
            self.tts_output("See you!")

    def tts_output(self, answer):

        self.tts_client.cancel_goal()
        goal = TtsGoal()
        goal.rawtext.lang_id = self.language
        goal.rawtext.text = str(answer)
        self.tts_client.send_goal_and_wait(goal)

if __name__ == "__main__":
    rospy.init_node("asr_tutorial")
    node = ASRDemo()
    rospy.spin()

You can run the script from the robot docker as explained in ROS development, or by deploying the package inside the robot as described in Deploying code.

Make sure the CMakeLists.txt includes the dependencies for hri_msgs, hri_actions_msgs and pal_interaction_msgs and installs the Python script in the robot if you plan to deploy it.

cmake_minimum_required(VERSION 3.0.2)
project(asr_tutorial)

find_package(catkin REQUIRED COMPONENTS
  rospy
  std_msgs
  hri_msgs
  pal_interaction_msgs
)

catkin_package(
   CATKIN_DEPENDS rospy std_msgs hri_msgs pal_interaction_msgs
)

catkin_install_python(PROGRAMS src/asr_tutorial.py
   DESTINATION ${CATKIN_PACKAGE_BIN_DESTINATION}
)

And the following package.xml:

<?xml version="1.0"?>
<package format="2">
  <name>asr_tutorial</name>
  <version>1.0.0</version>
  <description>The asr_tutorial package</description>
  <maintainer email="user@example.com">My Name</maintainer>
  <buildtool_depend>catkin</buildtool_depend>
  <depend>rospy</depend>
  <depend>std_msgs</depend>
  <depend>actionlib</depend>
  <depend>hri_msgs</depend>
  <depend>pal_interaction_msgs</depend>

  <export>
  </export>
</package>

You can execute the code in two ways:

From the robot docker, deploy the code inside PAL’s robots and run from the robot directly, as in Deploying code.

$ cd ~/my_ws/
$ source devel/setup.bash
$ rosrun pal_deploy deploy.py ari-0c --package "asr_tutorial"
$ ssh pal@robot-0c

Run directly from an external computer by exporting ROS_MASTER_URI

Regardless, at the end run the ROS script:

$ rosrun asr_tutorial asr_tutorial.py