How-to: Automatic Speech Recognition (ASR)

Overview

The robot uses Vosk, an offline speech recognition software, for automatic speech recognition. It supports more than 20 languages and dialects. The ROS wrapper the robot uses is the asr_vosk.

ROS interface

The asr_vosk node is in charge of processing the /audio_in/raw input from the microphone. It fully runs on the CPU (no GPU acceleration currently available).

The recognized text is published on the /humans/voices/*/speech topic corresponding to the current voice ID.

Warning

As of PAL OS edge, automatic voice separation and identification is not available. Therefore all detected speech will be published on the topic /humans/voices/anonymous_speaker/speech.

To see how the recognized speech is used on your robot, check How-to: Dialogue management.

Warning

To avoid self-listening, the ASR node will not publish the recognized speech if the robot is currently speaking. In particular, whenever /robot_speaking is True, the ASR node will not publish the recognized speech.

asr_vosk is a localized node, meaning its current language selection is controlled by the i18n_manager. Refer to [‼️ROS 1] Internationalisation and language support for more information on language availability and selection.

Web interface

The Web User Interface provides the status of the asr_vosk under Diagnostics > Communication > Speech recognition > asr_vosk.

There you can check, among other things:

  • Supported locales: the Vosk language models currently installed on the robot;

  • Current default locale: the language currently used by the ASR node;

  • Currently listening: whether the ASR node is currently listening to the microphone;

  • Last recognized sentence: the last recognized sentence, useful for debugging purposes.

Install additional languages

asr_vosk requires a language model to recognize a given language.

The language models provided by PAL are debian packages following the naming convention pal-*-asr-vosk-language-model-<language>-<region>-*. For example, the American English language model (installed by default), has language code en and region code us.

You can also check the installed language models from the 🚧 On-screen menu.

Note

Contact PAL support if you need additional languages.

Check the ASR from the terminal

Try to speak to the robot and monitor the recognized output by checking the /humans/voices/anonymous_speaker/speech topic:

$ ros2 topic echo /humans/voices/anonymous_speaker/speech --field final

---
hi robot
---

You can also check the latest recognized speech in the Web interface.