Microphone array and audio recording#

TIAGoPro and ARI have a ReSpeaker Mic Array V2.0 consisting of 4 microphones, positioned in the head (TIAGoPro) or in the torso, just below the touch-screen (ARI).

Caution

TIAGo (v1) features a simple microphone. Some of the features available on robots equipped witht he reSpeaker microphone array (like sound source localisation or noise cancellation) are therefore not available on this robot.

The microphone is connected via USB to the main PC. The PC then outputs audio through the robot’s two speakers, located at each lateral side of the torso, that include a 30W amplifier.

Main hardware features:

Support USB Audio Class 1.0 (UAC 1.0)
Four microphones array
Sensitivity: -26 dBFS (omnidirectional)
Signal-to-Noise Ratio: 63dB
12 programmable RGB LED indicators

Note

By default, the microphone LEDs are configured to turn blue when the microphone hears something. In addition, a light blue LED indicates the current sound source direction.

The parameter enable_leds can be set to False in the respeaker_ros launch file to disable this behaviour.

The ReSpeaker microphone also implements several audio processing directly on the hardware:

far-field Voice Activity Detection (up to 5m away);
Direction of Arrival (DoA) estimation;
Beamforming (BF) to focus on sound coming from a specific source;
noise suppression;
de-reverberation;
acoustic echo cancellation, enabling the robot to ignore its own voice.

ROS API#

PAL’s robots relies on an heavily modified version of the open-source respeaker_ros driver.

It exposes the following topics:

/audio/raw: Merged audio channel of the 4 microphones (alias for /audio/channel0).
/audio/channel0: Merged audio channel of the 4 microphones
/audio/channel1: Audio stream from the first microphone.
/audio/channel2: Audio stream from the second microphone.
/audio/channel3: Audio stream from the third microphone.
/audio/channel4: Audio stream from the fourth microphone.
/audio/channel5: Monitor audio stream from the audio input (used for self-echo cancellation).
/audio/voice_detected: Publishes a boolean indicating if a voice is currently detected (ie, whether someone is currently speaking)
/audio/speech: raw audio data of detected speech (published once the person has finished speaking).
/audio/sound_direction: The estimated Direction of Arrival of the detected sound.
/audio/sound_localization: The estimated sound source location.

Audio can then be recorded as a rosbag, for example:

ssh pal@<robot>-0c

rosbag record -O audio_sample.bag /audio/channel0

Regardless on how the audio is captured, it can later be processed.

Recording audio directly from ALSA#

You can also access the ReSpeaker as a regular Linux ALSA recording device: log onto the robot and it should appear when running the command arecord -l (and arecord -L to get the list of ALSA device names).

To record, you first need to stop the ROS driver and find the correct device name in the arecord -L list. In the example below we are recording it at 16 KHz sampling rate, recording all 6 channels of the respeaker, for a duration of 10 seconds.

> pal-stop respeaker_ros
> arecord -D "hw:CARD=ArrayUAC10,DEV=0" -fS16_LE -c6 -d 10 -r16000 > audio.wav

Play back the recorded sound:

> aplay audio.wav

You can then re-enable the ROS interface:

> pal-restart respeaker_ros

Microphone array and audio recording#

ROS API#

Recording audio directly from ALSA#

See also#