ARI microphone array and audio recording#

ARI has a ReSpeaker Mic Array V2.0 consisting of 4 microphones, positioned in the torso, just below the touch-screen.

The microphone is connected via USB to the main PC. The PC then outputs audio through the robot’s two speakers, located at each lateral side of the torso, that include a 30W amplifier.

Main hardware features:

Support USB Audio Class 1.0 (UAC 1.0)
Four microphones array
Sensitivity: -26 dBFS (omnidirectional)
Signal-to-Noise Ratio: 63dB
12 programmable RGB LED indicators

Note

By default, the microphone LEDs are configured to turn blue when the microphone hears something. In addition, a light blue LED indicates the current sound source direction.

The parameter enable_leds can be set to False in the respeaker_ros launch file to disable this behaviour.

The ReSpeaker microphone also implements several audio processing directly on the hardware:

far-field Voice Activity Detection (up to 5m away);
Direction of Arrival (DoA) estimation;
Beamforming (BF) to focus on sound coming from a specific source;
noise suppression;
de-reverberation;
acoustic echo cancellation, enabling the robot to ignore its own voice.

ROS API#

ARI relies on an heavily modified version of the open-source respeaker_ros driver.

It exposes the following topics:

/audio/raw: Merged audio channel of the 4 microphones (alias for /audio/channel0).
/audio/channel0: Merged audio channel of the 4 microphones
/audio/channel1: Audio stream from the first microphone.
/audio/channel2: Audio stream from the second microphone.
/audio/channel3: Audio stream from the third microphone.
/audio/channel4: Audio stream from the fourth microphone.
/audio/channel5: Monitor audio stream from the audio input (used for self-echo cancellation).
/audio/voice_detected: Publishes a boolean indicating if a voice is currently detected (ie, whether someone is currently speaking)
/audio/speech: raw audio data of detected speech (published once the person has finished speaking).
/audio/sound_direction: The estimated Direction of Arrival of the detected sound.
/audio/sound_localization: The estimated sound source location.

Audio can then be recorded as a rosbag, for example:

ssh pal@ari-0c

rosbag record -O audio_sample.bag /audio/channel0

Regardless on how the audio is captured, it can later be processed.

Recording audio directly from ALSA#

You can also access the ReSpeaker as a regular Linux ALSA recording device: log onto the robot and it should appear when running the command arecord -l (and arecord -L to get the list of ALSA device names).

To record, you first need to stop the ROS driver and find the correct device name in the arecord -L list. In the example below we are recording it at 16 KHz sampling rate, recording all 6 channels of the respeaker, for a duration of 10 seconds.

> pal-stop respeaker_ros
> arecord -D "hw:CARD=ArrayUAC10,DEV=0" -fS16_LE -c6 -d 10 -r16000 > audio.wav

Play back the recorded sound:

> aplay audio.wav

You can then re-enable the ROS interface:

> pal-restart respeaker_ros

ARI microphone array and audio recording#

ROS API#

Recording audio directly from ALSA#

See also#