ARI microphone array and audio recording#
ARI has a ReSpeaker Mic Array V2.0 consisting of 4 microphones, positioned in the torso, just below the touch-screen.

The microphone is connected via USB to the main PC. The PC then outputs audio through the robot’s two speakers, located at each lateral side of the torso, that include a 30W amplifier.

Main hardware features:
Support USB Audio Class 1.0 (UAC 1.0)
Four microphones array
Sensitivity: -26 dBFS (omnidirectional)
Signal-to-Noise Ratio: 63dB
12 programmable RGB LED indicators
Note
By default, the microphone LEDs are configured to turn blue when the microphone hears something. In addition, a light blue LED indicates the current sound source direction.
The parameter enable_leds
can be set to False
in the respeaker_ros
launch file to disable this behaviour.
The ReSpeaker microphone also implements several audio processing directly on the hardware:
far-field Voice Activity Detection (up to 5m away);
Direction of Arrival (DoA) estimation;
Beamforming (BF) to focus on sound coming from a specific source;
noise suppression;
de-reverberation;
acoustic echo cancellation, enabling the robot to ignore its own voice.
ROS API#
ARI relies on an heavily modified version of the open-source respeaker_ros driver.
It exposes the following topics:
/audio/raw: Merged audio channel of the 4 microphones (alias for /audio/channel0).
/audio/channel0: Merged audio channel of the 4 microphones
/audio/channel1: Audio stream from the first microphone.
/audio/channel2: Audio stream from the second microphone.
/audio/channel3: Audio stream from the third microphone.
/audio/channel4: Audio stream from the fourth microphone.
/audio/channel5: Monitor audio stream from the audio input (used for self-echo cancellation).
/audio/voice_detected: Publishes a boolean indicating if a voice is currently detected (ie, whether someone is currently speaking)
/audio/speech: raw audio data of detected speech (published once the person has finished speaking).
/audio/sound_direction: The estimated Direction of Arrival of the detected sound.
/audio/sound_localization: The estimated sound source location.
Audio can then be recorded as a rosbag, for example:
ssh pal@ari-0c
rosbag record -O audio_sample.bag /audio/channel0
Regardless on how the audio is captured, it can later be processed.
Recording audio directly from ALSA#
You can also access the ReSpeaker as a regular Linux ALSA recording device:
log onto the robot and it should appear when running the command arecord -l
(and arecord -L
to get the list of ALSA device names).
To record, you first need to stop the ROS driver and find the correct device name
in the arecord -L
list. In the example below we are recording it
at 16 KHz sampling rate, recording all 6 channels of the respeaker, for a
duration of 10 seconds.
> pal-stop respeaker_ros
> arecord -D "hw:CARD=ArrayUAC10,DEV=0" -fS16_LE -c6 -d 10 -r16000 > audio.wav
Play back the recorded sound:
> aplay audio.wav
You can then re-enable the ROS interface:
> pal-restart respeaker_ros