ARI microphone array and audio recording#
ARI has a ReSpeaker Mic Array V2.0 consisting of 4 microphones, positioned in the torso, just below the touch-screen.

The microphone is connected via USB to the main PC. The PC then outputs audio through the robot’s two speakers, located at each lateral side of the torso, that include a 30 Watt amplifier.

Among its software features it enables:
Far-field Voice Activity Detection
Direction of Arrival
Beamforming
Noise Suppression
De-reverberation
Acoustic Echo Cancellation
As for its hardware features:
Support USB Audio Class 1.0 (UAC 1.0)
Four microphones array
12 programmable RGB LED indicators
Sensitivity: -26 dBFS (Omnidirectional)
SNR: 61 dB
Recording audio data#
There are two ways to extract audio from the ReSpeaker microphone.
As Linux capture device managed by ALSA driver
As ROS topics
For the first case, you can see the ReSpeaker as an available recording device by entering the robot and outputting the arecord -l command. You need to stop the ROS interface of the microphone for this first, and check the available microphone name.
ssh pal@ari-0c
pal-stop respeaker_ros
arecord -l

Notice the name of the card and of the subdevice of the ReSpeaker device, and use that information to record audio. In the example below we are recording it at 16 KHz sampling rate, recording all 6 channels of the respeaker, for a duration of 10 seconds.
arecord -fS16_LE -c6 -d 10 -r16000 > micro.wav
Play the recorded sound:
aplay micro.wav
For the second case, first re-enable the ROS interface. Note that by default when the robot is switched on it will run the ROS interface of the ReSpeaker.
pal-restart respeaker_ros
The ReSpeaker is integrated with the open-source respeaker_ros package, offering the following interfaces:
/sound_direction std_msgs/Bool Result of direction of arrival (DOA)
/sound_localization std_msgs/Int32 Result of DoA as Pose
/is_speeching std_msgs/Bool Result of Voice Activity Detection (VAD)
- /audio audio_common_msgs/AudioData raw audio of the microphone. Additional audio topics:
/audio/channel0 Processed audio for speech recognition, coming from channel 0 of the ReSpeaker. Used for speech recognition. /audio/channel1 channel 1 of audio output /audio/channel2 channel 2 of audio output /audio/channel3 channel 3 of audio output /audio/channel4 channel 4 of audio output /audio/channel5 channel 5 of audio output
/speech_audio audio_common_msgs/AudioData Audio data while speeching
Audio can then be recorded as a rosbag, for example:
ssh pal@ari-0c
rosbag record -O audio_sample.bag /audio/channel0
Regardless on how the audio is captured, it can later be processed.