ARI microphone array and audio recording#

ARI has a ReSpeaker Mic Array V2.0 consisting of 4 microphones, positioned in the torso, just below the touch-screen.

../_images/respeaker.png

The microphone is connected via USB to the main PC. The PC then outputs audio through the robot’s two speakers, located at each lateral side of the torso, that include a 30 Watt amplifier.

../_images/audio_flow.png

Among its software features it enables:

  • Far-field Voice Activity Detection

  • Direction of Arrival

  • Beamforming

  • Noise Suppression

  • De-reverberation

  • Acoustic Echo Cancellation

As for its hardware features:

  • Support USB Audio Class 1.0 (UAC 1.0)

  • Four microphones array

  • 12 programmable RGB LED indicators

  • Sensitivity: -26 dBFS (Omnidirectional)

  • SNR: 61 dB

Recording audio data#

There are two ways to extract audio from the ReSpeaker microphone.

  1. As Linux capture device managed by ALSA driver

  2. As ROS topics

For the first case, you can see the ReSpeaker as an available recording device by entering the robot and outputting the arecord -l command. You need to stop the ROS interface of the microphone for this first, and check the available microphone name.

ssh pal@ari-0c

pal-stop respeaker_ros

arecord -l
../_images/arecord.png

Notice the name of the card and of the subdevice of the ReSpeaker device, and use that information to record audio. In the example below we are recording it at 16 KHz sampling rate, recording all 6 channels of the respeaker, for a duration of 10 seconds.

arecord -fS16_LE -c6 -d 10 -r16000 > micro.wav

Play the recorded sound:

aplay micro.wav

For the second case, first re-enable the ROS interface. Note that by default when the robot is switched on it will run the ROS interface of the ReSpeaker.

pal-restart respeaker_ros

The ReSpeaker is integrated with the open-source respeaker_ros package, offering the following interfaces:

/sound_direction std_msgs/Bool Result of direction of arrival (DOA)

/sound_localization std_msgs/Int32 Result of DoA as Pose

/is_speeching std_msgs/Bool Result of Voice Activity Detection (VAD)

/audio audio_common_msgs/AudioData raw audio of the microphone. Additional audio topics:

/audio/channel0 Processed audio for speech recognition, coming from channel 0 of the ReSpeaker. Used for speech recognition. /audio/channel1 channel 1 of audio output /audio/channel2 channel 2 of audio output /audio/channel3 channel 3 of audio output /audio/channel4 channel 4 of audio output /audio/channel5 channel 5 of audio output

/speech_audio audio_common_msgs/AudioData Audio data while speeching

Audio can then be recorded as a rosbag, for example:

ssh pal@ari-0c

rosbag record -O audio_sample.bag /audio/channel0

Regardless on how the audio is captured, it can later be processed.

See also#