Social perception#

The robot can detect and identify faces, detect 2D and 3D skeletons, perform speech and intent recognition, and fuse together various social signal to track multi-modal persons.

THe robot’s social perception pipeline is compliant with the ROS4HRI REP-155 ROS standard.

Note that the entire pipeline runs on-board; no cloud-based services are used (and consequently, no Internet connection is required).

The following figure provides an overview of the pipeline:

Attention

Limitations

These are the main limitations of the pal-sdk-23.1 social perception capabilities:

Person detection and face identification rely on external tools (Google MediaPipe and dlib). Like all vision-based algorithms, these tools do not always provide accurate estimate or might mis-detect/mis-classify people.
Body detection is currently single-body only;
Faces needs to be within a ~2m range of the robot to be detected;
No voice separation, identification or localisation is currently available: from the robot point of view, it always hears the same one voice;
ARI does not yet implement the ‘group interactions’ part of the specification (e.g. no automatic group detection).

Social perception#

General documentation#

Tutorials and how-tos#

API reference#