ROS for Human-Robot Interaction (or ROS4HRI [ros4hri]) is the main API that
ARI implements to represent information about the human surrounding
and interacting with the robot.
ROS4HRI actually refers to a set of conventions and tools that help developing
Human-Robot Interaction capabilities. The specfication (orginally developed by
PAL Robotics) is available online as the ROS REP-155.
The ROS4HRI API defines several types of identifiers (IDs)#
On ARI, we have implemented the following main parts of the
specification:
we follow the ROS4HRI human model representation, as a combination of a
permanent identity (person) and transient parts that are intermittently
detected (e.g. face, skeleton, voice);
in pal-sdk-23.1, we specifically support:
face detection and recognition (including extraction of facial
landmarks);
single body detection, and 2D and 3D skeleton tracking;
speech recognition (without support for voice separation or voice
identification)
probabilistic fusion of faces, bodies and voices
we follow the ROS4HRI topic naming conventions, with all human-related
messages published under the /humans/ topic namespace;
we follow the ROS4HRI kinematic model of the human and 3D tf frame
conventions (naming, orientation), as specified here.
In addition, ARI also provide:
gaze estimation
automatic engagement detection
Attention
Limitations
These are the main limitations of the pal-sdk-23.1 social perception capabilities:
Person detection and face identification rely on external tools (Google
MediaPipe and dlib). Like all vision-based algorithms, these tools do
not always provide accurate estimate or might mis-detect/mis-classify
people.
Body detection is currently single-body only;
Faces needs to be within a ~2m range of the robot to be detected;
No voice separation, identification or localisation is currently
available: from the robot point of view, it always hears the same one
voice;
ARI does not yet implement the ‘group interactions’ part of the
specification (e.g. no automatic group detection).
To ease access to these (many!) topics, you can use pyhri (Python) or
libhri (C++). These two open-source libraries are developed by PAL Robotics,
and are documented on the ROS Wiki:
Social perception with ROS4HRI#
ROS for Human-Robot Interaction (or ROS4HRI [ros4hri]) is the main API that ARI implements to represent information about the human surrounding and interacting with the robot.
ROS4HRI actually refers to a set of conventions and tools that help developing Human-Robot Interaction capabilities. The specfication (orginally developed by PAL Robotics) is available online as the ROS REP-155.
The ROS4HRI API defines several types of identifiers (IDs)#
On ARI, we have implemented the following main parts of the specification:
we follow the ROS4HRI human model representation, as a combination of a permanent identity (person) and transient parts that are intermittently detected (e.g. face, skeleton, voice);
in
pal-sdk-23.1
, we specifically support:face detection and recognition (including extraction of facial landmarks);
single body detection, and 2D and 3D skeleton tracking;
speech recognition (without support for voice separation or voice identification)
probabilistic fusion of faces, bodies and voices
we follow the ROS4HRI topic naming conventions, with all human-related messages published under the
/humans/
topic namespace;we follow the ROS4HRI kinematic model of the human and 3D tf frame conventions (naming, orientation), as specified here.
In addition, ARI also provide:
gaze estimation
automatic engagement detection
Attention
Limitations
These are the main limitations of the
pal-sdk-23.1
social perception capabilities:Person detection and face identification rely on external tools (Google MediaPipe and dlib). Like all vision-based algorithms, these tools do not always provide accurate estimate or might mis-detect/mis-classify people.
Body detection is currently single-body only;
Faces needs to be within a ~2m range of the robot to be detected;
No voice separation, identification or localisation is currently available: from the robot point of view, it always hears the same one voice;
ARI does not yet implement the ‘group interactions’ part of the specification (e.g. no automatic group detection).
How to use ROS4HRI?#
The ROS4HRI topics are documented on Social perception topics.
To ease access to these (many!) topics, you can use
pyhri
(Python) orlibhri
(C++). These two open-source libraries are developed by PAL Robotics, and are documented on the ROS Wiki:pyhri on the ROS wiki
libhri on the ROS wiki
Next steps#
understand ARI’s social perception pipeline
Tutorial: Python tutorial: detecting people oriented toward the robot
Tutorial: C++ tutorial: detecting people around the robot
More tutorials and ‘how-to’ on the Social capabilities main page.
See also#
The REP-155 aka ROS4HRI specification, on ROS website
The ROS wiki contains useful resources about ROS4HRI.
References#
ROS for Human-Robot Interaction Y. Mohamed; S. Lemaignan, IROS 2021, doi: 10.1109/IROS51168.2021.9636816