Multi-modal matching of faces, bodies and voices#
On this page, you will learn how your robotโs social perception pipeline matches the detected faces, bodies and voices into a single person instance.
General principle#
Multiple pipelines are executed in your robot which provide the perception of
different features of a person, such as its face, body or voice.
For instance, hri_face_detect
perceives the faces from the camera
images, while hri_body_detect
perceives the skeletal points of the
bodies.
To achieve a holistic representation of a person, such features must be be associated to each other and to the permanently identified person owning them. Some features are directly matched to a person, other indirectly by matching to another aspect already matched to a person.
For example, the faces can be reliably uniquely matched to a person (this is
what hri_face_identification
does), while the bodies can be associated more
easily to the corresponding face in the scene (thatโs
hri_face_body_matcher
job) then directly to a person.
These multiple possible associations can have different degrees of confidence
and be even contradictory if taken in their entirety.
The final step of the multi-modal fusion of the social perception pipeline
(performed by hri_person_manager
) is to collect all such
candidate matches into a consistent holistic people representation.
Note
Currently only the face to person and face to body matching are described in this page.
Testing the multi-modal fusion#
Prerequisites#
The following instructions require you to be familiar with the ROS command-line tools.
The following nodes should be running:
Note
If you are working directly with a PAL robot, these nodes are always started by default. You have nothing special to do, unless you have disabled the modules running those nodes. See more information in the application management page.
If you are working in a Docker image with no access to the robot and want to run everything locally, first start the usb camera (more details in the How to launch the different components for person detection page):
ros2 run usb_cam usb_cam_node_exe --ros-args -p camera_info_url:="file:///<calibration_file_location>"
Then start the nodes with the following commands:
ros2 launch hri_face_detect face_detect_with_args.launch.py
ros2 launch hri_face_identification face_identification.launch.py
ros2 launch hri_body_detect hri_body_detect_with_args.launch.py use_depth:=false
ros2 launch hri_face_body_matcher hri_face_body_matcher.launch.py
ros2 launch hri_person_manager person_manager.launch.py
Testing#
As described in the ROS4HRI standard, the detected personโs
information is published under the /humans/persons/<personID>/
namespace.
In particular, the matching face, body and voice are published in the sub-topics
/face_id
, /body_id
and /voice_id
. You can just echo these topics to
know the current matching of the detected persons with faces, bodies and voices.
ros2 topic echo /humans/persons/<personID>/face_id
ros2 topic echo /humans/persons/<personID>/body_id
ros2 topic echo /humans/persons/<personID>/voice_id
We can make use of this information, for instance adding an entry into the knowledge base (thatโs what people_facts does!).
Below we will present a couple of examples, one in C++ and one
in Python, on how to print this information, making use of the helper libraries
libhri
or pyhri
. You can learn more about these libraries in the
Tutorial: Detect people around the robot (C++) and Tutorial: Detect people oriented toward the robot (Python) pages.
1if __name__ == "__main__":
2 rclpy.init()
3 node = Node("test")
4 hri_listener = hri.HRIListener("hri_listener_node")
5 rate = node.create_rate(1)
6 while rclpy.ok():
7 """
8 Here we poll the tracked persons with HRIListener.tracked_persons, but one could alternatively setup
9 callbacks on each new person tracked and lost using HRIListener.on_tracked_person() and
10 HRIListener.on_tracked_person_lost()
11 """
12 for id, person in hri_listener.tracked_persons.items():
13 face_id = ""
14 body_id = ""
15 voice_id = ""
16 face = person.face
17 body = person.body
18 voice = person.voice
19 if face is not None:
20 face_id = face.id
21 if body is not None:
22 body_id = body.id
23 if voice is not None:
24 voice_id = voice.id
25 node.get_logger().info(f"Detected personID {id} with matching: faceID: {face_id}, bodyID: {body_id}, voiceID: {voice_id}")
26 rclpy.spin_once(node)
1 #include <hri/hri.hpp>
2 #include "rclcpp/rclcpp.hpp"
3
4 using namespace std::chrono_literals;
5
6 class PrintPersons : public rclcpp::Node
7 {
8 public:
9 PrintPersons()
10 : rclcpp::Node("hri_example")
11 {}
12
13 void init()
14 {
15 // "shared_from_this()" cannot be used in the constructor!
16 hri_listener_ = hri::HRIListener::create(shared_from_this());
17 timer_ = create_wall_timer(1000ms, std::bind(&PrintPersons::timer_callback, this));
18 }
19
20 void timer_callback()
21 {
22 for (auto const & person_kv: hri_listener_->getTrackedPersons()) {
23 if (auto person = person_kv.second) {
24 std::string face_id = "";
25 std::string body_id = "";
26 std::string voice_id = "";
27 if (person->face()) {
28 face_id = person->face()->id();
29 }
30 if (person->body()) {
31 body_id = person->body()->id();
32 }
33 if (person->voice()) {
34 voice_id = person->voice()->id();
35 }
36 RCLCPP_INFO(
37 get_logger(), "Detected personID %s with matching: faceID: %s, bodyID: %s, voiceID: %s",
38 person_kv.first.c_str(), face_id.c_str(), body_id.c_str(), voice_id.c_str());
39 }
40 }
41 }
42
43 private:
44 std::shared_ptr<hri::HRIListener> hri_listener_;
45 rclcpp::TimerBase::SharedPtr timer_;
46 };
47
48 int main(int argc, char * argv[])
49 {
50 rclcpp::init(argc, argv);
51 auto node = std::make_shared<PrintPersons>();
52 node->init();
53 rclcpp::spin(node->get_node_base_interface());
54 rclcpp::shutdown();
55 return 0;
56 }
Results#
After running the test node above (in particular in Python), you should see a similar output in the command line when one person is detected:
[INFO] [1689001257.952060]: Detected personID lbytj with matching: faceID: lxgep, bodyID: dpmxy, voiceID:
The steps of the social perception pipeline that contribute to such outcome are:
hri_face_detect
node assigns the temporary idlxgep
to the detected face and publishes it on the /humans/faces/tracked topichri_body_detect
node assigns the temporary iddpmxy
to the detected body and publishes it on the /humans/bodies/tracked topichri_face_identification
node proposes a candidate match between the temporary face idlxgep
to the permanent person idlbytj
and publishes it on the /humans/candidate_matches topichri_face_body_matcher
node associates the temporary face idlxgep
to the temporary body iddpmxy
and publishes it on the /humans/candidate_matches topichri_person_manager
node evaluates the candidates matches and publishes the person-related topics
If the detected person exits from the scene and then re-enters, the temporary face and body ids would change, while the person person id would remain the same.
If node_hri_face_identification
is stopped, then the face and body would
still be associated, but this time to a temporary person id
anonymous_person_<ID>
.
See also#
๐ฅ Social perception for an overview of the social perception pipeline
Social perception topics for the full list of topics published by the social perception pipeline