../_images/tiago-icon.png ../_images/tiagopro-icon.png ../_images/ari-icon.png

How-to: How to match face, body and voice of a person#

On this page, you will learn how PAL’s robots’s social perception pipeline matches the detected faces, bodies and voices into a single person instance.

Note

Currently only the face to person and face to body matching are described in this page.

Prerequisites#

  • The following instructions require you to be familiar with the ROS command-line tools.

  • The following startups (and related nodes) should be running:

    • head_rgbd_camera

    • hri_face_detect (node)

    • hri_face_identification (node)

    • hri_fullbody (node)

    • hri_face_body_matcher (node)

    • hri_person_manager (node)

Description#

On PAL’s robots are executed multiple pipelines providing the perception of different features of a person, like its face, body or voice. For instance, hri_face_detect perceives the faces from the camera images, while hri_fullbody perceives the skeletal points of the bodies.

To achieve a holistic representation of a person, such features must be be associated to each other and to the permanently identified person owning them. Some features are directly matched to a person, other indirectly by matching to another aspect already matched to a person.

For example, the faces can be reliably uniquely matched to a person (this is what hri_face_identification does), while the bodies can be associated more easily to the corresponding face in the scene (that’s hri_face_body_matcher job) then directly to a person.

These multiple possible associations can have different degrees of confidence and be even contradictory if taken in their entirety. The final step of the multi-modal fusion of the social perception pipeline (performed by hri_person_manager) is to collect all such candidate matches into a consistent holistic people representation.

Usage#

As is described in the ROS4HRI standard, the detected person informations are published under the /humans/persons/<personID>/ namespace. In particular, the matching face, body and voice are published in the sub-topics /face_id, /body_id and /voice_id.

We can make use of this information, for instance adding an entry into the knowledge base (that’s what people_facts does!). In the following we will present only a couple of examples, one in C++ and one in Python, on how to print this information, making use of the helper libraries libhri or pyhri.

C++ example#

 1#include <hri/hri.h>
 2#include <ros/ros.h>
 3
 4int main(int argc, char** argv)
 5{
 6  ros::init(argc, argv, "test");
 7  hri::HRIListener hri_listener{};
 8  ros::Rate loop_rate(1);
 9  while (ros::ok()) {
10    /*
11     * Here we poll the tracked persons with HRIListener.getTrackedPersons(), but one could alternatively setup
12     * callbacks on each new person tracked and lost using HRIListener.onTrackedPerson() and
13     * HRIListener.onTrackedPersonLost()
14     */
15    for (auto const& person_kv : hri_listener.getTrackedPersons()) {
16      if (auto person = person_kv.second.lock())
17      {
18        ROS_INFO("Detected personID %s with matching: faceID: %s, bodyID: %s, voiceID: %s",
19                person_kv.first.c_str(), person->face_id.c_str(), person->body_id.c_str(), person->voice_id.c_str());
20      }
21    }
22    loop_rate.sleep();
23    ros::spinOnce();
24  }
25  return 0;
26}

Python#

 1#! /usr/bin/env python3
 2
 3import rospy
 4from pyhri import HRIListener
 5
 6if __name__ == "__main__":
 7    rospy.init_node("test")
 8    hri_listener = HRIListener()
 9    rate = rospy.Rate(1)
10    while not rospy.is_shutdown():
11        """
12        Here we poll the tracked persons with HRIListener.tracked_persons, but one could alternatively setup
13        callbacks on each new person tracked and lost using HRIListener.on_tracked_person() and
14        HRIListener.on_tracked_person_lost()
15        """
16        for id, person in hri_listener.tracked_persons.items():
17            rospy.loginfo("Detected personID %s with matching: faceID: %s, bodyID: %s, voiceID: %s",
18                          id, person.face_id, person.body_id, person.voice_id)
19        rate.sleep()

Results#

If only one person is detected, the former test node (in particular the Python version) should output an output similar to:

[INFO] [1689001257.952060]: Detected personID lbytj with matching: faceID: lxgep, bodyID: dpmxy, voiceID: None

The steps of the social perception pipeline that contribute to such outcome are:

If the detected person exits from the scene and then re-enters, the temporary face and body ids would change, while the person person id would remain the same.

If hri_face_identification is stopped, then the face and body would still be associated, but this time to a temporary person id anonymous_person_<ID>.