How-to: How to match face, body and voice of a person#

On this page, you will learn how PAL’s robots’s social perception pipeline matches the detected faces, bodies and voices into a single person instance.

Note

Currently only the face to person and face to body matching are described in this page.

Prerequisites#

The following instructions require you to be familiar with the ROS command-line tools.
The following startups (and related nodes) should be running:
- head_rgbd_camera
- hri_face_detect (node)
- hri_face_identification (node)
- hri_fullbody (node)
- hri_face_body_matcher (node)
- hri_person_manager (node)

Description#

On PAL’s robots are executed multiple pipelines providing the perception of different features of a person, like its face, body or voice. For instance, hri_face_detect perceives the faces from the camera images, while hri_fullbody perceives the skeletal points of the bodies.

To achieve a holistic representation of a person, such features must be be associated to each other and to the permanently identified person owning them. Some features are directly matched to a person, other indirectly by matching to another aspect already matched to a person.

For example, the faces can be reliably uniquely matched to a person (this is what hri_face_identification does), while the bodies can be associated more easily to the corresponding face in the scene (that’s hri_face_body_matcher job) then directly to a person.

These multiple possible associations can have different degrees of confidence and be even contradictory if taken in their entirety. The final step of the multi-modal fusion of the social perception pipeline (performed by hri_person_manager) is to collect all such candidate matches into a consistent holistic people representation.

Usage#

As is described in the ROS4HRI standard, the detected person informations are published under the /humans/persons/<personID>/ namespace. In particular, the matching face, body and voice are published in the sub-topics /face_id, /body_id and /voice_id.

We can make use of this information, for instance adding an entry into the knowledge base (that’s what people_facts does!). In the following we will present only a couple of examples, one in C++ and one in Python, on how to print this information, making use of the helper libraries libhri or pyhri.

C++ example#

#include <hri/hri.h>
#include <ros/ros.h>

int main(int argc, char** argv)
{
  ros::init(argc, argv, "test");
  hri::HRIListener hri_listener{};
  ros::Rate loop_rate(1);
  while (ros::ok()) {
    /*
     * Here we poll the tracked persons with HRIListener.getTrackedPersons(), but one could alternatively setup
     * callbacks on each new person tracked and lost using HRIListener.onTrackedPerson() and
     * HRIListener.onTrackedPersonLost()
     */
    for (auto const& person_kv : hri_listener.getTrackedPersons()) {
      if (auto person = person_kv.second.lock())
      {
        ROS_INFO("Detected personID %s with matching: faceID: %s, bodyID: %s, voiceID: %s",
                person_kv.first.c_str(), person->face_id.c_str(), person->body_id.c_str(), person->voice_id.c_str());
      }
    }
    loop_rate.sleep();
    ros::spinOnce();
  }
  return 0;
}

Python#

#! /usr/bin/env python3

import rospy
from pyhri import HRIListener

if __name__ == "__main__":
    rospy.init_node("test")
    hri_listener = HRIListener()
    rate = rospy.Rate(1)
    while not rospy.is_shutdown():
        """
        Here we poll the tracked persons with HRIListener.tracked_persons, but one could alternatively setup
        callbacks on each new person tracked and lost using HRIListener.on_tracked_person() and
        HRIListener.on_tracked_person_lost()
        """
        for id, person in hri_listener.tracked_persons.items():
            rospy.loginfo("Detected personID %s with matching: faceID: %s, bodyID: %s, voiceID: %s",
                          id, person.face_id, person.body_id, person.voice_id)
        rate.sleep()

Results#

If only one person is detected, the former test node (in particular the Python version) should output an output similar to:

[INFO] [1689001257.952060]: Detected personID lbytj with matching: faceID: lxgep, bodyID: dpmxy, voiceID: None

The steps of the social perception pipeline that contribute to such outcome are:

hri_face_detect node assigns the temporary id lxgep to the detected face and publishes it on the /humans/faces/tracked topic
hri_fullbody node assigns the temporary id dpmxy to the detected body and publishes it on the /humans/bodies/tracked topic
hri_face_identification node proposes a candidate match between the temporary face id lxgep to the permanent person id lbytj and publishes it on the /humans/candidate_matches topic
hri_face_body_matcher node associates the temporary face id lxgep to the temporary body id dpmxy and publishes it on the /humans/candidate_matches topic
hri_person_manager node evaluates the candidates matches and publishes the person-related topics

If the detected person exits from the scene and then re-enters, the temporary face and body ids would change, while the person person id would remain the same.

If hri_face_identification is stopped, then the face and body would still be associated, but this time to a temporary person id anonymous_person_<ID>.