Multi-modal matching of faces, bodies and voices¶

On this page, you will learn how your robot’s social perception pipeline matches the detected faces, bodies and voices into a single person instance.

General principle¶

Multiple pipelines are executed in your robot which provide the perception of different features of a person, such as its face, body or voice. For instance, hri_face_detect perceives the faces from the camera images, while hri_body_detect perceives the skeletal points of the bodies.

To achieve a holistic representation of a person, such features must be be associated to each other and to the permanently identified person owning them. Some features are directly matched to a person, other indirectly by matching to another aspect already matched to a person.

For example, the faces can be reliably uniquely matched to a person (this is what hri_face_identification does), while the bodies can be associated more easily to the corresponding face in the scene (that’s hri_face_body_matcher job) then directly to a person.

These multiple possible associations can have different degrees of confidence and be even contradictory if taken in their entirety. The final step of the multi-modal fusion of the social perception pipeline (performed by hri_person_manager) is to collect all such candidate matches into a consistent holistic people representation.

Note

Currently only the face to person and face to body matching are described in this page.

Testing the multi-modal fusion¶

Prerequisites¶

The following instructions require you to be familiar with the ROS command-line tools.
The following nodes should be running:
- hri_face_detect (node)
- hri_face_identification (node)
- hri_body_detect (node)
- hri_face_body_matcher (node)
- hri_person_manager (node)

Note

If you are working directly with a PAL robot, these nodes are always started by default. You have nothing special to do, unless you have disabled the modules running those nodes. See more information in the application management page.

If you are working in a Docker image with no access to the robot and want to run everything locally, first start the usb camera (more details in the How to launch the different components for person detection page):

ros2 run usb_cam usb_cam_node_exe --ros-args -p camera_info_url:="file:///<calibration_file_location>"

Then start the nodes with the following commands:

ros2 launch hri_face_detect face_detect_with_args.launch.py
ros2 launch hri_face_identification face_identification.launch.py
ros2 launch hri_body_detect hri_body_detect_with_args.launch.py use_depth:=false
ros2 launch hri_face_body_matcher hri_face_body_matcher.launch.py
ros2 launch hri_person_manager person_manager.launch.py

Testing¶

As described in the ROS4HRI standard, the detected person’s information is published under the /humans/persons/<personID>/ namespace. In particular, the matching face, body and voice are published in the sub-topics /face_id, /body_id and /voice_id. You can just echo these topics to know the current matching of the detected persons with faces, bodies and voices.

ros2 topic echo /humans/persons/<personID>/face_id
ros2 topic echo /humans/persons/<personID>/body_id
ros2 topic echo /humans/persons/<personID>/voice_id

We can make use of this information, for instance adding an entry into the knowledge base (that’s what people_facts does!).

Below we will present a couple of examples, one in C++ and one in Python, on how to print this information, making use of the helper libraries libhri or pyhri. You can learn more about these libraries in the Detect people around the robot (C++) and Detect people oriented toward the robot (Python) pages.

if __name__ == "__main__":
   rclpy.init()
   node = Node("test")
   hri_listener = hri.HRIListener("hri_listener_node")
   rate = node.create_rate(1)
   while rclpy.ok():
      """
      Here we poll the tracked persons with HRIListener.tracked_persons, but one could alternatively setup
      callbacks on each new person tracked and lost using HRIListener.on_tracked_person() and
      HRIListener.on_tracked_person_lost()
      """
      for id, person in hri_listener.tracked_persons.items():
            face_id = ""
            body_id = ""
            voice_id = ""
            face = person.face
            body = person.body
            voice = person.voice
            if face is not None:
               face_id = face.id
            if body is not None:
               body_id = body.id
            if voice is not None:
               voice_id = voice.id
            node.get_logger().info(f"Detected personID {id} with matching: faceID: {face_id}, bodyID: {body_id}, voiceID: {voice_id}")
      rclpy.spin_once(node)

 #include <hri/hri.hpp>
 #include "rclcpp/rclcpp.hpp"

 using namespace std::chrono_literals;

 class PrintPersons : public rclcpp::Node
 {
 public:
    PrintPersons()
    : rclcpp::Node("hri_example")
    {}

    void init()
    {
       // "shared_from_this()" cannot be used in the constructor!
       hri_listener_ = hri::HRIListener::create(shared_from_this());
       timer_ = create_wall_timer(1000ms, std::bind(&PrintPersons::timer_callback, this));
    }

    void timer_callback()
    {
       for (auto const & person_kv: hri_listener_->getTrackedPersons()) {
          if (auto person = person_kv.second) {
          std::string face_id = "";
          std::string body_id = "";
          std::string voice_id = "";
          if (person->face()) {
             face_id = person->face()->id();
          }
          if (person->body()) {
             body_id = person->body()->id();
          }
          if (person->voice()) {
             voice_id = person->voice()->id();
          }
          RCLCPP_INFO(
             get_logger(), "Detected personID %s with matching: faceID: %s, bodyID: %s, voiceID: %s",
             person_kv.first.c_str(), face_id.c_str(), body_id.c_str(), voice_id.c_str());
          }
       }
    }

 private:
    std::shared_ptr<hri::HRIListener> hri_listener_;
    rclcpp::TimerBase::SharedPtr timer_;
 };

 int main(int argc, char * argv[])
 {
    rclcpp::init(argc, argv);
    auto node = std::make_shared<PrintPersons>();
    node->init();
    rclcpp::spin(node->get_node_base_interface());
    rclcpp::shutdown();
    return 0;
 }

Results¶

After running the test node above (in particular in Python), you should see a similar output in the command line when one person is detected:

[INFO] [1689001257.952060]: Detected personID lbytj with matching: faceID: lxgep, bodyID: dpmxy, voiceID:

The steps of the social perception pipeline that contribute to such outcome are:

hri_face_detect node assigns the temporary id lxgep to the detected face and publishes it on the /humans/faces/tracked topic
hri_body_detect node assigns the temporary id dpmxy to the detected body and publishes it on the /humans/bodies/tracked topic
hri_face_identification node proposes a candidate match between the temporary face id lxgep to the permanent person id lbytj and publishes it on the /humans/candidate_matches topic
hri_face_body_matcher node associates the temporary face id lxgep to the temporary body id dpmxy and publishes it on the /humans/candidate_matches topic
hri_person_manager node evaluates the candidates matches and publishes the person-related topics

If the detected person exits from the scene and then re-enters, the temporary face and body ids would change, while the person person id would remain the same.

If node_hri_face_identification is stopped, then the face and body would still be associated, but this time to a temporary person id anonymous_person_<ID>.