Tutorial: Detect people oriented toward the robot (Python)#

🏁 Goal of this tutorial

By the end of this tutorial, you will know how to establish which bodies are oriented towards the robot using pyhri and its API. This information can be useful when you want the robot to be socially proactive and try to attract people that might be looking at it from a distance, unsure whether getting closer to the robot or not.

Prerequisites#

This tutorial requires you to be familiar with the concepts of reference frames, 3D rotation transformations and 3D translation transformations.
This tutorial assumes that you already have an up and running body detection pipeline. The pipeline must be ROS4HRI compatible. If you do not have one available, you can start using hri_fullbody.
This tutorial assumes that you already have installed pyhri. If you do not have it yet, please refer to the official repo.

Note

Your robot already comes with hri_fullbody and pyhri installed.

The code#

We next provide an example of a node that detects bodies oriented towards the robot.

node.py#

 #!/usr/bin/env python3

 import rclpy
 from rclpy.node import Node
 import tf_transformations as t

 import numpy as np
 import hri


 class BodyOrientationListener(Node):

     def __init__(self, base_frame="camera_link", threshold=30):
         """ Constructor, defines some of the class
             objects that will be used later in the code
         """

         super().__init__('body_orientation_listener')
         # HRIListener objects help managing
         # some ROS4RI related aspects.
         # Refer to the official API documentation
         # for a better understanding
         self.hri_listener = hri.HRIListener("hri_listener_node")
         self.hri_listener.set_reference_frame(base_frame)

         # half of the amplitude of the attention cone
         self.threshold = threshold

         # the frequency for the run function execution
         timer_period = 1  # seconds
         self.timer = self.create_timer(timer_period, self.run)

     def run(self):
         """ The run function implement the main functionality of the
         BodyOrientationListener object: computing which bodies
         are orienteds toward the robot. The base_frame specified
         during the object initialisation represent the robot.
         """

         # an iterable object representing the bodies
         # detected through the body detection pipeline
         bodies = self.hri_listener.bodies

         # the list of the ids representing
         # the bodies oriented toward the robot
         bodies_facing_robot = []

         # this loop iterates over every single body
         # detected by the body detection pipeline
         for body in bodies:
             print("Evaluating body id: %s" % body)
             # the body frame to base frame
             # (that is, the robot frame) transformation is required.
             # Since the pyhri API provides the base frame to
             # body frame transformation, we have to invert it
             # a PoseStamped object representing the base frame
             # to body frame transformation
             if bodies[body].valid:  # make sure it still has valid info
                 transform = bodies[body].transform
             else:
                 print("Body %s information not valid anymore" % body)
                 continue

             # the translational and rotational components
             # of the base frame to body frame transformation,
             # expressed as a 3D vector and a quaternion
             trans = transform.transform.translation
             rot = transform.transform.rotation

             # the homogenous transformation matrix representing
             # the base frame to body frame transformation
             # (translation only)
             translation_matrix = t.translation_matrix((trans.x,
                                                        trans.y,
                                                        trans.z))

             # the homogenous transformation matrix representing
             # the base frame to body frame transformation
             # (rotation only)
             quaternion_matrix = t.quaternion_matrix((rot.x,
                                                      rot.y,
                                                      rot.z,
                                                      rot.w))

             # the homogenous transformation matrix representing
             # the base frame to body frame transformation
             transform = t.concatenate_matrices(translation_matrix,
                                                quaternion_matrix)

             # the inverse of the transform matrix,
             # that is, the homogenous transformation matrix
             # for the body frame to base frame transformation
             inv_transform = t.inverse_matrix(transform)

             # b2r = body to robot
             # the x and y component of the body frame
             # to base frame translation transformation
             b2r_translation_x = inv_transform[0, 3]
             b2r_translation_y = inv_transform[1, 3]

             # the norm of the projection on the xy plane
             # of the body frame to base frame translation
             b2r_xy_norm = np.linalg.norm([b2r_translation_x,
                                           b2r_translation_y],
                                          ord=2)

             # this if statement checks whether the base frame
             # lies inside the body frame-based cone of attention
             # with 2*threshold amplitude or not. When it does,
             # the body id is appended to the list of the bodies
             # oriented toward the robot
             if ((np.arccos(b2r_translation_x/b2r_xy_norm) <
                     (self.threshold/180*np.pi)) and
                     b2r_translation_x > 0):
                 bodies_facing_robot.append(body)

         # print the list of bodies oriented towards the robot
         print("Bodies facing the robot: ", bodies_facing_robot)


 if __name__ == "__main__":
     # initialise rclpy library
     rclpy.init(args=None)

     # instantiate a BodyOrientatationListener
     bol = BodyOrientationListener(
         base_frame="default_cam",  # If using webcam
         threshold=20)

     # spin BodyOrientationListener so its callback (the run function) is
     # called
     rclpy.spin(bol)

The code explained#

Note

In this section, we will use the pyhri and tf APIs. If you are not familiar with them, or want to take a deeper dive, refer to the pyhri API documentation and the tf API documentation.

As a first step the code defines a new class, class BodyOrientationListener. This class processes the bodies orientation to establish which ones are oriented towards the robot.

node.py#

 class BodyOrientationListener(Node):

     def __init__(self, base_frame="camera_link", threshold=30):
         """
             Constructor, defines some of the class
             objects that will be used later in the code
         """
         super().__init__('body_orientation_listener')
         # HRIListener objects help managing
         # some ROS4RI related aspects.
         # Refer to the official API documentation
         # for a better understanding
         self.hri_listener = hri.HRIListener("hri_listener_node")
         self.hri_listener.set_reference_frame(base_frame)

         # half of the amplitude of the attention cone
         self.threshold = threshold

         # the frequency for the run function execution
         timer_period = 1  # seconds
         self.timer = self.create_timer(timer_period, self.run)

Here, self.hri_listener is a HRIListener object. HRIListener abstracts some of ROS4HRI aspects: for instance, it manages the callbacks reading the lists of detected bodies, faces, voices and persons. This way, you don’t have to define the same callbacks over and over again in different ROS nodes. Later, you will discover more about the HRIListener and pyhri API capabilities.

We also initialise two more objects:

self.threshold defines the amplitude of the interaction cone, i.e. the cone with origin in the body frame within which the base frame can be to consider the body oriented toward the robot.
self.timer defines a timer to call the run function timer_period seconds (in this case, 1 second).

The core process happens entirely in the run function. It defines the geometric process to evaluate which bodies are oriented towards the robot.

BodyOrientationListener.run()#

 bodies = self.hri_listener.bodies

 bodies_facing_robot = []

 for body in bodies:

The HRIListener object provides a dictionary of ids and objects representing the detected bodies. bodies_facing_robot is initiated as an empty list, which will be used to keep track of the bodies oriented towards the robot. We next loop over the bodies. Let’s see what happens inside!

BodyOrientationListener.run()#

 if bodies[body].valid:  # make sure it still has valid info
     transform = bodies[body].transform
 else:
     print("Body %s information not valid anymore" % body)
     continue

Here you can see another example where the pyhri API simplifies the management of human related information. Since the body detection pipeline is ROS4HRI compatible, it publishes a frame for each body representing it in the space. The frame syntax is body_<body_id>. There is no need to create a tf.transform_listener object, since the hri.Body object already provides the transformation between the base frame and the aforementioned body frame. Note that before accessing the transformation, we make sure there is still valid data.

Now, the idea is to start working from a body perspective, not a robot one. In fact, to understand whether a person is facing a robot or not, you are not interested in the position of the person with respect to the robot, but actually in the position of the robot with respect to the body frame. If the body is oriented towards the robot, with a body frame following the REP 155 frame definition, then the robot frame (i.e., the base frame) origin expressed in body frame coordinates will have a positive \(x\) component and a relatively small \(y\) component.

../_images/body_toward_robot.svg — Fig 1. Human oriented toward the robot.#

../_images/body_not_toward_robot.svg — Fig 2. Human not oriented toward the robot.#

In Fig. 1, you can see how the \(d_{b2r}\) (where \(b2r\) stands for body-to-robot), expressed in body frame coordinates, has a relatively small \(y\) component when compared to the \(x\) one. In contrast, in Fig. 2 you can see that the \(d_{b2r}\) component has a greater \(y\) component in body frame coordinates, suggesting that the human is not oriented toward the robot.

The transform previously highlighted in the code provides the base frame to body frame transformation. Therefore, to execute the geometric reasoning we have previously described, the transformation has to be inverted.

BodyOrientationListener.run()#

trans = transform.transform.translation
rot = transform.transform.rotation

translation_matrix = t.translation_matrix((trans.x,
                                           trans.y,
                                           trans.z))

quaternion_matrix = t.quaternion_matrix((rot.x,
                                         rot.y,
                                         rot.z,
                                         rot.w))

transform = t.concatenate_matrices(translation_matrix,
                                   quaternion_matrix)

inv_transform = t.inverse_matrix(transform)

You obtain the translation and rotation homogenous matrices, multiply them (t.concatenate_matrices) and finally invert the result, to obtain the homogenous matrix representing the transformation from the body frame to the robot frame in body frame coordinates. The forth column of this matrix represents the base frame origin expressed in body frame origin, that is \(d_{b2r}\) from Fig. 1 and Fig. 2.

At this point, all you have to do is check whether the base frame origin lies inside of the attention cone defined by the threshold parameter previously defined.

BodyOrientationListener.run()#

 # b2r = body to robot
 b2r_translation_x = inv_transform[0, 3]
 b2r_translation_y = inv_transform[1, 3]

 b2r_xy_norm = np.linalg.norm([b2r_translation_x,
                               b2r_translation_y],
                              ord = 2)

 if ((np.arccos(b2r_translation_x/b2r_xy_norm) <
                 (self.threshold/180*np.pi)) and
                 b2r_translation_x > 0):
     bodies_facing_robot.append(body)

We first compute the distance between the body frame and the base frame. Then, we use this information to compute the angle between the body frame \(x\) axis and the \(d_{b2r}\) vector. If this angle is smaller than the threshold, then the body id is appended to bodies_facing_robot.

BodyOrientationListener.run()#

 print("Bodies facing the robot: ", bodies_facing_robot)

The last line of the run function prints the ids of the bodies oriented towards the robot.

In main, you can find the instructions to initialise the context and the node.

node.py#

 if __name__ == "__main__":
     # initialise rclpy library
     rclpy.init(args=None)

     # instantiate a BodyOrientatationListener
     bol = BodyOrientationListener(
         base_frame="default_cam",  # If using webcam
         threshold=20)

     # spin BodyOrientationListener so its callback (the run function) is
     # called
     rclpy.spin(bol)

Next steps#

The node you have developed through this tutorial does not really use the information about the bodies oriented toward the robot. You might think about a cool behaviour exploiting this information and implement it!
You might argue that face and gaze orientations could tell us more about a person’s engagement with the robot… and you would be right! Check this tutorial for a possible implementation of an engagement detection node based on face and gaze orientation.
If your’re interested in the C++ version of this tutorial, check the libhri tutorial.