Tutorial: detect people oriented toward the robot (Python)#

🏁 Goal of this tutorial

By the end of this tutorial, you will know how to establish which bodies are oriented toward the robot using pyhri and its API. This information can be useful when you want ARI to be socially proactive and try to attract those people that might be looking at it from a distance, unsure whether getting closer to the robot or not.

Pre-requisites#

  • This tutorial requires you to be familiar with the concepts of reference frames, 3D rotation transformations and 3D translation transformations.

  • This tutorial assumes that you already have an up and running body detection pipeline. The pipeline must be ROS4HRI compatible. If you do not have one available, you can start using hri_fullbody.

  • This tutorial assumes that you already have installed pyhri. If you have not, please refer to the official repo.

Note

ARI already comes with hri_fullbody and pyhri installed.

The code#

One possible implementation of a node detecting the bodies oriented toward the robot is the following:

node.py#
  1 #!/usr/bin/env python3
  2
  3 import rospy
  4 from tf import transformations as t
  5
  6 import numpy as np
  7
  8 import hri
  9 from hri_msgs.msg import IdsList
 10
 11 class BodyOrientationListener:
 12
 13     def __init__(self,
 14                  base_frame = "camera_link",
 15                  threshold=30):
 16         """ Constructor, defines some of the class
 17             objects that will be used later in the code """
 18
 19         # HRIListener objects help managing
 20         # some ROS4RI related aspects.
 21         # Refer to the official API documentation
 22         # for a better understanding
 23         self.hri_listener = hri.HRIListener()
 24
 25         # the name of the frame representing the robot
 26         self.base_frame = base_frame
 27
 28         # half of the amplitude of the attention cone
 29         self.threshold = threshold
 30
 31         # the frequency for the run function execution
 32         self.rate = rospy.Rate(1)
 33
 34
 35     def run(self):
 36         """ The run function implement the main
 37             functionality of the BodyOrientationListener
 38             object, that is understanding which bodies
 39             are oriented toward the robot. Here, the
 40             base_frame specified during the object
 41             initialisation represent the robot. """
 42
 43         # a loop based on the status of the node itself.
 44         # It loops until the node does not receive
 45         # a shutdown signal (as a change in the shutdown flag)
 46         while not rospy.is_shutdown():
 47
 48             # an iterable object representing the bodies
 49             # detected through the body detection pipeline
 50             bodies = self.hri_listener.bodies.items()
 51
 52             # the list of the ids representing
 53             # the bodies oriented toward the robot
 54             bodies_facing_robot = []
 55
 56             # this loop iterates over every single body
 57             # detected by the body detection pipeline
 58             for body in bodies:
 59                 # the body frame to base frame
 60                 # (that is, the robot frame) transformation is required.
 61                 # Since the pyhri API provides the base frame to
 62                 # body frame transformation, we have to invert it
 63
 64                 # a PoseStamped object representing the base frame
 65                 # to body frame transformation
 66                 transform = body[1].transform(self.base_frame)
 67
 68                 # the translational and rotational components
 69                 # of the base frame to body frame transformation,
 70                 # expressed as a 3D vector and a quaternion
 71                 trans = transform.transform.translation
 72                 rot = transform.transform.rotation
 73
 74                 # the homogenous transformation matrix representing
 75                 # the base frame to body frame transformation
 76                 # (translation only)
 77                 translation_matrix = t.translation_matrix((trans.x,
 78                                                            trans.y,
 79                                                            trans.z))
 80
 81                 # the homogenous transformation matrix representing
 82                 # the base frame to body frame transformation
 83                 # (rotation only)
 84                 quaternion_matrix = t.quaternion_matrix((rot.x,
 85                                                          rot.y,
 86                                                          rot.z,
 87                                                          rot.w))
 88
 89                 # the homogenous transformation matrix representing
 90                 # the base frame to body frame transformation
 91                 transform = t.concatenate_matrices(translation_matrix,
 92                                                    quaternion_matrix)
 93
 94                 # the inverse of the transform matrix,
 95                 # that is, the homogenous transformation matrix
 96                 # for the body frame to base frame transformation
 97                 inv_transform = t.inverse_matrix(transform)
 98
 99                 # b2r = body to robot
100
101                 # the x and y component of the body frame
102                 # to base frame translation transformation
103                 b2r_translation_x = inv_transform[0, 3]
104                 b2r_translation_y = inv_transform[1, 3]
105
106                 # the norm of the projection on the xy plane
107                 # of the body frame to base frame translation
108                 b2r_xy_norm = np.linalg.norm([b2r_translation_x,
109                                               b2r_translation_y],
110                                               ord = 2)
111
112                 # this if statement checks whether the base frame
113                 # lies inside the body frame-based cone of attention
114                 # with 2*threshold amplitude or not. When it does,
115                 # the body id is appended to the list of the bodies
116                 # oriented toward the robot
117                 if np.arccos(b2r_translation_x/b2r_xy_norm) < \
118                   (self.threshold/180*np.pi) and b2r_translation_x > 0:
119                     bodies_facing_robot.append(body[0])
120
121             # instruction printing the list of bodies oriented
122             # toward the robot
123             print("Bodies facing the robot: ", bodies_facing_robot)
124
125             # sleep instruction. In this case, the node will sleep
126             # for 1 second, considering that the rate is 1
127             self.rate.sleep()
128
129
130 if __name__=="__main__":
131
132     # When running a ROS node, it's required to invoke
133     # the init_node function. Here, it's possible to
134     # specify the name of the node
135     rospy.init_node("body_orientation_listener")
136
137     # bol is a BodyOrientationListener object
138     bol = BodyOrientationListener(threshold=20)
139
140     # this instruction starts the bodies orientation processing
141     bol.run()

The code explained#

Note

In this section, we will use the pyhri and tf APIs. If you are not familiar with them, or want to take a deeper dive, refer to the pyhri API documentation and the tf API documentation.

As a first step the code defines a new class, class BodyOrientationListener, that will later process the bodies orientation to establish which ones are oriented toward the robot.

node.py#
 1 class BodyOrientationListener:
 2
 3     def __init__(self,
 4                  base_frame = "camera_link",
 5                  threshold=30):
 6
 7         self.hri_listener = hri.HRIListener()
 8
 9         self.base_frame = base_frame
10
11         self.threshold = threshold
12
13         self.rate = rospy.Rate(1)

Here, self.hri_listener is a HRIListener object. HRIListener abstracts some of ROS4HRI aspects: for instance, it manages the callbacks reading the lists of detected bodies, faces, voices and persons. This way, you don’t have to define the same callbacks over and over again in different ROS nodes. Later, you will discover more about the HRIListener and pyhri API capabilities.

In the constructor, you have three more objects initialized:

  • self.base_frame defines the robot-related frame to use in the bodies orientation evaluation process.

  • self.threshold defines the amplitude of the interaction cone, i.e. the cone with origin in the body frame within which the base frame can be to consider the body oriented toward the robot.

  • self.rate defines the rate for the execution of the evaluation process.

The evaluation process happens entirely in the run function. Here you can find, within a typical while not rospy.is_shutdown(): loop, the geometric process that establishes which bodies are oriented toward the robot.

BodyOrientationListener.run()#
1 bodies = self.hri_listener.bodies.items()
2
3 bodies_facing_robot = []
4
5 for body in bodies:

Here, the HRIListener object previously instantiated obtains the ids and objects representing the detected bodies. In fact, HRIListener.bodies returns a dictionary where the keys are the bodies ids and the values are the corresponding hri.Body objects. bodies_facing_robot is an empty list we will use to keep track of the bodies oriented toward the robot. Finally, a loop over the detected bodies starts: let’s see what’s inside of it!

BodyOrientationListener.run()#
1 transform = body[1].transform(self.base_frame)

Here, you can see another example where the pyhri API simplifies the management of human related information. Since the body detection pipeline is ROS4HRI compatible, it is publishing a frame for each one of the bodies, representing it in the space. The frame syntax is body_<body_id>. Without any need to create a tf.transform_listener object, the hri.Body object provides the transformation between the base frame (self.base_frame) and the aforementioned body frame.

Now, the idea is to start working from a body perspective, not a robot one. In fact, to understand whether a person is facing a robot or not, you are not interested in the position of the person with respect to the robot, but more in the position of the robot with respect to the body frame. If the body is oriented toward the robot, with a body frame following the REP 155 frame definition, then the robot frame (i.e., the base frame) origin expressed in body frame coordinates will have a positive \(x\) component and a relatively small \(y\) component.

../_images/body_toward_robot.svg

Fig 1. Human oriented toward the robot.#

../_images/body_not_toward_robot.svg

Fig 2. Human not oriented toward the robot.#

In Fig. 1, you can see how the \(d_{b2r}\) (where \(b2r\) stands for body-to-robot), expressed in body frame coordinates, has a relatively small \(y\) component when compared to the \(x\) one. Differently, in Fig 2. you can see that the \(d_{b2r}\) component has a greater \(y\) component in body frame coordinates, suggesting that the human is not oriented toward the robot.

The transform previously highlighted in the code provides the base frame to body frame transformation. Therefore, to execute the geometric reasoning we have previously described, the transformation needs to be inverted.

BodyOrientationListener.run()#
 1trans = transform.transform.translation
 2rot = transform.transform.rotation
 3
 4translation_matrix = t.translation_matrix((trans.x,
 5                                           trans.y,
 6                                           trans.z))
 7
 8quaternion_matrix = t.quaternion_matrix((rot.x,
 9                                         rot.y,
10                                         rot.z,
11                                         rot.w))
12
13transform = t.concatenate_matrices(translation_matrix,
14                                   quaternion_matrix)
15inv_transform = t.inverse_matrix(transform)

In these lines, you obtain the translation and rotation homogenous matrices, multiply them (t.concatenate_matrices) and finally invert the result, to obtain the homogenous matrix representing the transformation from the body frame to the robot frame in body frame coordinates. The forth column of this matrix represents the base frame origin expressed in body frame origin, that is \(d_{b2r}\) from Fig. 1 and Fig. 2.

At this point, all you have to do is checking whether the the base frame origin lies inside of the attention cone defined by the threshold parameter previously defined.

BodyOrientationListener.run()#
 1 # b2r = body to robot
 2
 3 b2r_translation_x = inv_transform[0, 3]
 4 b2r_translation_y = inv_transform[1, 3]
 5
 6 b2r_xy_norm = np.linalg.norm([b2r_translation_x,
 7                               b2r_translation_y],
 8                               ord = 2)
 9
10 if np.arccos(b2r_translation_x/b2r_xy_norm) < \
11   (self.threshold/180*np.pi) and b2r_translation_x > 0:
12     bodies_facing_robot.append(body[0])

In the first lines here, we compute the distance between the body frame and the base frame. Then, we use this information to compute the angle between the body frame \(x\) axis and the \(d_{b2r}\) vector. If this angle is smaller than the threshold, then the body id is appended to bodies_facing_robot.

The last few lines of the run function are:

BodyOrientationListener.run()#
1 print("Bodies facing the robot: ", bodies_facing_robot)
2
3 self.rate.sleep()

Here, the node prints the names of the bodies detected as oriented toward the robot and puts itself to sleep for \(\frac{1}{rate}\) seconds.

In main, you can find the instructions to initialise the node and start the body orientation evaluation process:

node.py#
1 if __name__=="__main__":
2
3     rospy.init_node("body_orientation_listener")
4     bol = BodyOrientationListener(threshold=20)
5
6     bol.run()

Next steps#

  • The node you have developed through this tutorial does not really use the information about the bodies oriented toward the robot. You might think about a cool behaviour exploiting this information and implement it!

  • You might argue that face and gaze orientations could tell us more about a person’s engagement with the robot… and you would be right! Check this tutorial for a possible implementation of an engagement detection node based on face and gaze orientation.

  • If your’re interested in the C++ version of this tutorial, check the libhri tutorial.