../_images/tiagopro-icon.png ../_images/tiago-head-icon.png ../_images/kangaroo-icon.png ../_images/tiago-icon.png ../_images/triago-icon.png ../_images/ari-icon.png ../_images/talos-icon.png ../_images/mobile-bases-icon.png

Build a complete LLM-enabled interactive app#

🏁 Goal of this tutorial

This tutorial will guide you through the installation and use of the ROS4HRI framework, a set of ROS nodes and tools to build interactive social robots.

We will use a set of pre-configured Docker containers to simplify the setup process.

We will also explore how a simple yet complete social robot architecture can be assembled using ROS 2, PAL Robotics’ toolset to quickly generate robot application templates, and a LLM backend.

Social interaction simulator

PAL’ Social interaction simulator#

PART 0: Preparing your environment#

Pre-requisites#

To follow ‘hands-on’ the tutorial, you will need to be able to run a Docker container on your machine, with access to a X server (to display graphical applications like rviz and rqt). We will also use the webcam of your computer.

Any recent Linux distribution should work, as well as MacOS (with XQuartz installed).

The tutorial alo assumes that you have a basic understanding of ROS 2 concepts (topics, nodes, launch files, etc). If you are not familiar with ROS 2, you can check the official ROS 2 tutorials.

Get the public PAL tutorials Docker image#

Fetch the PAL tutorials public Docker image:

docker pull palrobotics/public-tutorials-alum-devel:hri25

Then, run the container, with access to your webcam and your X server.

xhost +
mkdir ros4hri-exchange
docker run -it --name ros4hri \
               --device /dev/video0:/dev/video0 \
               -e DISPLAY=$DISPLAY \
               -v /tmp/.X11-unix:/tmp/.X11-unix \
               -v `pwd`/ros4hri-exchange:/home/user/exchange \
               --net=host \
               palrobotics/public-tutorials-alum-devel:hri25 bash

Note

The --device option is used to pass the webcam to the container, and the -e: DISPLAY and -v /tmp/.X11-unix:/tmp/.X11-unix options are used to display graphical applications on your screen.

PART 1: Warm-up with face detection#

Start the webcam node#

First, let’s start a webcam node to publish images from the webcam to ROS.

In the terminal, type:

ros2 run gscam gscam_node --ros-args -p gscam_config:='v4l2src device=/dev/video0 ! video/x-raw,framerate=30/1 ! videoconvert' \
                                     -p use_sensor_data_qos:=True \
                                     -p camera_name:=camera \
                                     -p frame_id:=camera \
                                     -p camera_info_url:=package://interaction_sim/config/camera_info.yaml

Note

The gscam node is a ROS 2 node that captures images from a webcam and publishes them on a ROS topic. The gscam_config parameter is used to specify the webcam device to use (/dev/video0), and the camera_info_url parameter is used to specify the camera calibration file. We use a default calibration file that works reasonably well with most webcams.

You can open rqt to check that the images are indeed published:

rqt

Note

If you need to open another Docker terminal, run

docker exec -it -u user ros4hri bash

Then, in the Plugins menu, select Visualization > Image View, and choose the topic /camera/image_raw:

rqt image view

rqt image view#

Face detection#

hri_face_detect is an open-source ROS 1/ROS 2 node, compatible with ROS4HRI, that detects faces in images. This node is installed by default on all PAL robots.

It is already installed in the Docker container.

By default, hri_face_detect expect images on /image topic: before starting the node, we need to configure topic remapping:

mkdir -p $HOME/.pal/config
nano $HOME/.pal/config/ros4hri-tutorials.yml

Then, paste the following content:

/hri_face_detect:
   remappings:
      image: /camera/image_raw
      camera_info: /camera/camera_info

Press Ctrl+O to save, then Ctrl+X to exit.

Then, you can launch the node:

ros2 launch hri_face_detect face_detect.launch.py

You should see on your console which configuration files are used:

$ ros2 launch hri_face_detect face_detect.launch.py
[INFO] [launch]: All log files can be found below /home/user/.ros/log/2024-10-16-12-39-10-518981-536d911a0c9c-203
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [launch.user]: Loaded configuration for <hri_face_detect>:
- System configuration (from lower to higher precedence):
    - /opt/pal/alum/share/hri_face_detect/config/00-defaults.yml
- User overrides (from lower to higher precedence):
    - /home/user/.pal/config/ros4hri-tutorials.yml
[INFO] [launch.user]: Parameters:
- processing_rate: 30
- confidence_threshold: 0.75
- image_scale: 0.5
- face_mesh: True
- filtering_frame: camera_color_optical_frame
- deterministic_ids: False
- debug: False
[INFO] [launch.user]: Remappings:
- image -> /camera/image_raw
- camera_info -> /camera/camera_info
[INFO] [face_detect-1]: process started with pid [214]
...

Note

This way of managing launch parameters and remapping is not part of base ROS 2: it is an extension (available in ROS humble) provided by PAL Robotics to simplify the management of ROS 2 nodes configuration.

See for instance the launch file of hri_face_detect to understand how it is used.

You should immediately see on the console that some faces are indeed detected

Let’s visualise them:

  1. start rviz2:

rviz2
  1. In rviz, visualize the detected faces by adding the Humans plugin, which you can find in the hri_rviz plugins group. The plugin setup requires you to specify the image stream you want to use to visualize the detection results, in this case /camera/image_raw. You can also find the plugin as one of those available for the /camera/image_raw topic.

Important

Set the quality of service (QoS) of the /camera/image_raw topic to Best Effort, otherwise no image will be displayed:

Set the QoS of the ``/camera/image_raw`` topic to ``Best Effort``

Set the QoS of the /camera/image_raw topic to Best Effort#

  1. In rviz, enable as well the tf plugin, and set the fixed frame to camera. You should now see a 3D frame, representing the face position and orientation of your face.

rviz showing a 3D face frame

rviz showing a 3D face frame#

📚 Learn more

This tutorial does not go much further with exploring the ROS4HRI tools and nodes. However, you can find more information:

You can also check the ROS4HRI Github organisation and the original paper.

PART 2: the social interaction simulator#

Starting the interaction simulator#

Instead of running nodes manually, we are now going to use PAL Interaction simulator:

Social interaction simulator

PAL’s Social interaction simulator#

To start the simulator:

  1. stop all the nodes that are running (like gscam, hri_face_detect, rqt, etc)

  2. in one of your Docker terminals, launch the simulator:

ros2 launch interaction_sim simulator.launch.py

To load a pre-configured scene:

On the screenshot above, the objets (sofa, table, etc) are defined in an custom SVG file.

You can load such a pre-configured scene:

  1. click on Settings, then Download environment SVG template and save the file in the exchange folder (call it for instance scene.svg).

  2. click on Load environment and select the file you just saved.

Note

You can open the SVG file in a vector editor like Inkscape to modify the scene (add new objects, change the layout, etc). Check the instructions written in the template itself.

Interaction simulator architecture#

The interaction simulator starts several nodes:

The previous two:

  1. gscam to publish images from the webcam

  2. hri_face_detect to detect faces in images

And the following new nodes:

  1. hri_person_manager, to ‘combine’ faces, bodies, voices into full persons

  2. hri_emotion_recognizer, to recognize emotions on the detected faces

  3. knowledge_core, an open-source OWL/RDF-based knowledge base

  4. hri_visualization to generate a camera image overlay with the faces, bodies, emotions, etc

  5. attention_manager (not open-source), that decides where to look based on the where the faces are

  6. expressive_eyes (not open-source), that procedurally generates the robot’s face and moves the eyes

  7. communication_hub (not open-source), that manages the dialogues with the user (user input speech, and robot output speech)

Finally, it launches rqt with two custom plugins:

  1. rqt_human_radar, to visualize the detected people around the robot (and simulated interactions with a knowledge base)

  2. rqt_chat, to chat with the robot. When you type a message, it is sent to the ROS4HRI topic /humans/voices/anonymous_speaker/speech, and the robot’s response via the /tts_engine/tts action are displayed back.

The next figure shows the architecture of the interaction simulator:

Interaction simulator architecture

Interaction simulator architecture#

Using the simulator to add symbolic knowledge#

When starting the simulator, knowledge_core is also started. knowledge_core is a simple OWL/RDF-based knowledge base that can be used to store symbolic information about the world (get more information about 💡 Knowledge and reasoning on PAL robots).

By right-clicking on the top-down view of the environment, you can add new objects to the knowledge base:

Adding a new object to the knowledge base

Adding a new object to the knowledge base#

The simulator will then publish the new facts on to the knowledge base, including whether or not a given object is in the field of view of the robot and/or humans (eg myself sees cup_abcd or person_lkhgx sees sofa).

To visualize the knowledge base, we need to start PAL web-based knowledge base viewer.

In a new terminal, start the viewer:

ros2 launch knowledge_core knowledge_viewer.launch.py

Then, open your web browser at http://localhost:8000 to explore the knowledge base.

For instance, with the following scene:

Scene with a sofa and a cup

Scene with a sofa and a cup#

the knowledge base will contain new facts, including:

Knowledge base viewer

Knowledge base viewer#

Note

The robot’s own ‘instance’ is always called myself in the knowledge base.

For instance myself sees cup_oojnc means that the robot sees the cup cup_oojnc.

Note

As of Feb 2025, rqt_human_radar computes and pushes to the knowledge base the following facts:

  • <object|agent> rdf:type <class>

  • myself sees <object>

  • person_<id> sees <object>

  • <object|agent> isIn <zone>

  • <object|agent> isOn <object>

(note that the field of view of the robot/humans does not take walls into account yet!)

Accessing the knowledge base from Python#

You can easily query the knowledge base from Python:

Start ipython3 in the terminal:

ipython3

Then, in the Python shell:

from knowledge_core.api import KB
kb = KB()

kb["* sees *"]

This will return all the facts in the knowledge base that match the pattern * sees * (ie, all the objects that are seen by someone).

Likewise,

kb["* isIn kitchen"]

will return all the objects (or simulated humans) that are in the kitchen.

You can also create more complex queries by passing a list of semantic patterns and using named variables:

kb[["?h sees ?o", "?o rdf:type dbr:Cup", "?h rdf:type Human"]]

This will return all the facts in the knowledge base where a human sees a cup.

Note

Note the dbr: prefix in front of Cup: the simulator uses the ‘cup’ concept defined in the DBPedia ontology.

📚 Learn more

PART 3: Building a simple social behaviour#

Our first mission controller#

A mission controller is a ROS node that orchestrates the robot’s behaviour.

We will implement our first mission controller as a simple Python script that copies your facial expression onto the robot’s face: an emotion mirroring game.

Since creating a complete ROS 2 node from scratch can be a bit tedious, we will use the rpk tool, a command-line tool created by PAL Robotics, that generates ROS 2 nodes from templates.

Note

rpk is already installed in the Docker container. You can also install it easily on your own machines with pip install rpk

As the generator tool itself does not require ROS, you can use it on any machine, including eg Windows.

Learn more about rpk in its dedicated page: Automatic code generation with rpk.

Step 1: generating the mission controller#

  • go to your exchange folder and create a new workspace:

cd ~/exchange

# you might have to change the rights of the folder
sudo chown user:user .

mkdir ws
cd ws
  • run rpk to create the mission controller:

$ rpk create -p src/ mission
ID of your application? (must be a valid ROS identifier without spaces or hyphens. eg 'robot_receptionist')
emotion_mirror
Full name of your skill/application? (eg 'The Receptionist Robot' or 'Database connector', press Return to use the ID. You can change it later)


Choose a template:
1: base robot supervisor [python]
2: robot supervisor with pre-filled intent handlers [python]
3: robot supervisor with a GUI and pre-filled intent handlers [python]
4: complete supervisor example, using a basic chatbot to manage interactions with users [python]
5: complete supervisor example, using LLMs to manage interactions with users [python]

Your choice? 1

What robot are you targeting?
1: Generic robot (generic)
2: Generic PAL robot/simulator (generic-pal)
3: PAL ARI (ari)
4: PAL TIAGo (tiago)
5: PAL TIAGo Pro (tiago-pro)
6: PAL TIAGo Head (tiago-head)

Your choice? (default: 1: generic) 2

Choose a base robot supervisor template, and the generic-pal robot.

The tool will then create a complete ROS 2 mission controller, ready to listen to incoming user intents (eg, ROS messages pubished on the /intents topic).

  • build and source the workspace:

colcon build

source install/setup.bash
  • start the mission controller:

ros2 launch emotion_mirror emotion_mirror.launch.py

If you now write a line in the rqt_chat plugin, you should see the mission controller reacting to it:

[...]
[run_app-1] [INFO] [1729672049.773179100] [emotion_mirror]: Listening to /intents topic
[run_app-1] [INFO] [1729672099.529532393] [emotion_mirror]: Received an intent: __raw_user_input__
[run_app-1] [INFO] [1729672099.529859652] [emotion_mirror]: Processing input: __raw_user_input__

Note

rqt_chat displays ‘Processing…’ as it waits for a response from the mission controller. The mission controller is not doing anything yet, so ‘Processing…’ is displayed indefinitely (for now!).

The intent __raw_user_input__ is emitted by the communication_hub, and is a special intent that essentially means: “I don’t know what to do with this input, but I received it”.

Since we are not doing anything with the input yet (like trying to figure out what the user wants be calling a dedicated chatbot), the communication hub simply sends back the same message with this intent.

Step 2: add basic interactivity#

Modify the template to say something back. To start with, we will always respond the same thing.

  • open mission_controller.py:

cd src/emotion_mirror/emotion_mirror
nano mission_controller.py
  • in the constructor, create a action client:

 1#...
 2from rclpy.action import ActionClient
 3from tts_msgs.action import TTS
 4#...
 5
 6class MissionController(Node):
 7    def __init__(self):
 8
 9        #...
10        self.tts = ActionClient(self, TTS, '/say')
11        self.tts.wait_for_server()
  • modify the handling on incoming intents to say something back:

1    def on_intent(self, msg):
2        #...
3
4        if msg.intent == Intent.RAW_USER_INPUT:
5            goal = TTS.Goal()
6            goal.input = "I'm the famous emotion mirroring robot! (but I'm not great at chatting yet)"
7            self.tts.send_goal_async(goal)

(up to you to come up with something funny!)

Re-run colcon build and relaunch your mission controller to test it with the chat interface.

Since we are using communication_hub as a middle-man, we can also use markup in our sentences to change the expression of the robot.

For example, you can use the following markup to display an happy face:

"<set expression(happy)> I'm the famous emotion mirroring robot! My holiday was great <set expression(amazed)>, thank you for asking! <set expression(neutral)>"

📚 Learn more

You can check the documentation of the markup language used by the communication_hub to get the list of available actions and expressions.

Step 3: emotion mimicking game#

We can now extend our mission controller to implement our emotion mirroring game.

To get the recognised facial expression of the current user, we can use the ROS4HRI tools:

For instance:

1from hri import HRIListener
2
3hri_listener = HRIListener("mimic_emotion_hrilistener")
4
5for face_id, face in hri_listener.faces.items():
6    if face.expression:
7            print(f"Face {face_id} shows expression {face.expression}")

Then, we can set the same expression onto the robot face:

1from hri_msgs.msg import Expression
2
3expression_pub = self.create_publisher(Expression, "/robot_face/expression", QoSProfile(depth=10))
4
5msg = Expression()
6msg.expression = "happy" # see Expression.msg for all available expressions
7expression_pub.publish(msg)

Modify the mission controller to add a background ‘job’ of checking for people’s face, and accordingly set the robot’s expression:

Add a run() method at the bottom of your mission controller:

 1def run(self) -> None:
 2
 3    self.get_logger().info("Checking expressions...")
 4
 5    for face_id, face in self.hri_listener.faces.items():
 6        if face.expression:
 7                print(f"Face {face_id} shows expression {face.expression}")
 8
 9                msg = Expression()
10                msg.expression = face.expression
11                self.expression_pub.publish(msg)

Then, in the __init__ constructor, create the HRI listener, the expression publisher, and a timer to regularly call the run() method:

1def __init__(self):
2
3
4    # ...
5
6    self.hri_listener = HRIListener("mimic_emotion_hrilistener")
7    self.expression_pub = self.create_publisher(Expression, "/robot_face/expression", QoSProfile(depth=10))
8
9    self._timer = self.create_timer(0.1, self.run) # check at 10Hz

The final complete script should look like:

mission_controller.py#
 1import json
 2
 3from rclpy.node import Node
 4from rclpy.qos import QoSProfile
 5from rclpy.action import ActionClient
 6
 7from hri_actions_msgs.msg import Intent
 8from hri_msgs.msg import Expression
 9from tts_msgs.action import TTS
10from hri import HRIListener
11
12class MissionController(Node):
13
14    def __init__(self) -> None:
15        super().__init__('app_emotion_mirror')
16
17        self.get_logger().info("Initialising...")
18
19        self._intents_sub = self.create_subscription(
20            Intent,
21            '/intents',
22            self.on_intent,
23            10)
24        self.get_logger().info("Listening to %s topic" %
25                               self._intents_sub.topic_name)
26
27        self.hri_listener = HRIListener("mimic_emotion_hrilistener")
28        self.expression_pub = self.create_publisher(Expression, "/robot_face/expression", QoSProfile(depth=10))
29        self.last_expression = ""
30
31        self.tts = ActionClient(self, TTS, '/say')
32        self.tts.wait_for_server()
33
34        self._timer = self.create_timer(0.1, self.run) # check at 10Hz
35
36    def __del__(self):
37        self.get_logger().info("Destroying Mission Controller")
38        self.destroy_subscription(self._intents_sub)
39
40    def on_intent(self, msg) -> None:
41
42        self.get_logger().info("Received an intent: %s" % msg.intent)
43
44        data = json.loads(msg.data) if msg.data else {}  # noqa: F841
45        source = msg.source  # noqa: F841
46        modality = msg.modality  # noqa: F841
47        confidence = msg.confidence  # noqa: F841
48        priority_hint = msg.priority  # noqa: F841
49
50        if msg.intent == Intent.RAW_USER_INPUT:
51            self.get_logger().info(f"Processing input: {msg.intent} -> {data}")
52            goal = TTS.Goal()
53            goal.input = "I'm the famous emotion mirroring robot!"
54            self.tts.send_goal_async(goal)
55        else:
56            self.get_logger().warning("I don't know how to process intent "
57                                      "<%s>!" % msg.intent)
58    def run(self) -> None:
59
60        faces = list(self.hri_listener.faces.items())
61        if faces:
62            # only consider the first detected face
63            face_id, face = faces[0]
64            if face.expression and face.expression != self.last_expression:
65                    expression = face.expression.name.lower()
66                    print(f"Face {face_id} shows expression {expression}")
67
68                    goal = TTS.Goal()
69                    goal.input = f"you look {expression}. Same for me!"
70                    self.tts.send_goal_async(goal)
71
72                    msg = Expression()
73                    msg.expression = expression
74                    self.expression_pub.publish(msg)
75
76                    self.last_expression = face.expression

PART 4: Integration with LLMs#

Adding a chatbot#

Step 1: creating a chatbot#

  1. use rpk to create a new chatbot skill using the basic chabot intent extraction template:

$ rpk create -p src intent
ID of your application? (must be a valid ROS identifier without spaces or hyphens. eg 'robot_receptionist')
chatbot
Full name of your skill/application? (eg 'The Receptionist Robot' or 'Database connector', press Return to use the ID. You can change it later)


Choose a template:
1: basic chatbot template [python]
2: complete intent extraction example: LLM bridge using the OpenAI API (ollama, chatgpt) [python]

Your choice? 1

What robot are you targeting?
1: Generic robot (generic)
2: Generic PAL robot/simulator (generic-pal)
3: PAL ARI (ari)
4: PAL TIAGo (tiago)
5: PAL TIAGo Pro (tiago-pro)
6: PAL TIAGo Head (tiago-head)

Your choice? (default: 1: generic) 2

Compile and run the chatbot:

colcon build
source install/setup.bash
ros2 launch chatbot chatbot.launch.py

If you know type a message in the rqt_chat plugin, you should see the chatbot responding to it:

Chatbot responding to a message

Chatbot responding to a message#

You can also see in the chat window the intents that the chatbot has identified in the user input. For now, our basic chatbot only recognises the __intent_greet__ intent when you type Hi or Hello.

Step 2: integrating the chatbot with the mission controller#

To fully understand the intent pipeline, we will modify the chatbot to recognise a ‘pick up’ intent, and the mission controller to handle it.

  • open chatbot/node_impl.py and modify your chatbot to check whether incoming speech matches [please] pick up [the] <object>:

 1import re
 2
 3def contains_pickup(sentence):
 4    sentence = sentence.lower()
 5
 6    # matches sentences like: [please] pick up [the] <object> and return <object>
 7    pattern = r"(?:please\s+)?pick\s+up\s+(?:the\s+)?(\w+)"
 8    match = re.search(pattern, sentence)
 9    if match:
10        return match.group(1)
  • then, in the on_get_response function, check if the incoming speech matches the pattern, and if so, return a __intent_grab_object__:

 1def on_get_response(self, request, response):
 2
 3    #...
 4
 5    pick_up_object = self.contains_pickup(input)
 6    if pick_up_object:
 7        self.get_logger().warn(f"I think the user want to pick up a {pick_up_object}. Sending a GRAB_OBJECT intent")
 8        intent = Intent(intent=Intent.GRAB_OBJECT,
 9                        data=json.dumps({"object": pick_up_object}),
10                        source=user_id,
11                        modality=Intent.MODALITY_SPEECH,
12                        confidence=.8)
13        suggested_response = f"Sure, let me pick up this {pick_up_object}"
14    # elif ...

Note

the Intent message is defined in the hri_actions_msgs package, and contains the intent, the data associated with the intent, the source of the intent (here, the current user_id), the modality (here, speech), and the confidence of the recognition.

Check the Intents documentation for details, or directly the Intent.msg definition.

Test your updated chatbot by recompiling the workspace (colcon build) and relaunching the chatbot.

If you now type pick up the cup in the chat window, you should see the chatbot recognising the intent and sending a GRAB_OBJECT intent to the mission controller.

  • finally, modify the mission controller function handling inbound intents, in order to manage the GRAB_OBJECT intent. Open

     1def on_intent(self, msg):
     2    #...
     3
     4    if msg.intent == Intent.GRAB_OBJECT:
     5        # on a real robot, you would call here a manipulation skill
     6        goal = TTS.Goal()
     7        goal.input = f"<set expression(tired)> That {data['object']} is really heavy...! <set expression(neutral)>"
     8        self.tts.send_goal_async(goal)
     9
    10    # ...
    

Re-compile and re-run the mission controller. If you now type pick up the cup in the chat window, you should see the mission controller reacting to it.

📚 Learn more

In this example, we directly use the /say skill to respond to the user.

When developing a full application, you usually want to split your architecture into multiple nodes, each responsible for a specific task.

The PAL application model, based on the RobMoSys methodology, encourages the development of a single mission controller, and a series of tasks and skills that are orchestrated by the mission controller.

You can read more about this model here: 📝 Developing robot apps.

Integrating with a Large Language Model (LLM)#

Next, let’s integrate with an LLM.

Step 1: install ollama#

ollama is an open-source tool that provides a simple REST API to interact with a variety of LLMs. It makes it easy to install different LLMs, and to call them using the same REST API as, eg, OpenAI’s ChatGPT.

To install ollama on your machine, follow the instructions on the official repository:

curl -fsSL https://ollama.com/install.sh | sh

Once it is installed, you can start the ollama server with:

ollama serve

Open a new Docker terminal, and run the following command to download a first model and check it works:

ollama run llama3.2:1b

Note

Visit the ollama model page to see the list of available models.

Depending on the size of the model and your computer configuration, the response time can be quite long.

If you have a NVIDIA GPU, you might want to relaunch your Docker container with GPU support. Check the instructions on the NVidia website.

Alternatively, you can run ollama on your host machine, as we will interact with it via a REST API.

Step 2: calling ollama from the chatbot#

ollama can be accessed from your code either by calling the REST API directly, or by using the ollama Python binding. While the REST API is more flexible (and makes it possible to easily use other OpenAI-compatible services, like ChatGPT), the Python binding is very easy to use.

Note

If you are curious about the REST API, use rpk LLM chatbot template to generate an example of a chatbot that calls ollama via the REST API.

  • install the ollama python binding inside your Docker image:

    pip install ollama
    
  • Modify your chatbot to connect to ollama, using a custom prompt. Open chatbot/chatbot/node_impl.py do the following changes:

 1# add to the imports
 2from ollama import Client
 3
 4# ...
 5
 6class IntentExtractorImpl(Node):
 7
 8    # modify the constructor:
 9    def __init__(self) -> None:
10        # ...
11
12        self._ollama_client = Client()
13        # if ollama does not run on the local host, you can specify the host and
14        # port. For instance:
15        # self._ollama_client = Client("x.x.x.x:11434")
16
17        # dialogue history
18        self.messages = [
19            {"role": "system",
20             "content": """
21                You are a helpful robot, always eager to help.
22                You always respond with concise and to-the-point answers.
23             """
24             }]
25
26    # modify on_get_response:
27    def on_get_response(self, request: GetResponse.Request, response: GetResponse.Response):
28
29        user_id = request.user_id
30        input = request.input
31
32        self.get_logger().info(
33            f"new input from {user_id}: {input}... sending it to the LLM")
34        self._nb_requests += 1
35
36        self.messages.append({"role": "user", "content": input})
37
38        llm_res = self._ollama_client.chat(
39            messages=self.messages,
40            model="llama3.2:1b"
41        )
42
43        content = llm_res.message.content
44
45        self.get_logger().info(f"The LLM answered: {content}")
46
47        self.messages.append({"role": "assistant", "content": content})
48
49        response.response = content
50        response.intents = []
51
52        return response

As you can see, calling ollama is as simple as creating a Client object and calling its chat method with the messages to send to the LLM and the model to use.

In this example, we append to the chat history (self.messages) the user input and the LLM response after each interaction, thus building a complete dialogue.

Recompile and restart the chatbot. If you now type a message in the chat window, you should see the chatbot responding with a text generated by the LLM:

Example of a chatbot response generated by an LLM

Example of a chatbot response generated by an LLM#

Attention

Depending on the LLM model you use, the response time can be quite long. By default, after 10s, communication_hub will time out. In that case, the chatbot answer will not be displayed in the chat window.

Step 3: extract user intents#

To recognise intents from the LLM response, we can use a combination of prompt engineering and LLM structured output.

  • to generate structured output (ie, a JSON-structured response that includes the recognised intents), we first need to write a Python object that corresponds to the expected output of the LLM:

 1from pydantic import BaseModel
 2from typing import Literal
 3from hri_actions_msgs.msg import Intent
 4
 5# Define the data models for the chatbot response and the user intent
 6class IntentModel(BaseModel):
 7    type: Literal[Intent.BRING_OBJECT,
 8                  Intent.GRAB_OBJECT,
 9                  Intent.PLACE_OBJECT,
10                  Intent.GUIDE,
11                  Intent.MOVE_TO,
12                  Intent.SAY,
13                  Intent.GREET,
14                  Intent.START_ACTIVITY,
15                  ]
16    object: str | None
17    recipient: str | None
18    input: str | None
19    goal: str | None
20
21class ChatbotResponse(BaseModel):
22    verbal_ack: str | None
23    user_intent: IntentModel | None

Here, we use the type BaseModel from the pydantic library so that we can generate the formal model corresponding to this Python object (using the JSON schema specification).

  • then, modify the chatbot to force the LLM to return a JSON-structured response that includes the recognised intents:

 1    # ...
 2
 3    def on_get_response(self, request: GetResponse.Request, response: GetResponse.Response):
 4
 5        user_id = request.user_id
 6        input = request.input
 7
 8        self.get_logger().info(
 9            f"new input from {user_id}: {input}... sending it to the LLM")
10        self._nb_requests += 1
11
12        self.messages.append({"role": "user", "content": input})
13
14        llm_res = self._ollama_client.chat(
15            messages=self.messages,
16            model="llama3.2:1b",
17            format=ChatbotResponse.model_json_schema()
18        )
19
20        json_res = ChatbotResponse.model_validate_json(llm_res.message.content)
21
22        self.get_logger().info(f"The LLM answered: {json_res}")
23
24        verbal_ack = json_res.verbal_ack
25        if verbal_ack:
26            # if we have a verbal acknowledgement, add it to the dialogue history,
27            # and send it to the user
28            self.messages.append({"role": "assistant", "content": verbal_ack})
29            response.response = verbal_ack
30
31        user_intent = json_res.user_intent
32        if user_intent:
33            response.intents = [Intent(
34                intent=user_intent.type,
35                data=json.dumps(user_intent.model_dump())
36            )]
37
38        return response

Now, the LLM will always return a JSON-structured response that includes an intent (if one was recognised), and a verbal acknowledgement. For instance, when asking the robot to bring an apple, it returns an intent PLACE_OBJECT with the object apple:

Example of a structured LLM response

Example of a structured LLM response#

Step 4: prompt engineering to improve intent recognition#

To improve the intent recognition, we can use prompt engineering: we can provide the LLM with a prompt that will guide it towards generating a response that includes the intents we are interested in.

One key trick is to provide the LLM with examples of the intents we are interested in.

Here an example of a longer prompt, that would yield better results:

PROMPT = """
You are a friendly robot called $robot_name. You try to help the user to the best of your abilities.
You are always helpful, and ask further questions if the desires of the user are unclear.
Your answers are always polite yet concise and to-the-point.

Your aim is to extract the user goal.

Your response must be a JSON object with the following fields (both are optional):
- verbal_ack: a string acknowledging the user request (like 'Sure', 'I'm on it'...)
- user_intent: the user overall goal (intent), with the following fields:
  - type: the type of intent to perform (e.g. "__intent_say__", "__intent_greet__", "__intent_start_activity__", etc.)
  - any thematic role required by the intent. For instance: `object` to
    relate the intent to the object to interact with (e.g. "lamp",
    "door", etc.)

Importantly, `verbal_ack` is meant to be a *short* acknowledgement sentence,
unconditionally uttered by the robot, indicating that you have understood the request -- or that we need more information.
For more complex verbal actions, return a `__intent_say__` instead.

However, for answers to general questions that do not require any action
(eg: 'what is your name?'), the 'user_intent' field can be omitted, and the
'verbal_ack' field should contain the answer.

The user_id of the person you are talking to is $user_id. Always use this ID when referring to the person in your responses.

Examples
- if the user says 'Hello robot', you could respond:
{
    "user_intent": {"type": "__intent_greet__", "recipient": "$user_id"}
}

- if the user says 'What is your name?', you could respond:
{
    "verbal_ack":"My name is $robot_name. What is your name?"
}

- if the user say 'take a fruit', you could respond (assuming a object 'apple1' of type 'Apple' is visible):
{
    "user_intent": {
            "type":"__intent_grab_object__",
            "object":"apple1",
    },
    "verbal_ack": "Sure"
}

- if the user say 'take a fruit', but you do not know about any fruit. You could respond:
{
    "verbal_ack": "I haven't seen any fruits around. Do you want me to check in the kitchen?"
}

- the user says: 'clean the table'. You could return:
{
    "user_intent": {
        "type":"__intent_start_activity__",
        "object": "cleaning_table"
    },
    "verbal_ack": "Sure, I'll get started"
}

If you are not sure about the intention of the user, return an empty user_intent and ask for confirmation with the verbal_ack field.
"""

This prompt uses Python’s templating system to include the robot’s name and the user’s ID in the prompt.

You can use this prompt in your script by substituting the variables with the actual values:

from string import Template
actual_prompt = Template(PROMPT).safe_substitute(robot_name="Robbie", user_id="Alice")

Then, you can use this prompt in the ollama call:

# ...

def __init(self) -> None:

    # ...

    self.messages = [
        {"role": "system",
            "content": Template(PROMPT).safe_substitute(robot_name="Robbie", user_id="user1")
            }]

    # ...

Closing the loop: integrating LLM and symbolic knowledge representation#

Finally, we can use the knowledge base to improve the intent recognition.

For instance, if the user asks the robot to bring the apple, we can use the knowledge base to check whether an apple is in the field of view of the robot.

Note

It is often convenient to have a Python interpreter open to quickly test knowledge base queries.

Open ipython3 in a terminal from within your Docker image, and then:

from knowledge_core.api import KB; kb = KB()
kb["* sees *"] # etc.

First, let’s query the knowledge base for all the objects that are visible to the robot:

 1from knowledge_core.api import KB
 2
 3# ...
 4
 5def __init__(self) -> None:
 6
 7    # ...
 8
 9    self.kb = KB()
10
11
12def environment(self) -> str:
13    """ fetch all the objects and humans visible to the robot,
14    get for each of them their class and label, and return a string
15    that list them all.
16    """
17
18    environment_description = ""
19
20    seen_objects = self.kb["myself sees ?obj"]
21    for obj in [item["obj"] for item in seen_objects]:
22        details= self.kb.details(obj)
23        label= details["label"]["default"]
24        classes= details["attributes"][0]["values"]
25        class_name= None
26        if classes:
27                class_name= classes[0]["label"]["default"]
28                environment_description += f"- I see a {class_name} labeled {label}.\n"
29        else:
30            environment_description += f"- I see {label}.\n"
31
32    self.get_logger().info(
33        f"Environment description:\n{environment_description}")
34    return environment_description

Note

The kb.details method returns a dictionary with details about a given knowledge concept. The attributes field contains e.g. the class of the object (if known or inferred by the knowledg base).

📚 Learn more

To inspect in details the knowledge base, we recommend using Protégé, an open-source tool to explore and modify ontologies.

The ontology used by the robot (and the interaction simulator) is stored in /opt/pal/alum/share/oro/ontologies/oro.owl. Copy this file to your ~/exchange folder to access it from your host and inspect it with Protégé.

We can then use this information to ground the user intents in the physical world of the robot.

First, add the following two lines at the end of your prompt template:

This is a description of the environment:

$environment

Then, add a new method to your chatbot to generate the prompt:

 1def __init__(self) -> None:
 2
 3    # ...
 4
 5    self.messages = [
 6        {"role": "system",
 7            "content": self.prepare_prompt("user1")
 8            }]
 9
10    # ...
11
12def prepare_prompt(self, user_id: str) -> str:
13
14    environment = self.environment()
15
16    return Template(PROMPT).safe_substitute(robot_name="Robbie",
17                                            environment=environment,
18                                            user_id=user_id)

You could also call the environment method before each call to the LLM, to get the latest environment description.

Re-compile and restart your chatbot. You can now ask the robot e.g. what it sees.

The final chatbot code should look like:

chatbot/node_impl.py#
  1import json
  2from ollama import Client
  3
  4from knowledge_core.api import KB
  5
  6from rclpy.lifecycle import Node
  7from rclpy.lifecycle import State
  8from rclpy.lifecycle import TransitionCallbackReturn
  9from rcl_interfaces.msg import ParameterDescriptor
 10from rclpy.action import ActionServer, GoalResponse
 11
 12from chatbot_msgs.srv import GetResponse, ResetModel
 13from hri_actions_msgs.msg import Intent
 14from i18n_msgs.action import SetLocale
 15from i18n_msgs.srv import GetLocales
 16
 17from diagnostic_msgs.msg import DiagnosticArray, DiagnosticStatus, KeyValue
 18
 19from pydantic import BaseModel
 20from typing import Literal
 21from hri_actions_msgs.msg import Intent
 22from string import Template
 23
 24PROMPT = """
 25You are a friendly robot called $robot_name. You try to help the user to the best of your abilities.
 26You are always helpful, and ask further questions if the desires of the user are unclear.
 27Your answers are always polite yet concise and to-the-point.
 28
 29Your aim is to extract the user goal.
 30
 31Your response must be a JSON object with the following fields (both are optional):
 32- verbal_ack: a string acknowledging the user request (like 'Sure', 'I'm on it'...)
 33- user_intent: the user overall goal (intent), with the following fields:
 34  - type: the type of intent to perform (e.g. "__intent_say__", "__intent_greet__", "__intent_start_activity__", etc.)
 35  - any thematic role required by the intent. For instance: `object` to
 36    relate the intent to the object to interact with (e.g. "lamp",
 37    "door", etc.)
 38
 39Importantly, `verbal_ack` is meant to be a *short* acknowledgement sentence,
 40unconditionally uttered by the robot, indicating that you have understood the request -- or that we need more information.
 41For more complex verbal actions, return a `__intent_say__` instead.
 42
 43However, for answers to general questions that do not require any action
 44(eg: 'what is your name?'), the 'user_intent' field can be omitted, and the
 45'verbal_ack' field should contain the answer.
 46
 47The user_id of the person you are talking to is $user_id. Always use this ID when referring to the person in your responses.
 48
 49Examples
 50- if the user says 'Hello robot', you could respond:
 51{
 52    "user_intent": {"type": "__intent_greet__", "recipient": "$user_id"}
 53}
 54
 55- if the user says 'What is your name?', you could respond:
 56{
 57    "verbal_ack":"My name is $robot_name. What is your name?"
 58}
 59
 60- if the user say 'take a fruit', you could respond (assuming a object 'apple1' of type 'Apple' is visible):
 61{
 62    "user_intent": {
 63            "type":"__intent_grab_object__",
 64            "object":"apple1",
 65    },
 66    "verbal_ack": "Sure"
 67}
 68
 69- if the user say 'take a fruit', but you do not know about any fruit. You could respond:
 70{
 71    "verbal_ack": "I haven't seen any fruits around. Do you want me to check in the kitchen?"
 72}
 73
 74- the user says: 'clean the table'. You could return:
 75{
 76    "user_intent": {
 77        "type":"__intent_start_activity__",
 78        "object": "cleaning_table"
 79    },
 80    "verbal_ack": "Sure, I'll get started"
 81}
 82
 83If you are not sure about the intention of the user, return an empty user_intent and ask for confirmation with the verbal_ack field.
 84
 85This is a description of the environment:
 86
 87$environment
 88"""
 89
 90
 91# Define the data models for the chatbot response and the user intent
 92class IntentModel(BaseModel):
 93    type: Literal[Intent.BRING_OBJECT,
 94                  Intent.GRAB_OBJECT,
 95                  Intent.PLACE_OBJECT,
 96                  Intent.GUIDE,
 97                  Intent.MOVE_TO,
 98                  Intent.SAY,
 99                  Intent.GREET,
100                  Intent.START_ACTIVITY,
101                  ]
102    object: str | None
103    recipient: str | None
104    input: str | None
105    goal: str | None
106
107
108class ChatbotResponse(BaseModel):
109    verbal_ack: str | None
110    user_intent: IntentModel | None
111##################################################
112
113class IntentExtractorImpl(Node):
114
115    def __init__(self) -> None:
116        super().__init__('intent_extractor_chatbot')
117
118        # Declare ROS parameters. Should mimick the one listed in config/00-defaults.yaml
119        self.declare_parameter(
120            'my_parameter', "my_default_value.",
121            ParameterDescriptor(
122                description='Important parameter for my chatbot')
123        )
124
125        self.get_logger().info("Initialising...")
126
127        self._get_response_srv = None
128        self._reset_srv = None
129        self._get_supported_locales_server = None
130        self._set_default_locale_server = None
131
132        self._timer = None
133        self._diag_pub = None
134        self._diag_timer = None
135
136        self.kb = KB()
137
138        self._nb_requests = 0
139
140        self._ollama_client = Client()
141        # if ollama does not run on the local host, you can specify the host and
142        # port. For instance:
143        # self._ollama_client = Client("x.x.x.x:11434")
144
145        self.messages = [
146            {"role": "system",
147             "content": self.prepare_prompt("user1")
148             }]
149
150        self.get_logger().info('Chatbot chatbot started, but not yet configured.')
151
152    def environment(self) -> str:
153        environment_description = ""
154
155        seen_objects = self.kb["myself sees ?obj"]
156        for obj in [item["obj"] for item in seen_objects]:
157            details = self.kb.details(obj)
158            label = details["label"]["default"]
159            classes = details["attributes"][0]["values"]
160            class_name = None
161            if classes:
162                class_name = classes[0]["label"]["default"]
163                environment_description += f"- I see a {class_name} labeled {label}.\n"
164            else:
165                environment_description += f"- I see {label}.\n"
166
167        self.get_logger().info(
168            f"Environment description:\n{environment_description}")
169        return environment_description
170
171    def prepare_prompt(self, user_id: str) -> str:
172
173        environment = self.environment()
174
175        return Template(PROMPT).safe_substitute(robot_name="Robbie",
176                                                environment=environment,
177                                                user_id=user_id)
178
179    def on_get_response(self, request: GetResponse.Request, response: GetResponse.Response):
180
181        user_id = request.user_id
182        input = request.input
183
184        self.get_logger().info(
185            f"new input from {user_id}: {input}... sending it to the LLM")
186        self._nb_requests += 1
187
188        self.messages.append({"role": "user", "content": input})
189
190        llm_res = self._ollama_client.chat(
191            messages=self.messages,
192            # model="llama3.2:1b",
193            model="phi4",
194            format=ChatbotResponse.model_json_schema()
195        )
196
197        json_res = ChatbotResponse.model_validate_json(llm_res.message.content)
198
199        self.get_logger().info(f"The LLM answered: {json_res}")
200
201        verbal_ack = json_res.verbal_ack
202        if verbal_ack:
203            self.messages.append({"role": "assistant", "content": verbal_ack})
204            response.response = verbal_ack
205
206        user_intent = json_res.user_intent
207        if user_intent:
208            response.intents = [Intent(
209                intent=user_intent.type,
210                data=json.dumps(user_intent.model_dump())
211            )]
212
213        return response
214
215    def on_reset(self, request: ResetModel.Request, response: ResetModel.Response):
216        self.get_logger().info('Received reset request. Not implemented yet.')
217        return response
218
219    def on_get_supported_locales(self, request, response):
220        response.locales = []  # list of supported locales; empty means any
221        return response
222
223    def on_set_default_locale_goal(self, goal_request):
224        return GoalResponse.ACCEPT
225
226    def on_set_default_locale_exec(self, goal_handle):
227        """Change here the default locale of the chatbot."""
228        result = SetLocale.Result()
229        goal_handle.succeed()
230        return result
231
232    #################################
233    #
234    # Lifecycle transitions callbacks
235    #
236    def on_configure(self, state: State) -> TransitionCallbackReturn:
237
238        # configure and start diagnostics publishing
239        self._nb_requests = 0
240        self._diag_pub = self.create_publisher(
241            DiagnosticArray, '/diagnostics', 1)
242        self._diag_timer = self.create_timer(1., self.publish_diagnostics)
243
244        # start advertising supported locales
245        self._get_supported_locales_server = self.create_service(
246            GetLocales, "~/get_supported_locales", self.on_get_supported_locales)
247
248        self._set_default_locale_server = ActionServer(
249            self, SetLocale, "~/set_default_locale",
250            goal_callback=self.on_set_default_locale_goal,
251            execute_callback=self.on_set_default_locale_exec)
252
253        self.get_logger().info("Chatbot chatbot is configured, but not yet active")
254        return TransitionCallbackReturn.SUCCESS
255
256    def on_activate(self, state: State) -> TransitionCallbackReturn:
257        """
258        Activate the node.
259
260        You usually want to do the following in this state:
261        - Create and start any timers performing periodic tasks
262        - Start processing data, and accepting action goals, if any
263
264        """
265        self._get_response_srv = self.create_service(
266            GetResponse, '/chatbot/get_response', self.on_get_response)
267        self._reset_srv = self.create_service(
268            ResetModel, '/chatbot/reset', self.on_reset)
269
270        # Define a timer that fires every second to call the run function
271        timer_period = 1  # in sec
272        self._timer = self.create_timer(timer_period, self.run)
273
274        self.get_logger().info("Chatbot chatbot is active and running")
275        return super().on_activate(state)
276
277    def on_deactivate(self, state: State) -> TransitionCallbackReturn:
278        """Stop the timer to stop calling the `run` function (main task of your application)."""
279        self.get_logger().info("Stopping chatbot...")
280
281        self.destroy_timer(self._timer)
282        self.destroy_service(self._get_response_srv)
283        self.destroy_service(self._reset_srv)
284
285        self.get_logger().info("Chatbot chatbot is stopped (inactive)")
286        return super().on_deactivate(state)
287
288    def on_shutdown(self, state: State) -> TransitionCallbackReturn:
289        """
290        Shutdown the node, after a shutting-down transition is requested.
291
292        :return: The state machine either invokes a transition to the
293            "finalized" state or stays in the current state depending on the
294            return value.
295            TransitionCallbackReturn.SUCCESS transitions to "finalized".
296            TransitionCallbackReturn.FAILURE remains in current state.
297            TransitionCallbackReturn.ERROR or any uncaught exceptions to
298            "errorprocessing"
299        """
300        self.get_logger().info('Shutting down chatbot node.')
301        self.destroy_timer(self._diag_timer)
302        self.destroy_publisher(self._diag_pub)
303
304        self.destroy_service(self._get_supported_locales_server)
305        self._set_default_locale_server.destroy()
306
307        self.get_logger().info("Chatbot chatbot finalized.")
308        return TransitionCallbackReturn.SUCCESS
309
310    #################################
311
312    def publish_diagnostics(self):
313
314        arr = DiagnosticArray()
315        msg = DiagnosticStatus(
316            level=DiagnosticStatus.OK,
317            name="/intent_extractor_chatbot",
318            message="chatbot chatbot is running",
319            values=[
320                KeyValue(key="Module name", value="chatbot"),
321                KeyValue(key="Current lifecycle state",
322                         value=self._state_machine.current_state[1]),
323                KeyValue(key="# requests since start",
324                         value=str(self._nb_requests)),
325            ],
326        )
327
328        arr.header.stamp = self.get_clock().now().to_msg()
329        arr.status = [msg]
330        self._diag_pub.publish(arr)
331
332    def run(self) -> None:
333        """
334        Background task of the chatbot.
335
336        For now, we do not need to do anything here, as the chatbot is
337        event-driven, and the `on_user_input` callback is called when a new
338        user input is received.
339        """
340        pass

Next steps#

Interaction simulator architecture

Interaction simulator architecture#

We have completed a simple social robot architecture, with a mission controller that can react to user intents, and a chatbot that can extract intents from user.

You can now:

See also#