Build a complete LLM-enabled interactive app#
🏁 Goal of this tutorial
This tutorial will guide you through the installation and use of the ROS4HRI framework, a set of ROS nodes and tools to build interactive social robots.
We will use a set of pre-configured Docker containers to simplify the setup process.
We will also explore how a simple yet complete social robot architecture can be assembled using ROS 2, PAL Robotics’ toolset to quickly generate robot application templates, and a LLM backend.

PAL’ Social interaction simulator#
PART 0: Preparing your environment#
Pre-requisites#
To follow ‘hands-on’ the tutorial, you will need to be able to run a
Docker container on your machine, with access to a X server (to display
graphical applications like rviz
and rqt
). We will also use the
webcam of your computer.
Any recent Linux distribution should work, as well as MacOS (with XQuartz installed).
The tutorial alo assumes that you have a basic understanding of ROS 2 concepts (topics, nodes, launch files, etc). If you are not familiar with ROS 2, you can check the official ROS 2 tutorials.
Get the public PAL tutorials Docker image#
Fetch the PAL tutorials
public Docker image:
docker pull palrobotics/public-tutorials-alum-devel:hri25
Then, run the container, with access to your webcam and your X server.
xhost +
mkdir ros4hri-exchange
docker run -it --name ros4hri \
--device /dev/video0:/dev/video0 \
-e DISPLAY=$DISPLAY \
-v /tmp/.X11-unix:/tmp/.X11-unix \
-v `pwd`/ros4hri-exchange:/home/user/exchange \
--net=host \
palrobotics/public-tutorials-alum-devel:hri25 bash
Note
The --device
option is used to pass the webcam to the
container, and the -e: DISPLAY
and
-v /tmp/.X11-unix:/tmp/.X11-unix
options are used to display
graphical applications on your screen.
PART 1: Warm-up with face detection#
Start the webcam node#
First, let’s start a webcam node to publish images from the webcam to ROS.
In the terminal, type:
ros2 run gscam gscam_node --ros-args -p gscam_config:='v4l2src device=/dev/video0 ! video/x-raw,framerate=30/1 ! videoconvert' \
-p use_sensor_data_qos:=True \
-p camera_name:=camera \
-p frame_id:=camera \
-p camera_info_url:=package://interaction_sim/config/camera_info.yaml
Note
The gscam
node is a ROS 2 node that captures images from a
webcam and publishes them on a ROS topic. The gscam_config
parameter is used to specify the webcam device to use
(/dev/video0
), and the camera_info_url
parameter is used to
specify the camera calibration file. We use a default calibration
file that works reasonably well with most webcams.
You can open rqt
to check that the images are indeed published:
rqt
Note
If you need to open another Docker terminal, run
docker exec -it -u user ros4hri bash
Then, in the Plugins
menu, select Visualization > Image View
,
and choose the topic /camera/image_raw
:

rqt image view#
Face detection#
hri_face_detect is an open-source ROS 1/ROS 2 node, compatible with ROS4HRI, that detects faces in images. This node is installed by default on all PAL robots.
It is already installed in the Docker container.
By default, hri_face_detect
expect images on /image
topic:
before starting the node, we need to configure topic remapping:
mkdir -p $HOME/.pal/config
nano $HOME/.pal/config/ros4hri-tutorials.yml
Then, paste the following content:
/hri_face_detect:
remappings:
image: /camera/image_raw
camera_info: /camera/camera_info
Press Ctrl+O
to save, then Ctrl+X
to exit.
Then, you can launch the node:
ros2 launch hri_face_detect face_detect.launch.py
You should see on your console which configuration files are used:
$ ros2 launch hri_face_detect face_detect.launch.py
[INFO] [launch]: All log files can be found below /home/user/.ros/log/2024-10-16-12-39-10-518981-536d911a0c9c-203
[INFO] [launch]: Default logging verbosity is set to INFO
[INFO] [launch.user]: Loaded configuration for <hri_face_detect>:
- System configuration (from lower to higher precedence):
- /opt/pal/alum/share/hri_face_detect/config/00-defaults.yml
- User overrides (from lower to higher precedence):
- /home/user/.pal/config/ros4hri-tutorials.yml
[INFO] [launch.user]: Parameters:
- processing_rate: 30
- confidence_threshold: 0.75
- image_scale: 0.5
- face_mesh: True
- filtering_frame: camera_color_optical_frame
- deterministic_ids: False
- debug: False
[INFO] [launch.user]: Remappings:
- image -> /camera/image_raw
- camera_info -> /camera/camera_info
[INFO] [face_detect-1]: process started with pid [214]
...
Note
This way of managing launch parameters and remapping is not part of base ROS 2: it is an extension (available in ROS humble) provided by PAL Robotics to simplify the management of ROS 2 nodes configuration.
See for instance the launch file of hri_face_detect to understand how it is used.
You should immediately see on the console that some faces are indeed detected
Let’s visualise them:
start
rviz2
:
rviz2
In
rviz
, visualize the detected faces by adding theHumans
plugin, which you can find in thehri_rviz
plugins group. The plugin setup requires you to specify the image stream you want to use to visualize the detection results, in this case/camera/image_raw
. You can also find the plugin as one of those available for the/camera/image_raw
topic.
Important
Set the quality of service (QoS) of the
/camera/image_raw
topic to Best Effort
, otherwise no image will be displayed:

Set the QoS of the /camera/image_raw
topic to Best Effort
#
In
rviz
, enable as well thetf
plugin, and set the fixed frame tocamera
. You should now see a 3D frame, representing the face position and orientation of your face.

rviz showing a 3D face frame#
📚 Learn more
This tutorial does not go much further with exploring the ROS4HRI tools and nodes. However, you can find more information:
in the 👥 Social perception section of this documentation
in the ROS4HRI wiki page
You can also check the ROS4HRI Github organisation and the original paper.
PART 4: Integration with LLMs#
Adding a chatbot#
Step 1: creating a chatbot#
use
rpk
to create a newchatbot
skill using the basic chabot intent extraction template:
$ rpk create -p src intent
ID of your application? (must be a valid ROS identifier without spaces or hyphens. eg 'robot_receptionist')
chatbot
Full name of your skill/application? (eg 'The Receptionist Robot' or 'Database connector', press Return to use the ID. You can change it later)
Choose a template:
1: basic chatbot template [python]
2: complete intent extraction example: LLM bridge using the OpenAI API (ollama, chatgpt) [python]
Your choice? 1
What robot are you targeting?
1: Generic robot (generic)
2: Generic PAL robot/simulator (generic-pal)
3: PAL ARI (ari)
4: PAL TIAGo (tiago)
5: PAL TIAGo Pro (tiago-pro)
6: PAL TIAGo Head (tiago-head)
Your choice? (default: 1: generic) 2
Compile and run the chatbot:
colcon build
source install/setup.bash
ros2 launch chatbot chatbot.launch.py
If you know type a message in the rqt_chat
plugin, you should see
the chatbot responding to it:

Chatbot responding to a message#
You can also see in the chat window the intents that the chatbot has
identified in the user input. For now, our basic chatbot only recognises
the __intent_greet__
intent when you type Hi
or Hello
.
Step 2: integrating the chatbot with the mission controller#
To fully understand the intent pipeline, we will modify the chatbot to recognise a ‘pick up’ intent, and the mission controller to handle it.
open
chatbot/node_impl.py
and modify your chatbot to check whether incoming speech matches[please] pick up [the] <object>
:
1import re
2
3def contains_pickup(sentence):
4 sentence = sentence.lower()
5
6 # matches sentences like: [please] pick up [the] <object> and return <object>
7 pattern = r"(?:please\s+)?pick\s+up\s+(?:the\s+)?(\w+)"
8 match = re.search(pattern, sentence)
9 if match:
10 return match.group(1)
then, in the
on_get_response
function, check if the incoming speech matches the pattern, and if so, return a__intent_grab_object__
:
1def on_get_response(self, request, response):
2
3 #...
4
5 pick_up_object = self.contains_pickup(input)
6 if pick_up_object:
7 self.get_logger().warn(f"I think the user want to pick up a {pick_up_object}. Sending a GRAB_OBJECT intent")
8 intent = Intent(intent=Intent.GRAB_OBJECT,
9 data=json.dumps({"object": pick_up_object}),
10 source=user_id,
11 modality=Intent.MODALITY_SPEECH,
12 confidence=.8)
13 suggested_response = f"Sure, let me pick up this {pick_up_object}"
14 # elif ...
Note
the Intent
message is defined in the hri_actions_msgs
package, and contains the intent, the data associated with the
intent, the source of the intent (here, the current user_id
), the
modality (here, speech
), and the confidence of the recognition.
Check the Intents documentation for details, or directly the Intent.msg definition.
Test your updated chatbot by recompiling the workspace
(colcon build
) and relaunching the chatbot.
If you now type pick up the cup
in the chat window, you should see
the chatbot recognising the intent and sending a GRAB_OBJECT
intent
to the mission controller.
finally, modify the mission controller function handling inbound intents, in order to manage the
GRAB_OBJECT
intent. Open1def on_intent(self, msg): 2 #... 3 4 if msg.intent == Intent.GRAB_OBJECT: 5 # on a real robot, you would call here a manipulation skill 6 goal = TTS.Goal() 7 goal.input = f"<set expression(tired)> That {data['object']} is really heavy...! <set expression(neutral)>" 8 self.tts.send_goal_async(goal) 9 10 # ...
Re-compile and re-run the mission controller. If you now type
pick up the cup
in the chat window, you should see the mission
controller reacting to it.
📚 Learn more
In this example, we directly use the /say
skill to respond to the
user.
When developing a full application, you usually want to split your architecture into multiple nodes, each responsible for a specific task.
The PAL application model, based on the RobMoSys methodology, encourages the development of a single mission controller, and a series of tasks and skills that are orchestrated by the mission controller.
You can read more about this model here: 📝 Developing robot apps.
Integrating with a Large Language Model (LLM)#
Next, let’s integrate with an LLM.
Step 1: install ollama
#
ollama
is an open-source tool that provides a simple REST API to
interact with a variety of LLMs. It makes it easy to install different
LLMs, and to call them using the same REST API as, eg, OpenAI’s ChatGPT.
To install ollama
on your machine, follow the instructions on the
official repository:
curl -fsSL https://ollama.com/install.sh | sh
Once it is installed, you can start the ollama
server with:
ollama serve
Open a new Docker terminal, and run the following command to download a first model and check it works:
ollama run llama3.2:1b
Note
Visit the ollama model page to see the list of available models.
Depending on the size of the model and your computer configuration, the response time can be quite long.
If you have a NVIDIA GPU, you might want to relaunch your Docker container with GPU support. Check the instructions on the NVidia website.
Alternatively, you can run ollama
on your host machine, as we
will interact with it via a REST API.
Step 2: calling ollama
from the chatbot#
ollama
can be accessed from your code either by calling the REST API
directly, or by using the ollama
Python binding. While the REST API
is more flexible (and makes it possible to easily use other
OpenAI-compatible services, like ChatGPT), the Python binding is very
easy to use.
Note
If you are curious about the REST API, use rpk
LLM chatbot
template to generate an example of a chatbot that calls ollama
via the REST API.
install the
ollama
python binding inside your Docker image:pip install ollama
Modify your chatbot to connect to
ollama
, using a custom prompt. Openchatbot/chatbot/node_impl.py
do the following changes:
1# add to the imports
2from ollama import Client
3
4# ...
5
6class IntentExtractorImpl(Node):
7
8 # modify the constructor:
9 def __init__(self) -> None:
10 # ...
11
12 self._ollama_client = Client()
13 # if ollama does not run on the local host, you can specify the host and
14 # port. For instance:
15 # self._ollama_client = Client("x.x.x.x:11434")
16
17 # dialogue history
18 self.messages = [
19 {"role": "system",
20 "content": """
21 You are a helpful robot, always eager to help.
22 You always respond with concise and to-the-point answers.
23 """
24 }]
25
26 # modify on_get_response:
27 def on_get_response(self, request: GetResponse.Request, response: GetResponse.Response):
28
29 user_id = request.user_id
30 input = request.input
31
32 self.get_logger().info(
33 f"new input from {user_id}: {input}... sending it to the LLM")
34 self._nb_requests += 1
35
36 self.messages.append({"role": "user", "content": input})
37
38 llm_res = self._ollama_client.chat(
39 messages=self.messages,
40 model="llama3.2:1b"
41 )
42
43 content = llm_res.message.content
44
45 self.get_logger().info(f"The LLM answered: {content}")
46
47 self.messages.append({"role": "assistant", "content": content})
48
49 response.response = content
50 response.intents = []
51
52 return response
As you can see, calling ollama
is as simple as creating a Client
object and calling its chat
method with the messages to send to the
LLM and the model to use.
In this example, we append to the chat history (self.messages
) the
user input and the LLM response after each interaction, thus building a
complete dialogue.
Recompile and restart the chatbot. If you now type a message in the chat window, you should see the chatbot responding with a text generated by the LLM:

Example of a chatbot response generated by an LLM#
Attention
Depending on the LLM model you use, the response time can be quite
long. By default, after 10s, communication_hub
will time out. In that
case, the chatbot answer will not be displayed in the chat window.
Step 3: extract user intents#
To recognise intents from the LLM response, we can use a combination of prompt engineering and LLM structured output.
to generate structured output (ie, a JSON-structured response that includes the recognised intents), we first need to write a Python object that corresponds to the expected output of the LLM:
1from pydantic import BaseModel
2from typing import Literal
3from hri_actions_msgs.msg import Intent
4
5# Define the data models for the chatbot response and the user intent
6class IntentModel(BaseModel):
7 type: Literal[Intent.BRING_OBJECT,
8 Intent.GRAB_OBJECT,
9 Intent.PLACE_OBJECT,
10 Intent.GUIDE,
11 Intent.MOVE_TO,
12 Intent.SAY,
13 Intent.GREET,
14 Intent.START_ACTIVITY,
15 ]
16 object: str | None
17 recipient: str | None
18 input: str | None
19 goal: str | None
20
21class ChatbotResponse(BaseModel):
22 verbal_ack: str | None
23 user_intent: IntentModel | None
Here, we use the type BaseModel
from the pydantic
library so
that we can generate the formal model corresponding to this Python
object (using the JSON schema specification).
then, modify the chatbot to force the LLM to return a JSON-structured response that includes the recognised intents:
1 # ...
2
3 def on_get_response(self, request: GetResponse.Request, response: GetResponse.Response):
4
5 user_id = request.user_id
6 input = request.input
7
8 self.get_logger().info(
9 f"new input from {user_id}: {input}... sending it to the LLM")
10 self._nb_requests += 1
11
12 self.messages.append({"role": "user", "content": input})
13
14 llm_res = self._ollama_client.chat(
15 messages=self.messages,
16 model="llama3.2:1b",
17 format=ChatbotResponse.model_json_schema()
18 )
19
20 json_res = ChatbotResponse.model_validate_json(llm_res.message.content)
21
22 self.get_logger().info(f"The LLM answered: {json_res}")
23
24 verbal_ack = json_res.verbal_ack
25 if verbal_ack:
26 # if we have a verbal acknowledgement, add it to the dialogue history,
27 # and send it to the user
28 self.messages.append({"role": "assistant", "content": verbal_ack})
29 response.response = verbal_ack
30
31 user_intent = json_res.user_intent
32 if user_intent:
33 response.intents = [Intent(
34 intent=user_intent.type,
35 data=json.dumps(user_intent.model_dump())
36 )]
37
38 return response
Now, the LLM will always return a JSON-structured response that includes
an intent (if one was recognised), and a verbal acknowledgement. For
instance, when asking the robot to bring an apple
, it returns an
intent PLACE_OBJECT
with the object apple
:

Example of a structured LLM response#
Step 4: prompt engineering to improve intent recognition#
To improve the intent recognition, we can use prompt engineering: we can provide the LLM with a prompt that will guide it towards generating a response that includes the intents we are interested in.
One key trick is to provide the LLM with examples of the intents we are interested in.
Here an example of a longer prompt, that would yield better results:
PROMPT = """
You are a friendly robot called $robot_name. You try to help the user to the best of your abilities.
You are always helpful, and ask further questions if the desires of the user are unclear.
Your answers are always polite yet concise and to-the-point.
Your aim is to extract the user goal.
Your response must be a JSON object with the following fields (both are optional):
- verbal_ack: a string acknowledging the user request (like 'Sure', 'I'm on it'...)
- user_intent: the user overall goal (intent), with the following fields:
- type: the type of intent to perform (e.g. "__intent_say__", "__intent_greet__", "__intent_start_activity__", etc.)
- any thematic role required by the intent. For instance: `object` to
relate the intent to the object to interact with (e.g. "lamp",
"door", etc.)
Importantly, `verbal_ack` is meant to be a *short* acknowledgement sentence,
unconditionally uttered by the robot, indicating that you have understood the request -- or that we need more information.
For more complex verbal actions, return a `__intent_say__` instead.
However, for answers to general questions that do not require any action
(eg: 'what is your name?'), the 'user_intent' field can be omitted, and the
'verbal_ack' field should contain the answer.
The user_id of the person you are talking to is $user_id. Always use this ID when referring to the person in your responses.
Examples
- if the user says 'Hello robot', you could respond:
{
"user_intent": {"type": "__intent_greet__", "recipient": "$user_id"}
}
- if the user says 'What is your name?', you could respond:
{
"verbal_ack":"My name is $robot_name. What is your name?"
}
- if the user say 'take a fruit', you could respond (assuming a object 'apple1' of type 'Apple' is visible):
{
"user_intent": {
"type":"__intent_grab_object__",
"object":"apple1",
},
"verbal_ack": "Sure"
}
- if the user say 'take a fruit', but you do not know about any fruit. You could respond:
{
"verbal_ack": "I haven't seen any fruits around. Do you want me to check in the kitchen?"
}
- the user says: 'clean the table'. You could return:
{
"user_intent": {
"type":"__intent_start_activity__",
"object": "cleaning_table"
},
"verbal_ack": "Sure, I'll get started"
}
If you are not sure about the intention of the user, return an empty user_intent and ask for confirmation with the verbal_ack field.
"""
This prompt uses Python’s templating system to include the robot’s name and the user’s ID in the prompt.
You can use this prompt in your script by substituting the variables with the actual values:
from string import Template
actual_prompt = Template(PROMPT).safe_substitute(robot_name="Robbie", user_id="Alice")
Then, you can use this prompt in the ollama
call:
# ...
def __init(self) -> None:
# ...
self.messages = [
{"role": "system",
"content": Template(PROMPT).safe_substitute(robot_name="Robbie", user_id="user1")
}]
# ...
Closing the loop: integrating LLM and symbolic knowledge representation#
Finally, we can use the knowledge base to improve the intent recognition.
For instance, if the user asks the robot to bring the apple
, we can
use the knowledge base to check whether an apple is in the field of view
of the robot.
Note
It is often convenient to have a Python interpreter open to quickly test knowledge base queries.
Open ipython3
in a terminal from within your Docker image, and
then:
from knowledge_core.api import KB; kb = KB()
kb["* sees *"] # etc.
First, let’s query the knowledge base for all the objects that are visible to the robot:
1from knowledge_core.api import KB
2
3# ...
4
5def __init__(self) -> None:
6
7 # ...
8
9 self.kb = KB()
10
11
12def environment(self) -> str:
13 """ fetch all the objects and humans visible to the robot,
14 get for each of them their class and label, and return a string
15 that list them all.
16 """
17
18 environment_description = ""
19
20 seen_objects = self.kb["myself sees ?obj"]
21 for obj in [item["obj"] for item in seen_objects]:
22 details= self.kb.details(obj)
23 label= details["label"]["default"]
24 classes= details["attributes"][0]["values"]
25 class_name= None
26 if classes:
27 class_name= classes[0]["label"]["default"]
28 environment_description += f"- I see a {class_name} labeled {label}.\n"
29 else:
30 environment_description += f"- I see {label}.\n"
31
32 self.get_logger().info(
33 f"Environment description:\n{environment_description}")
34 return environment_description
Note
The kb.details
method returns a dictionary with details about
a given knowledge concept. The attributes
field contains e.g. the
class of the object (if known or inferred by the knowledg base).
📚 Learn more
To inspect in details the knowledge base, we recommend using Protégé, an open-source tool to explore and modify ontologies.
The ontology used by the robot (and the interaction simulator) is
stored in /opt/pal/alum/share/oro/ontologies/oro.owl
. Copy this
file to your ~/exchange
folder to access it from your host and
inspect it with Protégé.
We can then use this information to ground the user intents in the physical world of the robot.
First, add the following two lines at the end of your prompt template:
This is a description of the environment:
$environment
Then, add a new method to your chatbot to generate the prompt:
1def __init__(self) -> None:
2
3 # ...
4
5 self.messages = [
6 {"role": "system",
7 "content": self.prepare_prompt("user1")
8 }]
9
10 # ...
11
12def prepare_prompt(self, user_id: str) -> str:
13
14 environment = self.environment()
15
16 return Template(PROMPT).safe_substitute(robot_name="Robbie",
17 environment=environment,
18 user_id=user_id)
You could also call the environment
method before each call to the
LLM, to get the latest environment description.
Re-compile and restart your chatbot. You can now ask the robot e.g. what it sees.
The final chatbot code should look like:
1import json
2from ollama import Client
3
4from knowledge_core.api import KB
5
6from rclpy.lifecycle import Node
7from rclpy.lifecycle import State
8from rclpy.lifecycle import TransitionCallbackReturn
9from rcl_interfaces.msg import ParameterDescriptor
10from rclpy.action import ActionServer, GoalResponse
11
12from chatbot_msgs.srv import GetResponse, ResetModel
13from hri_actions_msgs.msg import Intent
14from i18n_msgs.action import SetLocale
15from i18n_msgs.srv import GetLocales
16
17from diagnostic_msgs.msg import DiagnosticArray, DiagnosticStatus, KeyValue
18
19from pydantic import BaseModel
20from typing import Literal
21from hri_actions_msgs.msg import Intent
22from string import Template
23
24PROMPT = """
25You are a friendly robot called $robot_name. You try to help the user to the best of your abilities.
26You are always helpful, and ask further questions if the desires of the user are unclear.
27Your answers are always polite yet concise and to-the-point.
28
29Your aim is to extract the user goal.
30
31Your response must be a JSON object with the following fields (both are optional):
32- verbal_ack: a string acknowledging the user request (like 'Sure', 'I'm on it'...)
33- user_intent: the user overall goal (intent), with the following fields:
34 - type: the type of intent to perform (e.g. "__intent_say__", "__intent_greet__", "__intent_start_activity__", etc.)
35 - any thematic role required by the intent. For instance: `object` to
36 relate the intent to the object to interact with (e.g. "lamp",
37 "door", etc.)
38
39Importantly, `verbal_ack` is meant to be a *short* acknowledgement sentence,
40unconditionally uttered by the robot, indicating that you have understood the request -- or that we need more information.
41For more complex verbal actions, return a `__intent_say__` instead.
42
43However, for answers to general questions that do not require any action
44(eg: 'what is your name?'), the 'user_intent' field can be omitted, and the
45'verbal_ack' field should contain the answer.
46
47The user_id of the person you are talking to is $user_id. Always use this ID when referring to the person in your responses.
48
49Examples
50- if the user says 'Hello robot', you could respond:
51{
52 "user_intent": {"type": "__intent_greet__", "recipient": "$user_id"}
53}
54
55- if the user says 'What is your name?', you could respond:
56{
57 "verbal_ack":"My name is $robot_name. What is your name?"
58}
59
60- if the user say 'take a fruit', you could respond (assuming a object 'apple1' of type 'Apple' is visible):
61{
62 "user_intent": {
63 "type":"__intent_grab_object__",
64 "object":"apple1",
65 },
66 "verbal_ack": "Sure"
67}
68
69- if the user say 'take a fruit', but you do not know about any fruit. You could respond:
70{
71 "verbal_ack": "I haven't seen any fruits around. Do you want me to check in the kitchen?"
72}
73
74- the user says: 'clean the table'. You could return:
75{
76 "user_intent": {
77 "type":"__intent_start_activity__",
78 "object": "cleaning_table"
79 },
80 "verbal_ack": "Sure, I'll get started"
81}
82
83If you are not sure about the intention of the user, return an empty user_intent and ask for confirmation with the verbal_ack field.
84
85This is a description of the environment:
86
87$environment
88"""
89
90
91# Define the data models for the chatbot response and the user intent
92class IntentModel(BaseModel):
93 type: Literal[Intent.BRING_OBJECT,
94 Intent.GRAB_OBJECT,
95 Intent.PLACE_OBJECT,
96 Intent.GUIDE,
97 Intent.MOVE_TO,
98 Intent.SAY,
99 Intent.GREET,
100 Intent.START_ACTIVITY,
101 ]
102 object: str | None
103 recipient: str | None
104 input: str | None
105 goal: str | None
106
107
108class ChatbotResponse(BaseModel):
109 verbal_ack: str | None
110 user_intent: IntentModel | None
111##################################################
112
113class IntentExtractorImpl(Node):
114
115 def __init__(self) -> None:
116 super().__init__('intent_extractor_chatbot')
117
118 # Declare ROS parameters. Should mimick the one listed in config/00-defaults.yaml
119 self.declare_parameter(
120 'my_parameter', "my_default_value.",
121 ParameterDescriptor(
122 description='Important parameter for my chatbot')
123 )
124
125 self.get_logger().info("Initialising...")
126
127 self._get_response_srv = None
128 self._reset_srv = None
129 self._get_supported_locales_server = None
130 self._set_default_locale_server = None
131
132 self._timer = None
133 self._diag_pub = None
134 self._diag_timer = None
135
136 self.kb = KB()
137
138 self._nb_requests = 0
139
140 self._ollama_client = Client()
141 # if ollama does not run on the local host, you can specify the host and
142 # port. For instance:
143 # self._ollama_client = Client("x.x.x.x:11434")
144
145 self.messages = [
146 {"role": "system",
147 "content": self.prepare_prompt("user1")
148 }]
149
150 self.get_logger().info('Chatbot chatbot started, but not yet configured.')
151
152 def environment(self) -> str:
153 environment_description = ""
154
155 seen_objects = self.kb["myself sees ?obj"]
156 for obj in [item["obj"] for item in seen_objects]:
157 details = self.kb.details(obj)
158 label = details["label"]["default"]
159 classes = details["attributes"][0]["values"]
160 class_name = None
161 if classes:
162 class_name = classes[0]["label"]["default"]
163 environment_description += f"- I see a {class_name} labeled {label}.\n"
164 else:
165 environment_description += f"- I see {label}.\n"
166
167 self.get_logger().info(
168 f"Environment description:\n{environment_description}")
169 return environment_description
170
171 def prepare_prompt(self, user_id: str) -> str:
172
173 environment = self.environment()
174
175 return Template(PROMPT).safe_substitute(robot_name="Robbie",
176 environment=environment,
177 user_id=user_id)
178
179 def on_get_response(self, request: GetResponse.Request, response: GetResponse.Response):
180
181 user_id = request.user_id
182 input = request.input
183
184 self.get_logger().info(
185 f"new input from {user_id}: {input}... sending it to the LLM")
186 self._nb_requests += 1
187
188 self.messages.append({"role": "user", "content": input})
189
190 llm_res = self._ollama_client.chat(
191 messages=self.messages,
192 # model="llama3.2:1b",
193 model="phi4",
194 format=ChatbotResponse.model_json_schema()
195 )
196
197 json_res = ChatbotResponse.model_validate_json(llm_res.message.content)
198
199 self.get_logger().info(f"The LLM answered: {json_res}")
200
201 verbal_ack = json_res.verbal_ack
202 if verbal_ack:
203 self.messages.append({"role": "assistant", "content": verbal_ack})
204 response.response = verbal_ack
205
206 user_intent = json_res.user_intent
207 if user_intent:
208 response.intents = [Intent(
209 intent=user_intent.type,
210 data=json.dumps(user_intent.model_dump())
211 )]
212
213 return response
214
215 def on_reset(self, request: ResetModel.Request, response: ResetModel.Response):
216 self.get_logger().info('Received reset request. Not implemented yet.')
217 return response
218
219 def on_get_supported_locales(self, request, response):
220 response.locales = [] # list of supported locales; empty means any
221 return response
222
223 def on_set_default_locale_goal(self, goal_request):
224 return GoalResponse.ACCEPT
225
226 def on_set_default_locale_exec(self, goal_handle):
227 """Change here the default locale of the chatbot."""
228 result = SetLocale.Result()
229 goal_handle.succeed()
230 return result
231
232 #################################
233 #
234 # Lifecycle transitions callbacks
235 #
236 def on_configure(self, state: State) -> TransitionCallbackReturn:
237
238 # configure and start diagnostics publishing
239 self._nb_requests = 0
240 self._diag_pub = self.create_publisher(
241 DiagnosticArray, '/diagnostics', 1)
242 self._diag_timer = self.create_timer(1., self.publish_diagnostics)
243
244 # start advertising supported locales
245 self._get_supported_locales_server = self.create_service(
246 GetLocales, "~/get_supported_locales", self.on_get_supported_locales)
247
248 self._set_default_locale_server = ActionServer(
249 self, SetLocale, "~/set_default_locale",
250 goal_callback=self.on_set_default_locale_goal,
251 execute_callback=self.on_set_default_locale_exec)
252
253 self.get_logger().info("Chatbot chatbot is configured, but not yet active")
254 return TransitionCallbackReturn.SUCCESS
255
256 def on_activate(self, state: State) -> TransitionCallbackReturn:
257 """
258 Activate the node.
259
260 You usually want to do the following in this state:
261 - Create and start any timers performing periodic tasks
262 - Start processing data, and accepting action goals, if any
263
264 """
265 self._get_response_srv = self.create_service(
266 GetResponse, '/chatbot/get_response', self.on_get_response)
267 self._reset_srv = self.create_service(
268 ResetModel, '/chatbot/reset', self.on_reset)
269
270 # Define a timer that fires every second to call the run function
271 timer_period = 1 # in sec
272 self._timer = self.create_timer(timer_period, self.run)
273
274 self.get_logger().info("Chatbot chatbot is active and running")
275 return super().on_activate(state)
276
277 def on_deactivate(self, state: State) -> TransitionCallbackReturn:
278 """Stop the timer to stop calling the `run` function (main task of your application)."""
279 self.get_logger().info("Stopping chatbot...")
280
281 self.destroy_timer(self._timer)
282 self.destroy_service(self._get_response_srv)
283 self.destroy_service(self._reset_srv)
284
285 self.get_logger().info("Chatbot chatbot is stopped (inactive)")
286 return super().on_deactivate(state)
287
288 def on_shutdown(self, state: State) -> TransitionCallbackReturn:
289 """
290 Shutdown the node, after a shutting-down transition is requested.
291
292 :return: The state machine either invokes a transition to the
293 "finalized" state or stays in the current state depending on the
294 return value.
295 TransitionCallbackReturn.SUCCESS transitions to "finalized".
296 TransitionCallbackReturn.FAILURE remains in current state.
297 TransitionCallbackReturn.ERROR or any uncaught exceptions to
298 "errorprocessing"
299 """
300 self.get_logger().info('Shutting down chatbot node.')
301 self.destroy_timer(self._diag_timer)
302 self.destroy_publisher(self._diag_pub)
303
304 self.destroy_service(self._get_supported_locales_server)
305 self._set_default_locale_server.destroy()
306
307 self.get_logger().info("Chatbot chatbot finalized.")
308 return TransitionCallbackReturn.SUCCESS
309
310 #################################
311
312 def publish_diagnostics(self):
313
314 arr = DiagnosticArray()
315 msg = DiagnosticStatus(
316 level=DiagnosticStatus.OK,
317 name="/intent_extractor_chatbot",
318 message="chatbot chatbot is running",
319 values=[
320 KeyValue(key="Module name", value="chatbot"),
321 KeyValue(key="Current lifecycle state",
322 value=self._state_machine.current_state[1]),
323 KeyValue(key="# requests since start",
324 value=str(self._nb_requests)),
325 ],
326 )
327
328 arr.header.stamp = self.get_clock().now().to_msg()
329 arr.status = [msg]
330 self._diag_pub.publish(arr)
331
332 def run(self) -> None:
333 """
334 Background task of the chatbot.
335
336 For now, we do not need to do anything here, as the chatbot is
337 event-driven, and the `on_user_input` callback is called when a new
338 user input is received.
339 """
340 pass
Next steps#
Interaction simulator architecture#
We have completed a simple social robot architecture, with a mission controller that can react to user intents, and a chatbot that can extract intents from user.
You can now:
Extend the mission controller: add more intents, more complex behaviours, etc;
Structure your application: split your mission controller into tasks and skills, and orchestrate them: 📝 Developing robot apps;
Integrate navigation and manipulation, using the corresponding navigation skills and manipulation skills
Finally, deploy your application on your robot: Deploying ROS 2 packages on your robot.
See also#
📝 Developing robot apps: learn more about the PAL application model
Intents: learn more about the intent messages
PAL Interaction simulator: learn more about the interaction simulator
💡 Knowledge and reasoning: learn more about the knowledge base and reasoning
return to the 🎯 Tutorials main page