Speech synthesis (TTS)¶

Overview of the technology¶

Your robot is able to generate speech using a text-to-speech (TTS) engine. As of PAL OS edge, two backends are available, namely Acapela and a non-verbal backend. Additionally, the multi-modal expression markup language can be used to synchronize the speech with other communication modalities, like gestures or lights, or other advanced features.

Acapela backend¶

The default TTS engine used by the robot uses the proprietary speech synthesis engine from Acapela Group.

The technology used in this engine is the one that leads the market of synthetic voices. It is based on unit selection and allows to produce highly natural speech in formal style. The system is able to generate speech output, based on a input text utterance [1]. It does the phonetic transcription of the text, predicts the appropriate prosody for the utterance and finally generates the signal waveform.

Every time a text utterance is sent to the text-to-speech (TTS) engine it generates the corresponding waveform and plays it using the robot speakers.

Non-verbal backend¶

The non-verbal backend is a TTS engine that generates a ‘R2D2’-like non-verbal utterance. This utterance is deterministically generated from the input text: the same input text will always generate the same output.

Using non-verbal TTS is useful when you chose to design a robot persona that is less anthropomorphic. In particular, it will typically reduce the expectation that the robot is able to understand and respond to arbitrary spoken language.

To enable the non-verbal TTS, the TTS node parameter non_verbal_mode must be true. To set it temporarily, the following command can be executed through command line:

ros2 param set /tts_engine non_verbal_mode true

To persist the parameter setting through robot reboots, see the section Configuration files.

Text-to-Speech node¶

Launching the node¶

System diagnostics described in section Web user interface allow to check the status of the TTS service runing in the robot. The /say action is provided by the communication_hub node. These services are started by default on start-up, so normally there is no need to start them manually. To start/stop them, the following commands can be executed in a terminal opened in the multimedia computer of the robot:

pal module start tts_engine
pal module start communication_hub

pal module stop tts_engine
pal module stop communication_hub

Action interface¶

See /say

Examples of usage¶

Web user interface¶

Command line¶

Goals to the action server can be sent through command line by typing:

ros2 action send_goal /say tts_msgs/TTS "i

Then, by pressing Tab the required message type will be auto-completed. The fields under rawtext can be edited to synthesize the desired sentence, as in the following example:

ros2 action send_goal /say tts_msgs/TTS "input: 'Hello world!'
locale: ''
voice: ''"

Note

The locale field can be used to select a specific language. If left empty, the current system language will be used. The voice field can be used to select a specific voice. The list of available locales and voices is printed by the tts_engine node on startup. You can check it by running the following terminal command: pal module log tts_engine head -n 50