AI and ICT

Last modified by Vladimir Rullens on 2025/11/09 18:29

Our robot is expected to communicate with its patients when not dancing, ideally in a way that is tailored to its patients. Ideally, its responses are not static, so it feels more human. For this, an LLM is ideal. The LLM will keep in mind the following:

- The patient it is talking to → For this it needs an understanding of its current location, which is handled by the robot itself.

- The music and dances the patient likes → For this, the LLM should be able to activate, remember, and play the tunes the patient likes.

- Any other activities that need to be done → For this, speech should be the method alongside a task list.

- Due to the above, a separate Text-to-speech program should be involved, which states the spoken text. Meanwhile, a Speech-to-text program will allow the patient to communicate with it using their own voice.

- To allow for the above, a microphone and speaker should be included with the robot. As will be described in the Social Robot section, this is already handled.

- A general idea of the patient's personality → For this, it needs experience with the patient. Prior knowledge given by caregivers or family members could also be useful. Through knowledge of the patient's personality, the robot can develop a sense for the Human Factor of 'interaction fluency'.

Risks and mitigations

With the above-mentioned systems, there are a few concerns that should be kept in mind, with the focus being on privacy:

- If the used LLM is connected to a third-party (such as OpenAI), patient data may end up on their servers, which would be a liability. As such, it is important for the LLM to be stored locally on the robot itself. This way, privacy concerns can be handled more effectively. With this in mind, the LLM should operate only with transcripts from the past 3 months. Any data older than this should be removed from its context window. However, a local LLM may be weaker than a public one

- The text-to-speech and speech-to-text programs should be executed locally, without storing any data for the sake of privacy. While transcripts may still be stored on the LLM, this reduces the number of potential security risks.

- With voice-based interaction, the patient may constantly be "heard" by the robot, even when not interacting with it. To make sure sensitive information is not being overheard, there should be a standby mode, which activates whenever the robot is not in use. While in standby mode, the robot will not "listen" to the patient, and the LLM will receive no transcripts. There will also be a way to "wake up" the robot (i.e. by petting it, or whenever the robot wants to initiate a conversation).