Test
Evaluation is an iterative process where the initial iterations focus on examining if the proposed idea is working as intended. Therefore, we want to first understand how realistic and convincing the provided dialogues and suggested activities are, and would they be able to prevent people from wandering. To examine this, we conduct a small pilot study with students, who role-play having dementia. We then observe their interaction with Pepper to examine the effectiveness of our dialogue flow in preventing people from wandering.
Problem statement and research questions
Goal: How effective is music and dialogue in preventing people with dementia from wandering?
Research Questions (RQ):
- What percentage of people are prevented from going out unsupervised? (Quantitative) (CL01, CL05)
- How does the interaction change the participant's mood? (CL02, CL13)
- Can the robot respond appropriately to the participant's intention? (CL03)
- How do the participants react to the music? (CL04)
- Does the activity that the robot suggests prevent people from wandering/ leaving? (CL06)
- Can pepper identify and catch the attention of the PwD? (CL07)
Future research questions
- Does the interaction with Pepper make PwD come back to reality? (CL08)
- Does the interaction with Pepper make PwD feel he/she is losing freedom? (CL09)
- Does preventing the participant from going out alone make them feel dependent? (CL10)
Method
We will conduct a between-subject study with students who play the role of having dementia. Data will be collected with a questionnaire that participants fill out before and after interacting with Pepper. The questionnaire captures different aspects of the conversation along with their mood before and after the interaction with Pepper.
For our between-subject study, our independent variable is Pepper trying to distract the users by mentioning different activities along with the corresponding music. Through this, we want to measure the effectiveness of music and activities in preventing people from leaving the care home, which is thereby our dependent variable. So we developed 2 different prototype designs-
Design X - is the full interaction flow where Pepper suggests activities and uses music to distract people from leaving.
Design Y - is the control condition where pepper simply tries to stop people from leaving by physically keeping its hand on the door.
Participants
The ideal participants for our user study would have been people who have dementia. However, as the people in this section fall under vulnerable groups, testing with them would have been very difficult due to the current pandemic situation. Therefore we planned to conduct our experiments with students instead.
Our experiment involves 17 students who play the role of having dementia. They will be divided into two groups. One group (11 participants) will be interacting with design X, while the other group (6 students) will interact with design Y.
Experimental design
Before Experiment:
We will explain to the participants the goal of this experiment and what they need to do to prevent ambiguity. Therefore, as our participants are students and only playing the role of having dementia, we will give them a level of stubbornness/ willpower with which they are trying to leave the care home.
Participants will also be given a reason to leave from the below list:
- going to the supermarket
- going to the office
- going for a walk
After this preparation, the participant fills a part of the questionnaire.
Experiment:
The participant begins interacting with Pepper who is standing near the exit door. The participant and robot have an interaction where the robot is trying to convince him/her to stay inside.
After Experiment:
After the participant finishes interacting with Pepper, he/she will be asked to fill out the remaining questionnaire. Almost all the questions in the questionnaire collect quantitative data, using a 5 point Likert scale. The questionnaire also used images from Self Assessment Manikin (SAM) so that users could self attest to their mood before and after their interaction with Pepper.
Material
The items required for this evaluation are the following:
- Pepper
- Door
- Caretaker in a nearby room in case of emergency
Results
RQ1: Are people convinced not to go out unsupervised?
![]() |
We used a Likert scale for this question, 1 being the lowest and 5 being the highest. Participants who interacted with design X tend to agree to stay inside more than those who interacted with design Y. |
RQ2: How does the interaction change the participant's mood?
![]() |
We notice a positive change in valence with the full flow i.e design X (although negligible). This could be because of the music. The valence does not decrease for the baseline, which might be due to the novelty effect of seeing Pepper for the first time. The change in arousal in both scenarios is nearly negligible. This might be due to the fact that the interaction with Pepper was very short. Additionally, in the case of the full flow i.e design X, these values might not have changed significantly as per the expectation (valence higher, arousal lower) because the music was not personalized for participants. |
RQ3: Can the robot respond appropriately to the participant's intention?
![]() |
We notice a very minute difference between the full flow i.e design X, and control condition, design Y. There might be many reasons behind this. The speech recognition module in Pepper was not very efficient to understand different accents and thereby misunderstood words in some cases. The null hypothesis is perceived message understanding for both the conditions is equal. Given the p-value, the null hypothesis can not be rejected. High variance in data and restrictive sample size could be the reasons behind the insignificant result. |
RQ4: How do the participants react to the music?
![]() |
We found that participants who knew the songs, enjoyed the music and thought it fit the situation more, than those who did not know the songs. |
RQ5: Does the activity that the robot suggests prevent people from wandering/ leaving?
![]() |
As per these results, we can say that if participants have a predilection toward the suggested activity, there is a higher chance of them staying in. Therefore there is a direct correlation between people staying in and their interest in the activity. After personalization, we expect the score to be further increased. |
RQ6: Can pepper identify and catch the attention of the PwD?
![]() |
We find that the values for co-presence for both conditions are very similar. This may be attributed to the novelty effect and the fact that the face recognition module remains unchanged.
The values for attention allocation are similar, but the controlled flow (design Y) has a higher value. We suspect that the potential reason might be that people start to lose focus with the long conversations. Besides the co-presence, all the observations are not statistically significant because of the high variance in the limited responses. |
Reliabity Scores
![]() |
We achieved a high Cronbatch alpha score (>60%) for almost all the sections of our analysis. This thereby provides reliability to our evaluation. |
Limitation
- Lab Environment: The lab environment is different from a care home, the participants found it difficult to process the suggestions made by Pepper. For example, if Pepper asked someone to visit the living room, it created confusion among the participants regarding their next action.
- Role-Playing: Participants for the experiment are not actual patients suffering from dementia. Hence it is naturally difficult for them to enact the situations and replicate the mental state of an actual person suffering from dementia.
- Speech Recognition: The speech recognition module inside Pepper is not perfect. Therefore, in certain cases, Pepper misinterpreted words spoken by the participants and triggered an erroneous dialogue flow. The problems commonly occurred with words that sound similar such as "work" and "walk". Moreover, there are some additional hardware limitations that hampered the efficiency of the speech recognition system. One prominent issue is that the microphone within Pepper is only active when the speaker is turned off. A blue light in the eye of Pepper indicated the operation of the microphone. Since most of the participants are not used to interacting with Pepper found it difficult to keep this limitation in mind while trying to have a natural conversation.
- Face Detection: The face recognition module within Pepper is also rudimentary in nature. It can not detect half faces are when participants approach from the side. Adding to the problem, the lighting condition in the lab was not sufficient for the reliable functioning of the face recognition module. Hence Pepper failed to notice the participant in some cases and did not start the dialogue flow.
Conclusions
- People who liked the activity tend to stay in
- People who knew the music found it more fitting
- People are more convinced to stay in with the intelligent prototype
- Cannot conclude whether moods were improved
- Need to experiment with the actual target user group to derive on concrete conclusion
- Experiment with personalization
Future Work
- Personalisation: Personalize music, and activity preferences according to the person interacting with Pepper.
- Robot Collaboration: Collaborate with other robots such as Miro to assist a person with dementia while going for a walk instead of the caretaker.
- Recognise Person: For a personalised experience, it is essential that Pepper is able to identify each person based on an internal database.
- Fine Tune Speech Recognition: Improvements are necessary for the speech recognition module before the actual deployment of the project in a care home. Additionally, support for multiple languages can be considered to engage with non-English speaking people.