Wiki source code of Test

Version 101.1 by Clara Stiller on 2022/04/05 13:42

Hide last authors
Vishruty Mittal 86.1 1 Evaluation is an iterative process where the initial iterations focus on examining if the proposed idea is working as intended. Therefore, we want to first understand how realistic and convincing the provided dialogues and suggested activities are, and would they be able to prevent people from wandering. To examine this, we conduct a small pilot study with students, who role-play having dementia. We then observe their interaction with Pepper to examine the effectiveness of our dialogue flow in preventing people from wandering.
Clara Stiller 2.4 2
Xin Wan 1.2 3 = Problem statement and research questions =
Clara Stiller 4.1 4
Simran  Karnani 1.4 5 **Goal**: How effective is music and dialogue in preventing people with dementia from wandering?
Bart Vastenhouw 1.1 6
Clara Stiller 2.7 7 **Research Questions (RQ):**
Simran  Karnani 10.4 8
Vishruty Mittal 73.1 9
Simran  Karnani 10.3 10 1. What percentage of people are prevented from going out unsupervised? (Quantitative) (CL01, CL05)
Cesar van der Poel 98.1 11 1. How does the interaction change the participant's mood? (CL02, CL13)
Clara Stiller 40.1 12 1. Can the robot respond appropriately to the participant's intention? (CL03)
Simran  Karnani 10.5 13 1. How do the participants react to the music? (CL04)
14 1. Does the activity that the robot suggests prevent people from wandering/ leaving? (CL06)
Cesar van der Poel 97.1 15 1. Can pepper identify and catch the attention of the PwD? (CL07)
Simran  Karnani 1.4 16
Clara Stiller 40.1 17 //Future research questions//
Simran  Karnani 10.4 18
Simran  Karnani 14.1 19 1. Does the interaction with Pepper make PwD come back to reality? (CL08)
20 1. Does the interaction with Pepper make PwD feel he/she is losing freedom? (CL09)
21 1. Does preventing the participant from going out alone make them feel dependent? (CL10)
Vishruty Mittal 9.1 22
Xin Wan 1.2 23 = Method =
Clara Stiller 4.1 24
Vishruty Mittal 87.1 25 We will conduct a between-subject study with students who play the role of having dementia. Data will be collected with a questionnaire that participants fill out before and after interacting with Pepper. The questionnaire captures different aspects of the conversation along with their mood before and after the interaction with Pepper.
Bart Vastenhouw 1.1 26
Vishruty Mittal 61.1 27 For our between-subject study, our independent variable is Pepper trying to distract the users by mentioning different activities along with the corresponding music. Through this, we want to measure the effectiveness of music and activities in preventing people from leaving the care home, which is thereby our dependent variable. So we developed 2 different prototype designs-
28
Vishruty Mittal 88.1 29 Design X - is the full interaction flow where Pepper suggests activities and uses music to distract people from leaving.
30 Design Y - is the control condition where pepper simply tries to stop people from leaving by physically keeping its hand on the door.
Vishruty Mittal 61.1 31
Xin Wan 1.2 32 == Participants ==
Clara Stiller 4.1 33
Vishruty Mittal 90.1 34 The ideal participants for our user study would have been people who have dementia. However, as the people in this section fall under vulnerable groups, testing with them would have been very difficult due to the current pandemic situation. Therefore we planned to conduct our experiments with students instead.
35 Our experiment involves 17 students who play the role of having dementia. They will be divided into two groups. One group (11 participants) will be interacting with design X, while the other group (6 students) will interact with design Y.
Bart Vastenhouw 1.1 36
Xin Wan 1.2 37 == Experimental design ==
Clara Stiller 4.1 38
Vishruty Mittal 65.1 39 **Before Experiment:**
Vishruty Mittal 91.1 40 We will explain to the participants the goal of this experiment and what they need to do to prevent ambiguity. Therefore, as our participants are students and only playing the role of having dementia, we will give them a level of stubbornness/ willpower with which they are trying to leave the care home.
41 Participants will also be given a reason to leave from the below list:
Bart Vastenhouw 1.1 42
Vishruty Mittal 67.1 43 * going to the supermarket
44 * going to the office
45 * going for a walk
Vishruty Mittal 63.1 46
47 After this preparation, the participant fills a part of the questionnaire.
48
Vishruty Mittal 66.1 49 **Experiment:**
Vishruty Mittal 63.1 50 The participant begins interacting with Pepper who is standing near the exit door. The participant and robot have an interaction where the robot is trying to convince him/her to stay inside.
51
Vishruty Mittal 66.1 52 **After Experiment:**
Vishruty Mittal 91.1 53 After the participant finishes interacting with Pepper, he/she will be asked to fill out the remaining questionnaire. Almost all the questions in the questionnaire collect quantitative data, using a 5 point Likert scale. The questionnaire also used images from Self Assessment Manikin (SAM) so that users could self attest to their mood before and after their interaction with Pepper.
Vishruty Mittal 63.1 54
Xin Wan 1.2 55 == Material ==
Clara Stiller 4.1 56
Vishruty Mittal 73.1 57 The items required for this evaluation are the following:
Bart Vastenhouw 1.1 58
Vishruty Mittal 73.1 59 * Pepper
60 * Door
61 * Caretaker in a nearby room in case of emergency
62
Vishruty Mittal 58.1 63 = Results =
Bart Vastenhouw 1.1 64
Sayak Mukherjee 52.3 65 {{html}}
66 <!--=== Comparison between intelligent (cond. 1) and less intelligent (cond. 2) prototype ===
Clara Stiller 6.2 67
Clara Stiller 52.1 68 {{html}}
69 <img src="/xwiki/wiki/sce2022group05/download/Test/WebHome/Stay_inside.svg" width="500" height="270" />
70 {{/html}}
71
72 Non of the participants who interacted with the less intelligent robot was prevented from leaving. Still, 3 people assigned to condition 1 weren't convinced to stay inside. A failure rate of 27,3 % is too high for this application since people could be in danger if the system fails.
73
74 **Mood evolution**
75 [[image:mood_only.png||height="150"]]
76
77 {{html}}<img src="/xwiki/wiki/sce2022group05/download/Test/WebHome/mood_before.svg" width="500" height="270" /><br>
78 <img src="/xwiki/wiki/sce2022group05/download/Test/WebHome/mood_after.svg" width="500" height="270" /><br>{{/html}}
79
80 Regarding the changes in mood, 4 out of 11 participants assigned to condition 1 had an increase in mood throughout the interaction. Only one participant felt less happy afterward, the rest stayed at the same level of happiness. The overall mood shifted to happier in general (as you can see in the graphic above), even though only small improvements in mood were detected (<= 2 steps on the scale).
81 The participants from condition 2 mostly stayed at the same mood level, 2 were less happy, one participant was happier afterward. Comparing both conditions it becomes clear, that condition 1 had a more positive impact on the participant's mood.
82
83 Interesting is also, none of the participants was in a really bad mood at the beginning or end.
84
Clara Stiller 42.1 85 ==== Condition 1 - intelligent Prototype: ====
Bart Vastenhouw 1.1 86
Clara Stiller 42.1 87 {{html}}
88 <img src="/xwiki/wiki/sce2022group05/download/Test/WebHome/Music_reco.svg" width="500" height="270" /> <br>
89 {{/html}}
Clara Stiller 40.1 90
Clara Stiller 7.2 91 8 out of 11 Participants answered, that they don't know the music that has been played. If we told them afterward the title of the song, most participants do know the song. Why didn't they recognize it during the interaction?
92 This can have two reasons: The part of the song we pick was too short to be recognized or not the most significant part of the song. For example, the beginning of "escape - the pina colada song" is not as well known as its chorus. Another reason could be, that the participant was distracted or confused by the robot and therefore couldn't carefully listen to the music.
Clara Stiller 40.1 93
94 {{html}}
Clara Stiller 40.2 95 <img src="/xwiki/wiki/sce2022group05/download/Test/WebHome/Music_fit.svg" width="500" height="270" /> <br>
Clara Stiller 40.1 96 {{/html}}
97
Clara Stiller 40.2 98 Only 4 out of 11 people agreed, that the music fits the situation. One of our claims, to use music that fits the situation or place, is therefore not reached and the music didn't have the intended effect. Even though we carefully choose the music and discussed a lot about our choice, it was hard to find music that different people connect with a certain place or activity. An approach to improve this could be using an individual playlist for each participant.
Clara Stiller 40.1 99
Clara Stiller 40.2 100
Clara Stiller 42.1 101 ==== Condition 2 - less intelligent prototype: ====
Clara Stiller 16.3 102
Sayak Mukherjee 52.3 103 Participants assigned to condition 2 weren't convinced to leave. We saw, that most of them tried to continue talking to pepper when it raises its arm to block the door, even though it didn't listen. They were surprised by peppers reaction and asked for a reason why they are not allowed to leave. In order to have a natural conversation flow, the robot should provide an explanation for each scenario that tells why the person is not allowed to leave. This confirms that our approach, to give reason to stay inside, might be helpful to convince PwD to stay inside.
Clara Stiller 42.1 104
Clara Stiller 43.1 105 === Problems that occurred during the evaluation ===
106
Clara Stiller 6.2 107 1. lots of difficulties with speech recognition:
108 1.1. even though the participant said one of the expected words, pepper understood it wrong and continued with a wrong path
109 1.2. If the participant started to talk before pepper was listening (eyes turning blue), it misses a "yes" or "no" at the beginning of the sentence, which causes misunderstandings.
110 1. problems with face detection
111 2.1. due to bad light face was not recognized
112 2.2. if the participant passes pepper from the side, the face was not recognized. Therefore, we told people to walk from the front towards pepper. In most cases that helped detect the face.
113 2.3. face detection doesn't work with face masks. This could lead to huge problems in the usage of pepper in care homes.
Clara Stiller 5.5 114
Clara Stiller 7.1 115 One of the most frequent and noticeable reactions from participants was **confusion**. This feeling was caused by two main factors:
116 misunderstandings from speech recognition which leads to unsuitable answers from pepper, as well as the unsuitable environment and setting of our evaluation.
Clara Stiller 6.2 117 The reasons for failure in speech recognition are listed above. An unsuitable answer can e.g. be an argument to stay inside, that doesn't fit the participant's reason to leave. Also, some people told in a long sentence that they don't like the provided activity and still want to leave. If the speech recognition fails in this case and pepper understood you would like to do the activity, it seems like it encourages you to leave, instead of doing the activity. This leads to the total opposite of our intention.
Sayak Mukherjee 52.3 118 Furthermore, we found out, that our prototype doesn't fit in the environment of the lab. We encourage the participant to do some activities, that they can't do in the lab environment (go to the living room, have a coffee or do a puzzle). If the robot tells asks you if you want to do the activity, most people don't know how to react and are insecure about how to answer. Participants "freeze" in front of the robot or just left the room. -->
119 {{/html}}
Clara Stiller 6.2 120
Sayak Mukherjee 52.3 121 === RQ1: Are people convinced not to go out unsupervised? ===
Vishruty Mittal 58.1 122
Sayak Mukherjee 52.3 123 {{html}}
124 <table style="width: 100%">
125 <tr>
126 <td style="width: 50%">
Sayak Mukherjee 54.1 127 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RQ1.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 128 </td>
129 <td>
Vishruty Mittal 92.1 130 We used a Likert scale for this question, 1 being the lowest and 5 being the highest. Participants who interacted with design X tend to agree to stay inside more than those who interacted with design Y.
Vishruty Mittal 74.2 131
Sayak Mukherjee 52.3 132 </td>
133 </tr>
134 </table>
135 {{/html}}
Clara Stiller 42.1 136
Sayak Mukherjee 52.3 137 === RQ2: How does the interaction change the participant's mood? ===
Vishruty Mittal 58.1 138
Sayak Mukherjee 52.3 139 {{html}}
140 <table style="width: 100%">
141 <tr>
142 <td style="width: 50%">
Sayak Mukherjee 54.1 143 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RQ2.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 144 </td>
145 <td>
Vishruty Mittal 95.1 146 We notice a positive change in valence with the full flow i.e design X (although negligible). This could be because of the music. The valence does not decrease for the baseline, which might be due to the novelty effect of seeing Pepper for the first time. The change in arousal in both scenarios is nearly negligible. This might be due to the fact that the interaction with Pepper was very short. <br>
Vishruty Mittal 93.1 147 Additionally, in the case of the full flow i.e design X, these values might not have changed significantly as per the expectation (valence higher, arousal lower) because the music was not personalized for participants.
Vishruty Mittal 76.1 148
Sayak Mukherjee 52.3 149 </td>
150 </tr>
151 </table>
152 {{/html}}
Bart Vastenhouw 1.1 153
Sayak Mukherjee 52.3 154 === RQ3: Can the robot respond appropriately to the participant's intention? ===
Vishruty Mittal 58.1 155
Sayak Mukherjee 52.3 156 {{html}}
157 <table style="width: 100%">
158 <tr>
159 <td style="width: 50%">
Sayak Mukherjee 54.1 160 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RQ3.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 161 </td>
162 <td>
Vishruty Mittal 80.1 163 We notice a very minute difference between the full flow i.e design X, and control condition, design Y. There might be many reasons behind this. The speech recognition module in Pepper was not very efficient to understand different accents and thereby misunderstood words in some cases. <br>
Vishruty Mittal 95.1 164 The null hypothesis is perceived message understanding for both the conditions is equal. Given the p-value, the null hypothesis can not be rejected. High variance in data and restrictive sample size could be the reasons behind the insignificant result.
Vishruty Mittal 79.1 165
Sayak Mukherjee 52.3 166 </td>
167 </tr>
168 </table>
169 {{/html}}
Bart Vastenhouw 1.1 170
Sayak Mukherjee 52.3 171 === RQ4: How do the participants react to the music? ===
Vishruty Mittal 58.1 172
Sayak Mukherjee 52.3 173 {{html}}
174 <table style="width: 100%">
175 <tr>
176 <td style="width: 50%">
Sayak Mukherjee 54.1 177 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RQ4.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 178 </td>
179 <td>
Vishruty Mittal 95.1 180 We found that participants who knew the songs, enjoyed the music and thought it fit the situation more, than those who did not know the songs. 
Sayak Mukherjee 52.3 181 </td>
182 </tr>
183 </table>
184 {{/html}}
Bart Vastenhouw 1.1 185
Sayak Mukherjee 52.3 186 === RQ5: Does the activity that the robot suggests prevent people from wandering/ leaving? ===
Vishruty Mittal 58.1 187
Sayak Mukherjee 52.3 188 {{html}}
189 <table style="width: 100%">
190 <tr>
191 <td style="width: 50%">
Sayak Mukherjee 54.1 192 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RQ5.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 193 </td>
194 <td>
Vishruty Mittal 83.1 195 As per these results, we can say that if participants have a predilection toward the suggested activity, there is a higher chance of them staying in. Therefore there is a direct correlation between people staying in and their interest in the activity. After personalization, we expect the score to be further increased.
Sayak Mukherjee 52.3 196 </td>
197 </tr>
198 </table>
199 {{/html}}
Clara Stiller 16.3 200
Sayak Mukherjee 52.3 201 === RQ6: Can pepper identify and catch the attention of the PwD? ===
Vishruty Mittal 58.1 202
Sayak Mukherjee 52.3 203 {{html}}
204 <table style="width: 100%">
205 <tr>
206 <td style="width: 50%">
Sayak Mukherjee 54.1 207 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RQ6.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 208 </td>
209 <td>
Vishruty Mittal 95.1 210 We find that the values for co-presence for both conditions are very similar. This may be attributed to the novelty effect and the fact that the face recognition module remains unchanged.
Vishruty Mittal 96.1 211 The values for attention allocation are similar, but the controlled flow (design Y) has a higher value. We suspect that the potential reason might be that people start to lose focus with the long conversations. <br>
Vishruty Mittal 84.1 212
Vishruty Mittal 95.1 213 Besides the co-presence, all the observations are not statistically significant because of the high variance in the limited responses.
Vishruty Mittal 84.1 214
Sayak Mukherjee 52.3 215 </td>
216 </tr>
217 </table>
218 {{/html}}
Clara Stiller 16.3 219
Sayak Mukherjee 52.3 220 === Reliabity Scores ===
Vishruty Mittal 58.1 221
Sayak Mukherjee 52.3 222 {{html}}
223 <table style="width: 100%">
224 <tr>
225 <td style="width: 50%">
Sayak Mukherjee 54.1 226 <img src="/xwiki/wiki/sce2022group05/download/Foundation/Operational%20Demands/Personas/WebHome/RelScores.jpg?height=250&rev=1.1" />
Sayak Mukherjee 52.3 227 </td>
228 <td>
Vishruty Mittal 96.1 229 We achieved a high Cronbatch alpha score (>60%) for almost all the sections of our analysis. This thereby provides reliability to our evaluation.
Sayak Mukherjee 52.3 230 </td>
231 </tr>
232 </table>
233 {{/html}}
234
Vishruty Mittal 58.1 235 = Limitation =
Clara Stiller 16.3 236
Sayak Mukherjee 52.3 237 * **Lab Environment**: The lab environment is different from a care home, the participants found it difficult to process the suggestions made by Pepper. For example, if Pepper asked someone to visit the living room, it created confusion among the participants regarding their next action.
Clara Stiller 52.2 238
Sayak Mukherjee 52.3 239 * **Role-Playing**: Participants for the experiment are not actual patients suffering from dementia. Hence it is naturally difficult for them to enact the situations and replicate the mental state of an actual person suffering from dementia.
Clara Stiller 52.2 240
Sayak Mukherjee 52.3 241 * **Speech Recognition**: The speech recognition module inside Pepper is not perfect. Therefore, in certain cases, Pepper misinterpreted words spoken by the participants and triggered an erroneous dialogue flow. The problems commonly occurred with words that sound similar such as "work" and "walk". Moreover, there are some additional hardware limitations that hampered the efficiency of the speech recognition system. One prominent issue is that the microphone within Pepper is only active when the speaker is turned off. A blue light in the eye of Pepper indicated the operation of the microphone. Since most of the participants are not used to interacting with Pepper found it difficult to keep this limitation in mind while trying to have a natural conversation.
242
243 * **Face Detection**: The face recognition module within Pepper is also rudimentary in nature. It can not detect half faces are when participants approach from the side. Adding to the problem, the lighting condition in the lab was not sufficient for the reliable functioning of the face recognition module. Hence Pepper failed to notice the participant in some cases and did not start the dialogue flow.
244
Xin Wan 1.2 245 = Conclusions =
Vishruty Mittal 58.1 246
Sayak Mukherjee 52.3 247 * People who liked the activity tend to stay in
248 * People who knew the music found it more fitting
249 * People are more convinced to stay in with the intelligent prototype
250 * Cannot conclude whether moods were improved
251 * Need to experiment with the actual target user group to derive on concrete conclusion
252 * Experiment with personalization
Clara Stiller 40.1 253
Sayak Mukherjee 52.3 254 = Future Work =
Vishruty Mittal 58.1 255
Sayak Mukherjee 52.3 256 * **Personalisation**: Personalize music, and activity preferences according to the person interacting with Pepper.
257 * **Robot Collaboration**: Collaborate with other robots such as Miro to assist a person with dementia while going for a walk instead of the caretaker.
258 * **Recognise Person**: For a personalised experience, it is essential that Pepper is able to identify each person based on an internal database.
259 * **Fine Tune Speech Recognition**: Improvements are necessary for the speech recognition module before the actual deployment of the project in a care home. Additionally, support for multiple languages can be considered to engage with non-English speaking people.