Wiki source code of Test

Version 94.1 by Xinqi Li on 2022/04/02 01:33

Hide last authors
Xinqi Li 60.1 1 Our robot aims to help delay the stage of dementia or slow down the deterioration of memory. The best situation is that we can test the robot with real PwD and in a relatively long time period to see if this robot really works, which is impossible for our project. So our evaluation performs in a group control way. Participants are divided into two groups, group A with the intelligent one, and group B with the dumb one.
Xinqi Li 52.1 2
Andrei Stefan 30.1 3 = Problem statement and research questions =
Andrei Stefan 46.1 4
Andrei Stefan 34.1 5 The main use cases that the evaluation focuses on are UC001: Daily todo list and UC005: Quiz. Based on the claims corresponding to those use cases, we derive the following research questions:
Andrei Stefan 36.1 6
Xinqi Li 56.1 7 1. Are PwD willing to play the quiz?
8 1. Are PwD happy to listen to music?
9 1. Are PwD happy if they get the correct answer?
10 1. Does PwD enhance their memory of the association between music and activities?
Andrei Stefan 34.1 11
Andrei Stefan 1.2 12 = Method =
Bart Vastenhouw 1.1 13
Xinqi Li 60.1 14 The control group evaluation is used. One group of participants interacts with a dumb robot and another group interacts with the intelligent robot. The only difference between these two groups is the independent variable - dumb or intelligent robot, which makes our result more reasonable.
Xinqi Li 54.1 15
16 Besides, Our group decided to use a mixed-method approach for the evaluation.
17
Xinqi Li 90.1 18 * Quantitative data will be derived during the experiment such as the number of mistakes the participant makes during the quiz. The participants were also asked to provide a score based on the given system usability scale^^1^^.
Xinqi Li 54.1 19 * Qualitative data expected to be gathered through questionnaires, such as to what extent participants are satisfied with using the robot, is also adopted for evaluation.
Xinqi Li 55.1 20
Xinqi Li 54.1 21 By measuring these two types of data, we will manage to assess if our claims are achieved and the research questions are answered.
Bart Vastenhouw 1.1 22
Andrei Stefan 1.2 23 == Participants ==
Bart Vastenhouw 1.1 24
Xinqi Li 60.1 25 We invited 19 participants. To validate our research question that the quiz will help people better memorize music-activity links, participants will be divided into two groups, Group A with the intelligent robot(9 participants) and Group B(10 participants) with the dumb robot.
Bart Vastenhouw 1.1 26
Andrei Stefan 1.2 27 == Experimental design ==
Bart Vastenhouw 1.1 28
Xinqi Li 58.1 29 The experiment will be conducted to simulate the reinforcement learning process of musical memory related to daily activities and to investigate if the quiz is indeed able to help with the learning.
Xinqi Li 65.1 30 All participants would sign a consent form that informed them of the usage of the collected data and our goal of evaluations. In our prototype, users can personalize the association between music and activities based on their existing intrinsic knowledge. But due to the limited time and requiring a comparable result between groups, in evaluation, we forced 6 pieces of music and activities. Participants listened to the music and were asked the remember the associated activities.
Xinqi Li 58.1 31 In the end, the participants would take a quiz to see how much they remembered. They are also asked to fill in a questionnaire including the feeling of the robot and possible feedback.
32
33 1. How many questions did you answer correctly? (Points from 0-6)
34 1. You feel the robot can help you remember the task. (Agree, Neutral, Disagree)
35 1. You feel the robot is annoying. (Agree, Neutral, Disagree)
Xinqi Li 65.1 36 1. Based on the given system usability scale, please give our robot a score. (0-100)
Xinqi Li 58.1 37
38 Except for the previous questions, we also collect feedback from participants
39
40 1. What did you like most about the robot?
41 1. What did you dislike most about the robot?
42 1. Do you have any further suggestions? (*optional)
43
Dongxu Lu 15.1 44 == Tasks ==
45
Xinqi Li 68.2 46 The participants are asked to memorize the association between the given music and activities as best as they can during the play with the robot.
Xinqi Li 70.1 47 The robot would play the music and ask the participant to answer the correct activity.
48 In the end, the participant would do the final test and we count the number of correct answers.
Dongxu Lu 25.1 49
Andrei Stefan 1.2 50 == Measures ==
Dongxu Lu 9.1 51
Xinqi Li 62.1 52 Count the correct answer in the final test.
Andrei Stefan 33.1 53 After the experiment, ask the user to fill in the system usability scale and the questionnaire regarding mood and satisfaction.
Bart Vastenhouw 1.1 54
Andrei Stefan 1.2 55 == Procedure ==
Bart Vastenhouw 1.1 56
Xinqi Li 67.1 57 **Event: Quiz**
Bart Vastenhouw 1.1 58
Xinqi Li 67.1 59 {{html}}
60 <table>
61 <tr>
62 <td>No.</td>
63 <td>Group A with the intelligent robot</td>
64 <td>Group B with the dumb robot</td>
65 </tr>
66 <tr>
67 <td>1</td>
68 <td>Participants sign the consent form and read the instruction for the evaluation;</td>
69 <td>Participants sign the consent form and read the instruction for the evaluation;</td>
70 </tr>
71 <tr>
72 <td>2</td>
73 <td>Participants memorize six pieces of music corresponding with different activities;</td>
74 <td>Participants memorize six pieces of music corresponding with different activities;</td>
75 </tr>
76 <tr>
77 <td>3</td>
78 <td>Participants play quiz with the smart robot for three minutes, which will correct the participant when wrong answers are given;</td>
79 <td>Participants play quiz with the dumb robot for three minutes, which will not correct the participant when wrong answers are given;</td>
80 </tr>
81 <tr>
82 <td>4</td>
83 <td>Test how well participants remember the music-activity pairs by counting the mistakes made;</td>
84 <td>Test how well participants remember the music-activity pairs by counting the mistakes made;</td>
85 </tr>
86 <tr>
87 <td>5</td>
88 <td>Participants fill in the questionnaire and give the feedback;</td>
89 <td>Participants fill in the questionnaire and give the feedback;</td>
90 </tr>
91 <table>
92 {{/html}}
93
Andrei Stefan 1.2 94 == Material ==
Bart Vastenhouw 1.1 95
Xinqi Li 61.1 96 Robot(NAO) with setting music, consent form, laptop
Bart Vastenhouw 1.1 97
Xinqi Li 75.1 98 = Results =
Bart Vastenhouw 1.1 99
Xinqi Li 80.1 100 [[image:result2.png||height="400px"]]
Xinqi Li 83.1 101 From the left figure, we can see the distribution of the number of correct answers. The average score of all participants is 3.6 among 6 questions. For group A, the average score is 3.3 and for group B the average score is 3.8. This bias can be explained because our group size is not large enough to eliminate the various memory ability. but we can also find that all participants in group A can learn something because they have no 0 scores but several participants in group B got 0 scores. In this degree, we can show that our robot does help in memory.
Xinqi Li 75.1 102
Xinqi Li 83.1 103 From the middle figure, we can find that people in group A tend to think our robot can help improve the memory task and only a few of them thought our robot is annoying, as shown in the right figure.
Xinqi Li 81.1 104
Xinqi Li 89.1 105 [[image:result4.png||height="400px"]]
Xinqi Li 92.1 106 As shown in the above figure, group A with our intelligent robot gave our robot an average score of 66.7, and group B with the dumb robot gave 58.2. In this scale, we can see that participants are more willing to play with our intelligent robot.
Xinqi Li 88.1 107
Xinqi Li 92.1 108 Also, we collect some feedback from the participants. Most of them liked the appearance of the robot which is consistent with the reasons we choose the NAO. People are more engaged and willing to interact with a humanoid robot. Some of them complained about the speech recognition of this robot.
109
Andrei Stefan 7.1 110 = Discussion =
Bart Vastenhouw 1.1 111
Xinqi Li 94.1 112 We assume that our intelligent robot can help people strengthen the association between music and activities. The result of average correct answers didn't approve this. Several reasons existed. First, our participants were not real PwD and their memory abilities vary. Our group size(about 10 for each group) was not large enough. Also, Participants were only given a limited time. The short duration of the quiz and not using personalised music also accounted for this biased result. However, the overall usability score between the two groups and some quantitative results above also shows that our claim PwD are more willing to play with our intelligent robot and PwD are happy to use the robot could still hold.
Xinqi Li 93.1 113
Xinqi Li 94.1 114 Besides, our robot was limited by several key factors,
115
116 * Due to the limited time and resources, we could not evaluate all the claims that were made in the use cases. This limited the broadness of our conclusion about the effectiveness of the system.
117 * As mentioned before, the small sample size made the accuracy of the result doubtable. Having a larger and more diverse sample group would allow us to more accurately predict real-world usage.
118 * The accuracy of the speech recognition system in the NAO and the availability of test subjects and robots also limited the evaluation.
119
120 In the future, we could improve in the following aspects,
121
122 * Test a full implementation of the system in a real setting with PwD.
123 * Research should also be done to look if the robot is actually necessary, or if the advantage of the system could be achieved by a cheaper alternative, such as a virtual robot on a tablet. (Also inspired by the feedback we got. One participant asked why we didn't create an APP.)
124
125
126
Andrei Stefan 7.1 127 = Conclusion =
Xinqi Li 91.1 128
129 = Reference =
Xinqi Li 92.1 130
Xinqi Li 91.1 131 Bangor, A., Kortum, P. T., & Miller, J. T. (2008). An empirical evaluation of the system usability scale. Intl. Journal of Human–Computer Interaction, 24(6), 574-594.