b. Test - XWiki

= 1. Introduction =

In section [[a. Prototype>>3\. Evaluation.a\. Prototype.WebHome]] two versions of the robot were presented, one with voice functionality and one without.

4

5

The main claims we are looking to test with this testing procedure are related to the functionality and usability of the robot.

6

7

The participants will be other students taking this course. The participants will be placed in the shoes of a PwD and be tasked with completing several basic actions with the robot while impaired in several known ways to simulate the difficulties of a PwD.

8

9

After the experiment, the participants will fill out a survey and be asked some more open-ended questions with the purpose of understanding how the interaction with the robot went, and whether they have anything that they find concerning regarding the possible use of the system and its functions in a real-life setting.

10

11

On top of this, a short questionnaire will be sent to several care homes throughout the Netherlands in hopes to get a general idea of whether the caretakers at the facilities think that the system would be a good fit for the proposed use case.

= 2. Method =

The prototypes are evaluated in a simulated manner and conducted in-person experiments. Participants will be given a [[persona>>doc:Main.sdf.Persona Scenarios.WebHome]] to play act.

16

17

== 2.1 Participants ==

18

19

All students in CS4235 Socio-Cognitive Engineering (2022-2023) in TU Delft are invited to test the robot. In the end, 14 students are presented.

20

21

== 2.2 Experimental design ==

22

23

Prior to the experiment, participants were asked to simulate the experience of a person with dementia. Participants who wore glasses were asked to remove them, while those who did not were provided with short-sighted or far-sighted glasses to replicate the blurred vision and degraded perception that is common in PwDs. Additionally, their index and middle fingers were taped together to simulate the difficulty in controlling movements that many PwDs experience.

24

25

For this experiment, we used a between-subject design. The control group interacted solely with the stand-alone application, which represented the robot without voice functionality, and received guidance through a single task paper. In contrast, the experimental group engaged with the fully-functional robot, which provided audio instructions, guidance, and encouragement.

== 2.3 Tasks ==

In the user test, the following tasks were asked of the participants:

30

31

==== Reminders for activities ====

32

33

* (((

34

Add a reminder that a relative will pay a visit on Sunday with the format as "<relative name> will visit you on Sunday at 3 pm for some tea". Set the reminder to remind you 10 min before that.

35

)))

36

* (((

37

Check the remainder for this week and verify the new remainder added for the event.

38

)))

39

40

==== Personal profile ====

41

42

* (((

43

Browse through relative profiles and read the info.

44

)))

45

* (((

46

Add relatives as a contact in the "profile" section.

47

)))

48

49

==== Memory games ====

50

51

* (((

52

Go to the Games section and play the game.

53

)))

54

55

==== Medicine reminders ====

56

57

(for professional caregivers - not part of the evaluation)

58

59

* (((

60

In the section “My Health”, add a medicine reminder to take the medicine **Donepezil**, 1 time per day at 9 PM before going to bed.

61

)))

62

* (((

63

Check medicines that have been added.

64

)))

65

* (((

66

Delete medicines that have been added.

)))

== 2.4 Measures ==

Two quantitative measures were employed in the user evaluation. The first measure aimed to test attributes including accessibility, trustworthiness, and comprehensibility. The second measure used was the System Usability Scale (SUS), which is a widely-used scale for evaluating the usability of the software.

72

73

In addition to the quantitative measures, a structured interview with open-ended questions was conducted with randomly selected participants. The aim of this interview was to gain a deeper understanding of participants' experiences with the robot, including any concerns they may have regarding the potential use of the system and its functions in a real-life setting.

74

75

By using both quantitative and qualitative measures, the user evaluation can provide a more comprehensive understanding of participants' experiences and perceptions of the system. This approach allows for a more nuanced analysis of the data and can provide valuable insights into the strengths and weaknesses of the system, as well as areas for improvement.

== 2.5 Procedure ==

The procedure was conducted as follows:

80

81

1. Welcome participants and give an introduction.

82

1. Get them to sign a consent form.

83

1. Prepare them to pretend to be a person with dementia.

84

1. Have interaction with the robot and complete the tasks.

85

1. Complete a questionnaire.

86

1. Have a short interview with randomly selected participants.

== 2.6 Material ==

1. Consent form. To protect the privacy of participants and ensure the evaluation process goes smoothly, we will ask participants to sign a consent form, indicating they are willing to take part in the evaluation and the data gathered from the experiment will be analyzed by researchers.

91

1. [[Pepper robot>>doc:Main.c\. Technology.Humanoid Robot.WebHome]].

92

1. [[Questionnaire>>doc:Main.b\. Human Factors.Measuring Instruments.Questionnaire\: SUS.WebHome]]. Validated questions to test the functionality and usability.

= 3. Results =

=== (% style="color:inherit; font-family:inherit" %)Results of the survey:(%%) ===

97

98

[[Figure: //Percentage of user satisfaction and SUS score//>>image:attach:chart.png]]

99

100

(% class="wikigeneratedid" %)

101

As mentioned earlier, the user evaluation incorporated two quantitative measures. The first measure evaluated the various attributes of the system, including accessibility, trustworthiness, perceivability, understandability, and empowerment. The second measure employed was the System Usability Scale (SUS).

102

103

(% class="wikigeneratedid" %)

104

The attributes-related evaluation was analyzed based on the following way: if a respondent had a minimum total score of 60% (15 out of 25) or more, he or she was considered to be satisfied with the application. 11 out of 14 (78.57%) of the users achieved a score of 15 or higher. The average score is 18. According to the standard operating protocol (Quintana et al., 2020), the feasibility test was to be considered successfully completed if at least 75% were satisfied with the use of the application. Therefore, based on this criterion, the feasibility test was considered successfully completed.

105

106

The System Usability Scale (SUS) was interpreted in terms of percentile ranking. The average SUS score for the stand-alone application is 54.17 (grade D), and that of the robot is 71.86 (grade B). Based on research, a SUS score above 68 would be considered above average and anything below 68 is below average. As a result, the usability of the robot was from this point of view considered successfully completed.

107

108

109

|(% style="width:215px" %)**Attributes**|(% style="width:211px" %)**Mean (control group)**|(% style="width:229px" %)**Mean (Experimental group)**|(% style="width:197px" %)**P-value**

110

|(% style="width:215px" %)Accessibility|(% style="width:211px" %)2,33|(% style="width:229px" %)3,25 |(% style="width:197px" %)0,0644

111

|(% style="width:215px" %)Trustability|(% style="width:211px" %)3,83|(% style="width:229px" %)4,125|(% style="width:197px" %)0,3165

112

|(% style="width:215px" %)Perceivability|(% style="width:211px" %)3,33|(% style="width:229px" %)3,5|(% style="width:197px" %)0,4112

113

|(% style="width:215px" %)Understandability|(% style="width:211px" %)3,33|(% style="width:229px" %)4,25|(% style="width:197px" %)0,1151

114

|(% style="width:215px" %)Empowerment|(% style="width:211px" %)3,33|(% style="width:229px" %)4|(% style="width:197px" %)0,0895

115

|(% style="width:215px" %)Usability|(% style="width:211px" %)54,16666667|(% style="width:229px" %)71,875|(% style="width:197px" %)0,0903

116

117

(% class="wikigeneratedid" %)

118

//Table: User evaluation score//

119

120

=== Observation: (Total percentage sums up to 100) ===

121

122

|=(% style="width: 199px;" %)Tasks|=(% style="width: 147px;" %)Succeeded by Themselves|=(% style="width: 146px;" %)Succeeded with Some Guidance|=(% style="width: 185px;" %)Succeeded with Detailed Explicit Instructions|=(% style="width: 175px;" %)Average Time to Complete Task (s)

//Table: Results of user performance of tasks //

130

131

132

|(% style="width:330px" %)**Tasks**|(% style="width:523px" %)**Parts where people struggled**

133

|(% style="width:330px" %)Add a reminder|(% style="width:523px" %)(((

134

* Don't know where to start

135

* No immediate audio feedback indicating success

136

)))

137

|(% style="width:330px" %)Create a personal profile|(% style="width:523px" %)(((

138

* It's hard to type with taped fingers

139

)))

140

|(% style="width:330px" %)Play memory game|(% style="width:523px" %)(((

141

* Have no idea how to play the game, sometimes even after listening to the explicit instructions

142

* There are too many words in the text

143

* The beta version has no right or wrong prompts, different from the instructions, making people confused

144

)))

145

146

//Table: Difficulties that users struggled with when solving tasks//

= 4. Discussion =

(% class="wikigeneratedid" %)

151

In light of our research question, we found no notable disparities between the stand-alone application and the robot. The results may be influenced by the experimental setup. While we aimed to emulate a real-world scenario for participants to perform the tasks, their pre-existing digital device proficiency could have played a role. The participants may have had prior experience with using similar applications or robots, which could have affected their performance and perception of the two groups. Nevertheless, given that people with dementia likely have limited knowledge of utilizing mobile devices, we maintain a positive outlook on the potential efficacy of the visual-audio aid provided by the robot to enhance the experience of PwDs utilizing the application.

152

153

==== Limitations: ====

154

155

* We could not adapt the robot to the PwD due to time constraints. This means that we did not take into account the severity of the PwD's visual, acoustic and kinesthetic limitations while setting up Pepper.

156

* We could not test the full capabilities of the robot due to privacy constraints. Since we fabricated the information about relatives to protect the privacy of participants, we were not able to perform the scenarios in a realistic manner.

157

* Since the version of the Google Chrome browser on the Pepper tablet was outdated, we were not able to load our Flutter application onto it and simulate actual scenarios.

158

* Participants were from a wide variety of different backgrounds and mother tongues, it was therefore not possible to adjust Pepper to the specific culture of the participant.

159

160

==== Future Improvements: ====

161

162

* We can make our system more realistic/adapted to PwD by incorporating human-like responses, gestures and movements to Pepper.

163

* We can make our system fully gesture/voice controlled to enable the PwD to (% style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-variant:normal; font-weight:400; text-decoration:none; white-space:pre-wrap" %)use the system without assistance from a caregiver, increasing their autonomy.

164

* (% style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-variant:normal; font-weight:400; text-decoration:none; white-space:pre-wrap" %)We can incorporate privacy protocols like voice authentication and gaze detection to ensure that all personal information about the PwD, relatives and caregivers are kept safe and confidential.

= 5. Conclusions =

After performing the experiment and running various statistical tests on the results obtained, we have made the following conclusions that hopefully answer some of our research questions:

169

170

1. We believe that an information support application **DOES IMPROVE** a PwD's well-being, since it can (% style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-variant:normal; font-weight:400; text-decoration:none; white-space:pre-wrap" %)provide them with access to important information and support, improving their overall quality of life.

171

1. We believe that a robot assistant **DOES IMPROVE** the experience of a PwD using it. The robot(% style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-variant:normal; font-weight:400; text-decoration:none; white-space:pre-wrap" %) can provide companionship and assistance, making them feel independent and less isolated.

172

173

(% style="color:#000000; font-family:Arial; font-size:11pt; font-style:normal; font-variant:normal; font-weight:400; text-decoration:none; white-space:pre-wrap" %)While our experiment had its limitations, we believe that it provides a foundation for future research in developing personalized memory robots for people with dementia. We also believe that our research is applicable to mobile agents which increases the accessibility of the solution.

174

175

Wiki source code of b. Test

Navigation

Need help?