b. Test - XWiki

= 1. Introduction =

We aim to measure the effectiveness of a robot with interactive storytelling, which provides personalization and opportunities for interaction and activities with the family members of the patients. The control situation was a storytelling robot which narrates the story without any interaction or personalization. We aim to measure the claims made earlier using a modified Godspeed questionnaire. The questions added were:

4

5

~1. The mood of the patient after the meal with storytelling.

6

7

2. The feedback of the patient for the story.

8

9

3. The enjoyment level of the patient (given by caregiver)

10

11

4. Was the meal completed? (given by caregiver)

12

13

5. Time taken to complete the meal: Too much time could mean the patient did not enjoy the meal, or that they were too engaged and hence it took longer. A critical analysis is needed to evaluate this measure.

14

15

The negative effects of Pepper were also measured in the questionnaire (covered in Godspeed)

16

17

~1. Pepper was annoying.

18

19

2. Pepper was not human.

20

21

3. Pepper was disturbing.

= 2. Method =

== 2.1 Participants ==

28

29

A total of 14 TU Delft students participated in the study. There were 9 male and 5 female participants. Due to limited time, targeted users (patients with Dementia) could not be included in the study, and all participants were students.

30

31

== 2.2 Experimental design ==

32

33

The within-subject experimental design was chosen, due to the limited number of participants. So each participant was made to talk to the robot twice. The group was divided into half: the first half interacted with the robot with non-interactive storytelling first (control scenario), and then spoke to the robot with interactive storytelling. For the second half of the group, this order was reversed. This was done to balance any carryover bias of talking to one robot before the other.

== 2.3 Tasks ==

The Pepper robot was powered on and connected to the laptop before the participants came in. The participants had to sign the consent form, talk to the robots (twice each) and fill in the questionnaire (twice each).

== 2.4 Measures ==

The experiment measured the differences between the non interactive storytelling robot (control situation) versus the interactive storytelling robot (experimental situation). After each interaction, the participant filled a questionnaire about how their experience with questions which could be answered on a scale of 1 to 5. The link to the questionnaire is [[here>>https://tudelft.eu.qualtrics.com/jfe/preview/previewId/e0402219-f101-4c82-b651-98bbf7cdcc58/SV_8Gtn6s6loYOYXl4?Q_CHL=preview&Q_SurveyVersionID=current]].

42

43

The answers to both questionnaires were recorded, and the p-value was calculated to find the significance of the differences (if any).

== 2.5 Procedure ==

For the experiment, the following steps were performed for each participant:

48

49

~1. Welcome the participant, and explain their tasks.

50

51

2. Make them sign the consent form, making them aware of the data being collected.

52

53

3. Power the robot version on depending on which group of participants they are in.

54

55

4. Make them interact with the first version of the robot.

56

57

5. Once completed, give them a questionnaire to document and fill their experience.

58

59

6. Power the second version of the robot they have to interact with.

60

61

7. Once completed, give them the same questionnaire to fill in their experience.

62

63

8. The experiment is now completed and they can leave.

64

65

The non interactive stories and interactive stories were hard-coded for now. They were personalized based on a made-up scenario. This must be changed according to the patient's experiences. The same story was used for both situations, only difference being: the non interactive robot narrated the story without any scope for interaction between family members, whereas the interactive robot used the same story to promote engagement with the family.

== 2.6 Material ==

The following materials were used during the experiments:

70

71

~1. Laptops for consent forms, questionnaires and the code.

2. The robot pepper.

= 3. Results =

With the answers from both questionnaires (i.e the ones in the control and experimental situations), the following means were observed for the questions with ratings on a 1 to 5 scale:

78

79

[[image:attach:QuestionnaireMeans.png||height="407" width="551"]]

80

81

The x axis denotes the question number from the [[questionnaire>>https://tudelft.eu.qualtrics.com/jfe/preview/previewId/e0402219-f101-4c82-b651-98bbf7cdcc58/SV_8Gtn6s6loYOYXl4?Q_CHL=preview&Q_SurveyVersionID=current]] (these questions can also be seen in the table below) and the y axis denotes the mean rating of that question calculated from all participants. We can see that interactive storytelling has a higher mean in most questions than non interactive storytelling, which implies a more positive response from the participants towards the robot with interactive storytelling. This shows that the interactive storytelling robot might elicit a more positive response from patients with dementia.

82

83

To further find if our results were significant, we performed a one tailed paired sample t test, and the table below summarizes the p values obtained for all the questions (robot characteristics).

84

85

(% style="width:509px" summary="Table showing obtained p-values for all questions" %)

86

|=(% style="width: 47px;" %)Qs Num|=(% style="width: 279px;" %)Robot Characteristics (Questions from Questionnaire)|=(% style="width: 179px;" %)p-value

87

|(% style="width:47px" %)1|(% style="width:279px" %)//**1- Fake, 5- Natural**//|(% style="width:179px" %)//**0.04113**//

88

|(% style="width:47px" %)2|(% style="width:279px" %)1 - Machine Like, 5 - Human like|(% style="width:179px" %)0.20205

89

|(% style="width:47px" %)3|(% style="width:279px" %)1 - Unconscious, 5 - conscious|(% style="width:179px" %)0.14581

90

|(% style="width:47px" %)4|(% style="width:279px" %)1 - Artificial, 5 - Lifelike|(% style="width:179px" %)0.09371

91

|(% style="width:47px" %)5|(% style="width:279px" %)1 - Moving Rigidly, 5 - Moving elegantly|(% style="width:179px" %)0.15096

92

|(% style="width:47px" %)6|(% style="width:279px" %)1- Dead, 5- Alive|(% style="width:179px" %)0.5

93

|(% style="width:47px" %)7|(% style="width:279px" %)1 - Stagnant, 5 - Lively|(% style="width:179px" %)0.30576

94

|(% style="width:47px" %)8|(% style="width:279px" %)1 - Mechanical, 5 - Organic|(% style="width:179px" %)0.40923

95

|(% style="width:47px" %)9|(% style="width:279px" %)1 - Artificial, 5 - Lifelike|(% style="width:179px" %)0.26641

96

|(% style="width:47px" %)10|(% style="width:279px" %)//**1 - Inert, 5 - Interactive**//|(% style="width:179px" %)//**0.00495**//

97

|(% style="width:47px" %)11|(% style="width:279px" %)//**1- Apathetic, 5 - Responsive**//|(% style="width:179px" %)//**0.00759**//

98

|(% style="width:47px" %)12|(% style="width:279px" %)//**1- Dislike, 5- Like**//|(% style="width:179px" %)//**0.00143**//

99

|(% style="width:47px" %)13|(% style="width:279px" %)1 - Unfriendly, 5 - Friendly|(% style="width:179px" %)0.10392

100

|(% style="width:47px" %)14|(% style="width:279px" %)//**1 - Unkind, 5 - Kind**//|(% style="width:179px" %)//**0.02226**//

101

|(% style="width:47px" %)15|(% style="width:279px" %)//**1 - Unpleasant, 5 - Pleasant**//|(% style="width:179px" %)//**0.00845**//

102

|(% style="width:47px" %)16|(% style="width:279px" %)//**1 - Awful, 5 - Nice**//|(% style="width:179px" %)//**0.00329**//

103

|(% style="width:47px" %)17|(% style="width:279px" %)1- Ignorant, 5- Knowledgable|(% style="width:179px" %)0.13154

104

|(% style="width:47px" %)18|(% style="width:279px" %)//**1 - Irresponsible, 5 - Responsible**//|(% style="width:179px" %)//**0.01425**//

105

|(% style="width:47px" %)19|(% style="width:279px" %)1 - Unintelligent, 5 - Intelligent|(% style="width:179px" %)0.37610

106

|(% style="width:47px" %)20|(% style="width:279px" %)1 - Foolish, 5 - Sensible|(% style="width:179px" %)0.06823

107

|(% style="width:47px" %)21|(% style="width:279px" %)1- Anxious, 5- Relaxed|(% style="width:179px" %)0.16778

108

|(% style="width:47px" %)22|(% style="width:279px" %)1 - Agitated, 5 - Calm|(% style="width:179px" %)0.05548

109

|(% style="width:47px" %)23|(% style="width:279px" %)//**1 - Quiescent, 5 - Surprised**//|(% style="width:179px" %)//**0.04806**//

110

|(% style="width:47px" %)24|(% style="width:279px" %)1 - Demotivated, 5 - Motivated|(% style="width:179px" %)0.10392

111

|(% style="width:47px" %)25|(% style="width:279px" %)//**Mood of the patient after the activity.**//|(% style="width:179px" %)//**0.00055**//

112

|(% style="width:47px" %)26|(% style="width:279px" %)//**Patient's feedback about the story experience**//|(% style="width:179px" %)//**0.00037**//

113

|(% style="width:47px" %)27|(% style="width:279px" %)//**Patient's enjoyment**//|(% style="width:179px" %)//**0.00016**//

114

|(% style="width:47px" %)28|(% style="width:279px" %)Did the patient complete the activity?|(% style="width:179px" %)NA

115

|(% style="width:47px" %)29|(% style="width:279px" %)How many minutes did the patient take to complete the activity?|(% style="width:179px" %)NA

116

117

118

With a p-value chosen to be 0.05, the questions which received a p value less than this, have been highlighted in the table. We observe that 12/27 questions come out to be statistically significant, which include all 3 custom questions (Qs 25,26,27). We could not determine the results of the last two questions (i.e, whether the patient completed the activity or not, and the time taken to complete the activity) because of the limited scope of the experiment. The participants were neither dementia patients, nor were they actually eating: so it seemed irrelevant to base any results on these questions.

119

120

The following graphs show the difference in means for the statistically significant questions:

121

122

123

[[image:https://lh6.googleusercontent.com/nTIJLrcxdbvmHRNU7AqEsBcRRH_29l73c4o1dbeVDAwvC9sKoupoUEHvoCqf3_LIAD2Uk51kRURT6jj5sMxTDnexvBHbWbyy9e-wSLmIyvowqEcNmeacWsCdbSXjGg0vVwtcPRg8hF_cSrtetKObNmITTg=s2048||height="361" width="509"]]

124

125

Difference in means for Questions in Godspeed questionnaire

126

127

128

[[image:https://lh4.googleusercontent.com/I3voGI5LPNJH_jm41KqgOFHY1PixCJRBbZJosr0KlcuLrTlA7dC8dod_rU0Oex5kkjmADyc9e4rcNkhXsBV7IMPe-vVXj5gq2i7KOclqr_q5UdVijJe4f1dX9MWaejgSPdFkDlskEKrlG7djOYGMDFFUEg=s2048||height="312" width="518"]]

129

130

Difference in means for custom questions

= 4. Discussion =

From the above results, we do observe a difference between interactive and non interactive storytelling. Overall, the interactive storytelling robot was perceived to be more friendly and responsive, which could make a difference for patients with dementia. It was significantly more interactive than the non interactive storytelling robot, and promoted conversations between the patient and the family, which is beneficial for the patient. Though it was not significantly more intelligent than the robot in the control situation, the robot with interactive storytelling was perceived to be more amicable and human-like. As mentioned above, meal completion and time taken to complete the meal could not be judged due to limited scope of the experiment. Thus, these give answers to the research questions in an encouraging and hopeful direction.

135

136

137

However, in the statistical testing we do not account for the fact that there are multiple questions, and the p-value might need to be changed in order to factor this in. In other words, we are testing multiple features on the same data, which increases the probability of atleast some features being significant (even if they are not). Hence, ideally we must divide the threshold p-value by 27 (for 27 questions) for a more accurate result. Another limitation lies in the fact that we could not experiment with actual patients: our participants were students who took the place of these patients. This could have affected judgements of the final results. Finally, the interactive storytelling requires templates of stories from the patient's life for an biographical story experience: this could raise privacy concerns. This also needs investment of time and work from the side of the family, as the patient with dementia might not be able to recall the stories of their past.

138

139

All in all, factoring in the limitations, we do see promising results in this research towards the well-being of patients with dementia. More relevant experimentation would give more accurate results, and formative assessment at each stage could push the research in the direction which is most beneficial for the patients. In the future, more extensive research must be performed with patients, and using stories curated to their experiences. For most reliable results, the experiment should be organic, with the patient's family members involved, and in an environment where the robot will actually be deployed.

= 5. Conclusions =

In this report, we show performed research on how an intervention with robots can help patients with dementia. Stories about a patient's life and past can help promote memory retention and interaction about this with people around can help with mood management, and to connect with others. Using this, we built a robot for interactive biographical storytelling i.e, a robot which uses the patient's past experiences to spark conversations and interactions with family members. We conducted experiments using a non interactive storytelling robot as the control situation, and observe hopeful results towards the well-being of patients with dementia. These experiments have their limitations (as mentioned above) and can be improved in future work, but surely give an optimistic start to future research in the field.

Wiki source code of b. Test

Navigation

Need help?