Changes for page b. Test

Last modified by Jean-Paul Smit on 2024/04/09 15:23

From version 25.1
edited by Jean-Paul Smit
on 2024/04/07 00:14
Change comment: There is no comment for this version
To version 36.1
edited by Jean-Paul Smit
on 2024/04/09 15:23
Change comment: There is no comment for this version

Summary

Details

Page properties
Content
... ... @@ -74,13 +74,8 @@
74 74  
75 75  
76 76  
77 -Competence. No significant effect on using the NAO (p > 0.05)
77 +The competence of participants, measured with a paired t-test between pre-study and post-study measures found no significant effect on using the NAO (p > 0.05) over the set up with only the conversational agent. Covariates had no significant influence on recall score (p > 0.05). Both student t-tests on the relationship between familiarity and attitude towards robots and conversational agents gave no significant difference between the two group's dependent variables. For other analyses, most resulting p-values had an alpha above 0.05 and thus were not significant enough.
78 78  
79 -Covariates. had no significant influence on recall score (p > 0.05)
80 -
81 -
82 -For other analyses, most resulting p values had an alpha above 0.05 and thus were not significant enough.
83 -
84 84  [[Table 1: showing the Affect variables>>image:Table of means.png||height="214" width="737"]]
85 85  
86 86  === 4. Discussion ===
... ... @@ -90,27 +90,35 @@
90 90  
91 91  **Sample Group and Size impact validity of the study**
92 92  
93 -The ecological validity of the study is impacted by the fact that there were no PwD in our sample. The scope of the experiment was limited to TU Delft University students. That means that future research may benefit from a closer approach to an experiment which is closer to the experience of PwD. Moreover, the controlled experiment was restricted to a cohort of 20 participants, underscoring the potential for enhancing result validity through the utilization of a larger sample size
88 +The ecological validity of the study is impacted by the fact that there were no PwD in our sample. The scope of the experiment was limited to TU Delft University students. That means that future research may benefit from a closer approach to an experiment which is closer to the experience of PwD. Moreover, the controlled experiment was restricted to a cohort of 20 participants, underscoring the potential for enhancing result validity through the utilization of a larger sample size.
94 94  
95 95  
96 96  **//Participants don't know about points system so they didn't answer with "getting the most points" in mind//**
97 97  
98 -//Ambiguities in the evaluation briefing has led to several aspects in the results that might misrepresent the participants' gathered knowledge. Points were awarded to the participant for certain key descriptors, each family member's role, occupation, likes, dislikes and so on. Of course, this wasn't known to the participants, so they might have omitted descriptors they deemed less important or trivial and therefore scored worse, even though they'd heard and remembered them.//
93 +Ambiguities in the evaluation briefing has led to several aspects in the results that might misrepresent the participants' gathered knowledge. Points were awarded to the participant for certain key descriptors, each family member's role, occupation, likes, dislikes and so on. Of course, this wasn't known to the participants, so they might have omitted descriptors they deemed less important or trivial and therefore scored worse, even though they'd heard and remembered them.
99 99  
100 100  
101 101  //**Participants don't know which people or facts are important, so they can get stuck in spots that are unrewarded**//
102 102  
103 -//The choice to create a sprawling, multi-faceted database also had the side-effect of participants finding out a lot of information that was not rewarded by the grading system in any way. For example, the user can ask the robot to elaborate on certain memories or character traits of family members. There are also people in the database that act as ancillary characters and to create a sense of realism to the database, but participants can likely get stuck on learning about them as there is no implied hierarchy of importance to the participant. //
98 +The choice to create a sprawling, multi-faceted database also had the side-effect of participants finding out a lot of information that was not rewarded by the grading system in any way. For example, the user can ask the robot to elaborate on certain memories or character traits of family members. There are also people in the database that act as ancillary characters and to create a sense of realism to the database, but participants can likely get stuck on learning about them as there is no implied hierarchy of importance to the participant.
104 104  
105 105  
106 106  **//GPT Assistant can elaborate on any question, and therefore the user does not know what belongs to the database and don't know where to focus//**
107 107  
108 -//Another limitation of the evaluation is with the GPT Assistant's ability to consistently elaborate on any question posed by the participant.//
103 +Another limitation of the evaluation is with the GPT Assistant's ability to consistently elaborate on any question posed by the participant.
109 109  
110 110  
106 +(% class="wikigeneratedid" id="HEthicalConsiderations" %)
107 +//**Ethical Considerations**//
111 111  
109 +We commit to high ethical standards, respecting the sensitive nature of simulating dementia conditions, and ensuring the well-being and dignity of all participants throughout the study. To ensure this we present a form containing the ethical considerations to each participant.
110 +
111 +
112 112  === 5. Conclusions ===
113 113  
114 -===== //**Ethical Considerations**// =====
114 +With regards to our final insights, there were some areas of improvement. For example, relating to the validity of the testing procedure, we had a relatively small sample size and none of the participants had Dementia. This means that the target group for this NAO system was not tested. It was also only a specific group of people, namely students with the age range of 20-25. There was also a level of ambiguity in the point system of the evaluation and with regards to the LLM model focus, we found deviations from the GPT regarding the main objective. However, although these were a few of the limitations throughout the process, we were still able to evaluate the NAO system and could recognise that the NAO may positively affects participant’ s affective state.
115 115  
116 -We commit to high ethical standards, respecting the sensitive nature of simulating dementia conditions, and ensuring the well-being and dignity of all participants throughout the study. To ensure this we present a form containing the ethical considerations to each participant.
116 +
117 +===== **//Avenues for Future Work//** =====
118 +
119 +As mentioned, there were some limitations, which could be taken into consideration if wishing to continue work on the Personal Encyclopaedia. For future work, testing can be done on the correct target group, the GPT could be optimised to avoid deviations, and more focus could be placed on our last stage (Extension Phase) to see if additional functionalities are needed!
Test group data.jpg
Author
... ... @@ -1,0 +1,1 @@
1 +xwiki:XWiki.jeanpaulsmit
Size
... ... @@ -1,0 +1,1 @@
1 +157.5 KB
Content