4. Evaluation Methods
A within-subject designed experiment is when each participant is exposed to more than one experiment under testing. A between-subject design is when participants only do one experiment [1]. With the within-subject design, a risk is the so-called 'demand effect', which entails that they might expect the researchers to want certain results, and will then act as such. Another thing that might happen with within-subject design is that participants might experience a learning effect, i.e. learning from the first experiment. [2]
Quite some established questionnaires exist regarding human-robot interaction. However, most are more about the usability of a system where the user has a specific goal. Examples of these questionnaires are SASSI [3], SUS [4], and APA [5]. Questionnaires also concerning the robot's perceived likeability and general interaction are GodSpeed [6] and a questionnaire proposed by Herink et al. [7], where the latter is more elaborate. [8] proposes the Self-Assessment Manikin (SAM), a non-verbal assessment based on pictures used to measure pleasure, arousal, and dominance as a reaction to some form of stimulation. Finally, [9] explains the AffectButton, an interface component that lets users enter the most appropriate expression by moving their mouse to the proper location.
References
[1] Greenwald, A. G. (1976). Within-subjects designs: To use or not to use?. Psychological Bulletin, 83(2), 314.
[2] Seltman, H. J. (2012). Experimental design and analysis (pp. 340)
[3] Hone, K. S., & Graham, R. (2000). Towards a tool for the subjective assessment of speech system interfaces (SASSI). Natural Language Engineering, 6(3-4), 287-303.
[4] Lewis, J. R. (2018). The system usability scale: past, present, and future. International Journal of Human–Computer Interaction, 34(7), 577-590.
[5] Fitrianie, S., Bruijnes, M., Li, F., Abdulrahman, A., & Brinkman, W. P. (2022, September). The artificial-social-agent questionnaire: establishing the long and short questionnaire versions. In Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents (pp. 1-8).
[6] Bartneck, C. (2023). Godspeed Questionnaire Series: Translations and Usage.
[7] Heerink, M., Krose, B., Evers, V., & Wielinga, B. (2009, September). Measuring acceptance of an assistive social robot: a suggested toolkit. In RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication (pp. 528-533). IEEE.
[8] Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: the self-assessment manikin and the semantic differential. Journal of behavior therapy and experimental psychiatry, 25(1), 49-59.
[9] Broekens, J., & Brinkman, W. P. (2013). AffectButton: A method for reliable and valid affective self-report. International Journal of Human-Computer Studies, 71(6), 641-667.