4. Evaluation Methods

1

A within-subject designed experiment is when each participant is exposed to more than one experiment under testing. A between-subject design is when participants only do one experiment [1]. With within-subject design, a risk is the so-called 'demand effect', which entails that they might expect the researchers to want certain results, and will then act as such. Another thing that might happen with within-subject design is that participants might experience a learning effect, i.e. learning from the first experiment. [2]

2

3

Quite some established questionnaires exist regarding human-robot interaction. However, most are more about the usability of a system where the user has a specific goal. Examples of these questionnaires are SASSI [3], SUS [4], and APA [5]. Questionnaires also concerning the robot's perceived likeability and general interaction are GodSpeed [6] and a questionnaire proposed by Herink et al. [7], where the latter is more elaborate. [8] proposes the Self-Assessment Manikin (SAM), a non-verbal assessment based on pictures used to measure pleasure, arousal, and dominance as a reaction to some form of stimulation. Finally, [9] explains the AffectButton, an interface component that lets users enter the most appropriate expression by moving their mouse to the proper location.

=== References ===

[1] Greenwald, A. G. (1976). Within-subjects designs: To use or not to use?. //Psychological Bulletin//, //83//(2), 314.

9

[2] Seltman, H. J. (2012). Experimental design and analysis (pp. 340)

10

11

[3] Hone, K. S., & Graham, R. (2000). Towards a tool for the subjective assessment of speech system interfaces (SASSI). //Natural Language Engineering//, //6//(3-4), 287-303.

12

13

[4] Lewis, J. R. (2018). The system usability scale: past, present, and future. //International Journal of Human–Computer Interaction//, //34//(7), 577-590.

14

[5] Fitrianie, S., Bruijnes, M., Li, F., Abdulrahman, A., & Brinkman, W. P. (2022, September). The artificial-social-agent questionnaire: establishing the long and short questionnaire versions. In //Proceedings of the 22nd ACM International Conference on Intelligent Virtual Agents// (pp. 1-8).

15

16

[6] Bartneck, C. (2023). Godspeed Questionnaire Series: Translations and Usage.

17

18

[7] Heerink, M., Krose, B., Evers, V., & Wielinga, B. (2009, September). Measuring acceptance of an assistive social robot: a suggested toolkit. In RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication (pp. 528-533). IEEE.

19

20

[8] Bradley, M. M., & Lang, P. J. (1994). Measuring emotion: the self-assessment manikin and the semantic differential. //Journal of behavior therapy and experimental psychiatry//, //25//(1), 49-59.

21

22

[9] Broekens, J., & Brinkman, W. P. (2013). AffectButton: A method for reliable and valid affective self-report. //International Journal of Human-Computer Studies//, //71//(6), 641-667.

Wiki source code of 4. Evaluation Methods

Navigation

Need help?