a. Prototype

Version 24.1 by William OGrady on 2024/04/04 15:05

Materials

The prototype for this research consists of several components in the following Wizard of Oz style setup:

  • The NAO robot (human-like movements)
  • A chatbot with pre-trained custom OpenAI conversational model, running on a smartphone
  • A script with keyboard shortcuts to movements for the NAO, running on a laptop
  • A supervisor

Communication

The conversational model will handle the information exchange and verbal cues of the test, while the supervisor will act as the wizard and provide movements through the running robotsindeklas script. The prototype will be used according to the following steps:

  • NAO will be started by the supervisor, as this specific part of the NAO usability is not a focus of the research.
  • The participant will, as instructed, initiate conversation with the NAO. The NAO listens 'implicitly' through the chatbot.
  • The chatbot responds automatically, without supervisor intervention.
  • The supervisor will be able to listen to the entire conversation, and have the NAO move appropriately through pressing keyboard shortcuts.

Non-verbal communication

Besides communication with the conversational agent, NAO can make non-verbal behaviour such as human-like movements. Non-verbal communication have been proven to serve a variety of cognitive functions, reducing cognitive load to benefit working [1]. The design choices for which movements to implement are based on research by Ekman [2,3]. Ekman is an authority in non-verbal behaviour, a great example for this part of the implementation, and he divides these behaviours into five different groups, namely:

  1. Emblems: these are culture-specific, such as the nodding in european cultures which means yes but in India it means no. This can be done by the NAO but it has to be reflected upon by us. The NAO may use nodding as a back-channel that may signal as a feedback cue[4].
  2. Illustrators: gestures which depict something clearly, like making a heart sign with your hands. Not relevant for the NAO as we use it.
  3. Manipulators: manipulate body parts or objects, like chewing on a pencil. Also not for the NAO, even though I think we could pretend NAO is scraping it's head to signal it is reasoning on an answer, in case the database is slow.
  4. Regulators: probably the most relevant to what we want to achieve with our gestures in the NAO, the regulators are used to signal conversation initiatiation, termination and turn-taking. Initiation is usually done by means of forward leans, walking toward someone, and making eye contact. Conversation termination behaviours include decreasing gaze, facing away, gathering possessions and looking at one’s watch or a clock. Maintaining a conversation can be done by maintaining gaze.
  5. Emotional Expressions: ekman argues that these are not tied to culture but are present in all of human behaviour over the world.

Synthesizing this to non-verbal cues expressed by the NAO, the implemented movements are:

  • Emblem: Nodding the head as a back-channel, which signals a response has been heard by the NAO.
  • Illustrator: Moving arms while talking
  • Regulator: Following eye gaze while initiating a conversation
  • Regulator: Sitting down when being in conversation.
  • Regulator: Likewise, looking away when attempting to leave the conversation.

The protocol is explained in section 2.4 in Test.

References

[1] Clough, S., & Duff, M. C. (2020). The role of gesture in communication and cognition: Implications for understanding and treating neurogenic communication disorders. Frontiers in Human Neuroscience14. doi:10.3389/fnhum.2020.00323

[2] Ekman, P. (1976). Movements with precise meanings. The Journal of Communication26(3), 14–26. doi:10.1111/j.1460-2466.1976.tb01898.x

[3] Ekman, P. (2004). Emotional and Conversational Nonverbal Signals. In: Larrazabal, J.M., Miranda, L.A.P. (eds) Language, Knowledge, and Representation. Philosophical Studies Series, vol 99. Springer, Dordrecht. https://doi.org/10.1007/978-1-4020-2783-3_3

[4] Granström, B., House, D., & Swerts, M. (2002). Multimodal feedback cues in human-machine interactions. In Speech prosody 2002, international conference (pp. 347–350). Laboratoire Parole et Langage.