Comments on 3. Evaluation

Last modified by Vladimir Rullens on 2025/11/08 23:23

  • Mark Neerincx
    Mark Neerincx, 2025/10/21 12:07

    1. Prototype Scope: “Is the Artifact or Prototype described with a clear scope, indicating which specific Requirements and Design Specifications it implements for testing?”

    Applies to: Prototype

    • Prototype:
      • Meets Criterion:
        • States a focal Claim about dancing improving patient happiness and outlines a robot-based dance session using a Miro robot connected via a Python library and LLM.
        • Lists planned components (songs, genres/BPMs) and preliminary success metrics (interaction success rate).
      • (Potential) Improvements
        • Scope does not list the specific Requirements/Design Specifications implemented; functions and constraints are not enumerated.
        • Prototype maturity and boundaries are unclear (what is functional now vs. to-be-implemented).
        • No explicit traceability from the evaluated functions to prior Specification artifacts (Use Cases, Requirements, Claims identifiers).

    2. Methodological Rigor: Is the chosen Evaluation Method clearly described and justified as being appropriate for testing the specific Claims outlined in the Specification stage?

    Applies to: Test

    • Test:
      • Meets Criterion:
        • Outlines two alternative experimental designs (A/B with/without dancing; pre-post mood questionnaire).
        • Defines roles and tasks for host, participant, and robot; includes an informed consent step.
      • (Potential) Improvements
        • The descriptions should be elaborated

    3. Clarity of Measures: Are the Measures used in the test clearly defined and directly linked to the operationalization of the Claims?

    Applies to: Test

    • Test:
      • Meets Criterion:
        • Identifies physiological measures (heart rate, calories burned) and session duration for physical engagement.
        • Includes a Likert questionnaire for mental well-being with items adapted from established instruments.
      • (Potential) Improvements
        • Operational definitions are incomplete (e.g., exact scales, scoring, pre/post timing, aggregation rules).
        • Improve justification of item selection, generation or adjustment (referring to foundation, existing instruments).
        • Explain the relations of the measures with claims (effects).

    4. Results and Claim Validation: Do the Evaluation Results provide clear empirical evidence that either supports or refutes the tested Claims?

    Applies to: Test

    • Test:
      • Meets Criterion:
      • (Potential) Improvements
        • No results reported yet

    5. Discussion of Results: Is there a thorough discussion of the Evaluation Results, including an analysis of the study's limitations and any unexpected findings?

    Applies to: Test

    • Test:
      • Meets Criterion:
      • (Potential) Improvements
        • No discussion provided yet (e.g., think about limitations such as non-target participants, novelty effects, device accuracy, and setting constraints).

    6. Iterative Feedback Loop: Does the evaluation conclude with a clear analysis of how the Evaluation Results could inform the next iteration of the Foundation or Specification?

    Applies to: Test

    • Test:
      • Meets Criterion:
      • (Potential) Improvements
        • No explicit plan et for feeding findings into the next iteration (e.g., adjusting dance prompting frequency, music personalization, or Claims refinement).

     

  • Bernd Dudzik
    Bernd Dudzik, 2025/11/03 23:40

    Feedback on Revised Draft

    1. Prototype Scope: "Is the Artifact or Prototype described with a clear scope, indicating which specific Requirements and Design Specifications it implements for testing?"

    Applies to: Prototype

    Prototype

    Meets the criterion:

    • Very barebones description of the prototype.

    (Potential) Improvements:

    • Scope does not list the specific Requirements/Design Specifications implemented; functions and constraints are not enumerated.
    • Prototype maturity and boundaries are unclear (what is functional now vs. to-be-implemented).
    • No explicit traceability from the evaluated functions to prior Specification artifacts (Use Cases, Requirements, Claims identifiers).

    2. Methodological Rigor: "Is the chosen Evaluation Method clearly described and justified as being appropriate for testing the specific Claims outlined in the Specification stage?"

    Applies to: Test

    Test

    Meets the criterion:

    • Describes participants, tasks, informed consent, and pre/post testing; collects physiological and self-report data.
    • Outlines the intended comparison of mood/engagement changes with dancing.

    (Potential) Improvements:

    • Justify proxy sample and define inclusion/exclusion criteria; add description of design and randomization/counterbalancing for conditions.

    3. Clarity of Measures: "Are the Measures used in the test clearly defined and directly linked to the operationalization of the Claims?"

    Applies to: Test

    Measures

    Meets the criterion:

    • Specifies heart rate, calories, time dancing; uses a PANAS-inspired mood questionnaire; reports unexpected reduction of physical exhaustion.
    • Links physical measures to CL001 and mood to CL002 at high level.

    (Potential) Improvements:

    • Detail timing windows, scaling, aggregation; justify adapted items or use validated subscales; map each item to a claim construct.
    • Add objective activity classification (steps, cadence) and set success thresholds for claim support.

    4. Results and Claim Validation: "Do the Evaluation Results provide clear empirical evidence that either supports or refutes the tested Claims?"

    Applies to: Test

    Test

    Meets the criterion:

    • Shows overall mood improvement and decreased negative affect across participants; interprets Zone-2 intensity from cardio data.
    • Identifies detrimental effects from ASR issues on some items (proudness, miserable).

    (Potential) Improvements:

    • Add inferential statistics (pre/post tests), effect sizes, and confidence intervals; report per-participant traces.
    • Separate usability/ASR issues from well-being outcomes; conduct sensitivity analyses.

    5. Discussion of Results: "Is there a thorough discussion of the Evaluation Results, including an analysis of the study's limitations and any unexpected findings?"

    Applies to: Test

    Test

    Meets the criterion:

    • Discusses how music/dancing improved mood and how recognition errors harmed experience; acknowledges light intensity.
    • Notes the student sample and prototype constraints.

    (Potential) Improvements:

    • Structure threats to validity (sample, instrumentation, novelty); propose mitigations (improved ASR, co-dancing).
    • Relate findings to HF concepts (e.g., motivation, social facilitation) to guide next design choices.

    6. Iterative Feedback Loop: "Does the evaluation conclude with a clear analysis of how the Evaluation Results could inform the next iteration of the Foundation or Specification?"

    Applies to: Test

    Test

    Meets the criterion:

    • Proposes improving ASR and adding robot dancing; suggests testing with PwD and larger N.
    • Points to refining study design and recruiting target users.

    (Potential) Improvements:

    • Translate improvements into updated Requirements/Claims (e.g., ASR accuracy prerequisite, co-dancing function) .
    • Add a plan to update Specification patterns (timing, encouragement) based on empirical findings.