Measuring Theory of Mind: A Multiple-Choice Response Format Version of the Short Story Task.
Test-retest reliability of common ToM measures is uneven—favor cognitive and spontaneous tasks over affective ones in your assessments.
01Research in Context
What this study did
The team gave six common Theory-of-Mind tasks to autistic kids aged 8-11. They used a new multiple-choice version of the Short Story Task and tested each child twice to see which tasks stayed consistent.
All tasks were paper-and-picture stories or short videos. Kids picked answers instead of talking. The researchers scored the same test two weeks later to check reliability.
What they found
Reliability ranged from poor to good. Cognitive tasks and spontaneous tasks held up best. Affective tasks, like reading emotions, dipped the most.
In plain words: if you test again next month, cognitive and free-response stories give similar scores. Emotion stories do not.
How this fits with other research
Tsui et al. (2024) pooled 27 ToM tools and also found mixed reliability. Their meta-analysis says half of common tasks fail in non-autistic samples. Abney et al. (2026) narrows the field: among autistic kids, pick cognitive or spontaneous formats.
Ferreri et al. (2011) tried multiple-choice Frith-Happé animations in adults and kept sensitivity. The new study extends that idea downward to late-elementary kids and adds hard numbers on repeat testing.
Jones et al. (2010) warned that the Eyes test misses some adults with HFA. Abney et al. (2026) echo the caution: affective tasks show the shakiest reliability in children too.
Why it matters
When you pick a ToM probe for reassessment or research, grab cognitive or spontaneous story tasks first. Skip emotion-only stories unless you need them for a specific goal. The new multiple-choice Short Story Task saves scoring time and keeps reliability in the green zone for autistic 8-11-year-olds.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Swap your emotion-story task for a cognitive false-belief story with multiple-choice answers and re-test in two weeks.
02At a glance
03Original abstract
The present study evaluates the test-retest reliability of six theory of mind (ToM) tasks that measured cognitive, affective, and spontaneous ToM in 7 to 11 year-old children with autism spectrum disorder. Our results revealed considerable variation in test-retest reliability depending on the type of ToM task, which ranged from poor to good with the majority of the measures exhibiting moderate reliability. Results inform which common measures of cognitive ToM should be selected versus avoided in future intervention work, suggest our measure of spontaneous ToM should be used more widely in intervention and ToM research more broadly, and indicate more work is needed to develop reliable measures of affective ToM. Implications for research and clinical practice are discussed.
Journal of autism and developmental disorders, 2026 · doi:10.1016/0010-0277(83)90004-5