Assessment & Research

Reliability of Theory of Mind Tasks in Schizophrenia, ASD, and Nonclinical Populations: A Systematic Review and Reliability Generalization Meta-analysis

Tsui et al. (2024) · Neuropsychology Review

★ The Verdict

Half of common ToM tasks are too unstable for clinical use—verify the reliability coefficient before you baseline social-cognition goals.

✓ Read this if BCBAs who assess or write social-skills goals for autistic clients in clinic or school settings.

✗ Skip if Practitioners who only run skill-acquisition programs with no social-cognition component.

01Research in Context

What this study did

Tsui and colleagues pulled every paper that used a theory-of-mind task in people with autism, schizophrenia, or no diagnosis.

They ran a reliability-generalization meta-analysis. That means they asked: do these tasks give the same score every time?

Only papers that reported test-retest or internal-consistency numbers were kept. The final pool covered 35 years and thousands of participants.

What they found

Half of the popular ToM tasks fail the basic reliability rule: their scores jump around too much.

The same task can look solid in neurotypical adults but shaky in autistic clients.

Only a handful of measures, like the Reading the Mind in the Eyes Test, met the 0.70 bar across groups.

How this fits with other research

Simó-Pinatella et al. (2013) already showed that poor ToM links to missed future plans in autistic kids. If the tool you use to measure ToM is noisy, you may misread that link.

Ciaramelli et al. (2018) found that autistic teens give sparse personal stories and tied the gap to ToM. Their stories could look worse simply because the ToM task inflated error variance.

Embregts (2000) warned that the Child Behavior Checklist loses reliability in youth with ID. Tsui et al. (2024) now show the same warning applies to social-cognitive tasks in ASD.

Kang et al. (2013) proved that some preference assessments give steadier reinforcer picks. Tsui’s team extend the same psychometric vigilance to the social domain.

Why it matters

Before you write a goal that says “client will infer others’ thoughts,” check the test manual for test-retest numbers. If the coefficient is below 0.70, pick a different task or collect your own baseline twice. Reliable data keeps you from chasing noise and wasting therapy hours.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Flip to the appendix of your ToM test manual and circle the test-retest value; if it is under 0.70, switch to a tool Tsui flagged as solid or run two baseline probes before you start.

02At a glance

Intervention

not applicable

Design

systematic review

Sample size

19060

Population

autism spectrum disorder, mixed clinical, neurotypical

Finding

not reported

03Original abstract

Though theory of mind (ToM) is an important area of study for different disciplines, however, the psychometric evaluations of ToM tasks have yielded inconsistent results across studies and populations, raising the concerns about the accuracy, consistency, and generalizability of these tasks. This systematic review and meta-analysis examined the psychometric reliability of 27 distinct ToM tasks across 90 studies involving 2771 schizophrenia (SZ), 690 autism spectrum disorder (ASD), and 15,599 nonclinical populations (NC). Findings revealed that while all ToM tasks exhibited satisfactory internal consistency in ASD and SZ, about half of them were not satisfactory in NC, including the commonly used Reading the Mind in the Eye Test and Hinting Task. Other than that, Reading the Mind in the Eye Test showed acceptable reliability across populations, whereas Hinting Task had poor test–retest reliability. Notably, only Faux Pas Test and Movie for the Assessment of Social Cognition had satisfactory reliability across populations albeit limited numbers of studies. However, only ten studies examined the psychometric properties of ToM tasks in ASD adults, warranting additional evaluations. The study offered practical implications for selecting ToM tasks in research and clinical settings, and underscored the importance of having a robust psychometric reliability in ToM tasks across populations. The online version contains supplementary material available at 10.1007/s11065-024-09652-4.

Neuropsychology Review, 2024 · doi:10.1007/s11065-024-09652-4