Assessment & Research

Consistency and reliability of automated language measures across expressive language samples in autism.

MacFarlane et al. (2023) · Autism research : official journal of the International Society for Autism Research

★ The Verdict

Automated language scores stay steady across retests in the same task, so you can safely track real language gains in highly verbal clients with ASD.

✓ Read this if BCBAs running language goals for fluent speakers in clinic or schools.

✗ Skip if Teams working with non-speaking or preschool populations.

01Research in Context

What this study did

MacFarlane et al. (2023) asked if computer counts of talking stay the same when you test the same child twice. They used two quick story tasks and two play tasks with highly verbal kids with autism.

Software gave each child an MLU score, a word-diversity score, and a few other numbers. The team ran the numbers twice to see if they matched.

What they found

The scores stayed almost identical when the same task was given again. Scores changed when the task changed, showing the tool can spot real language shifts.

In plain words, the robot counts are reliable enough to track growth in clients who already speak in sentences.

How this fits with other research

Lopata et al. (2020) found the same kind of steadiness over nine months with a teacher social-skills checklist. Both studies calm the fear that autism traits alone make scores bounce.

Smit et al. (2019) also showed solid short-term retest numbers on the DANVA-2 emotion test. Together, the three papers build a pattern: brief, structured tasks can give stable data in HFASD.

M-Patterson et al. (2012) did the same math on a Cantonese preschool language test. Their test-retest was strong too, so reliability is not just an English-school-age quirk.

Why it matters

You can now grab a five-minute language sample, run it through free software, and trust the score next month. If the number jumps, you can be confident the child actually grew, not that the tool wiggled. Use it to set MLU or word-diversity goals and show parents clear, cheap progress lines without extra testing days.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

Record a one-minute play sample, run the free SALT or CLAN counter, save the MLU, and compare it again next month.

02At a glance

Intervention

not applicable

Design

other

Sample size

Population

autism spectrum disorder

Finding

positive

Magnitude

medium

03Original abstract

Autism Spectrum Disorder (ASD) is a neurodevelopmental disorder with substantial clinical heterogeneity, especially in language and communication ability. There is a need for validated language outcome measures that show sensitivity to true change for this population. We used Natural Language Processing to analyze expressive language transcripts of 64 highly-verbal children and young adults (age: 6-23 years, mean 12.8 years; 78.1% male) with ASD to examine the validity across language sampling context and test-retest reliability of six previously validated Automated Language Measures (ALMs), including Mean Length of Utterance in Morphemes, Number of Distinct Word Roots, C-units per minute, unintelligible proportion, um rate, and repetition proportion. Three expressive language samples were collected at baseline and again 4 weeks later. These samples comprised interview tasks from the Autism Diagnostic Observation Schedule (ADOS-2) Modules 3 and 4, a conversation task, and a narration task. The influence of language sampling context on each ALM was estimated using either generalized linear mixed-effects models or generalized linear models, adjusted for age, sex, and IQ. The 4 weeks test-retest reliability was evaluated using Lin's Concordance Correlation Coefficient (CCC). The three different sampling contexts were associated with significantly (P < 0.001) different distributions for each ALM. With one exception (repetition proportion), ALMs also showed good test-retest reliability (median CCC: 0.73-0.88) when measured within the same context. Taken in conjunction with our previous work establishing their construct validity, this study demonstrates further critical psychometric properties of ALMs and their promising potential as language outcome measures for ASD research.

Autism research : official journal of the International Society for Autism Research, 2023 · doi:10.1044/1058-0360(2010/09-0011)