Assessment & Research

Clinical correlates of errors in machine-learning diagnostic model of autism spectrum disorder: Impact of sample cohorts.

Wang et al. (2025) · Autism : the international journal of research and practice

★ The Verdict

Autism-detecting AI trained on one kind of kids mislabels a third of real cases when faced with a different kind—check the training cohort before you click 'diagnose'.

✓ Read this if BCBAs who use or recommend AI screening tools in clinics or schools.

✗ Skip if RBTs whose role is direct therapy and who do not handle intake assessments.

01Research in Context

What this study did

The team built machine-learning models to spot autism. They trained one group on kids who came to specialty clinics. They trained another group on kids found in the community.

Then they tested both models on new kids from each setting. They counted how often the computer was wrong and looked at what the mislabeled kids had in common.

What they found

Inside the same kind of group, the models hit a large share accuracy. When the clinic-trained model met community kids, it wrongly called a large share of true autism cases 'not autism'.

Most errors happened when the child also had ADHD symptoms or an uneven profile—high IQ but low daily living scores.

How this fits with other research

Schaaf et al. (2015) warned that switching to DSM-5 kicks out 25–a large share of prior PDD-NOS cases. Yen-Chin et al. now show those same ‘borderline’ profiles trip up ML models, so label noise from criteria shifts becomes algorithm error.

Craddock et al. (1994) found old tools like ADI and ABC each missed some high-functioning kids. Thirty years later the new study echoes the message: cohort makeup decides who gets missed, whether the tool is an interview or an algorithm.

Schiltz et al. (2017) showed anxiety and ADHD traits stay stable in autism. The ML paper links those very ADHD flags to misclassification, proving comorbid symptoms still skew measurement no matter the method.

Why it matters

Before you trust an autism-screening app, ask where its data came from. If your client looks different—higher IQ, ADHD mix, uneven adaptive scores—the model may quietly fail. Cross-check with clinical judgment and use DSM-5-TR level guidelines (A et al. 2026) to keep your caseload consistent.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Open the ‘about’ page of any autism-screening software you use; if the training sample is only ‘research volunteers,’ plan a second ADOS or interview before signing the report.

02At a glance

Intervention

not applicable

Design

other

Sample size

5717

Population

autism spectrum disorder, adhd, mixed clinical

Finding

mixed

03Original abstract

Machine-learning models can assist in diagnosing autism but have biases. We examines the correlates of misclassifications and how training data affect model generalizability. The Social Responsive Scale data were collected from two cohorts in Taiwan: the clinical cohort comprised 1203 autistic participants and 1182 non-autistic comparisons, and the community cohort consisted of 35 autistic participants and 3297 non-autistic comparisons. Classification models were trained, and the misclassification cases were investigated regarding their associations with sex, age, intelligence quotient (IQ), symptoms from the child behavioral checklist (CBCL), and co-occurring psychiatric diagnosis. Models showed high within-cohort accuracy (clinical: sensitivity 0.91-0.95, specificity 0.93-0.94; community: sensitivity 0.91-1.00, specificity 0.89-0.96), but generalizability across cohorts was limited. When the community-trained model was applied to the clinical cohort, performance declined (sensitivity 0.65, specificity 0.95). In both models, non-autistic individuals misclassified as autistic showed elevated behavioral symptoms and attention-deficit hyperactivity disorder (ADHD) prevalence. Conversely, autistic individuals who were misclassified tended to show fewer behavioral symptoms and, in the community model, higher IQ and aggressive behavior but less social and attention problems. Error patterns of machine-learning model and the impact of training data warrant careful consideration in future research.Lay AbstractMachine-learning is a type of computer model that can help identify patterns in data and make predictions. In autism research, these models may support earlier or more accurate identification of autistic individuals. But to be useful, they need to make reliable predictions across different groups of people. In this study, we explored when and why these models might make mistakes-and how the kind of data used to train them affects their accuracy. Training models means using information to teach the computer model how to tell the difference between autistic and non-autistic individuals. We used the information from the Social Responsiveness Scale (SRS), which is a questionnaire that measures autistic features. We tested these models on two different groups: one from clinical settings and one from the general community. The models worked well when tested within the same type of group they were trained. However, a model trained on the community group did not perform as accurately when tested on the clinical group. Sometimes, the model got it wrong. For example, in the clinical group, some autistic individuals were mistakenly identified as non-autistic. These individuals tended to have fewer emotional or behavioral difficulties. In the community group, autistic individuals who were mistakenly identified as non-autistic had higher IQs and showed more aggressive behaviors but fewer attention or social problems. On the contrary, some non-autistic people were incorrectly identified as autistic. These people had more emotional or behavioral challenges and were more likely to have attention-deficit hyperactivity disorder (ADHD). These findings highlight that machine-learning models are sensitive to the type of data they are trained on. To build fair and accurate models for predicting autism, it is essential to consider where the training data come from and whether it represents the full diversity of individuals. Understanding these patterns of error can help improve future tools used in both research and clinical care.

Autism : the international journal of research and practice, 2025 · doi:10.1177/13623613251360271