Assessment & Research

Diagnostic accuracy of AI-based models for autism spectrum disorder: A systematic review and meta-analysis with a focus on Arab populations.

Aldakhil et al. (2025) · Research in developmental disabilities

★ The Verdict

AI screeners catch ASD well, yet you must add Arab-specific items or risk over-diagnosis.

✓ Read this if BCBAs who assess culturally diverse toddlers and want faster intakes.

✗ Skip if Clinicians who already use gold-standard ADOS for every case and have no wait-list pressure.

01Research in Context

What this study did

Aldakhil et al. (2025) pooled every paper that tested AI for spotting autism. They kept only studies that reported accuracy numbers. They looked at three kinds of AI: plain machine-learning, deep-learning, and hybrid models. They also checked if the tools worked the same for Arab children.

What they found

Hybrid AI models won. They reached the highest accuracy for flagging ASD. Plain machine-learning came second. Deep-learning alone ranked third. The catch: most tools were trained on Western data. When used on Arab kids, false-positive risk jumped.

How this fits with other research

Zhu et al. (2026) give you a quick fix for the Arab gap. They show an Arabic eye-tracking index separates ASD from language delay with good accuracy. Use it while you wait for Arab-trained AI.

Andrews et al. (2024) go further. Their eight-marker blood panel hits very high accuracy in Qatari toddlers. It backs the review’s point that biology plus culture beats generic screens.

Two older studies look worse only because they tested weaker tools. Maddox et al. (2015) found adult self-report screens miss too many cases. Lotfizadeh et al. (2020) saw billing-code algorithms misclassify kids. Fahad’s review says AI can beat both—if you tune it.

Why it matters

You can start using AI screener apps today, but run them side-by-side with an Arab-culture check. Add the Arabic eye-tracking test or ask parents about red flags the AI was not trained on. This two-step keeps your false positives low and your wait list short.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Pick one hybrid AI app, then add the Arabic eye-tracking task or parent questions on social norms before you score.

02At a glance

Intervention

not applicable

Design

systematic review

Sample size

26569

Population

autism spectrum disorder

Finding

positive

Magnitude

large

03Original abstract

BACKGROUND: Autism Spectrum Disorder (ASD) is a prevalent neurodevelopmental condition globally, including in Arab countries, where stigma, limited awareness, and scarce specialized services often delay diagnosis and care. Artificial intelligence (AI) offers scalable solutions for screening, early diagnosis, and intervention programmes. AIMS: To evaluate the diagnostic accuracy of AI-based models for ASD with a specific focus on Arab cohorts, and to appraise methodological quality and potential cultural influences on model performance. METHODS: We searched PubMed, Scopus, and Web of Science for studies published between January 2019 and September 2025. Eligible studies evaluated supervised AI systems, machine learning (ML), or deep learning (DL) that classify individuals as ASD versus non-ASD against a clinician-confirmed reference standard. Study quality was assessed using QUADAS-2. Diagnostic accuracy metrics (sensitivity, specificity, likelihood ratios, diagnostic odds ratio) were pooled using a bivariate random-effects model. RESULTS: Fifteen studies were included in the systematic review; ten studies were eligible for meta-analysis (59 model evaluations; 26,569 instances), comparing AI models against clinician-confirmed autism diagnoses. Pooled sensitivity was 91.8 % (95 % CI [89.0, 94.2]) and specificity 90.7 % (95 % CI [87.6, 93.5]), yielding a diagnostic odds ratio (DOR) of 109.0 (95 % CI [59.5, 227.9]), positive likelihood ratio (LR⁺) of 9.8, and negative likelihood ratio (LR⁻) of 0.09. Subgroup analysis revealed hybrid models (deep feature extractors with classical classifiers) achieved the highest accuracy (sensitivity 95.2 %, specificity 96.0 %), followed by conventional ML (sensitivity 91.6 %, specificity 90.3 %), and DL alone (sensitivity 87.3 %, specificity 86.0 %). In Arab-only cohorts, models showed higher sensitivity (94.2 %) but lower specificity (87.6 %), suggesting stronger rule-out potential but more false positives. CONCLUSION: To our knowledge, this is the first systematic meta-analysis of AI-based ASD diagnostics confirms high accuracy, with hybrid models excelling compared to both traditional ML and DL alone. In Arab cohorts, models showed higher sensitivity but lower specificity, highlighting the importance of cultural and linguistic tailoring of assessment tools, diagnostic protocols, and datasets, alongside regional challenges such as stigma and limited resources. These findings support AI as a valuable tool for early detection and screening.

Research in developmental disabilities, 2025 · doi:10.1016/j.ridd.2025.105166