An evaluation of the Gilliam Autism Rating Scale.
The first GARS misses too many autistic kids and its subscales do not hold up — use newer tools.
01Research in Context
What this study did
Lecavalier (2005) checked if the Gilliam Autism Rating Scale (GARS) works as promised.
The team ran factor analysis and looked at sensitivity and inter-rater reliability.
They used an independent sample of children already diagnosed with autism.
What they found
The test missed many kids who really had autism — poor sensitivity.
Two raters often scored the same child differently — low reliability.
The subscales in the manual did not show up in the numbers.
How this fits with other research
Pandolfi et al. (2010) repeated the factor work on the newer GARS-2 and got the same bad fit.
Yang et al. (2026) looks like a contradiction — their Chinese GARS-3 hit 86-89 % accuracy.
The difference is version and language: the old English GARS is weak; the third edition in Chinese is fixed.
Sutton et al. (2022) adds that boys and girls score differently on GARS-3 items, so even the better version needs sex-aware cut-offs.
Why it matters
If your clinic still uses the original GARS, pause. It under-identifies autism and gives shaky scores.
Switch to GARS-3 or pair it with ADOS-2. Always cross-check scores with developmental history and direct observation.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Pull any old GARS protocols and re-screen those kids with a validated tool this week.
02At a glance
03Original abstract
The Gilliam Autism Rating Scale was developed to identify individuals with autism in research and clinical settings. It has benefited from wide use and acceptance but has received little empirical attention. The purpose of this study was to evaluate the construct and diagnostic validity, interrater reliability, and effects of participant characteristics of the GARS in a large and heterogeneous sample of children and adolescents with autism spectrum disorders. 360 parent and teacher ratings were submitted to factor analysis. A three-factor solution explaining 38% of the variance was obtained. Almost half of all items loaded on a Repetitive and Stereotyped Behavior factor. The Developmental Disturbance subscale did not contribute to the Autism Quotient (AQ) and was poorly related to other subscales. Internal consistency for the three behavioral subscales was good but low for the Developmental Disturbance subscale. The average AQ was significantly lower than what was reported in the test manual, suggesting low sensitivity with the current cutoff criteria. Interrater reliability was also much lower than originally reported by the instrument's developer. No significant age or gender effects were found. Level of impairment, as measured by adaptive behavior, was negatively related to total and subscale scores. The implications of these findings were discussed, as was the use of diagnostic instruments in the field in general.
Journal of autism and developmental disorders, 2005 · doi:10.1007/s10803-005-0025-6