Assessment & Research

An evaluation of the Gilliam Autism Rating Scale.

Lecavalier (2005) · Journal of autism and developmental disorders

★ The Verdict

The first GARS misses too many autistic kids and its subscales do not hold up — use newer tools.

✓ Read this if BCBAs who screen or re-evaluate school-age clients in clinic or school settings.

✗ Skip if Practitioners already using GARS-3 plus ADOS-2 or ADI-R.

01Research in Context

What this study did

Lecavalier (2005) checked if the Gilliam Autism Rating Scale (GARS) works as promised.

The team ran factor analysis and looked at sensitivity and inter-rater reliability.

They used an independent sample of children already diagnosed with autism.

What they found

The test missed many kids who really had autism — poor sensitivity.

Two raters often scored the same child differently — low reliability.

The subscales in the manual did not show up in the numbers.

How this fits with other research

Pandolfi et al. (2010) repeated the factor work on the newer GARS-2 and got the same bad fit.

Yang et al. (2026) looks like a contradiction — their Chinese GARS-3 hit 86-89 % accuracy.

The difference is version and language: the old English GARS is weak; the third edition in Chinese is fixed.

Sutton et al. (2022) adds that boys and girls score differently on GARS-3 items, so even the better version needs sex-aware cut-offs.

Why it matters

If your clinic still uses the original GARS, pause. It under-identifies autism and gives shaky scores.

Switch to GARS-3 or pair it with ADOS-2. Always cross-check scores with developmental history and direct observation.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Pull any old GARS protocols and re-screen those kids with a validated tool this week.

02At a glance

Intervention

not applicable

Design

other

Sample size

360

Population

autism spectrum disorder

Finding

negative

03Original abstract

The Gilliam Autism Rating Scale was developed to identify individuals with autism in research and clinical settings. It has benefited from wide use and acceptance but has received little empirical attention. The purpose of this study was to evaluate the construct and diagnostic validity, interrater reliability, and effects of participant characteristics of the GARS in a large and heterogeneous sample of children and adolescents with autism spectrum disorders. 360 parent and teacher ratings were submitted to factor analysis. A three-factor solution explaining 38% of the variance was obtained. Almost half of all items loaded on a Repetitive and Stereotyped Behavior factor. The Developmental Disturbance subscale did not contribute to the Autism Quotient (AQ) and was poorly related to other subscales. Internal consistency for the three behavioral subscales was good but low for the Developmental Disturbance subscale. The average AQ was significantly lower than what was reported in the test manual, suggesting low sensitivity with the current cutoff criteria. Interrater reliability was also much lower than originally reported by the instrument's developer. No significant age or gender effects were found. Level of impairment, as measured by adaptive behavior, was negatively related to total and subscale scores. The implications of these findings were discussed, as was the use of diagnostic instruments in the field in general.

Journal of autism and developmental disorders, 2005 · doi:10.1007/s10803-005-0025-6