Assessment & Research

Reliability of the ADI-R for the single case-part II: clinical versus statistical significance.

Cicchetti et al. (2014) · Journal of autism and developmental disorders

★ The Verdict

For the ADI-R, statistical significance backs up clinical judgment, but clinical judgment can overrule borderline stats.

✓ Read this if BCBAs who use the ADI-R or train others to use it.

✗ Skip if Practitioners who only do brief rating-scale screenings.

01Research in Context

What this study did

The authors tested how well the ADI-R interview items agree when two clinicians rate the same child. They used a single case of a toddler with autism and ran a special Z-test on each item.

The goal was to see which items meet both clinical rules of thumb and statistical rules of thumb.

What they found

Every item that looked good to clinicians also passed the math test. But some items that passed the math test still looked weak to clinicians.

In short, statistical significance does not guarantee clinical usefulness.

How this fits with other research

Older papers like Rider (1977) and Yelton (1979) already warned that percent agreement and correlation numbers can hide weak spots. Cicchetti et al. (2014) now give a concrete example with the ADI-R.

Oliver et al. (2002) and Gustafsson et al. (2005) saw the same pattern in other tools: item-level agreement was only moderate even when total scores looked fine. The new study echoes their advice to watch single items, not just totals.

Cacciani et al. (2013) shortened questionnaires and still found IQ swayed scores. Cicchetti et al. (2014) add that even perfect stats can miss real-world fit, so check both numbers and clinical sense.

Why it matters

When you give the ADI-R, an item that is statistically reliable is always clinically acceptable, but not the other way around. If the numbers say “significant” yet the item feels shaky, trust your clinical eye and gather more data. This keeps your autism diagnosis solid and saves families from false positives.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

Pull your last ADI-R protocol, flag any item that passed the math but felt weak, and re-interview on those points.

02At a glance

Intervention

not applicable

Design

case study

Sample size

Population

autism spectrum disorder

Finding

not reported

03Original abstract

In an earlier investigation, the authors assessed the reliability of the ADI-R when multiple clinicians evaluated a single case, here a female 3 year old toddler suspected of having an autism spectrum disorder (Cicchetti et al. in J Autism Dev Disord 38:764-770, 2008). Applying the clinical criteria of Cicchetti and Sparrow (Am J Men Def 86:127-137, 1981); and those of Cicchetti et al. (Child Neuropsychol 126-137, 1995): 74 % of the ADI-R items showed 100 % agreement; 6 % showed excellent agreement; 7 % showed good agreement; 3 % manifested average agreement; and the remaining 10 % evidenced poor agreement. In this follow-up investigation, the authors described and applied a novel method for determining levels of statistical significance of the reliability coefficients obtained in the earlier investigation. It is based upon a modification of the Z test for comparing a given level of inter-examiner reliability with a lower limit value of 70 % (Dixon and Massey in Introduction to statistical analysis. McGraw-Hill, New York, 1957). Results indicated that every item producing a clinically acceptable level of inter-examiner reliability was also statistically significant. However, the reverse was not true, since a number of the items with statistically significant reliability levels did not reach levels of agreement that were clinically meaningful. This indicated that clinical significance was an accurate marker of statistical significance. The generalization of these findings to other areas of diagnostic interest and importance is also examined.

Journal of autism and developmental disorders, 2014 · doi:10.1007/s10803-014-2177-8