Reliability of the ADI-R for the single case-part II: clinical versus statistical significance.
For the ADI-R, statistical significance backs up clinical judgment, but clinical judgment can overrule borderline stats.
01Research in Context
What this study did
The authors tested how well the ADI-R interview items agree when two clinicians rate the same child. They used a single case of a toddler with autism and ran a special Z-test on each item.
The goal was to see which items meet both clinical rules of thumb and statistical rules of thumb.
What they found
Every item that looked good to clinicians also passed the math test. But some items that passed the math test still looked weak to clinicians.
In short, statistical significance does not guarantee clinical usefulness.
How this fits with other research
Older papers like Rider (1977) and Yelton (1979) already warned that percent agreement and correlation numbers can hide weak spots. Cicchetti et al. (2014) now give a concrete example with the ADI-R.
Oliver et al. (2002) and Gustafsson et al. (2005) saw the same pattern in other tools: item-level agreement was only moderate even when total scores looked fine. The new study echoes their advice to watch single items, not just totals.
Cacciani et al. (2013) shortened questionnaires and still found IQ swayed scores. Cicchetti et al. (2014) add that even perfect stats can miss real-world fit, so check both numbers and clinical sense.
Why it matters
When you give the ADI-R, an item that is statistically reliable is always clinically acceptable, but not the other way around. If the numbers say “significant” yet the item feels shaky, trust your clinical eye and gather more data. This keeps your autism diagnosis solid and saves families from false positives.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Pull your last ADI-R protocol, flag any item that passed the math but felt weak, and re-interview on those points.
02At a glance
03Original abstract
In an earlier investigation, the authors assessed the reliability of the ADI-R when multiple clinicians evaluated a single case, here a female 3 year old toddler suspected of having an autism spectrum disorder (Cicchetti et al. in J Autism Dev Disord 38:764-770, 2008). Applying the clinical criteria of Cicchetti and Sparrow (Am J Men Def 86:127-137, 1981); and those of Cicchetti et al. (Child Neuropsychol 126-137, 1995): 74 % of the ADI-R items showed 100 % agreement; 6 % showed excellent agreement; 7 % showed good agreement; 3 % manifested average agreement; and the remaining 10 % evidenced poor agreement. In this follow-up investigation, the authors described and applied a novel method for determining levels of statistical significance of the reliability coefficients obtained in the earlier investigation. It is based upon a modification of the Z test for comparing a given level of inter-examiner reliability with a lower limit value of 70 % (Dixon and Massey in Introduction to statistical analysis. McGraw-Hill, New York, 1957). Results indicated that every item producing a clinically acceptable level of inter-examiner reliability was also statistically significant. However, the reverse was not true, since a number of the items with statistically significant reliability levels did not reach levels of agreement that were clinically meaningful. This indicated that clinical significance was an accurate marker of statistical significance. The generalization of these findings to other areas of diagnostic interest and importance is also examined.
Journal of autism and developmental disorders, 2014 · doi:10.1007/s10803-014-2177-8