A graphical judgmental aid which summarizes obtained and chance reliability data and helps assess the believability of experimental effects.
Draw a free disagreement band around your graph to see at a glance if observer agreement beats chance and if your effect is believable.
01Research in Context
What this study did
The authors built a simple paper-and-pencil graph. It shows two lines around your data line.
The space between the lines is the amount of disagreement between two observers.
A second shaded band shows how much agreement you would expect by pure luck.
If the disagreement band sits inside the luck band, your data are probably solid.
What they found
The graph lets you see in one glance if observer agreement beats chance.
It also shows if a behavior change is bigger than the measurement noise.
No math beyond plotting points is needed.
How this fits with other research
Hopkins et al. (1977) came first. They gave curves to compare interval IOA with chance. The 1979 paper turns that idea into a universal bandwidth plot you can use with any data.
Manolov et al. (2015) and Wolfe et al. (2023) push the same spirit forward. They offer free R and Brinley-plot software that judge single-case effects, not just reliability.
Sunde et al. (2022) shows the idea still works today. Their visual checklist for latency-based FA graphs reached 98% rater agreement, proving structured graphics keep decisions consistent.
Why it matters
Next time you finish an observation session, plot the disagreement band before you trust the numbers. If the band is wider than the chance zone, train observers again. If it is narrow, you can defend your data in supervision or in court. The tool costs nothing and takes two minutes.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →After the next IOA check, plot the disagreement bandwidth and compare it to the shaded chance area before you write the session note.
02At a glance
03Original abstract
Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.
Journal of applied behavior analysis, 1979 · doi:10.1901/jaba.1979.12-523