Assessment & Research

A graphical judgmental aid which summarizes obtained and chance reliability data and helps assess the believability of experimental effects.

Birkimer et al. (1979) · Journal of applied behavior analysis

★ The Verdict

Draw a free disagreement band around your graph to see at a glance if observer agreement beats chance and if your effect is believable.

✓ Read this if BCBAs who defend data in supervision, peer review, or due-process hearings.

✗ Skip if Teams already using automated IOA software that flags chance levels for them.

01Research in Context

What this study did

The authors built a simple paper-and-pencil graph. It shows two lines around your data line.

The space between the lines is the amount of disagreement between two observers.

A second shaded band shows how much agreement you would expect by pure luck.

If the disagreement band sits inside the luck band, your data are probably solid.

What they found

The graph lets you see in one glance if observer agreement beats chance.

It also shows if a behavior change is bigger than the measurement noise.

No math beyond plotting points is needed.

How this fits with other research

Hopkins et al. (1977) came first. They gave curves to compare interval IOA with chance. The 1979 paper turns that idea into a universal bandwidth plot you can use with any data.

Manolov et al. (2015) and Wolfe et al. (2023) push the same spirit forward. They offer free R and Brinley-plot software that judge single-case effects, not just reliability.

Sunde et al. (2022) shows the idea still works today. Their visual checklist for latency-based FA graphs reached 98% rater agreement, proving structured graphics keep decisions consistent.

Why it matters

Next time you finish an observation session, plot the disagreement band before you trust the numbers. If the band is wider than the chance zone, train observers again. If it is narrow, you can defend your data in supervision or in court. The tool costs nothing and takes two minutes.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

After the next IOA check, plot the disagreement bandwidth and compare it to the shaded chance area before you write the session note.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

Interval by interval reliability has been criticized for "inflating" observer agreement when target behavior rates are very low or very high. Scored interval reliability and its converse, unscored interval reliability, however, vary as target behavior rates vary when observer disagreement rates are constant. These problems, along with the existence of "chance" values of each reliability which also vary as a function of response rate, may cause researchers and consumers difficulty in interpreting observer agreement measures. Because each of these reliabilities essentially compares observer disagreements to a different base, it is suggested that the disagreement rate itself be the first measure of agreement examined, and its magnitude relative to occurrence and to nonoccurrence agreements then be considered. This is easily done via a graphic presentation of the disagreement range as a bandwidth around reported rates of target behavior. Such a graphic presentation summarizes all the information collected during reliability assessments and permits visual determination of each of the three reliabilities. In addition, graphing the "chance" disagreement range around the bandwidth permits easy determination of whether or not true observer agreement has likely been demonstrated. Finally, the limits of the disagreement bandwidth help assess the believability of claimed experimental effects: those leaving no overlap between disagreement ranges are probably believable, others are not.

Journal of applied behavior analysis, 1979 · doi:10.1901/jaba.1979.12-523