Assessment & Research

Influences of response rate and distribution on the calculation of interobserver reliability scores.

Rolider et al. (2012) · Journal of applied behavior analysis

★ The Verdict

High-rate or end-heavy behavior makes total IOA look perfect while exact-agreement IOA reveals observer drift.

✓ Read this if BCBAs who measure high-rate behaviors like stereotypy or tapping in clinic or school settings.

✗ Skip if Practitioners only tracking low-rate behaviors such as daily initiations.

01Research in Context

What this study did

Rolider et al. (2012) ran math checks on two common IOA formulas. They asked what happens when the same data have high response rates or most responses pile up at the end of each interval.

They fed fake data sets into exact-agreement IOA and total IOA. The goal was to see if the numbers still tell the truth when behavior gets fast or bunched.

What they found

Exact-agreement IOA cracked first. It dropped hard when responses came fast or landed late in the interval.

Total IOA stayed high and cheerful even when observers quietly disagreed. The paper warns that total IOA can hide real drift.

How this fits with other research

Cox et al. (2025) now gives you eight better tools. They say swap in precision, recall, or F1 instead of raw percent agreement. Their toolkit directly fixes the ceiling-effect trap that U et al. exposed.

Hausman et al. (2022) pushed the same lever further. They cut the number of IOA sessions and still saw the same rate-driven swings. This tells us the problem is stable across more or less data.

Jones et al. (1977) saw it coming. They already asked for Kappa and occurrence/non-occurrence splits long before U et al. proved raw percent agreement can lie.

Why it matters

Next time you track hand-flaps or vocal stereotypy that top 100 per minute, do not trust a glossy 95% total IOA. Flip to exact-agreement or Cox’s precision score. If the second number dips, retrain observers before you trust any trend line.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Run both total and exact-agreement IOA on your fastest behavior this week—if the gap is over 10%, retrain observers.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

We examined the effects of several variations in response rate on the calculation of total, interval, exact-agreement, and proportional reliability indices. Trained observers recorded computer-generated data that appeared on a computer screen. In Study 1, target responses occurred at low, moderate, and high rates during separate sessions so that reliability results based on the four calculations could be compared across a range of values. Total reliability was uniformly high, interval reliability was spuriously high for high-rate responding, proportional reliability was somewhat lower for high-rate responding, and exact-agreement reliability was the lowest of the measures, especially for high-rate responding. In Study 2, we examined the separate effects of response rate per se, bursting, and end-of-interval responding. Response rate and bursting had little effect on reliability scores; however, the distribution of some responses at the end of intervals decreased interval reliability somewhat, proportional reliability noticeably, and exact-agreement reliability markedly.

Journal of applied behavior analysis, 2012 · doi:10.1901/jaba.2012.45-753