Assessment & Research

A cautionary note on the use of probability values to evaluate interobserver agreement.

Hartmann et al. (1982) · Journal of applied behavior analysis 1982
★ The Verdict

Stop using p-values for IOA when data points influence each other—skip intervals or model the links.

✓ Read this if BCBAs who collect continuous observation data for research or billing.
✗ Skip if Practitioners who only use permanent product or trial-by-trial data.

01Research in Context

01

What this study did

The authors looked at how we test interobserver agreement. They asked: do the usual p-values still work when data points are linked over time?

They wrote a short warning paper. They showed that normal chi-square tests give false positives when one score predicts the next score.

02

What they found

The paper says the math breaks down. If your behavior data are serially correlated, p-values for IOA are meaningless.

They give two fixes. Skip every other interval to break the chain. Or switch to a Markov model that expects the links.

03

How this fits with other research

Parsons et al. (1981) found observers cheat when they score their own agreement. Hartmann et al. (1982) adds a second trap: even honest scores can fail the math test.

Fisch (1998) showed our eyes miss slow trends. Together these papers say: don’t trust people, don’t trust p-values, and don’t trust your eyes alone.

Hastings et al. (2001) later showed staff reports swing wildly day to day. That daily swing is the same serial correlation Hartmann et al. (1982) warned about.

04

Why it matters

Before you report “significant IOA,” run a quick check. Plot your data. If one interval looks like the next, skip intervals or use the Markov fix. It takes five extra minutes and saves you from publishing bad numbers.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Re-score one recent session using every other interval and compare the IOA; if it changes, your data are serially correlated.

02At a glance

Intervention
not applicable
Design
theoretical
Finding
not reported

03Original abstract

Proposed methods of assessing the statistical significance of interobserver agreements provide erroneous probability values when conducted on serially correlated data. Investigators who wish to evaluate interobserver agreements by means of statistical significance can do so by limiting the analysis to every k(th) interval of data, or by using Markovian techniques which accommodate serial correlations.

Journal of applied behavior analysis, 1982 · doi:10.1901/jaba.1982.15-189