Assessment & Research

Getting More From Your IOA Data: Alternative Measures to Total, Occurrence, and Non‐Occurrence Agreement

Cox et al. (2025) · Behavioral Interventions

★ The Verdict

Swap in precision, recall, or F1 when you need to know what kind of IOA error you have, especially with very low or very high behavior rates.

✓ Read this if BCBAs who collect IOA on any behavior and want clearer graphs for supervision or publication.

✗ Skip if Teams already using classification metrics in their data sheets.

01Research in Context

What this study did

Cox et al. (2025) wrote a how-to paper. They list eight new ways to score inter-observer agreement (IOA).

Each index tells you where the errors live. Some work best when behavior is rare. Others help when behavior fills the session.

The authors give a short pick-list so you can match the math to your data shape.

What they found

The paper itself has no new numbers. It is a map, not a study.

The big idea: stop reporting only "percent agreement." Use precision, recall, or F1 when you need to know what kind of disagreement happened.

How this fits with other research

Jones et al. (1977) first told the field to dump simple percent agreement and add Kappa or Phi. Cox et al. (2025) answer that 48-year-old call with fresh tools.

Harris et al. (1978) built a weighted IOA to fight chance inflation. Cox replaces that single fix with a full toolkit that includes precision and recall—metrics made for skewed data.

Rolider et al. (2012) showed exact-agreement IOA breaks when response rates are high, while total IOA stays misleadingly high. Cox gives you precision and recall so you can spot exactly where the drift lives.

Why it matters

Next time you write "IOA averaged 95%," ask what the 5% error looked like. Were misses on tiny self-injury or on high-rate stereotypy? Pick precision or recall, rerun the sheet, and tell your reader the shape of the error. It takes one extra column and makes your data bullet-proof in supervision, publication, and peer review.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Open last session’s IOA sheet, add a precision column, and see if your high agreement hides a miss pattern.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

ABSTRACT Behavior analysts often need to rely on interobserver agreement (IOA) as a substitute for direct measures of the accuracy of collected trial‐by‐trial data. In these situations, behavior analysts often report the sum of instances wherein two people agreed divided by the total opportunities where they could have agreed. Though this is easy to calculate, easy to communicate, and provides a balanced comparison of agreement relative to total opportunities, it does not provide specific information on disagreements, is misleading with imbalanced datasets, and has decreased corrective utility with ceiling/floor effects. In this manuscript, we describe eight alternative measures from the binary classification literature. For each measure, we describe its benefits and drawbacks and the conditions under which each might have greater practical utility based on the function of analyzing IOA data. Understanding the measures described in this manuscript may allow behavior analysts to get more information from their IOA data.

Behavioral Interventions, 2025 · doi:10.1002/bin.70031