Assessment & Research

Back to basics: Percentage agreement measures are adequate, but there are easier ways.

Birkimer et al. (1979) · Journal of applied behavior analysis

★ The Verdict

Count disagreements; if they stay at 10% or less after 50 observations, you can trust your data and skip harder math.

✓ Read this if BCBAs who collect interval or trial data in schools, homes, or clinics.

✗ Skip if Researchers who must report chance-corrected reliability to a grant agency.

01Research in Context

What this study did

The authors looked at everyday data sheets. They asked: how much disagreement is too much?

They tested interval and trial records. They set a simple cut-off: 10% or less disagreement.

If you hit that mark after 50 observations, they say you can stop checking further.

What they found

The 10% rule caught the same bad data as longer formulas. No extra math needed.

Teams saved time. They still met journal standards for reliable data.

How this fits with other research

Jones et al. (1977) pushed a chance-corrected formula. Friedling et al. (1979) say skip it if you meet the 10% rule. The two papers seem opposite, but they serve different moments. Use the formula when disagreement is high; use the shortcut when it is low.

Goldman et al. (1979) published a table that turns percent agreement into a phi score the same year. C et al. say you can often ignore that step. Again, the clash is only on paper. The table helps when you need a fancy stat; the 10% rule helps when you just need to move on.

Lanovaz et al. (2017) give a phase-length rule to curb false positives. Like C et al., they trade a small bit of rigor for a big gain in speed. Both papers give you permission to stop collecting earlier if simple checks pass.

Why it matters

You can check reliability on the clinic floor without a calculator. Run your session, count disagreements, and if you are at or below 10% after 50 intervals or trials, keep going with your intervention. File the sheet and focus on your client, not on stats software. The rule keeps you honest while giving back precious therapy minutes.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

After 50 intervals, tally disagreements; if ≤10%, move on to teaching.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.

Journal of applied behavior analysis, 1979 · doi:10.1901/jaba.1979.12-535