Back to basics: Percentage agreement measures are adequate, but there are easier ways.
Count disagreements; if they stay at 10% or less after 50 observations, you can trust your data and skip harder math.
01Research in Context
What this study did
The authors looked at everyday data sheets. They asked: how much disagreement is too much?
They tested interval and trial records. They set a simple cut-off: 10% or less disagreement.
If you hit that mark after 50 observations, they say you can stop checking further.
What they found
The 10% rule caught the same bad data as longer formulas. No extra math needed.
Teams saved time. They still met journal standards for reliable data.
How this fits with other research
Jones et al. (1977) pushed a chance-corrected formula. Friedling et al. (1979) say skip it if you meet the 10% rule. The two papers seem opposite, but they serve different moments. Use the formula when disagreement is high; use the shortcut when it is low.
Goldman et al. (1979) published a table that turns percent agreement into a phi score the same year. C et al. say you can often ignore that step. Again, the clash is only on paper. The table helps when you need a fancy stat; the 10% rule helps when you just need to move on.
Lanovaz et al. (2017) give a phase-length rule to curb false positives. Like C et al., they trade a small bit of rigor for a big gain in speed. Both papers give you permission to stop collecting earlier if simple checks pass.
Why it matters
You can check reliability on the clinic floor without a calculator. Run your session, count disagreements, and if you are at or below 10% after 50 intervals or trials, keep going with your intervention. File the sheet and focus on your client, not on stats software. The rule keeps you honest while giving back precious therapy minutes.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →After 50 intervals, tally disagreements; if ≤10%, move on to teaching.
02At a glance
03Original abstract
Percentage agreement measures of interobserver agreement or "reliability" have traditionally been used to summarize observer agreement from studies using interval recording, time-sampling, and trial-scoring data collection procedures. Recent articles disagree on whether to continue using these percentage agreement measures, and on which ones to use, and what to do about chance agreements if their use is continued. Much of the disagreement derives from the need to be reasonably certain we do not accept as evidence of true interobserver agreement those agreement levels which are substantially probable as a result of chance observer agreement. The various percentage agreement measures are shown to be adequate to this task, but easier ways are discussed. Tables are given to permit checking to see if obtained disagreements are unlikely due to chance. Particularly important is the discovery of a simple rule that, when met, makes the tables unnecessary. If reliability checks using 50 or more observation occasions produce 10% or fewer disagreements, for behavior rates from 10% through 90%, the agreement achieved is quite improbably the result of chance agreement.
Journal of applied behavior analysis, 1979 · doi:10.1901/jaba.1979.12-535