Assessment & Research

A review of the observational data-collection and reliability procedures reported in The Journal of Applied Behavior Analysis.

Kelly (1977) · Journal of applied behavior analysis

★ The Verdict

Early JABA papers under-checked observer agreement—keep your own reliability at 90 % and measure it every condition.

✓ Read this if BCBAs who collect direct observation data in clinics, schools, or homes.

✗ Skip if Practitioners who only use permanent product or standardized rating scales.

01Research in Context

What this study did

The author read every paper in JABA from 1968 through 1975. He counted how many used direct observation. He noted who reported reliability checks and how often.

What they found

Three out of four studies watched behavior live. Almost all said "we checked reliability." Yet only half hit the 90 % agreement mark. Fewer than one in four checked reliability in every condition.

How this fits with other research

Lancioni et al. (2008) later showed the eyeball method itself is shaky. Raters looking at the same FA graphs only agreed about half the time. This builds on Stolz (1977) by proving weak reliability is not just under-reported—it is also hard to achieve.

Spanoudis et al. (2011) offered a fix: calibrate observers against scripted videos instead of trusting a second human. Their accuracy landed within 0.1 responses per minute. This successor paper shows the field has moved past simple inter-observer agreement.

Taylor et al. (2022) added computer tools. Objective algorithms matched visual inspection on only 84 % of graphs. The gap echoes B’s warning: without tight checks, our conclusions wobble.

Why it matters

If you run sessions without fresh reliability data, you may be chasing noise. Check agreement at least once per condition and aim for 90 %. Better yet, calibrate observers with gold-standard videos or add an algorithm backup. Solid data make solid decisions.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Schedule one reliability check in each condition this week and score it live.

02At a glance

Intervention

not applicable

Design

systematic review

Finding

not reported

03Original abstract

The research published in the Journal of Applied Behavior Analysis (1968 to 1975) was surveyed for three basic elements: data-collection methods, reliability procedures, and reliability scores. Three-quarters of the studies reported observational data. Most of these studies' observational methods were variations of event recording, trial scoring, interval recording, or time-sample recording. Almost all studies reported assessment of observer reliability, usually total or point-by-point percentage agreement scores. About half the agreement scores were consistently above 90%. Less than one-quarter of the studies reported that reliability was assessed at least once per condition.

Journal of applied behavior analysis, 1977 · doi:10.1901/jaba.1977.10-97