Assessment & Research

Reliability in the context of the experiment: A commentary on two articles by Birkimer and Brown.

Yelton (1979) · Journal of applied behavior analysis

★ The Verdict

Track observer agreement across the whole study and let the stability of those numbers decide whether your single-case data are trustworthy.

✓ Read this if BCBAs who run or supervise single-case experiments in clinics, schools, or home programs.

✗ Skip if Practitioners who only use pre-packased curricula with built-in reliability checks and never collect original data.

01Research in Context

What this study did

Yelton (1979) wrote a short, sharp commentary on two earlier articles by Birkimer and Brown. He took their ideas about observer agreement and stretched them into a full system for judging data quality inside any single-case experiment. The paper is pure theory—no new data, just a blueprint for practitioners who want to know when their numbers are solid enough to trust.

What they found

The core message is simple: stop treating observer agreement as a one-time checkbox. Instead, use those agreement scores to measure point-by-point variability across the whole study. High, stable agreement means the data picture is clear; dips or jumps warn you the picture is blurry. If the variability is too wild, the experiment is not ready for a final call.

How this fits with other research

Rider (1977) set the table two years earlier by mapping two basic ways to calculate reliability—percentage agreement for trial-level checks and correlational indices for session-level trends. Yelton (1979) keeps both tools but adds the new rule: track them continuously, not just at the start.

Cook et al. (2020) give a modern example of the same spirit. They show how to slip brief duration probes into momentary time-sampling sessions so you can spot measurement drift while the study runs—exactly the kind of live quality check Yelton (1979) was asking for.

Wilder et al. (2023) push the idea beyond observer agreement. They argue that procedural-fidelity percentages hide useful rate information, just like raw agreement percentages can hide variability. Both papers say the same thing: add a rate metric if you want to see the real story.

Sasson et al. (2018) close the loop by giving a ready-made reporting scaffold. They turn the old plea for transparent data into fill-in-the-blank language you can drop into your next single-case paper so meta-analysts can judge adequacy without guessing.

Why it matters

Next time you graph a client’s data, pick three random sessions and run fresh agreement checks. If the new numbers match your original reliability file and stay flat across sessions, you can sleep on those data. If they wobble, retrain observers or collect more proof before you claim victory. It takes ten minutes and saves your reputation.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Add a standing column to your data sheet labeled ‘IOA tonight’ and calculate agreement on the last three data points before every team meeting.

02At a glance

Intervention

not applicable

Design

theoretical

Finding

not reported

03Original abstract

Two sources of variability must each be considered when examining change in level between two sets of data obtained by human observers; namely, variance within data sets (phases) and variability attributed to each data point (reliability). Birkimer and Brown (1979a, 1979b) have suggested that both chance levels and disagreement bands be considered in examining observer reliability and have made both methods more accessible to researchers. By clarifying and extending Birkimer and Brown's papers, a system is developed using observer agreement to determine the data point variability and thus to check the adequacy of obtained data within the experimental context.

Journal of applied behavior analysis, 1979 · doi:10.1901/jaba.1979.12-565