Assessment & Research

Observer agreement, credibility, and judgment: some considerations in presenting observer agreement data.

Kratochwill et al. (1977) · Journal of applied behavior analysis

★ The Verdict

Add occurrence/non-occurrence agreement and Kappa/Phi to your next IOA report instead of relying solely on percent agreement.

✓ Read this if BCBAs who collect direct-observation data for publication or funding review.

✗ Skip if Practitioners who only share raw data with in-house teams and never publish.

01Research in Context

What this study did

Jones et al. (1977) wrote a think-piece, not an experiment. They looked at how ABA papers were showing observer agreement. Most authors only gave one number: total percent agreement. The team said that single number hides important facts.

They urged writers to add two more checks. Report separate scores for times the behavior happened and did not happen. Also add Kappa or Phi to correct for chance agreement.

What they found

The paper found no new data. Instead it found a reporting gap. Readers could not tell if high agreement came from real consensus or from lucky guesses. The authors warned that weak reports hurt the field’s credibility.

How this fits with other research

Harris et al. (1978) answered the call the very next year. They gave a weighted formula that mixes occurrence and non-occurrence scores. Cox et al. (2025) went even further. Their 2025 paper replaces percent agreement with eight new indices like precision and recall for cases with very low or very high behavior rates.

Hausman et al. (2022) tested a related worry: do we need lots of extra IOA sessions? Their re-analysis showed stable coefficients come quickly with well-trained observers, so you can follow R et al.’s advice without piling on sessions.

Essig et al. (2023) widened the lens. They audited recent JABA and BAP articles and found that half still skip procedural-fidelity IOA. This gap keeps the credibility problem alive almost fifty years after the target paper sounded the alarm.

Why it matters

Next time you write a report, do not stop at 95% agreement. Add occurrence agreement, non-occurrence agreement, and Kappa or Phi. These extra numbers take five minutes to calculate and show reviewers you did not hide disagreement. The habit boosts your study’s trust and keeps pace with newer indices if you later need them.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Open your last data sheet, calculate occurrence and non-occurrence IOA, and add both numbers to the file note.

02At a glance

Intervention

not applicable

Design

theoretical

Finding

not reported

03Original abstract

Graphical and statistical indices employed to represent observer agreement in interval recording are described as "judgmental aids", stimuli to which the researcher and scientific community must respond when viewing observer agreement data. The advantages and limitations of plotting calibrating observer agreement data and reporting conventional statistical aids are discussed in the context of their utility for researchers and research consumers of applied behavior analysis. It is argued that plotting calibrating observer data is a useful supplement to statistical aids for researchers but is of only limited utility for research consumers. Alternatives to conventional per cent agreement statistics for research consumers include reporting special agreement estimates (e.g., per cent occurrence agreement and nonoccurrence agreement) and correlational statistics (e.g., Kappa and Phi).

Journal of applied behavior analysis, 1977 · doi:10.1901/jaba.1977.10-133