Assessment & Research

Observer agreement, credibility, and judgment: some considerations in presenting observer agreement data.

Kratochwill et al. (1977) · Journal of applied behavior analysis 1977
★ The Verdict

Add occurrence/non-occurrence agreement and Kappa/Phi to your next IOA report instead of relying solely on percent agreement.

✓ Read this if BCBAs who collect direct-observation data for publication or funding review.
✗ Skip if Practitioners who only share raw data with in-house teams and never publish.

01Research in Context

01

What this study did

Jones et al. (1977) wrote a think-piece, not an experiment. They looked at how ABA papers were showing observer agreement. Most authors only gave one number: total percent agreement. The team said that single number hides important facts.

They urged writers to add two more checks. Report separate scores for times the behavior happened and did not happen. Also add Kappa or Phi to correct for chance agreement.

02

What they found

The paper found no new data. Instead it found a reporting gap. Readers could not tell if high agreement came from real consensus or from lucky guesses. The authors warned that weak reports hurt the field’s credibility.

03

How this fits with other research

Harris et al. (1978) answered the call the very next year. They gave a weighted formula that mixes occurrence and non-occurrence scores. Cox et al. (2025) went even further. Their 2025 paper replaces percent agreement with eight new indices like precision and recall for cases with very low or very high behavior rates.

Hausman et al. (2022) tested a related worry: do we need lots of extra IOA sessions? Their re-analysis showed stable coefficients come quickly with well-trained observers, so you can follow R et al.’s advice without piling on sessions.

Essig et al. (2023) widened the lens. They audited recent JABA and BAP articles and found that half still skip procedural-fidelity IOA. This gap keeps the credibility problem alive almost fifty years after the target paper sounded the alarm.

04

Why it matters

Next time you write a report, do not stop at 95% agreement. Add occurrence agreement, non-occurrence agreement, and Kappa or Phi. These extra numbers take five minutes to calculate and show reviewers you did not hide disagreement. The habit boosts your study’s trust and keeps pace with newer indices if you later need them.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Open your last data sheet, calculate occurrence and non-occurrence IOA, and add both numbers to the file note.

02At a glance

Intervention
not applicable
Design
theoretical
Finding
not reported

03Original abstract

Graphical and statistical indices employed to represent observer agreement in interval recording are described as "judgmental aids", stimuli to which the researcher and scientific community must respond when viewing observer agreement data. The advantages and limitations of plotting calibrating observer agreement data and reporting conventional statistical aids are discussed in the context of their utility for researchers and research consumers of applied behavior analysis. It is argued that plotting calibrating observer data is a useful supplement to statistical aids for researchers but is of only limited utility for research consumers. Alternatives to conventional per cent agreement statistics for research consumers include reporting special agreement estimates (e.g., per cent occurrence agreement and nonoccurrence agreement) and correlational statistics (e.g., Kappa and Phi).

Journal of applied behavior analysis, 1977 · doi:10.1901/jaba.1977.10-133