Assessment & Research

Measuring the reliability of observational data: a reactive process.

Romanczyk et al. (1973) · Journal of applied behavior analysis

★ The Verdict

Announcing a reliability check inflates observer agreement, so collect some data without telling your coders.

✓ Read this if BCBAs who supervise data collection in clinic, school, or home settings

✗ Skip if Practitioners who rely solely on permanent product measures

01Research in Context

What this study did

Wildemann et al. (1973) asked a simple question: does telling observers you will check their agreement change how they code? They ran two conditions. In one, observers knew a second coder would compare notes. In the other, the check stayed hidden.

The team then looked at how often the two coders matched. They wanted to see if the simple act of announcing a reliability check skewed the numbers.

What they found

When observers knew they were being checked, their agreement scores shot up. The covert condition gave lower numbers. The authors argue the lower figures are the honest ones.

In short, the warning cue alone pushed coders toward the known assessor's data.

How this fits with other research

Harris et al. (1978) extends the warning. They showed that even when reliability checks stay in place, watching a teacher hand out praise can nudge observers to over-count eye contact. Three of six coders inflated scores after seeing rewards delivered.

Palmer et al. (2018) echoes the same theme with college students. Just having an experimenter in the room cut off-task behavior in half. Both papers shout the same message: measurement itself changes what you measure.

Fahmie et al. (2013) built a new autism scale and reported decent inter-rater reliability. Yet that claim is open to the same reactivity G et al. exposed; if the coders knew a check was coming, their agreement may be artifically rosy.

Why it matters

Next time you train a new RBT to collect data, run covert reliability trials at unannounced times. Rotate second observers quietly into sessions. Report both the open and hidden agreement numbers in your treatment reports. This small step keeps your data honest and your clinical decisions solid.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Schedule one silent reliability session this week: have a second observer code without the primary therapist knowing.

02At a glance

Intervention

not applicable

Design

other

Finding

not reported

03Original abstract

Reliability of observational data was measured simultaneously by two assessors under two experimental conditions. During overt assessment, observers were told that reliability would be measured by one of the two assessors, thus permitting computation of reliability with an identified and an unidentified assessor. During covert assessment, observers were not informed of the reliability measured. Throughout the study, each of the assessors employed a unique version of a standard observational code. In the overt assessment condition, reliability of observers with the identified assessor was consistently higher than reliability with the unidentified assessor, indicating that observers modified their observational criteria to approximate those of the identified assessor. In the covert assessment condition, reliability with the two assessors was substantially lower than during overt assessment. Further, observers consistently recorded lower frequencies of disruptive behavior than the two assessors during covert assessment.

Journal of applied behavior analysis, 1973 · doi:10.1901/jaba.1973.6-175