Assessment & Research

Observation procedure, observer gender, and behavior valence as determinants of sampling error in a behavior assessment analogue.

Farkas et al. (1980) · Journal of applied behavior analysis

★ The Verdict

Train observers with calibrated videos and check for bias monthly—systematic beats casual every time.

✓ Read this if BCBAs who rely on staff-collected data in clinics or schools.

✗ Skip if Practitioners using only automated sensors or permanent product measures.

01Research in Context

What this study did

The researchers set up a lab task to test how well people record behavior.

Some used a clear step-by-step plan. Others just watched and wrote notes.

They also looked at whether the observer’s gender and the type of behavior changed the scores.

What they found

The group with the plan made fewer errors.

Unsystematic watching led to shaky data.

Observer gender and whether the behavior was good or bad also shifted the numbers.

How this fits with other research

Spanoudis et al. (2011) give you the next step. They show how to calibrate observers with scripted videos until every score is within 0.1 responses per minute. Their method turns the 1980 “use a plan” idea into a day-one training tool.

Kazdin (1977) warned that drift and expectancies spoil data. Locurto et al. (1980) proved it by showing gender and valence bias in real numbers.

Abuin et al. (2026) extend the same worry to treatment fidelity. They found that 50 % fidelity can outscore 100 % fidelity unless you switch conditions fast and without cues. Both papers scream the same message: tiny procedural details move the graph.

Why it matters

Before you let a new tech collect data, make them watch a gold-standard video and score within 0.1. Pick reference clips that mix positive and negative behaviors so gender or likeability bias shows up early. Recalibrate each month. These two steps take twenty minutes and save you from bad calls on treatment change.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

Pick one mastered client video, have each staff member score it, and compare to the master key; retrain until every count is within 0.1 responses per minute.

02At a glance

Intervention

not applicable

Design

other

Sample size

Population

not specified

Finding

positive

03Original abstract

Several factors thought to influence the representativeness of behavioral assessment data were examined in an analogue study using a multifactorial design. Systematic and unsystematic methods of observing group behavior were investigated using 18 male and 18 female observers. Additionally, valence properties of the observed behaviors were inspected. Observers' assessments of a videotape were compared to a criterion code that defined the population of behaviors. Results indicated that systematic observation procedures were more accurate than unsystematic procedures, though this factor interacted with gender of observer and valence of behavior. Additionally, males tended to sample more representatively than females. A third finding indicated that the negatively valenced behavior was overestimated, whereas the neutral and positively valenced behaviors were accurately assessed.

Journal of applied behavior analysis, 1980 · doi:10.1901/jaba.1980.13-529