Assessment & Research

The effects of instructions and calculation procedures on observers' accuracy, agreement, and calculation correctness.

Boykin et al. (1981) · Journal of applied behavior analysis 1981
★ The Verdict

Let an outsider calculate observer agreement and stress accuracy, not high scores, to keep your data clean.

✓ Read this if BCBAs who use partial interval, momentary time sample, or any agreement measure in clinic or school.
✗ Skip if Practitioners who only use permanent product data with no observer judgment.

01Research in Context

01

What this study did

Researchers asked 48 college students to watch two people code child behavior. The students then calculated how often the two coders agreed.

Half the students got told to be accurate. The other half got told to aim for high agreement. No one knew the real goal was to see if the math itself could bend the results.

02

What they found

Students who did their own math boosted their own agreement scores and cut the other pair's. The 'be accurate' group got the numbers right. The 'aim for agreement' group let the math slide to look better.

Simply changing the instructions changed the final data.

03

How this fits with other research

Morris et al. (2020) also checked if the way you run an assessment changes the outcome. They found the same big idea: the method you pick decides the numbers you get.

Clarke et al. (1998) used observers to rate staff care quality. Their work shows observer codes drive real-world choices, so any bias in the math matters.

Wilkins et al. (2009) and Festinger et al. (1996) both built conclusions on observer-coded social skills. The 1981 paper warns that those very codes can be nudged up or down by who does the tally.

04

Why it matters

Next time you train staff to take data, let a third person do the agreement math. Tell the team to chase correct numbers, not just high agreement. One small change in the instructions can keep your treatment decisions honest.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Pick one client's program, have a different staff member calculate IOA this week, and remind the team 'correct' beats 'high'.

02At a glance

Intervention
not applicable
Design
other
Sample size
16
Population
neurotypical
Finding
mixed

03Original abstract

Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions.

Journal of applied behavior analysis, 1981 · doi:10.1901/jaba.1981.14-479