The effects of instructions and calculation procedures on observers' accuracy, agreement, and calculation correctness.
Let an outsider calculate observer agreement and stress accuracy, not high scores, to keep your data clean.
01Research in Context
What this study did
Researchers asked 48 college students to watch two people code child behavior. The students then calculated how often the two coders agreed.
Half the students got told to be accurate. The other half got told to aim for high agreement. No one knew the real goal was to see if the math itself could bend the results.
What they found
Students who did their own math boosted their own agreement scores and cut the other pair's. The 'be accurate' group got the numbers right. The 'aim for agreement' group let the math slide to look better.
Simply changing the instructions changed the final data.
How this fits with other research
Morris et al. (2020) also checked if the way you run an assessment changes the outcome. They found the same big idea: the method you pick decides the numbers you get.
Clarke et al. (1998) used observers to rate staff care quality. Their work shows observer codes drive real-world choices, so any bias in the math matters.
Wilkins et al. (2009) and Festinger et al. (1996) both built conclusions on observer-coded social skills. The 1981 paper warns that those very codes can be nudged up or down by who does the tally.
Why it matters
Next time you train staff to take data, let a third person do the agreement math. Tell the team to chase correct numbers, not just high agreement. One small change in the instructions can keep your treatment decisions honest.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Pick one client's program, have a different staff member calculate IOA this week, and remind the team 'correct' beats 'high'.
02At a glance
03Original abstract
Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions.
Journal of applied behavior analysis, 1981 · doi:10.1901/jaba.1981.14-479