Assessment & Research

The effects of instructions and calculation procedures on observers' accuracy, agreement, and calculation correctness.

Boykin et al. (1981) · Journal of applied behavior analysis

★ The Verdict

Let an outsider calculate observer agreement and stress accuracy, not high scores, to keep your data clean.

✓ Read this if BCBAs who use partial interval, momentary time sample, or any agreement measure in clinic or school.

✗ Skip if Practitioners who only use permanent product data with no observer judgment.

01Research in Context

What this study did

Researchers asked 48 college students to watch two people code child behavior. The students then calculated how often the two coders agreed.

Half the students got told to be accurate. The other half got told to aim for high agreement. No one knew the real goal was to see if the math itself could bend the results.

What they found

Students who did their own math boosted their own agreement scores and cut the other pair's. The 'be accurate' group got the numbers right. The 'aim for agreement' group let the math slide to look better.

Simply changing the instructions changed the final data.

How this fits with other research

Morris et al. (2020) also checked if the way you run an assessment changes the outcome. They found the same big idea: the method you pick decides the numbers you get.

Clarke et al. (1998) used observers to rate staff care quality. Their work shows observer codes drive real-world choices, so any bias in the math matters.

Wilkins et al. (2009) and Festinger et al. (1996) both built conclusions on observer-coded social skills. The 1981 paper warns that those very codes can be nudged up or down by who does the tally.

Why it matters

Next time you train staff to take data, let a third person do the agreement math. Tell the team to chase correct numbers, not just high agreement. One small change in the instructions can keep your treatment decisions honest.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Pick one client's program, have a different staff member calculate IOA this week, and remind the team 'correct' beats 'high'.

02At a glance

Intervention

not applicable

Design

other

Sample size

Population

neurotypical

Finding

mixed

03Original abstract

Although the quality of observational data is generally evaluated by observer agreement, measures of both observer agreement and accuracy were available in the present study. Videotapes with a criterion protocol were coded by 16 observers. All observers calculated agreement scores both on their own and their partner's data and on a contrived data set misrepresented as data collected by other observers. Compared with agreement scores calculated by the experimenter, observers erroneously inflated their own agreement scores and deflated the agreement scores on the contrived data. Half of the observers (n = 8) had been given instructions emphasizing the importance of accuracy during observation while the other half had been given instructions emphasizing interobserver agreement. Accuracy exceeded agreement for the former group, whereas agreement exceeded accuracy for the latter group. The implications are that agreement should be calculated by the experimenter and that the accuracy-agreement relationship can be altered by differential observer instructions.

Journal of applied behavior analysis, 1981 · doi:10.1901/jaba.1981.14-479