Interrater agreement between visual analysts of single-case data: a meta-analysis.
Visual agreement is only a large share, so always add clear rules and aids before you call a graph effective.
01Research in Context
What this study did
The team pooled 23 older studies where pairs of BCBAs judged the same single-case graphs.
They asked: how often do two people agree that a line is flat, rising, or falling?
The meta-analysis covered 1,000+ graphs from classrooms, clinics, and labs.
What they found
Across all studies, agreement averaged only a large share.
That means one in four graphs got different yes/no calls from two trained viewers.
Agreement dropped even lower when trends were small or data were messy.
How this fits with other research
Lam et al. (2011) saw the same risk with numbers: switching from continuous to 10-s partial interval recording inflated IOA scores. Both papers warn that the method you pick can hide poor agreement.
Evers et al. (2020) found the same a large share overlap between two popular ASD parent interviews. Together the studies show moderate agreement is common across very different tools—graphs, checklists, or interviews.
Samyn et al. (2015) showed questionnaires and lab tasks measure different skills, so you can’t swap them. Likewise, Jennifer et al. remind us that eye-balling graphs without rules is not the same as using a defined rubric.
Why it matters
Before you write “intervention worked” in a report, pair up and score the graph with a coworker. Use a simple rule sheet: trend must cross three data points, no overlap with baseline, etc. Add median lines or split-middle lines to the chart so both raters look at the same cue. These quick steps push your team past the shaky a large share mark and give parents, teachers, and payers clearer proof.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Print two copies of your last client graph, add a split-middle trend line, and have a colleague score it using the same three-rule checklist.
02At a glance
03Original abstract
Visual analysis is the most widely applied method of data interpretation for single-case research as it encompasses multifaceted considerations relevant to evaluating behavior change. However, a previous research synthesis found low levels of interrater agreement between visually analyzed ratings of graphed data across all variables under analysis. The purpose of this meta-analysis was to evaluate the peer-reviewed literature to date for potential moderators affecting the proportion of interrater agreement between visual analysts. Nineteen articles with 32 effects were assembled. Potential moderators evaluated included (a) design families, (b) rater expertise, (c) the provision of contextual information for graphs, (d) the use of visual aids, (e) the provision of an operational definition of the construct being rated, and (f) rating scale ranges. Results yielded an overall weighted interrater agreement proportion of .76. Moderator variables identified produced low to adequate levels of interrater agreement. Practical recommendations for future research are discussed.
Behavior modification, 2015 · doi:10.1177/0145445515581327