Assessment & Research

The influence of data characteristics on interrater agreement among visual analysts

Wolfe et al. (2023) · Journal of Applied Behavior Analysis

★ The Verdict

Steep trends and high variability split visual judges most, so smooth your data before you share it.

✓ Read this if BCBAs who present single-case graphs in team meetings or supervision.

✗ Skip if Practitioners who only run standardized assessments and never plot client data.

01Research in Context

What this study did

Wolfe et al. (2023) asked 36 BCBAs to look at 120 single-case graphs on a computer. Each graph showed six different data features: trend, effect size, variability, overlap, data points per phase, and phase length.

The analysts rated whether they saw a clear treatment effect. The team then checked which features made raters agree or disagree most strongly.

What they found

Steep trend and large effect size hurt agreement the most. High variability also lowered agreement, but less than trend and effect size. The other three features—overlap, points per phase, and phase length—hardly mattered.

In plain words, when a graph shoots up or down sharply, or when dots bounce widely, BCBAs are more likely to argue about what they see.

How this fits with other research

McGonigle et al. (1982) first showed that experts ignore variability when they eyeball graphs. Wolfe et al. (2023) now prove that variability still splits raters, even if they think they ignore it. The older and newer studies line up: variability quietly sows doubt.

Diller et al. (2016) also ran a survey and found low agreement among BCBAs judging multielement designs. Both papers blame variability and trend, giving the same warning across very different graph types.

Peltier et al. (2024) tried shrinking the y-axis or removing dot boxes and saw no boost in confidence. Wolfe et al. (2023) add that you cannot fix disagreement with layout tricks if the data itself are steep or jumpy.

Why it matters

Before you show a graph to a team, check the slope and bounce. If the line rockets up or the dots scatter wide, add more stable baseline days or tighten the intervention first. Cleaner data make team decisions faster and reduce wasted time arguing over pictures.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

Re-scan this week’s graphs: if any trend line looks like a ski slope or the dots look like popcorn, add more sessions to calm the data before the next team review.

02At a glance

Intervention

not applicable

Design

other

Finding

not reported

03Original abstract

Visual analysis is the primary method of analyzing single-case research data, yet relatively little is known about the variables that influence raters' decisions and rater agreement. Previous research has suggested that trend, variability, and autocorrelation may negatively affect interrater agreement, but studies have been limited by small numbers of graphs and participants whose knowledge of single-case research was not described. The purpose of this study was to examine the main and interaction effects of two values of each of six data characteristics (e.g., level, trend, and number of data points) on agreement among visual analysts. Using data from Lanovaz and Hranchuk (2021), we examined odds ratios to identify data characteristics that influence interrater agreement. Results suggest that trend and effect size, and to a lesser extent variability, have the largest effects on interrater agreement. We discuss the implications of our results for future research on improving interrater agreement among visual analysts.

Journal of Applied Behavior Analysis, 2023 · doi:10.1002/jaba.980