Inconsistent visual analyses of intrasubject data.
Visual inspection is shaky; draw clear phase-mean lines and always get a second reviewer.
01Research in Context
What this study did
The authors mailed 22 graphs to 65 behavior analysts. Each graph showed one client’s data across two phases. The reviewers simply answered: “Did behavior change?”
No stats were used. The study wanted to see how often two reviewers agreed by eye alone.
What they found
Agreement was only 61 %. Clear jumps in the phase means helped people agree. High or low data points, trend, or big shift size did not help much.
In short, messy graphs made people disagree.
How this fits with other research
Nelson et al. (1978) warned that high autocorrelation makes visual and statistical answers clash. Annable et al. (1979) now show that even without stats, visual eyes can clash on their own.
Branch (2019) later claimed that behavior analysis can save psychology from the “reproducibility crisis” because we replicate and visually inspect. The 1979 data say: first fix the visual inspection step, or the crisis walks right into our own journals.
Gilroy et al. (2018) offered a new number-crunching tool for discounting curves. Their spirit matches A et al.: when old eyeball methods wobble, tighten the metric or the graph.
Why it matters
Your graphs are your main evidence. Make phase means jump off the page: bold mean lines, clean phase change labels, and stable baselines. Share graphs with a co-worker before you call an intervention a success. One extra set of eyes can catch what yours missed.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Add a bold horizontal line at each phase mean on your current graph and ask a colleague to verify the change.
02At a glance
03Original abstract
Visual inspection has been the method of analysis most widely employed to evaluate the functional control demonstrated by any given set of intrasubject replication data. To identify the influence of certain graphic characteristics on these evaluative behaviors, 36 "ABAB reversal" figures were constructed. They were sent to 250 reviewers of behavioral journals. Their evaluation of each figure was expressed as a rating on a 100-point scale of "experimental control." Mean interrater agreement was 0.61. In addition to this rating, a verbal description of evaluation criteria was requested. It was also found that graphic characteristics determine evaluative judgments in concert rather than singly. For example, phase mean changes had to be a pattern consistent with the hypothesized effect of the experimental variables, while degrees of mean shift and variability were less important. A description of the following evaluative criteria was presented: (a) topographic characteristics, (b) format of data presentation, (c) intra-experimental, and (d) extra-experimental circumstances.
Journal of applied behavior analysis, 1979 · doi:10.1901/jaba.1979.12-573