Assessment & Research

Comparing Visual and Statistical Analysis of Multiple Baseline Design Graphs.

Wolfe et al. (2019) · Behavior modification

★ The Verdict

IRD and BC-SMD are the stats that most often agree with expert visual inspection of multiple-baseline graphs.

✓ Read this if BCBAs who run multiple-baseline studies and want a quick number to back up visual calls.

✗ Skip if Practitioners who only use alternating-treatment or reversal designs; other stats fit those better.

01Research in Context

What this study did

Howard et al. (2019) asked a simple question. Which numbers match what our eyes already see?

They took graphs from real multiple-baseline studies. Four quick stats were tried on each graph.

Experts also judged the same graphs by eye. The team then checked which stat best agreed with the pros.

What they found

Two stats won. IRD and BC-SMD lined up closest to visual calls.

The naked eye was usually tougher. Visual judges said "no effect" more often than any stat did.

How this fits with other research

Lanovaz et al. (2017) set the table first. They showed you need at least three A points and five B points before you even trust the simpler dual-criteria rule. Katie’s team built on that by testing fancier stats on the same kind of graphs.

Falligant et al. (2020) double-checked the dual-criteria method with fake data and found it keeps false alarms low. Katie et al. now give you two more tools—IRD and BC-SMD—that do the same job with tighter agreement to expert eyes.

Manolov (2019) ran a sister simulation on alternating-treatment designs and praised ALIV plus randomization. The takeaway across both 2019 papers: pair your eyes with one solid stat, but pick the stat that matches your design—ALIV for ATD, IRD or BC-SMD for multiple baseline.

Why it matters

You no longer have to guess if a graph "looks good enough." Slap IRD or BC-SMD onto your next multiple-baseline project and write the number right under the visual claim. If the stat agrees with your eye, you can sign off with confidence. If they clash, collect more data or tweak the intervention before you call it a win.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Open your last multiple-baseline graph, compute IRD with free online calculators, and compare the number to your visual judgment.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

A growing number of statistical analyses are being developed for single-case research. One important factor in evaluating these methods is the extent to which each corresponds to visual analysis. Few studies have compared statistical and visual analysis, and information about more recently developed statistics is scarce. Therefore, our purpose was to evaluate the agreement between visual analysis and four statistical analyses: improvement rate difference (IRD); Tau-U; Hedges, Pustejovsky, Shadish (HPS) effect size; and between-case standardized mean difference (BC-SMD). Results indicate that IRD and BC-SMD had the strongest overall agreement with visual analysis. Although Tau-U had strong agreement with visual analysis on raw values, it had poorer agreement when those values were dichotomized to represent the presence or absence of a functional relation. Overall, visual analysis appeared to be more conservative than statistical analysis, but further research is needed to evaluate the nature of these disagreements.

Behavior modification, 2019 · doi:10.1177/0145445518768723