Assessment & Research

Statistical comparison of four effect sizes for single-subject designs.

Campbell (2004) · Behavior modification 2004
★ The Verdict

Four common effect-size indices can call the same graph weak or strong, so report the formula you use and treat meta-analytic averages with caution.

✓ Read this if BCBAs who publish or review single-case research.
✗ Skip if Clinicians who only run treatment and never crunch numbers.

01Research in Context

01

What this study did

Campbell (2004) lined up four popular ways to turn single-case graphs into numbers. The team compared PND, PZD, MBLR, and regression-d. They wanted to see if the indices told the same story about the same treatments.

02

What they found

All four indices agreed the treatments worked. But the indices did not agree on how well they worked. Only PZD spotted differences tied to moderators such as age or setting.

03

How this fits with other research

Sen (2022) extends the warning: five newer regression formulas can swing Cohen's d from 0.003 to 3.47 on the same data. The numbers look precise, but they are not interchangeable.

Cohn et al. (2007) acts as a successor. Their field test of 165 AB graphs crowned IRD the best nonoverlap index, pushing past the PND that Campbell (2004) still included.

Carter (2013) reframes the debate. The paper says overlap indices are not wrong, just misread. Use them to judge control, not size. This softens the apparent contradiction without erasing it.

04

Why it matters

When you write up a single-case study, pick one index and stick with it. State the formula in your method section so readers can compare across studies. If you need to hunt for moderators, try PZD first.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Add one sentence to your next report that names the exact effect-size formula you used.

02At a glance

Intervention
not applicable
Design
systematic review
Sample size
181
Population
autism spectrum disorder
Finding
not reported

03Original abstract

Controversy exists regarding appropriate methods for summarizing treatment outcomes for single-subject designs. Nonregression- and regression-based methods have been proposed to summarize the efficacy of single-subject interventions with proponents of both methods arguing for the superiority of their respective approaches. To compare findings for different single-subject effect sizes, 117 articles that targeted the reduction of problematic behaviors in 181 individuals diagnosed with autism were examined. Four effect sizes were calculated for each article: mean baseline reduction (MBLR), percentage of nonoverlapping data (PND), percentage of zero data (PZD), and one regression-based d statistic. Although each effect size indicated that behavioral treatment was effective, moderating variables were detected by the PZD effect size only. Pearson product-moment correlations indicated that effect sizes differed in statistical relationships to one another. In the present review, the regression-based d effect size did not improve the understanding of single-subject treatment outcomes when compared to nonregression effect sizes.

Behavior modification, 2004 · doi:10.1177/0145445503259264