Assessment & Research

Statistical Decision-Making Accuracies for Some Overlap- and Distance-based Measures for Single-Case Experimental Designs

Carlin et al. (2022) · Perspectives on Behavior Science 2022
★ The Verdict

Use Tau to detect effects and RD or g to measure them—skip guesswork.

✓ Read this if BCBAs who publish or review single-case data.
✗ Skip if Clinicians who only run standardized norm-referenced tests.

01Research in Context

01

What this study did

Carlin et al. (2022) ran computer simulations to see which numbers best spot true effects in single-case graphs.

They compared overlap tools like Tau with distance tools like RD and g.

The goal was to tell BCBAs which statistic to trust for yes-or-no decisions and for sizing effects.

02

What they found

Tau won for decision accuracy: it rarely cried “effect” when none was there.

RD and g gave the clearest picture of how big a change was, not just if it happened.

Eyeballing the graph alone was not enough; the right number helps you act faster.

03

How this fits with other research

Costello et al. (2022) ran a similar 2022 test and also found Tau and RD beat visual inspection alone.

Dowdy et al. (2021) had already warned that each effect-size index carries hidden rules; Carlin’s work now shows which index to pick in practice.

Manolov et al. (2025) focus on picking the right multilevel model, not the right measure; together the two papers give you a full checklist for solid SCED stats.

04

Why it matters

Next time you graph a client’s data, run Tau first. If Tau says “effect,” add RD or g to show the size. This two-step habit speeds up team decisions and makes your write-ups reviewer-proof.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Open last week’s graph, calculate Tau in free SCED software, and note the RD value next to it.

02At a glance

Intervention
not applicable
Design
methodology paper
Finding
not reported

03Original abstract

Selecting a quantitative measure to guide decision making in single-case experimental designs (SCEDs) is complicated. Many measures exist and all have been rightly criticized. The two general classes of measure are overlap-based (e.g., percentage nonoverlapping data) and distance-based (e.g., Cohen’s d). We compare several measures from each category for Type I error rate and power across a range of designs using equal numbers of observations (i.e., 3–10) in each phase. Results showed that Tau and the distance-based measures (i.e., RD and g) provided the highest decision accuracies. Other overlap-based measures (e.g., PND, dual-criterion method) did not perform as well. It is recommended that Tau be used to guide decision making about the presence/absence of a treatment effect, and RD or g be used to quantify the magnitude of the treatment effect. The online version contains supplementary material available at 10.1007/s40614-021-00317-8.

Perspectives on Behavior Science, 2022 · doi:10.1007/s40614-021-00317-8