Assessment & Research

How Many Tiers Do We Need? Type I Errors and Power in Multiple Baseline Designs

Lanovaz et al. (2020) · Perspectives on Behavior Science

★ The Verdict

Accept two clear changes out of three or more tiers and move on — power stays strong and errors stay low.

✓ Read this if BCBAs who run or review multiple-baseline studies.

✗ Skip if Practitioners who only use group designs.

01Research in Context

What this study did

Stop scrapping your study when the third tier wobbles. If two tiers show clear change, you already meet the evidence standard. Next time you plan a multiple baseline, aim for at least three tiers and be content when two win — your power stays high and your errors stay low.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Keep your three-tier study even if tier three wobbles — two clear changes are enough to claim an effect.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

Design quality guidelines typically recommend that multiple baseline designs include at least three demonstrations of effects. Despite its widespread adoption, this recommendation does not appear grounded in empirical evidence. The main purpose of our study was to address this issue by assessing Type I error rate and power in multiple baseline designs. First, we generated 10,000 multiple baseline graphs, applied the dual-criteria method to each tier, and computed Type I error rate and power for different number of tiers showing a clear change. Second, two raters categorized the tiers for 300 multiple baseline graphs to replicate our analyses using visual inspection. When multiple baseline designs had at least three tiers and two or more of these tiers showed a clear change, the Type I error rate remained adequate (< .05) while power also reached acceptable levels (> .80). In contrast, requiring all tiers to show a clear change resulted in overly stringent conclusions (i.e., unacceptably low power). Therefore, our results suggest that researchers and practitioners should carefully consider limitations in power when requiring all tiers of a multiple baseline design to show a clear change in their analyses.

Perspectives on Behavior Science, 2020 · doi:10.1007/s40614-020-00263-x