Assessment & Research

Detecting false positives in A-B designs: potential implications for practitioners.

Krueger et al. (2013) · Behavior modification

★ The Verdict

A calm, trend-free baseline gives fewer than two false positives out of a hundred, so you can trust a simple A-B design when the data look flat.

✓ Read this if BCBAs who run single-case evaluations in clinics or schools and need quick, valid proof.

✗ Skip if Researchers who only use group designs or who already rely on full reversal or multiple-baseline setups.

01Research in Context

What this study did

Bigham et al. (2013) ran computer simulations of A-B graphs. They wanted to know how often a stable baseline would fool you into thinking an intervention worked. They tested thousands of fake data sets that had no real treatment effect.

Each graph had a flat, steady baseline. The team then added random bounce and looked for false alarms — times the data crossed a preset decision rule and cried 'success' when nothing actually happened.

What they found

False positives stayed below two percent when the baseline showed no trend. In plain words, a flat, calm baseline rarely cried wolf. The design looked safe for everyday clinical use.

The result held across different amounts of bounce and different decision rules. Stability, not perfection, was the key guardrail.

How this fits with other research

Suzuki et al. (2023) extend the same idea to N-of-1 trend analyses. They also found that lower baseline variability boosts accuracy. Both papers tell you to eyeball a steady baseline before you trust any later change.

Lanovaz et al. (2019) add a twist. They show that when an effect is large and clear in the first A-B segment, it usually repeats in a full reversal design. K et al. give you confidence in simple A-B; Lanovaz et al. say you can often stop there if the jump is obvious.

Smith et al. (2022) widen the lens. They argue that non-concurrent multiple-baseline designs can be just as valid. Taken together, the four papers give you a menu: stable A-B, big visible change, or well-spaced baselines can all justify treatment without extra phases.

Why it matters

You can feel safer running a short A-B study when the baseline is flat and steady. Check for drift first; if you see none, the chance of a false alarm is tiny. This saves time in busy clinics and keeps parents from chasing ghost improvements. Pair the rule with Lanovaz’s big-effect shortcut and you will know when to push ahead or add more phases.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Plot your client’s baseline, draw a best-fit line; if it’s flat, proceed with intervention and single-phase evaluation.

02At a glance

Intervention

not applicable

Design

other

Finding

positive

03Original abstract

This study evaluated the probability of generating false positives with A-B graphs. We generated 1,000 graphs consisting of three stable A-phase data points at 25% and three random B-phase data points; 1,000 graphs consisting of three stable A-phase data points at 50% and three random B-phase data points; and 1,000 graphs consisting of three random A-phase data points and three random B-phase data points. Results indicate that false positives were produced for (a) a relatively high percentage of graphs containing nonrandom data points in the A phase and (b) less than 2% of graphs containing random data points in both the A and B phases. These findings suggest that A-B designs may be a stronger clinical tool for evaluating the effects of interventions than previously recognized.

Behavior modification, 2013 · doi:10.1177/0145445512468754