Detecting false positives in A-B designs: potential implications for practitioners.
A calm, trend-free baseline gives fewer than two false positives out of a hundred, so you can trust a simple A-B design when the data look flat.
01Research in Context
What this study did
Bigham et al. (2013) ran computer simulations of A-B graphs. They wanted to know how often a stable baseline would fool you into thinking an intervention worked. They tested thousands of fake data sets that had no real treatment effect.
Each graph had a flat, steady baseline. The team then added random bounce and looked for false alarms — times the data crossed a preset decision rule and cried 'success' when nothing actually happened.
What they found
False positives stayed below two percent when the baseline showed no trend. In plain words, a flat, calm baseline rarely cried wolf. The design looked safe for everyday clinical use.
The result held across different amounts of bounce and different decision rules. Stability, not perfection, was the key guardrail.
How this fits with other research
Suzuki et al. (2023) extend the same idea to N-of-1 trend analyses. They also found that lower baseline variability boosts accuracy. Both papers tell you to eyeball a steady baseline before you trust any later change.
Lanovaz et al. (2019) add a twist. They show that when an effect is large and clear in the first A-B segment, it usually repeats in a full reversal design. K et al. give you confidence in simple A-B; Lanovaz et al. say you can often stop there if the jump is obvious.
Smith et al. (2022) widen the lens. They argue that non-concurrent multiple-baseline designs can be just as valid. Taken together, the four papers give you a menu: stable A-B, big visible change, or well-spaced baselines can all justify treatment without extra phases.
Why it matters
You can feel safer running a short A-B study when the baseline is flat and steady. Check for drift first; if you see none, the chance of a false alarm is tiny. This saves time in busy clinics and keeps parents from chasing ghost improvements. Pair the rule with Lanovaz’s big-effect shortcut and you will know when to push ahead or add more phases.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Plot your client’s baseline, draw a best-fit line; if it’s flat, proceed with intervention and single-phase evaluation.
02At a glance
03Original abstract
This study evaluated the probability of generating false positives with A-B graphs. We generated 1,000 graphs consisting of three stable A-phase data points at 25% and three random B-phase data points; 1,000 graphs consisting of three stable A-phase data points at 50% and three random B-phase data points; and 1,000 graphs consisting of three random A-phase data points and three random B-phase data points. Results indicate that false positives were produced for (a) a relatively high percentage of graphs containing nonrandom data points in the A phase and (b) less than 2% of graphs containing random data points in both the A and B phases. These findings suggest that A-B designs may be a stronger clinical tool for evaluating the effects of interventions than previously recognized.
Behavior modification, 2013 · doi:10.1177/0145445512468754