Assessment & Research

Replicability and randomization test logic in behavior analysis

Jacobs (2019) · Journal of the Experimental Analysis of Behavior

★ The Verdict

Randomization tests give you a p-value for single-case data without assuming normal distributions—use them instead of t-tests next time you analyze an AB design.

✓ Read this if BCBAs who analyze or publish single-case data.

✗ Skip if Clinicians who only read group studies.

01Research in Context

What this study did

Jacobs wrote a how-to paper, not an experiment. He looked at how we usually analyze single-case data. He asked: why are we still using t-tests that need big, normal-shaped groups?

He explained randomization tests. These tests count up what actually happened versus what could have happened if the treatment order were shuffled. No need for normal curves or large N.

What they found

The paper says randomization tests fit single-case logic better. You get a clean p-value for one learner without pretending the data are normal.

Jacobs showed the math is simple: list every possible order of A and B phases, compute the stat for each order, and see where your real result falls.

How this fits with other research

Manolov et al. (2022) built a free visual tool that uses the same shuffle logic. You upload your data and get a modified Brinley plot that tells you if the effect replicates across kids. It turns Jacobs’ idea into a one-click website.

Iversen (2025) goes smaller, not bigger. He says each single trial inside a session is already a mini-replication. You graph moment-to-moment responses to see stimulus control appear or vanish. Same goal as Jacobs: stop waiting for big groups and use the data you have.

Solanas et al. (2010) looks like a rival at first. They offer new slope and level estimators instead of randomization tests. But the two methods answer different questions: their tool describes size of change, Jacobs’ tool tests if change is unlikely under chance. You can run both on the same data set.

Why it matters

Next time you run an AB design, swap your t-test for a randomization test. Free online calculators now exist, so you can still get a p-value while staying true to single-case philosophy. Your graph stays small-N, but your inference gets stronger.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Plug your last AB graph into a free randomization-test website and compare the new p-value to the t-test you used before.

02At a glance

Intervention

not applicable

Design

theoretical

Finding

not reported

03Original abstract

Randomization tests are a class of nonparametric statistics that determine the significance of treatment effects. Unlike parametric statistics, randomization tests do not assume a random sample, or make any of the distributional assumptions that often preclude statistical inferences about single-case data. A feature that randomization tests share with parametric statistics, however, is the derivation of a p-value. P-values are notoriously misinterpreted and are partly responsible for the putative "replication crisis." Behavior analysts might question the utility of adding such a controversial index of statistical significance to their methods, so it is the aim of this paper to describe the randomization test logic and its potentially beneficial consequences. In doing so, this paper will: (1) address the replication crisis as a behavior analyst views it, (2) differentiate the problematic p-values of parametric statistics from the, arguably, more useful p-values of randomization tests, and (3) review the logic of randomization tests and their unique fit within the behavior analytic tradition of studying behavioral processes that cut across species.

Journal of the Experimental Analysis of Behavior, 2019 · doi:10.1002/jeab.501