Statistical inference in behavior analysis: Some things significance testing does and does not do.
Let the second curve be your p-value—replicate, don’t calculate.
01Research in Context
What this study did
Branch (1999) wrote a plain-language essay. He asked one question: do p-values help or hurt behavior analysts?
He read psychology journals and spotted a pattern. Authors used big statistical tests to claim their graphs were “real.”
He argued this habit hides the actual data. The paper has no new numbers—just a warning.
What they found
The author found that p-values can fool us. A tiny p does not mean the behavior changed in a useful way.
He says the fix is simple: show the effect again in a new subject, a new room, a new day. If it repeats, it is real.
How this fits with other research
Lemons et al. (2015) counted every article in JEAB for 55 years. They saw inferential stats keep rising, just like Branch (1999) feared.
Busch et al. (2010) went the other way. They taught college kids to love t-tests with three short lessons. Branch (1999) would call this teaching the wrong skill.
The two papers seem to clash, but they talk past each other. Lemons et al. (2015) describe what we do; Branch (1999) tells us what we should do. Busch et al. (2010) show we can teach stats fast—yet that speed does not make the stats useful for single-case work.
Why it matters
You run sessions, not t-tests. Next time a graph looks good, skip the p-value. Run one more reversal or replicate with a new client. That second curve is your proof. Share that picture in your report and you follow N’s advice: let the behavior speak, not the statistics.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Pick one behavior, run one extra reversal, and plot it—no stats needed.
02At a glance
03Original abstract
Significance testing plays a prominent role in behavioral science, but its value is frequently overestimated. It does not estimate the reliability of a finding, it does not yield a probability that results are due to chance, nor does it usually answer an important question. In behavioral science it can limit the reasons for doing experiments, reduce scientific responsibility, and emphasize population parameters at the expense of behavior. It can, and usually does, lead to a poor approach to theory testing, and it can also, in behavior-analytic experiments, discount reliability of data. At best, statistical significance is an ancillary aspect of a set of data, and therefore should play a relatively minor role in advancing a science of behavior.
The Behavior analyst, 1999 · doi:10.1007/BF03391984