Predict, Control, and Replicate to Understand: How Statistics Can Foster the Fundamental Goals of Science
Swap p-values for Bayesian predictions and you can spot replicable effects before you collect the first data point.
01Research in Context
What this study did
Killeen (2019) wrote a think-piece. He says p-values are broken. He wants scientists to forecast results before they run studies.
The paper is conceptual. No new data. It maps how Bayesian or predictive metrics beat null-hypothesis tests.
What they found
The big idea: replace p with prediction. State what you expect, collect data, then check how close you were.
This swap makes replication part of the design, not an afterthought.
How this fits with other research
Branch (2019) agrees p-values hurt replication, but pushes behavior-analytic replication instead of Bayesian math. Both want the same fix—more built-in checks—through different doors.
Franck et al. (2019) show the Bayesian door in action. They re-analyzed delay-discounting data with credible intervals, giving the concrete recipe Killeen only describes.
Bacon et al. (1998) beat everyone to the punch. They told behavior analysts to drop inferential stats entirely and trust visual analysis. Killeen (2019) updates that call by offering Bayesian tools rather than none at all.
Why it matters
If you write or review studies, stop letting p-values do the thinking. Spell out your predicted effect size, plug it into a Bayes factor or credible interval, then run the study. This habit tells you, and every reader, how likely the finding is to repeat. Start small: add one Bayesian contrast to your next single-case design and compare the forecast to what the graph shows.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Write a numeric prediction for your next client target, then mark on the graph if the data land inside your forecast range.
02At a glance
03Original abstract
Scientists abstract hypotheses from observations of the world, which they then deploy to test their reliability. The best way to test reliability is to predict an effect before it occurs. If we can manipulate the independent variables (the efficient causes) that make it occur, then ability to predict makes it possible to control. Such control helps to isolate the relevant variables. Control also refers to a comparison condition, conducted to see what would have happened if we had not deployed the key ingredient of the hypothesis: scientific knowledge only accrues when we compare what happens in one condition against what happens in another. When the results of such comparisons are not definitive, metrics of the degree of efficacy of the manipulation are required. Many of those derive from statistical inference, and many of those poorly serve the purpose of the cumulation of knowledge. Without ability to replicate an effect, the utility of the principle used to predict or control is dubious. Traditional models of statistical inference are weak guides to replicability and utility of results. Several alternatives to null hypothesis testing are sketched: Bayesian, model comparison, and predictive inference (prep). Predictive inference shows, for example, that the failure to replicate most results in the Open Science Project was predictable. Replicability is but one aspect of scientific understanding: it establishes the reliability of our data and the predictive ability of our formal models. It is a necessary aspect of scientific progress, even if not by itself sufficient for understanding.
Perspectives on Behavior Science, 2019 · doi:10.1007/s40614-018-0171-8