Assessment & Research

Predicting the Next Response: Demonstrating the Utility of Integrating Artificial Intelligence-Based Reinforcement Learning with Behavior Science

Cox et al. (2025) · Perspectives on Behavior Science

★ The Verdict

Feeding real-time operant data into a Q-learning algorithm lets you predict a person’s next response with 95% accuracy.

✓ Read this if BCBAs who run computer-equipped sessions and want live decision support.

✗ Skip if Practitioners without digital data collection or programming support.

01Research in Context

What this study did

Cox et al. (2025) built a computer model that learns like a person. They fed it real-time operant data: what response just happened and what consequence followed.

The model, called Q-learning, updated its guess after every response. The goal was to see if it could predict the next response a human would make.

What they found

The smart model hit 95% accuracy. It beat simpler models that ignored reinforcers.

Adding reward size and delay made the predictions sharper. The machine acted like a seasoned BCBA reading a pattern.

How this fits with other research

Morris et al. (2021) used the Evolutionary Theory of Behavior Dynamics to forecast self-injury. Both papers marry formal models with operant rules, but Cox swaps evolution for AI.

Chadwick et al. (2000) also let a computer track and reinforce human vocal responses in real time. Their 2000 speech-recognition setup did the recording; Cox’s 2025 setup adds prediction.

These studies do not clash—they stack. Each shows that handing data chores to a machine frees you to focus on teaching.

Why it matters

If you run sessions on a tablet or laptop, you already collect time-stamped response data. Pipe that stream into a Q-learning script and you get a live forecast of what the client will do next. You can then deliver the reinforcer a beat earlier, test a new contingency, or fade prompts before errors pile up. No extra staff, no clipboards—just faster, sharper instruction.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Export last session’s response-consequence log, run the free Q-learning code the authors posted, and watch the forecast bar tick up.

02At a glance

Intervention

not applicable

Design

single case other

Sample size

Population

not specified

Finding

strongly positive

Magnitude

large

03Original abstract

The concepts of reinforcement and punishment arose in two disparate scientific domains of psychology and artificial intelligence (AI). Behavior scientists study how biological organisms do behave as a function of their environment, whereas AI focuses on how artificial agents should behave to maximize reward or minimize punishment. This article describes the broad characteristics of AI-based reinforcement learning (RL), how those differ from operant research, and how combining insights from each might advance research in both domains. To demonstrate this mutual utility, 12 artificial organisms (AOs) were built for six participants to predict the next response they emitted. Each AO used one of six combinations of feature sets informed by operant research, with or without punishing incorrect predictions. A 13th predictive approach, termed “human choice modeled by Q-learning,” uses the mechanism of Q-learning to update context-response-outcome values following each response and to choose the next response. This approach achieved the highest average predictive accuracy of 95% (range 90%-99%). The next highest accuracy, averaging 89% (range: 85%–93%), required molecular and molar information and punishment contingencies. Predictions based only on molar or molecular information and with punishment contingencies averaged 71%–72% accuracy. Without punishment, prediction accuracy dropped to 47%–54%, regardless of the feature set. This work highlights how AI-based RL techniques, combined with operant and respondent domain knowledge, can enhance behavior scientists’ ability to predict the behavior of organisms. These techniques also allow researchers to address theoretical questions about important topics such as multiscale models of behavior and the role of punishment in learning.

Perspectives on Behavior Science, 2025 · doi:10.1007/s40614-025-00444-6