Machine learning to analyze single‐case graphs: A comparison to visual inspection
Machine-learning models judged 1,480 single-case graphs more consistently and with better error balance than expert visual inspection.
01Research in Context
What this study did
Lanovaz and colleagues built computer models that read single-case graphs. They trained the models on 1,024 fake AB graphs that had known answers.
Next they asked 16 board-certified behavior analysts to judge the same graphs by eye. The team then compared who got more calls right—people or code.
What they found
The machine-learning models made fewer wild guesses. They balanced false alarms and missed effects better than the human experts did.
Even the best visual inspectors disagreed with each other. The code gave the same verdict every time.
How this fits with other research
Ferron et al. (2017) already showed that masked visual analysis keeps error rates low. Lanovaz adds a new layer: let a computer do the masking for you.
Adams et al. (2024) pushed the same idea into functional-analysis data. Their simple script now agrees with experts 89 % of the time—up from 81 %—showing the field is moving fast.
Wolfe et al. (2026) looked at masked versus traditional visual analysis and found so-so agreement. The 2021 machine approach may solve that reliability problem altogether.
Why it matters
If you run single-case studies, you can start testing free machine-learning tools on your own graphs. Upload an AB plot, let the model vote, then compare its call with yours. Over time you will see whether the code saves you from false positives and from long team debates about ‘Do you see an effect?’
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Feed one of your recent AB graphs into an open-source decision tool and compare the model’s verdict with your visual call.
02At a glance
03Original abstract
Behavior analysts commonly use visual inspection to analyze single‐case graphs, but studies on its reliability have produced mixed results. To examine this issue, we compared the Type I error rate and power of visual inspection with a novel approach—machine learning. Five expert visual raters analyzed 1,024 simulated AB graphs, which differed on number of points per phase, autocorrelation, trend, variability, and effect size. The ratings were compared to those obtained by the conservative dual‐criteria method and two models derived from machine learning. On average, visual raters agreed with each other on only 75% of graphs. In contrast, both models derived from machine learning showed the best balance between Type I error rate and power while producing more consistent results across different graph characteristics. The results suggest that machine learning may support researchers and practitioners in making fewer errors when analyzing single‐case graphs, but replications remain necessary.
Journal of Applied Behavior Analysis, 2021 · doi:10.1002/jaba.863