Validity and Reliability Evidence for Assessments Based in Applied Behavior Analysis: A Systematic Review.
ABA's favorite checklists and apps often lack solid reliability and validity, so verify before you rely.
01Research in Context
What this study did
Howard et al. (2023) hunted for solid proof that ABA-based tests actually work. They pulled every paper that checked reliability or validity of skill checklists, preference screens, or data apps. The team graded how good the evidence was.
They scanned work from many labs and clinics. They wanted to see if the tools you use every day hold up under psychometric rules.
What they found
Most ABA tools come with weak or mixed proof. Some show good numbers in one study but fall apart in the next. The gap between daily use and hard evidence is wide.
Bottom line: popular criterion-referenced tests lack steady, strong reliability and validity data.
How this fits with other research
Anonymous (2024) gives a bright spot. They tested the Catalyst Datafinch app with 363 autistic clients and found high internal consistency. This seems to clash with L et al.'s gloomy view. The gap closes when you see the difference: Catalyst is one tool with fresh, direct data, while L et al. averaged many tools, most with poor or old reports.
Rodgers et al. (2021) and Van Gaasbeek et al. (2026) show early ABA programs can produce real child gains. Those outcome numbers depend on the same assessments L et al. now question. The newer outcome meta-analyses extend the story, but they also rest on the shaky instruments the review warns about.
Bush et al. (2021) and Johnson et al. (2021) found behavior-analytic teaching and NCR work, yet they echo the call for better real-world measurement. All these papers line up: intervention success looks promising, but we need tougher, cleaner tests to track it.
Why it matters
Before you pick your next skill tracker or preference screen, demand the psychometric report. If the manual only gives one alpha from 1998, treat it like a draft, not a done deal. Push publishers for reliability updates, run your own inter-observer checks, and triangulate with direct data. Good clinical decisions rest on good numbers; this review tells you to verify, not trust.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Email one assessment company and ask for this year's reliability study; if they can't send it, schedule an inter-observer agreement probe with your team.
02At a glance
03Original abstract
The current article presents the findings from a systematic review of the available reliability and validity evidence supporting the use of criterion-referenced assessments based on the applied behavior analysis framework. We identified 46 studies that reported reliability and/or validity evidence for six assessments, 37 of which presented reliability evidence and 43 presented validity evidence. Additionally, we extracted and summarized information related to participant characteristics (e.g., age, sex, diagnosis), geographic location, and research setting (e.g., residential facility, home). Overall, we found conflicting support for the use of the assessments. When coupled with the reported usage by behavior analysis professionals, our findings suggest a misalignment between the reportedly used assessments and the number of published studies providing validity and/or reliability evidence. We found inconsistent use of measurement-related vocabulary and that many studies could have been strengthened by conducting different statistical analyses. We provide a summary of studies, findings, and offer recommendations for clinical practice and future measurement research.
Behavior modification, 2023 · doi:10.1177/01454455221098151