Assessing the effectiveness of programmed generalization.
Always adjust generalization data for baseline levels or you may celebrate a failure.
01Research in Context
What this study did
Kelly (1973) re-examined a famous generalization study by Walker and Buckley. The author showed that the original paper claimed big maintenance gains. But those gains vanished when you adjusted for where kids started.
The paper is a warning, not new data. It tells analysts to use change scores or ANCOVA before they brag about generalization.
What they found
Without baseline adjustment, the program looked like a winner. After adjustment, the same numbers said the program barely worked.
One simple math switch flipped the whole conclusion.
How this fits with other research
Pigott (1987) extends the same worry. The author adds a scatterplot trick. Plot baseline covariation between treated and untreated behaviors. If the lines already move together, your "generalization" might be pre-existing, not new.
Scheithauer et al. (2020) tests the idea in functional analysis. They show that different baseline sources give the same treatment decision. This sounds like it clashes with Kelly (1973), but it doesn’t. Scheithauer compared baseline sources, not baseline adjustment. Both papers agree: pick your baseline carefully, then analyze it right.
Grauerholz-Fisher et al. (2023) shows another baseline trap. Multiple-opportunity probes inflate baseline skill scores in task analyses. Again, the method you pick changes the story you tell.
Why it matters
Before you claim a skill "generalized," subtract where the client started. Run an ANCOVA or simple change score in Excel. Do it even when the graph looks great. This five-minute step can save you from picking the wrong intervention or over-selling results to parents and funders.
Want CEUs on This Topic?
The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.
Join Free →Open your last generalization graph and re-calculate gains as post-minus-pre scores.
02At a glance
03Original abstract
Issues related to assessing change and retention of change were discussed. An alternative analysis was suggested for the data of a recent study by Walker and Buckley (1972). These authors had found that peer reprogramming, equating stimulus conditions, teacher training, and control groups maintained 77, 74, 69, and 67%, respectively, of appropriate behavior produced in a token economy. Their analysis made no use of baseline levels. Two analyses incorporating baseline scores were suggested. One involved change scores; the other, analysis of covariance using baselines as the covariate. Problems with the data made a clear preference difficult, but it was concluded that either analysis would have resulted in conclusions different from those of Walker and Buckley.
Journal of applied behavior analysis, 1973 · doi:10.1901/jaba.1973.6-713