Assessment & Research

Assessing the effectiveness of programmed generalization.

Cone (1973) · Journal of applied behavior analysis

★ The Verdict

Always adjust generalization data for baseline levels or you may celebrate a failure.

✓ Read this if BCBAs who write or review generalization probes in any setting.

✗ Skip if Clinicians only running brief discrete-trial sessions with no post-probe plans.

01Research in Context

What this study did

Kelly (1973) re-examined a famous generalization study by Walker and Buckley. The author showed that the original paper claimed big maintenance gains. But those gains vanished when you adjusted for where kids started.

The paper is a warning, not new data. It tells analysts to use change scores or ANCOVA before they brag about generalization.

What they found

Without baseline adjustment, the program looked like a winner. After adjustment, the same numbers said the program barely worked.

One simple math switch flipped the whole conclusion.

How this fits with other research

Pigott (1987) extends the same worry. The author adds a scatterplot trick. Plot baseline covariation between treated and untreated behaviors. If the lines already move together, your "generalization" might be pre-existing, not new.

Scheithauer et al. (2020) tests the idea in functional analysis. They show that different baseline sources give the same treatment decision. This sounds like it clashes with Kelly (1973), but it doesn’t. Scheithauer compared baseline sources, not baseline adjustment. Both papers agree: pick your baseline carefully, then analyze it right.

Grauerholz-Fisher et al. (2023) shows another baseline trap. Multiple-opportunity probes inflate baseline skill scores in task analyses. Again, the method you pick changes the story you tell.

Why it matters

Before you claim a skill "generalized," subtract where the client started. Run an ANCOVA or simple change score in Excel. Do it even when the graph looks great. This five-minute step can save you from picking the wrong intervention or over-selling results to parents and funders.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

Open your last generalization graph and re-calculate gains as post-minus-pre scores.

02At a glance

Intervention

not applicable

Design

theoretical

Finding

not reported

03Original abstract

Issues related to assessing change and retention of change were discussed. An alternative analysis was suggested for the data of a recent study by Walker and Buckley (1972). These authors had found that peer reprogramming, equating stimulus conditions, teacher training, and control groups maintained 77, 74, 69, and 67%, respectively, of appropriate behavior produced in a token economy. Their analysis made no use of baseline levels. Two analyses incorporating baseline scores were suggested. One involved change scores; the other, analysis of covariance using baselines as the covariate. Problems with the data made a clear preference difficult, but it was concluded that either analysis would have resulted in conclusions different from those of Walker and Buckley.

Journal of applied behavior analysis, 1973 · doi:10.1901/jaba.1973.6-713