Assessment & Research

The behavior equivalence problem in within-subject treatment comparisons.

Romer et al. (1988) · Research in developmental disabilities 1988
★ The Verdict

Match task difficulty before you compare treatments within one learner so the results show method effects, not task drift.

✓ Read this if BCBAs who run alternating treatments or multielement designs in schools, clinics, or vocational programs.
✗ Skip if Practitioners who only run between-subject groups or pure baseline-to-treatment designs.

01Research in Context

01

What this study did

Stoddard et al. (1988) wrote a how-to paper, not an experiment. They asked: how can we compare two teaching methods on the same worker if the jobs are different?

Their fix is to build small groups of tasks that feel equally hard. They call these groups item cohorts. You pick one task from each cohort for each method you want to test.

02

What they found

The paper gives a step-by-step recipe. First, rank every task by how many correct responses it takes to finish. Next, bundle tasks with close scores into cohorts. Last, draw one task per cohort for each treatment arm.

This keeps difficulty the same across treatments, so any change you see is more likely from the teaching method, not from the job being easier or harder.

03

How this fits with other research

McMillan (1973) warned that sequence effects can fool you in within-subject designs. T et al. answer that warning with a practical tool: item cohorts.

McLennan et al. (2008) later showed kids shift to easier work even when rewards stay equal. Their data back up the warning T et al. tried to solve.

Cariveau et al. (2021) scanned 30 years of adapted alternating treatments studies. They found most papers skip any difficulty check. They echo T et al.'s call and extend it to academic targets like sight words.

04

Why it matters

Next time you run an alternating treatments design, take 15 minutes to build item cohorts. List your targets, score them for difficulty, then match them across conditions. This tiny step blocks a major threat to internal validity and makes your data cleaner without extra sessions.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Rank your current targets by baseline accuracy and pair the closest scores across your two treatment sets.

02At a glance

Intervention
not applicable
Design
methodology paper
Finding
not reported

03Original abstract

Within-subject comparisons of multiple treatment effects raise a variety of issues for applied researchers. They include potential nonreversibility of behaviors, practice or habituation effects resulting from repeated presentations of the same stimulus, and the possibility of multiple-treatment interference. It has recently been suggested that the use of item cohorts with equivalent behavioral difficulty addresses those problems. In order to meet the needs of researchers whose primary interest is in domestic, vocational, or other nonacademic skills, a procedure is described for estimating equivalent difficulty for different vocational preparation tasks.

Research in developmental disabilities, 1988 · doi:10.1016/0891-4222(88)90007-8