Assessment & Research

The behavior equivalence problem in within-subject treatment comparisons.

Romer et al. (1988) · Research in developmental disabilities

★ The Verdict

Match task difficulty before you compare treatments within one learner so the results show method effects, not task drift.

✓ Read this if BCBAs who run alternating treatments or multielement designs in schools, clinics, or vocational programs.

✗ Skip if Practitioners who only run between-subject groups or pure baseline-to-treatment designs.

01Research in Context

What this study did

Stoddard et al. (1988) wrote a how-to paper, not an experiment. They asked: how can we compare two teaching methods on the same worker if the jobs are different?

Their fix is to build small groups of tasks that feel equally hard. They call these groups item cohorts. You pick one task from each cohort for each method you want to test.

What they found

The paper gives a step-by-step recipe. First, rank every task by how many correct responses it takes to finish. Next, bundle tasks with close scores into cohorts. Last, draw one task per cohort for each treatment arm.

This keeps difficulty the same across treatments, so any change you see is more likely from the teaching method, not from the job being easier or harder.

How this fits with other research

McMillan (1973) warned that sequence effects can fool you in within-subject designs. T et al. answer that warning with a practical tool: item cohorts.

McLennan et al. (2008) later showed kids shift to easier work even when rewards stay equal. Their data back up the warning T et al. tried to solve.

Cariveau et al. (2021) scanned 30 years of adapted alternating treatments studies. They found most papers skip any difficulty check. They echo T et al.'s call and extend it to academic targets like sight words.

Why it matters

Next time you run an alternating treatments design, take 15 minutes to build item cohorts. List your targets, score them for difficulty, then match them across conditions. This tiny step blocks a major threat to internal validity and makes your data cleaner without extra sessions.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Rank your current targets by baseline accuracy and pair the closest scores across your two treatment sets.

02At a glance

Intervention

not applicable

Design

methodology paper

Finding

not reported

03Original abstract

Within-subject comparisons of multiple treatment effects raise a variety of issues for applied researchers. They include potential nonreversibility of behaviors, practice or habituation effects resulting from repeated presentations of the same stimulus, and the possibility of multiple-treatment interference. It has recently been suggested that the use of item cohorts with equivalent behavioral difficulty addresses those problems. In order to meet the needs of researchers whose primary interest is in domestic, vocational, or other nonacademic skills, a procedure is described for estimating equivalent difficulty for different vocational preparation tasks.

Research in developmental disabilities, 1988 · doi:10.1016/0891-4222(88)90007-8