Assessment & Research

Methodological issues in group-matching designs: alpha levels for control variable comparisons and measurement characteristics of control and target variables.

Mervis et al. (2004) · Journal of autism and developmental disorders

★ The Verdict

Stop trusting studies that use age-equivalent scores or p > .01 to claim groups are matched.

✓ Read this if BCBAs who review research or train staff to read studies critically.

✗ Skip if RBTs who only implement protocols and don’t evaluate research quality.

01Research in Context

What this study did

The authors looked at how researchers pick control groups in developmental-disability studies.

They asked: When we say two groups are 'matched,' what rules should we follow?

They focused on two big problems: setting the right alpha level for matching tests and avoiding age-equivalent scores.

What they found

Using age-equivalent scores (like saying a young learners has a 'young learners level') hides real differences between groups.

Researchers should use stricter alpha levels (like p < .01 instead of p < .05) when testing if groups are truly matched.

The paper gives clear steps: measure control variables reliably, test group differences with conservative stats, and never rely on age-equivalents.

How this fits with other research

Flapper et al. (2013) built on these rules. They said effect sizes and variance ratios matter more than p-values when claiming groups are equivalent.

Jarrold et al. (2004) agreed on the core problem but focused on autism studies. They suggested using control-task matching instead of IQ-only matching.

Zhou et al. (2018) showed many motor-skills studies still use weak matching. Their review proves B et al.'s 2004 advice is still ignored in practice.

Why it matters

Next time you read a study claiming groups are 'matched,' check if they used age-equivalent scores or loose p-values. If they did, the groups might not be as similar as claimed. This protects your clinical decisions from shaky research.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →

→ Action — try this Monday

Pull your last journal article and check the 'Participants' section—if age-equivalent scores were used to claim groups were matched, flag it as weak evidence.

02At a glance

Intervention

not applicable

Design

methodology paper

Population

developmental delay

Finding

not reported

03Original abstract

Group-matching designs are commonly used to identify the diagnosis-specific characteristics of children with developmental disabilities. In this paper, we address three issues central to the use of this design. The first concerns the alpha level to be used for considering groups to be matched on the control variable(s). The second involves the measurement characteristics of the control and target variables. We discuss the properties of standard scores, raw scores, and age equivalents and argue against the use of age equivalents. In addition, we consider the appropriateness of the commonly made prediction that groups that are matched for a control variable such as language ability or nonverbal reasoning ability but are not matched for chronological age should perform at equivalent levels on the target variable. Finally, we discuss issues related to the interpretation of significant between-group differences on the target variable, assuming groups are well-matched on the control variables, and describe the benefits of a method that focuses on characterizing a disorder on a case-by-case basis and then aggregating the cases, using the measures of sensitivity and specificity from signal detection theory.

Journal of autism and developmental disorders, 2004 · doi:10.1023/b:jadd.0000018069.69562.b8