Assessment & Research

Is the Intellectual Functioning Component of AAIDD's 12th Manual Satisficing?

McGrew (2021) · Intellectual and developmental disabilities 2021
★ The Verdict

The AAIDD 12th manual endorses CHC theory but gives conflicting advice on using part scores in place of full-scale IQ—clinicians must look elsewhere for clear rules.

✓ Read this if BCBAs who complete or review ID assessments for school or waiver eligibility.
✗ Skip if RBTs who only run skill-acquisition programs and never see assessment reports.

01Research in Context

01

What this study did

The author read the AAIDD 12th manual and asked a simple question. Does the IQ section give clear steps for diagnosing intellectual disability?

The paper is a position piece. It walks through the manual’s advice on using part-scores instead of full-scale IQ. No new data were collected.

02

What they found

The manual praises CHC theory but then gives fuzzy rules. It says you can swap in index scores, yet never tells you when or how.

The result is a recipe for different clinicians to call the same person ID or not ID.

03

How this fits with other research

Howard et al. (2023) looked at every ABA-based test and found most lack solid reliability or validity. Together, the two papers warn that both AAIDD guidance and common ABA tools are shaky.

Mottron (2004) showed that PPVT and Raven over-estimate IQ in autism. S (2021) widens the worry, saying any part-score can mislead if rules are unclear.

Colbert et al. (2020) offer a possible fix: a quick 15-minute RAI+ that tracks relational skills. The pair points toward shorter, function-based IQ proxies instead of vague part-score swaps.

04

Why it matters

If you evaluate ID, you now know the AAIDD manual alone will not tell you which IQ number to use. Check psychometric evidence for every sub-test, and write your justification in the report. Until clearer rules arrive, lean on full-scale IQ or validated brief tools like RAI+ rather than mixing index scores on a hunch.

Free CEUs

Want CEUs on This Topic?

The ABA Clubhouse has 60+ free CEUs — live every Wednesday. Ethics, supervision & clinical topics.

Join Free →
→ Action — try this Monday

Pull your last three ID reports and add one sentence that cites the specific IQ test manual for any part-score you used.

02At a glance

Intervention
not applicable
Design
theoretical
Population
intellectual disability
Finding
not reported

03Original abstract

In the world of design and decision making, perfect or optimal solutions typically only work in simplified worlds. In the complex constraint-driven nature of reality, satisfactory and sufficient (satisficing) designs and decisions are the norm (Leahey, 2003; Simon, 1956). Cleary a manual produced with input from a large and diverse committee of experts and stakeholders, when committees have been characterized as “a cul-de-sac down which ideas are lured and then quietly strangled” (Barnett Cocks, 1973), will not be perfect.Given this context, the opinions expressed here are based on the author's 45+ years of experience, including 12 years as a practicing school psychologist, an intelligence researcher and scholar, a university professor, an author of a major intelligence test (Woodcock-Johnson IV [WJ IV]), and a frequent expert regarding the intelligence quotient (IQ) prong in intellectual disability (ID) death penalty cases following Atkins v. Virginia. This author offers opinions on whether the American Association on Intellectual and Developmental Disabilities (AAIDD) 12th edition of Intellectual Disability: Definition, Diagnosis, Classification, and Systems of Support (Schalock et al., 2021; hereafter referred to as the purple manual) provides satisficing treatment on a handful of select issues. The complex and unresolved issue of using part scores for proxies of general intelligence (g) receives the largest discussion.Yes. Satisficing. Grade B+. As I argued in 2009/2010 (tinyurl.com/5dsaqh43), the Cattell-Horn-Carroll (CHC) theory of intelligence was, at the time of the 11th edition of the AAIDD manual (hereafter referred to as the green manual; Schalock et al., 2010), the consensus taxonomy of cognitive abilities (Floyd et al., 2021; McGrew, 2005, 2009, 2015; Schneider & McGrew, 2012, 2018; Watson, 2015). The purple manual now recognizes this consensus by stating, “The approach to intellectual assessment used in this manual incorporates the Cattell-Horn-Carroll theory of intelligence, which is currently the most comprehensive and empirically supported theory of intelligence” (Schalock et al., 2021, p. 25). The AAIDD purple manual IQ prong is now firmly grounded in contemporary intelligence theory and research evidence.This author frequently finds CHC analysis of IQ scores from different IQ tests, or an earlier version from a series of related tests (e.g., various editions of the Wechsler Intelligence Scale for Children and the Wechsler Adult Intelligence Scale; McGrew, 2015; Watson, 2015), useful when explaining to others the actual consistency of an individual's abilities across time or tests that, at first blush, may appear as inconsistency when only attending to full-scale IQ scores. AAIDD's formal recognition of the CHC theory supports this type of analysis. I would have liked to have seen the inclusion of a CHC model figure (of which many exist in a variety of publications) and a brief table of CHC broad ability construct definitions. Users will need to consult other sources such as Floyd et al. (2021) and Schneider and McGrew (2018).An A grade was not assigned given the manuals inadvertent muddying of the CHC waters. The CHC glossary definition only mentions fluid (Gf) and crystallized (Gc) intelligence. Gf and Gc are also the only broad CHC abilities mentioned on pages 25–28 (save one exception noted below) and are featured in Table 3 of the manual. Gf and Gc are indeed the consensus king and queen of the CHC taxonomy (McGrew, 2015; Schneider & McGrew, 2012, 2018). “These factors appear to have a degree of centrality in relationship to other intellectual abilities and are the broad ability factors most closely associated with the general factor of intelligence” (emphasis added; Watson, 2015, p. 128). However, the CHC abilities comprising most contemporary IQ tests may also include Gv (visual-spatial processing), Ga (auditory processing), Gwm (short-term working memory), Gs (processing speed), Gl (learning efficiency), or Gr (retrieval fluency) abilities (McGrew, 2015; Schneider & McGrew, 2012, 2018). (Please note that this author would typically not mention such a minor error, but the manual incorrectly references learning efficiency [Gl] as “efficiency,” as described in the AAIDD-referenced source [Schneider & McGrew, 2018]. However, as discussed later, the large number of copyedit errors in the purple manual tarnish its authoritative stature. Another example is that when mentioning crystallized intelligence across pages 25–28 and in the CHC definition in the glossary [p. 118], the term is used 12 times, and is incorrectly spelled crystalized [sic] in 11 of the 12 instances.) The manual mentions these “additional” abilities after the king and queen (Gf and Gc) are first anointed as the basis of the full-scale score that represents general intelligence—“the full-scale IQ score is based on general intelligence (i.e., g) that encompasses crystalized [sic] intelligence and fluid intelligence part scores, along with as many as six additional broad-strata abilities” (emphasis added; p. 27). The repeated reference and preferential treatment of fluid and crystallized intelligence suggests AAIDD implicitly or explicitly endorses only a partial CHC model, or is trying to straddle a theorical or measurement issue fence (e.g., see discussion of part scores vs. full-scale IQ scores).Yes and no. Marginally satisficing. Grade B-. In high-stakes IQ prong settings (e.g., social security and special education eligibility, Atkins death penalty cases), certain measurement issues almost always require attention.The purple manual, either in the glossary, text, or in both, provides satisficing treatment of SEM, confidence band intervals, norm obsolescence or the Flynn effect, and practice effects. However, this author was frustrated when trying to locate clear AAIDD descriptions or guidance for certain measurement issues. For example, the explanation of practice effects, although not in the green manual glossary, was listed in the green manual topic index that conveniently directed the reader to page 38 for a definition and practice guidance. Practice effects are now included under progressive error in the purple manual glossary, which may not be immediately apparent or familiar to all users, and only receives a mention in Table 3.5 and a passing comment on page 39 (“frequent re-administrations may lead to overestimating the examinee's true intelligence [i.e., practice effects]”). The description of the Flynn effect is mysteriously described under the second level subheading of “Making a Retrospective Diagnosis.” A Flynn effect adjustment is a function of the time between when an IQ test was administered and the norming year of the test, temporally orthogonal to whether a potential diagnosis is retrospective or not.There are other examples of the need to search for the needle (i.e., term, definition, practice guidance) in the haystack (the body of the manual) that could be resolved by retaining the topic index from the green manual. The inability to readily locate the corpus of AAIDD's treatment for key measurements terms and concepts in a coherent tractable manner is frustrating. This annoyance is exacerbated when attempting to crosswalk the same measurement concepts between the green and purple manuals.Finally, many high-stakes ID cases often include case files that include multiple IQ scores across time or from different IQ tests. Some form of guidance, at minimum in a passing reference, to the issues of the convergence of indicators and IQ score exchangeability would have been useful. Users will need to go beyond the AAIDD manual for guidance (see Floyd et al., 2021; McGrew, 2015; and Watson, 2015).Not satisficing. Grade C. This marginally passing grade is due to AAIDD's part-score position: (a) being inconsistent and confusing within the manual, (b) being at variance with other authoritative sources, and (c) not recognizing central scientific and legal tenants that underlie the complex issue. AAIDD needs to address the part-score issue with preemptive vigor to mitigate confusion and potential misuse of its ambiguous statements. Otherwise, legal entities may fill the void and prescribe a variety of case-specific remedies of dubious quality.The manual states that “Part scores should not be used in determining whether the individual's level of intellectual functioning … the current evidence indicates that there is no reason to question the validity of the full-scale IQ, even in individual cases where there is significant part/factor score profiles” (emphasis added; p. 28). The “just say no to part scores” position seems clear in the statement that “Gf or Gc scores [sic] should not be used as a proxy for general intelligence, even in unusual cases, such as when there is a substantial spread of subtest scores (emphasis added; p. 28).” (Please note the manual's inattention to what I call “misplaced italics” [e.g., Gc scores] is, unfortunately, frequent in the manual. I comment on this and other numerous copyedit errors later.) Yet, in the next sentence there is a suggestion that Gf and Gc part scores can be used: “Consistent with current thinking … the valid use of intelligence part scores requires at least 3-6 subtests [emphasis added] of Gf and [sic] Gc” (p. 28). Furthermore, by featuring crystallized and fluid intelligence part or factor scores from common IQ tests in Table 3.2, there is the implicit suggestion that fluid and crystallized part scores hold special value. AAIDD's ambiguous part-score statements only muddy an already contentious and complex issue in high-stakes ID diagnostic settings.In AAIDD's The Death Penalty and Intellectual Disability (Polloway, 2015), both McGrew (2015) and Watson (2015) suggest that part scores can be used in special cases. (Note that these two chapters, although published in an AAIDD book, do not necessarily represent the official position of AAIDD.) The limited use of part scores is also described in the 2002 National Research Council book on ID and social security eligibility (see McGrew, 2015; Watson, 2015). The authoritative Diagnostic and Statistical Manual of Mental Disorder—Fifth Edition (DSM-5) manual implies that part scores may be necessary when it states that “highly discrepant subtest scores may make an overall IQ score invalid” (American Psychiatric Association, 2013, p. 37). Finally, in the recent APA Handbook of Intellectual and Developmental Disabilities (Glidden, 2021), Floyd et al. state “in rare situations in which the repercussions of a false negative diagnostic decision would have undue or irreparable negative impact upon the client, a highly g-loaded part score (see McGrew, 2015a) might be selected to represent intellectual functioning” (emphasis added; p. 412).Specifying (either implicitly or explicitly) fluid and crystallized intelligence measures as the most valid g-proxies for unique cases fails to recognize an important distinction between gf /gc and Gf/Gc. As written, the purple manual references the broad Gf and Gc abilities as per contemporary CHC theory. However, it is not often understood that Horn and Carroll's broad Gf and Gc abilities are not isomorphic with Cattell's two gf /gc general abilities, constructs that are more consistent with the notion of general intelligence (g) as articulated by Cattell's mentor, Spearman (see Schneider & McGrew, 2018). (Note that CHC or the three-stratum Gf-Gc theory differentiates abilities at three levels [strata] of generality. General intelligence [g] is the most general and is at the apex [stratum III] of the hierarchy. Broad CHC abilities [Gf, Gc, Gv, etc.] are at stratum II, and narrow abilities at stratum I are subsumed by the broad abilities.) The purple manual's deference to fluid and crystallized intelligence, and particularly the passing mention of both abilities as potentially suitable part scores to represent general intelligence (see page 28), has a clear Cattell general ability (stratum III) construct ring, not the narrower (broad) notions of CHC Gf and Gc associated with the CHC theory endorsed in the manual. Perhaps this disconnect is the reason for the manual's ambiguous and contradictory treatment of fluid and crystallized intelligence g-proxy part scores.AAIDD needs to provide guidance regarding whether g-proxy measures should be more broad-like CHC Gf and Gc composites present in most contemporary IQ batteries or more Cattell-like general gf /gc composites. For example, the Wechsler Intelligence Scale for Children Fifth Edition (WISC-V) provides four-subtest Expanded Verbal (crystallized intelligence; Gc) and Expanded Fluid Index (Gf) scores that are consistent with the broad Gc and Gf CHC constructs. The Woodcock-Johnson III (WJ III) had a four-test Thinking Ability cluster that was more akin to Cattell's general gf, as it was comprised of tests that measured Gf, Gv, Ga, and Glr (now split into Gl and Gr). Interestingly, the Comprehensive Test of Nonverbal Intelligence—Second Edition (CTONI-2), typically considered a special purpose test, produces a six-test Gf-like score that is likely a more robust Gf measure than any Gf score from any individually administered IQ test. The popular CHC cross-battery assessment and interpretation methods and software (Flanagan et al., 2013) allow users the ability to generate unique mixtures of broad CHC-like Gf and Gc scores across multiple test batteries for as many individual tests a psychologist desires. Schneider (2013) has also provided information on formulas (and software tools; tinyurl.com/3oj2wu79; tinyurl.com/nn11zg81) to calculate statistically sound clusters for any mixture of tests.With the clear movement to flexible tablet-based digital test libraries and centralized online scoring platforms, publishers are soon likely to provide a menu-driven test selection approach where users can obtain broad CHC-like Gf and Gc scores based on three to four (or more) tests from the same battery of co-normed tests, or across different test batteries within a publisher's stable of test products. For the test battery this author coauthors (WJ IV), three-test Gf and Gc CHC broad clusters are available. By ignoring the WJ IV packaging boundaries of the Cognitive, Oral Language, and Achievement batteries, with minimal psychometric work and a software patch to the online scoring platform, four-test Gf and up to seven-test Gc (Schneider, 2016) IQ scores could readily be made available. Depending on which broad CHC abilities one considers as representing a general Cattell gf score (e.g., Gf, Gv, Ga, Gl, Gr), the current WJ IV could generate such a score based on five to approximately a dozen tests. Without theoretically and psychometrically sound guidance, there is the increased possibility of fluid and crystallized part-score IQ roulette.The core part- vs. full-scale IQ score issue, in part, reflects a fundamental tension between science and law. “While science attempts to discover universals hiding among the particulars, trial courts attempt to discover the particulars hiding among the universals” (Faigman, 1999, p. 69). A central issue is whether the scientific principle of ergodicity holds. In simple terms, do group-based research findings generalize or remain invariant when applied to individuals (Fisher et al., 2018; Gomes et al., 2019)? In the courts this is referred to as the General-2-individual or G2i principle (Faigman et al., 2017; National Academies of Sciences, Engineering, and Medicine, 2018). Group-based research consistently suggests that discrepant part scores do not invalidate full-scale IQ scores (Floyd et al., 2021). However, the ergodicity and G2i principles have not been proven to hold in the form of knowing, with any degree of certainty, that for any individual the group-based part vs. full-scale research findings may or may not apply to a specific individual. In fact, most all psychological processes are nonergodic (Gomes et al., 2019). In a unique n = 1 high-stakes setting, a psychologist may be ethically obligated to proffer an expert opinion whether the full-scale score is (or is not) the best indicator of general intelligence. There must be room for the judicious use of clinical judgment-based part scores. AAIDD's purple manual complicates rather than elucidates guidance for psychologists and the courts. In high-stakes settings, a psychologist may be hard pressed to explain that their proffered expert opinions are grounded in the AAIDD purple manual, but then explain why they disagree with the “just say no to part scores” AAIDD position.The theoretical construct of general intelligence (g) is the Loch Ness Monster of psychology. Since the early 1900s, psychologists have been searching for the theoretical basis of g in the form of a brain-based property, entity, or mechanism, to no avail. There is a distinction made, typically overlooked in applied settings, between psychometric g (represented by a full-scale IQ score) and theoretical g (i.e., a brain-wide property or entity that produces psychometric g). Emerging contemporary research focused on brain networks, dynamic mutualism and process overlap theories, provides compelling evidence that theoretical g may not exist (Barbey, 2018; Kan, van der Mass, & Levin, & Schneider & McGrew, van der et al., suggest that is an property rather than a is the not the of between cognitive ability & p. theoretical g not and psychometric g is more than a property index in a not represent an entity in the but is the property index from the of the theoretical part scores in the of the full-scale g score is the for theoretical and that certain part scores gf more coherent psychometric and theoretical validity than the full-scale IQ score & Schneider & McGrew, 2019). the statistically the fundamental issue is that as a theoretical is as a theoretical suggest psychometric g is a construct that can explain the among a of individual & ambiguous part score statements more than The of and psychometrically test Gf and Gc or gf /gc composites scores is is (a) many tests should be at a to these fluid and crystallized psychometric g-proxy (b) whether these composites should more with broad Gf and Gc as per CHC theory or should more with the general Cattell gf and (c) what psychometric methods are for such scores (e.g., only composites from tests from the same composites from tests with statistically test composites from formulas for tests that do not common or statistically satisficing. Grade As of this I have at least copyedit and these are only noted in the pages to the IQ are “misplaced italics” errors as described (e.g., page and reference pages is as both on page and in the page is incorrectly as The and McGrew reference is with the in the references but is as “in on pages and errors are mentioned earlier in this as copyedit in the and variety should not be present in what is to be the authoritative source for The authoritative as being or also the is and the of a topic to may suggest that the manual was and tarnish both the purple manual and the of as a full-scale IQ score may not always be the best psychometric proxy for an individual's general intellectual this author from a grade for the IQ prong in the purple manual. this author's intelligence of the 11th edition green manual, AAIDD's of the CHC theory of intelligence is the most important to the intellectual prong of the purple manual. The part of the intellectual functioning of the manual is the of AAIDD's g-proxy part-score a position at variance with most other sources and a position that fails to recognize the central scientific and legal evidence issues. more individuals with unique cognitive the edition manual is published for more robust AAIDD part-score may be a but and are to AAIDD and to provide the best guidance for the of individuals with intellectual or author a of as a author of the WJ an individually administered IQ test mentioned in this and often used in the diagnosis of The author also a of the AAIDD manual as a of the The author's of and all potential of can be The author Floyd for on an earlier version of this

Intellectual and developmental disabilities, 2021 · doi:10.1352/1934-9556-59.5.369