Assessment & Research

Heterogeneity and the design of genetic studies in autism.

Sutcliffe (2008) · Autism research : official journal of the International Society for Autism Research

★ The Verdict

Small, ancestry-mixed gene studies miss real autism genes—go big or go home.

✓ Read this if BCBAs who field parent questions about genetic testing or research flyers.

✗ Skip if Clinicians only interested in pure behavior intervention data.

01Research in Context

What this study did

Swenson (2008) looked at past autism gene hunts and asked, “Why do we keep missing real genes?”

The author wrote a roadmap. It says future studies must enroll hundreds to thousands of families. Teams must match people by ancestry and check every kind of DNA change at once.

What they found

The paper found that small, unmatched samples create false negatives. In plain words, we say “no gene here” when a gene really exists.

Old studies often looked at only one type of DNA variant. That narrow view hides other true signals.

How this fits with other research

Gaily et al. (1998) and Leung et al. (1998) already warned that single-strategy gene hunts lack power. Swenson (2008) keeps the same worry but adds hard numbers: you now need sample sizes ten times larger.

Waterhouse (2022) extends the story. Swenson (2008) says autism is too mixed inside its own label. Waterhouse agrees and tells us to cut across labels like ADHD and dyslexia to find smaller, cleaner “endophenotypes.”

Scior et al. (2023) brings the lesson to today. Big-data banks are the new mega samples. The same ancestry-matching and variant-screening rules still apply, or confounding will return.

Why it matters

You run behavioral assessments, not DNA labs. Still, you rely on genetic findings to understand etiology and to refer families to credible studies. When you read a new “autism gene” headline, check the sample size and ancestry matching. If the study is small or unmatched, stay skeptical and keep the family’s hope grounded. Share the Swenson (2008) checklist with medical partners so future gene discoveries stand on solid ground.

FREE CEUs

Get CEUs on This Topic — Free

The ABA Clubhouse has 60+ on-demand CEUs including ethics, supervision, and clinical topics like this one. Plus a new live CEU every Wednesday.

✓ 60+ on-demand CEUs (ethics, supervision, general)

✓ New live CEU every Wednesday

✓ Community of 500+ BCBAs

✓ 100% free to join

Join The ABA Clubhouse — Free →

→ Action — try this Monday

Add a one-page “gene study checklist” to your intake folder: sample size over 500, ancestry matching reported, all variant types scanned.

02At a glance

Intervention

not applicable

Design

narrative review

Population

autism spectrum disorder

Finding

not reported

03Original abstract

The last several years have witnessed tremendous progress in our understanding of the genetic architecture of autism susceptibility. We now understand the underlying genetics to be highly complex, with liability conferred by a wide range of genetic variation at a large number of genes. With substantial genetic heterogeneity, coupled at least in part with phenotypic heterogeneity, the field, which indeed parallels the broader human genetics field, recognizes the need to accommodate this complexity in genetic study design. Here, we will briefly discuss some of the relevant issues. While much remains unknown, discoveries to date point in particular to three scenarios of genetic variation—not mutually exclusive for any given gene—that must be considered when evaluating a particular candidate locus or genome for risk alleles. Many examples exist in which discrete variation in DNA sequence (e.g. single nucleotide variation) of both the common and rare variety is associated with the condition. More recently, larger (>1,000 bp) gains or losses of DNA, termed copy number variation (CNV), are found to be a significant class of causal or risk factors. Further, rare variants (single nucleotide or CNV) may arise as de novo events in an affected individual or be inherited from either parent. De novo events in an affected individual are interpreted as causal of the condition, while interpretation of heritable variants is more complicated. Presence of a variant within an unaffected parent minimally indicates reduced penetrance for that allele. However, distinguishing between a benign polymorphism, even if a functional coding variant, and a disease-associated variant requires additional information. At any rate, uncovering a role, if any, for a given gene in contributing to autism risk requires addressing these different allelic scenarios. Studies of common variation are now quite straightforward with the availability of HapMap data to guide selection of single nucleotide polymorphisms across a gene to capture common haplotypes present within the general population. A greater problem is the reality that the strength of genetic effects imparted by common alleles is likely to be modest at best for most genes. Therefore, studies need to be sufficiently powered to have a reasonable opportunity to detect such effects. This directly translates into a need for the largest possible sample to maximize power. Several years ago, it was common to see studies involving fewer than 100 families. It seems clear now that such samples will not be sufficient to ask most questions regarding allelic association. Also important is the phenotypic make-up and rigor of phenotypic assessment for a sample. A given data set containing probands each possessing a narrow autism diagnosis is more likely to be informative than one in which phenotypes span the autism spectrum, particularly for an analysis of the categorical diagnosis. In the analysis of quantitative traits, a spectrum sample might be more desirable, unless such a sample might increase the inclusion of phenocopies. Given modest genetic effects, how do we interpret either a significant or non-significant result from allelic association from a study involving a relatively small sample? If no evidence for association is seen from a family-based study employing a sample of, say, <200 families, is it because the gene is not involved or because the sample was under-powered? Considering the potential for rare variants, or even multiple common variants, associated with disease, a negative result does not exclude possible involvement of the gene. If positive evidence for association is seen, is the effect real? Obviously, replication ultimately addresses this question, but it seems likely that smaller samples are more prone to spurious findings than large, well-powered samples, which in turn are more likely to provide a better estimate of true effect size. Over the last decade, as groups have continued ascertainment, larger collections have been established, and investigators have made use of central resources like NIMH, AGRE and the developing Simons Simplex Collection repositories as a source of family materials or as a means to augment sample size. This has gone a long way to address the sample size limitations for those groups with access to repository samples. Power afforded by larger samples, particularly when composed of subsamples from multiple ascertainment sources, does not necessarily overcome heterogeneity. Indeed, heterogeneity is not restricted to genes, alleles and phenotype. Population differences owing to different mixes of ancestry are one potential problem, and addressing potential population stratification effects is important even in a family-based design. The days are approaching when simply documenting ethnicity breakdown for a sample will not meet expectations in the field. With increasing availability of genome-wide data or even a modest number of ancestry-informative markers, we are easily able to characterize ancestry structure and substructure at a molecular level and stratify analyses accordingly. Unfortunately, it does not end there. Multiple ascertainment sources provide ample opportunity for other more subtle differences resulting from ascertainment bias. The nature of the clinical service in which a given recruiting group is involved and/or how families are identified and invited to participate in genetics research may have an important effect. For example, the overall profile of patients seen in a child psychiatry clinic is likely to be different from that at a clinic that is devoted to developmental psychological assessments. Such profile differences are almost certain to correlate with relative differences in representation of underlying susceptibility alleles. In terms of taking different ascertainment sites into account in a genetic study, it is useful to test for heterogeneity in the data when presented with a significant finding. The above discussion has largely focused on issues related to analysis of common alleles; however, it is clear from various studies that common and rare disease-associated alleles are not mutually exclusive. All too often in the past, association studies failing to find evidence supporting allelic association concluded that a given locus was not likely to be involved in disease risk. Too many examples of autism genes (e.g. NLGN3/4, NRXN1, SLC6A4) with rare, causal or clear risk alleles have been reported to permit such conclusions in the future unless an effort has been made to screen for rare variants. As indicated earlier, rare variation need not be of the single nucleotide variety. CNV data have been published and continue to develop on large autism family samples. These data are compiled into databases, such as the Autism CNV Database developed by The Centre for Applied Genomics in Toronto (http://projects.tcag.ca/autism). While tools for detection of CNV become more accessible to investigators, it is worthwhile to explore existing data that might help a group ask the question in silico without necessarily performing a microarray analysis of their own family collection. In summary, given what we now know about the challenges confronting us due to etiological heterogeneity in autism, the certain links between phenotypic and genetic heterogeneity, likely small effect sizes imparted by common alleles, the discovery of rare risk alleles or causal mutations in some genes and the emerging picture of CNV as a significant class of genomic liability, autism genetics researchers must embrace the growing sophistication in the field of human genetics with regard to approaches being used to identify susceptibility alleles for a complex disorder.

Autism research : official journal of the International Society for Autism Research, 2008 · doi:10.1002/aur.37