Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes - PubMed (original) (raw)

Evolutionary constraint facilitates interpretation of genetic variation in resequenced human genomes

David L Goode et al. Genome Res. 2010 Mar.

Abstract

Here, we demonstrate how comparative sequence analysis facilitates genome-wide base-pair-level interpretation of individual genetic variation and address two questions of importance for human personal genomics: first, whether an individual's functional variation comes mostly from noncoding or coding polymorphisms; and, second, whether population-specific or globally-present polymorphisms contribute more to functional variation in any given individual. Neither has been definitively answered by analyses of existing variation data because of a focus on coding polymorphisms, ascertainment biases in favor of common variation, and a lack of base-pair-level resolution for identifying functional variants. We resequenced 575 amplicons within 432 individuals at genomic sites enriched for evolutionary constraint and also analyzed variation within three published human genomes. We find that single-site measures of evolutionary constraint derived from mammalian multiple sequence alignments are strongly predictive of reductions in modern-day genetic diversity across a range of annotation categories and across the allele frequency spectrum from rare (<1%) to high frequency (>10% minor allele frequency). Furthermore, we show that putatively functional variation in an individual genome is dominated by polymorphisms that do not change protein sequence and that originate from our shared ancestral population and commonly segregate in human populations. These observations show that common, noncoding alleles contribute substantially to human phenotypes and that constraint-based analyses will be of value to identify phenotypically relevant variants in individual genomes.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Derived allele frequency (DAF) compared with evolutionary constraint. (A) DAF spectrum of all SNVs. Each category is one percentile DAF. Note the much higher number of SNVs with 0%–1% and 1%–2% DAF compared with the rest of the data. (B) Mean DAF as a function of RS score. Sites with RS > 0 are binned by increasing level of constraint. (C) Proportion of sites of the indicated DAF within each of five RS bins. Bars of each RS bin add up to 1 but are organized by DAF to facilitate visual comparison between the RS bins. The greater the RS of a site, the rarer is its derived allele. (D) Proportion of sites within the indicated RS bins at sites of rare, intermediate to common, and very common DAF. The greater the DAF, the more SNVs avoid constrained sites.

Figure 2.

Figure 2.

Variation and evolutionary constraint in three personal genomes. (A) Coverage of the human genome by mammalian alignment depth (solid line) and level of constraint (broken line). (B) Ratio of the number of SNVs observed in the three individual genomes at sites within the given RS score range to the number expected, given the distribution of RS scores across the human genome.

Figure 3.

Figure 3.

Properties of derived alleles at constrained sites. (A) Percentage of sites homozygous for derived alleles, where the derived alleles are of a frequency above the threshold indicated along the _x_-axis, per individual, at all sites (solid line) or at constrained sites only (broken lines). (B) Percentage of derived alleles of a frequency above the threshold indicated along the _x_-axis, per individual, at all sites (solid line) or at constrained sites only (broken lines). (C) Mean number of sites per individual that bear derived alleles found in all five populations (global SNVs), are found only in the indicated population (population specific), or are shared between two to four populations. (D) Number of segregating sites in each individual at highly constrained (RS > 2) sites that are shared between individuals or are private to each individual. (E) Percentage of segregating sites at constrained (RS > 2) sites in each individual that are shared with at least one of the other individuals.

Figure 4.

Figure 4.

Relative abundance of coding and noncoding functional variation. SNVs were divided into three categories: those that cause nonsynonymous substitutions; those that cause synonymous substitutions or changes in the UTRs; and those that do not occur in exons (intronic and intergenic). (A) Contribution of SNVs in each category to total variation at constrained sites in each resequenced individual genome at sites with RS 2 to 4 and at sites with RS > 4. (B) Total number of segregating sites in our ENCODE resequencing sample that occur at constrained sites in our three annotation categories. Constrained sites are divided into bins of increasing constraint. (C) Mean number of segregating sites carried by the individuals in our ENCODE resequencing sample in all three categories, at moderately (RS 1 to 3) and highly (RS > 3) constrained positions. Diamonds correspond to the 10%, 50%, and 90% quantiles.

Similar articles

Cited by

References

    1. Ahituv N, Kavaslar N, Schackwitz W, Ustaszewska A, Martin J, Hebert S, Doelle H, Ersoy B, Kryukov G, Schmidt S, et al. Medical sequencing at the extremes of human body mass. Am J Hum Genet. 2007;80:779–791. - PMC - PubMed
    1. Akey JM, Eberle MA, Rieder MJ, Carlson CS, Shriver MD, Nickerson DA, Kruglyak L. Population history and natural selection shape patterns of genetic variation in 132 genes. PLoS Biol. 2004;2:e286. doi: 10.1371/journal.pbio.0020286. - DOI - PMC - PubMed
    1. Asthana S, Roytberg M, Stamatoyannopoulos J, Sunyaev S. Analysis of sequence conservation at nucleotide resolution. PLoS Comput Biol. 2007a;3:e254. doi: 10.1371/journal.pcbi.0030254. - DOI - PMC - PubMed
    1. Asthana S, Noble WS, Kryukov G, Grant CE, Sunyaev S, Stamatoyannopoulos JA. Widely distributed noncoding purifying selection in the human genome. Proc Natl Acad Sci. 2007b;104:12410–12415. - PMC - PubMed
    1. Bentley DR, Balasubramanian S, Swerdlow HP, Smith GP, Milton J, Brown CG, Hall KP, Evers DJ, Barnes CL, Bignell HR, et al. Accurate whole human genome sequencing using reversible terminator chemistry. Nature. 2008;456:53–59. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources