Exome sequencing and the genetic basis of complex traits - PubMed (original) (raw)

. 2012 May 29;44(6):623-30.

doi: 10.1038/ng.2303.

Kiran Garimella, Ron Do, Nathan O Stitziel, Benjamin M Neale, Paul J McLaren, Namrata Gupta, Pamela Sklar, Patrick F Sullivan, Jennifer L Moran, Christina M Hultman, Paul Lichtenstein, Patrik Magnusson, Thomas Lehner, Yin Yao Shugart, Alkes L Price, Paul I W de Bakker, Shaun M Purcell, Shamil R Sunyaev

Affiliations

Exome sequencing and the genetic basis of complex traits

Adam Kiezun et al. Nat Genet. 2012.

Abstract

Exome sequencing is emerging as a popular approach to study the effect of rare coding variants on complex phenotypes. The promise of exome sequencing is grounded in theoretical population genetics and in empirical successes of candidate gene sequencing studies. Many projects aimed at common diseases are underway, and their results are eagerly anticipated. In this Perspective, using exome sequencing data from 438 individuals, we discuss several aspects of exome sequencing studies that we view as particularly important. We review processing and quality control of raw sequence data, evaluate the statistical properties of exome sequencing studies, discuss rare variant burden tests to detect association to phenotypes, and demonstrate the importance of accounting for population stratification in the analysis of rare variants. We conclude that enthusiasm for exome sequencing studies of complex traits should be combined with the caution that thousands of samples may be required to reach sufficient statistical power.

PubMed Disclaimer

Conflict of interest statement

Competing Interests: The authors declare that they have no competing financial interests.

Figures

Figure 1

Figure 1

Discovery of novel variants for increasing numbers of samples. For each functional class, the fold-increase over the number of variants in one sample for that class is plotted as a function of the number of samples in a sequencing experiment. For example, the number of nonsense variants discovered in 300 samples is 40 times greater than the average number discovered in a single sample while the number of synonymous variants is only 10 times greater (although the absolute number of nonsense variants is a relatively minor proportion of the total variation discovered); this effect is due to purifying selection. All classes of variants are discovered at rates exceeding what would be predicted under a neutral model of evolution in a population of constant size, an effect of population growth. The crossing between curves for synonymous variants and the theoretical prediction most likely is a signature of the out-of-Africa bottleneck. See Methods for additional details.

Figure 2

Figure 2

Association analysis. (a) Q-Q plot of association _p_-values under the null hypothesis. (b) Distributions of lowest _p_-values under whole-exome permutations. The histograms show the distributions of the lowest _p_-values across permutations for the T5 test. The red vertical line indicates the 0.05 exome-wide significance level for the most significant gene (i.e., the most significant gene is exome-wide significant if its _p_-value is lower that the level indicated by the red line).

Figure 2

Figure 2

Association analysis. (a) Q-Q plot of association _p_-values under the null hypothesis. (b) Distributions of lowest _p_-values under whole-exome permutations. The histograms show the distributions of the lowest _p_-values across permutations for the T5 test. The red vertical line indicates the 0.05 exome-wide significance level for the most significant gene (i.e., the most significant gene is exome-wide significant if its _p_-value is lower that the level indicated by the red line).

Figure 3

Figure 3

Extrapolation of gene burden results. Horizontal solid red line shows Bonferroni genome-wide significance threshold of P = 2.5 × 10−6. Horizontal dashed line shows the threshold derived from whole-exome permutations (Figure 2b). For larger sample sizes, the permutation threshold would be closer to the Bonferroni threshold, asymptotically approaching it as the sample sizes increase.

Similar articles

Cited by

References

    1. Fuller CW, et al. The challenges of sequencing by synthesis. Nature Biotechnology. 2009;27:1013–1023. - PubMed
    1. Rusk N, Kiermer V. Primer: Sequencing—the next generation. Nature Methods. 2008;5:15. - PubMed
    1. Metzker ML. Sequencing technologies the next generation. Nature Reviews Genetics. 2009;11:31–46. - PubMed
    1. Shendure J, Ji H. Next-generation DNA sequencing. Nature Biotechnology. 2008;26:1135–1145. - PubMed
    1. Clarke J, et al. Continuous base identification for single-molecule nanopore DNA sequencing. Nature Nanotechnology. 2009;4:265–270. - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources