Integrating common and rare genetic variation in diverse human populations - PubMed (original) (raw)

. 2010 Sep 2;467(7311):52-8.

doi: 10.1038/nature09298.

David M Altshuler, Richard A Gibbs, Leena Peltonen, David M Altshuler, Richard A Gibbs, Leena Peltonen, Emmanouil Dermitzakis, Stephen F Schaffner, Fuli Yu, Leena Peltonen, Emmanouil Dermitzakis, Penelope E Bonnen, David M Altshuler, Richard A Gibbs, Paul I W de Bakker, Panos Deloukas, Stacey B Gabriel, Rhian Gwilliam, Sarah Hunt, Michael Inouye, Xiaoming Jia, Aarno Palotie, Melissa Parkin, Pamela Whittaker, Fuli Yu, Kyle Chang, Alicia Hawes, Lora R Lewis, Yanru Ren, David Wheeler, Richard A Gibbs, Donna Marie Muzny, Chris Barnes, Katayoon Darvishi, Matthew Hurles, Joshua M Korn, Kati Kristiansson, Charles Lee, Steven A McCarrol, James Nemesh, Emmanouil Dermitzakis, Alon Keinan, Stephen B Montgomery, Samuela Pollack, Alkes L Price, Nicole Soranzo, Penelope E Bonnen, Richard A Gibbs, Claudia Gonzaga-Jauregui, Alon Keinan, Alkes L Price, Fuli Yu, Verneri Anttila, Wendy Brodeur, Mark J Daly, Stephen Leslie, Gil McVean, Loukas Moutsianas, Huy Nguyen, Stephen F Schaffner, Qingrun Zhang, Mohammed J R Ghori, Ralph McGinnis, William McLaren, Samuela Pollack, Alkes L Price, Stephen F Schaffner, Fumihiko Takeuchi, Sharon R Grossman, Ilya Shlyakhter, Elizabeth B Hostetter, Pardis C Sabeti, Clement A Adebamowo, Morris W Foster, Deborah R Gordon, Julio Licinio, Maria Cristina Manca, Patricia A Marshall, Ichiro Matsuda, Duncan Ngare, Vivian Ota Wang, Deepa Reddy, Charles N Rotimi, Charmaine D Royal, Richard R Sharp, Changqing Zeng, Lisa D Brooks, Jean E McEwen

Affiliations

Integrating common and rare genetic variation in diverse human populations

International HapMap 3 Consortium et al. Nature. 2010.

Abstract

Despite great progress in identifying genetic variants that influence human disease, most inherited risk remains unexplained. A more complete understanding requires genome-wide studies that fully examine less common alleles in populations with a wide range of ancestry. To inform the design and interpretation of such studies, we genotyped 1.6 million common single nucleotide polymorphisms (SNPs) in 1,184 reference individuals from 11 global populations, and sequenced ten 100-kilobase regions in 692 of these individuals. This integrated data set of common and rare alleles, called 'HapMap 3', includes both SNPs and copy number polymorphisms (CNPs). We characterized population-specific differences among low-frequency variants, measured the improvement in imputation accuracy afforded by the larger reference panel, especially in imputing SNPs with a minor allele frequency of <or=5%, and demonstrated the feasibility of imputing newly discovered CNPs and SNPs. This expanded public resource of genome variants in global populations supports deeper interrogation of genomic variation and its role in human disease, and serves as a step towards a high-resolution map of the landscape of human genetic variation.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Size and frequency spectra of common and rare CNPs

a, Estimated size distribution of common CNPs calculated from the physical span of the genomic probes supporting each CNP event. b, Allele frequency spectrum for biallelic CNPs calculated from integer CNP genotypes for the samples analysed in this work.

Figure 2

Figure 2. SNP discovery informativeness across populations

a, b, For each of 7 populations for which at least 60 individuals were resequenced, we considered a sample of 30 individuals, another non-overlapping sample of 30 individuals from the same population, and a sample of 30 individuals from each of the 6 other populations (results are averaged over 1,000 random samplings). Out of all SNPs that are either polymorphic (a) or polymorphic with a minor allele with at most two copies in the sample of 30 individuals (b), here we present the fraction that are also polymorphic in a different sample, starting with the other sample from the same population (black bars). The black bars serve as a baseline that accounts for the effect of sampling stochasticity and sequencing errors on SNP discovery. The different _y_-axis scales used reflect the lower likelihood of a low-frequency variant being seen in a different sample.

Figure 3

Figure 3. Effect of sample size on SNP ascertainment

The number of SNPs discovered as a function of sample size by averaging over 1,000 random samplings. For each population, we randomly sampled without replacement a subset of the individuals of any possible size and considered which SNPs were polymorphic in the resequencing data for that sample. For any given sample size, many more variants are discovered in populations with genetic proximity to Africa (LWK, ASW and YRI), compared to populations of non-African ancestry.

Figure 4

Figure 4. Haplotype sharing around SNPs and CNPs

a, b, Extent of haplotype homozygosity around variant alleles of various frequencies. Shown are SNPs from the ENCODE sequence, CNPs of comparable frequency, SNPs from the arrays and on randomly grouped chromosomes, and (for YRI) the maximum possible sharing for a genotyping error rate of 0.2%. a, CEU. b, YRI.

Figure 4

Figure 4. Haplotype sharing around SNPs and CNPs

a, b, Extent of haplotype homozygosity around variant alleles of various frequencies. Shown are SNPs from the ENCODE sequence, CNPs of comparable frequency, SNPs from the arrays and on randomly grouped chromosomes, and (for YRI) the maximum possible sharing for a genotyping error rate of 0.2%. a, CEU. b, YRI.

Figure 5

Figure 5. Imputation accuracy and reference panel size

a, b, Mean _r_2 between true and imputed genotype dosage for SNPs imputed from a HapMap-II-sized panel of 120 CEU chromosomes (HMII-CEU) or a HapMap 3 panel of 410 European-ancestry chromosomes (CEU+TSI). Scatter plots show Affymetrix 500K SNPs on chromosome 20 imputed for 1,393 subjects of the 1958 British birth cohort. a, Rare SNPs (MAF <0.5%). b, Low-frequency SNPs (MAF = 0.5–5%).

Figure 6

Figure 6. Imputation: new populations, new variants

a, b, Mean _r_2 between true and imputed genotype dosage as a function of copies of minor allele in the reference panel. a, The loss in imputation accuracy when the reference population differs slightly from the target population (CEU imputed into CEU compared to CEU into TSI; and YRI into YRI compared to YRI into LWK). b, Imputation accuracy for newly discovered variants (CNPs and ENCODE SNPs).

Figure 6

Figure 6. Imputation: new populations, new variants

a, b, Mean _r_2 between true and imputed genotype dosage as a function of copies of minor allele in the reference panel. a, The loss in imputation accuracy when the reference population differs slightly from the target population (CEU imputed into CEU compared to CEU into TSI; and YRI into YRI compared to YRI into LWK). b, Imputation accuracy for newly discovered variants (CNPs and ENCODE SNPs).

Comment in

Similar articles

Cited by

References

    1. International Human Genome Sequencing Consortium Initial sequencing and analysis of the human genome. Nature. 2001;409:860–921. - PubMed
    1. The Internation SNP Map Working Group A map of human genome sequence variation containing 1.42 million single nucleotide polymorphisms. Nature. 2001;409:928–933. - PubMed
    1. The International HapMap Consortium A second generation human haplotype map of over 3.1 million SNPs. Nature. 2007;449:851–861. - PMC - PubMed
    1. Donnelly P. Progress and challenges in genome-wide association studies in humans. Nature. 2008;456:728–731. - PubMed
    1. Manolio TA, et al. Finding the missing heritability of complex diseases. Nature. 2009;461:747–753. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources