A second generation human haplotype map of over 3.1 million SNPs - PubMed (original) (raw)
. 2007 Oct 18;449(7164):851-61.
doi: 10.1038/nature06258.
Kelly A Frazer, Dennis G Ballinger, David R Cox, David A Hinds, Laura L Stuve, Richard A Gibbs, John W Belmont, Andrew Boudreau, Paul Hardenbol, Suzanne M Leal, Shiran Pasternak, David A Wheeler, Thomas D Willis, Fuli Yu, Huanming Yang, Changqing Zeng, Yang Gao, Haoran Hu, Weitao Hu, Chaohua Li, Wei Lin, Siqi Liu, Hao Pan, Xiaoli Tang, Jian Wang, Wei Wang, Jun Yu, Bo Zhang, Qingrun Zhang, Hongbin Zhao, Hui Zhao, Jun Zhou, Stacey B Gabriel, Rachel Barry, Brendan Blumenstiel, Amy Camargo, Matthew Defelice, Maura Faggart, Mary Goyette, Supriya Gupta, Jamie Moore, Huy Nguyen, Robert C Onofrio, Melissa Parkin, Jessica Roy, Erich Stahl, Ellen Winchester, Liuda Ziaugra, David Altshuler, Yan Shen, Zhijian Yao, Wei Huang, Xun Chu, Yungang He, Li Jin, Yangfan Liu, Yayun Shen, Weiwei Sun, Haifeng Wang, Yi Wang, Ying Wang, Xiaoyan Xiong, Liang Xu, Mary M Y Waye, Stephen K W Tsui, Hong Xue, J Tze-Fei Wong, Luana M Galver, Jian-Bing Fan, Kevin Gunderson, Sarah S Murray, Arnold R Oliphant, Mark S Chee, Alexandre Montpetit, Fanny Chagnon, Vincent Ferretti, Martin Leboeuf, Jean-François Olivier, Michael S Phillips, Stéphanie Roumy, Clémentine Sallée, Andrei Verner, Thomas J Hudson, Pui-Yan Kwok, Dongmei Cai, Daniel C Koboldt, Raymond D Miller, Ludmila Pawlikowska, Patricia Taillon-Miller, Ming Xiao, Lap-Chee Tsui, William Mak, You Qiang Song, Paul K H Tam, Yusuke Nakamura, Takahisa Kawaguchi, Takuya Kitamoto, Takashi Morizono, Atsushi Nagashima, Yozo Ohnishi, Akihiro Sekine, Toshihiro Tanaka, Tatsuhiko Tsunoda, Panos Deloukas, Christine P Bird, Marcos Delgado, Emmanouil T Dermitzakis, Rhian Gwilliam, Sarah Hunt, Jonathan Morrison, Don Powell, Barbara E Stranger, Pamela Whittaker, David R Bentley, Mark J Daly, Paul I W de Bakker, Jeff Barrett, Yves R Chretien, Julian Maller, Steve McCarroll, Nick Patterson, Itsik Pe'er, Alkes Price, Shaun Purcell, Daniel J Richter, Pardis Sabeti, Richa Saxena, Stephen F Schaffner, Pak C Sham, Patrick Varilly, David Altshuler, Lincoln D Stein, Lalitha Krishnan, Albert Vernon Smith, Marcela K Tello-Ruiz, Gudmundur A Thorisson, Aravinda Chakravarti, Peter E Chen, David J Cutler, Carl S Kashuk, Shin Lin, Gonçalo R Abecasis, Weihua Guan, Yun Li, Heather M Munro, Zhaohui Steve Qin, Daryl J Thomas, Gilean McVean, Adam Auton, Leonardo Bottolo, Niall Cardin, Susana Eyheramendy, Colin Freeman, Jonathan Marchini, Simon Myers, Chris Spencer, Matthew Stephens, Peter Donnelly, Lon R Cardon, Geraldine Clarke, David M Evans, Andrew P Morris, Bruce S Weir, Tatsuhiko Tsunoda, James C Mullikin, Stephen T Sherry, Michael Feolo, Andrew Skol, Houcan Zhang, Changqing Zeng, Hui Zhao, Ichiro Matsuda, Yoshimitsu Fukushima, Darryl R Macer, Eiko Suda, Charles N Rotimi, Clement A Adebamowo, Ike Ajayi, Toyin Aniagwu, Patricia A Marshall, Chibuzor Nkwodimmah, Charmaine D M Royal, Mark F Leppert, Missy Dixon, Andy Peiffer, Renzong Qiu, Alastair Kent, Kazuto Kato, Norio Niikawa, Isaac F Adewole, Bartha M Knoppers, Morris W Foster, Ellen Wright Clayton, Jessica Watkin, Richard A Gibbs, John W Belmont, Donna Muzny, Lynne Nazareth, Erica Sodergren, George M Weinstock, David A Wheeler, Imtaz Yakub, Stacey B Gabriel, Robert C Onofrio, Daniel J Richter, Liuda Ziaugra, Bruce W Birren, Mark J Daly, David Altshuler, Richard K Wilson, Lucinda L Fulton, Jane Rogers, John Burton, Nigel P Carter, Christopher M Clee, Mark Griffiths, Matthew C Jones, Kirsten McLay, Robert W Plumb, Mark T Ross, Sarah K Sims, David L Willey, Zhu Chen, Hua Han, Le Kang, Martin Godbout, John C Wallenburg, Paul L'Archevêque, Guy Bellemare, Koji Saeki, Hongguang Wang, Daochang An, Hongbo Fu, Qing Li, Zhen Wang, Renwu Wang, Arthur L Holden, Lisa D Brooks, Jean E McEwen, Mark S Guyer, Vivian Ota Wang, Jane L Peterson, Michael Shi, Jack Spiegel, Lawrence M Sung, Lynn F Zacharia, Francis S Collins, Karen Kennedy, Ruth Jamieson, John Stewart
Affiliations
- PMID: 17943122
- PMCID: PMC2689609
- DOI: 10.1038/nature06258
A second generation human haplotype map of over 3.1 million SNPs
International HapMap Consortium et al. Nature. 2007.
Abstract
We describe the Phase II HapMap, which characterizes over 3.1 million human single nucleotide polymorphisms (SNPs) genotyped in 270 individuals from four geographically diverse populations and includes 25-35% of common SNP variation in the populations surveyed. The map is estimated to capture untyped common variation with an average maximum r2 of between 0.9 and 0.96 depending on population. We demonstrate that the current generation of commercial genome-wide genotyping products captures common Phase II SNPs with an average maximum r2 of up to 0.8 in African and up to 0.95 in non-African populations, and that potential gains in power in association studies can be obtained through imputation. These data also reveal novel aspects of the structure of linkage disequilibrium. We show that 10-30% of pairs of individuals within a population share at least one region of extended genetic identity arising from recent ancestry and that up to 1% of all common variants are untaggable, primarily because they lie within recombination hotspots. We show that recombination rates vary systematically around genes and between genes of different function. Finally, we demonstrate increased differentiation at non-synonymous, compared to synonymous, SNPs, resulting from systematic differences in the strength or efficacy of natural selection between populations.
Figures
Figure 1. SNP density in the Phase II HapMap
a, SNP density across the genome. Colours indicate the number of polymorphic SNPs per kb in the consensus data set. Gaps in the assembly are shown as white. b, Example of the fine-scale structure of SNP density for a 100-kb region on chromosome 17 showing Perlegen amplicons (black bars), polymorphic Phase I SNPs in the consensus data set (red triangles) and polymorphic Phase II SNPs in the consensus data set (blue triangles). Note the relatively even spacing of Phase I SNPs. c, The distribution of polymorphic SNPs in the consensus Phase II HapMap data (blue line and left-hand axis) around coding regions. Also shown is the density of SNPs in dbSNP release 125 around genes (red line and right-hand axis). Values were calculated separately 5′ from the coding start site (the left dotted line) and 3′ from the coding end site (right dotted line) and were joined at the median midpoint position of the coding unit (central dotted line).
Figure 2. Haplotype structure and recombination rate estimates from the Phase II HapMap
a, Haplotypes from YRI in a 100 kb region around the β-globin (HBB) gene. SNPs typed in Phase I are shown in dark blue. Additional SNPs in the Phase II HapMap are shown in light blue. Only SNPs for which the derived allele can be unambiguously identified by parsimony (by comparison with an outgroup sequence) are shown (89% of SNPs in the region); the derived allele is shown in colour. b, Recombination rates (lines) and the location of hotspots (horizontal blue bars) estimated for the same region from the Phase I (dark blue) and Phase II HapMap (light blue) data. Also shown are the location of genes within the region (grey bars) and the location of the experimentally verified recombination hotspot, at the 59′ end of the HBB gene (black bar).
Figure 3. The extent of recent co-ancestry among HapMap individuals
a, Three pairs of individuals with varying levels of identity-by-descent (IBD) sharing illustrate the continuum between very close and very distant relatedness and its relation to segmental sharing. The three pairs are: high sharing (NA19130 and NA19192 from YRI; previously identified as second-degree relatives3), moderate sharing (NA06994 and NA12892 from CEU) and low sharing (NA12006 and NA12155 from CEU). Along each chromosome, the probability of sharing at least one chromosome IBD is plotted, based on the HMM method described in Supplementary Text 5. Red sections indicate regions called as segments: in general, the proportion of the genome in segments is similar to each pair's estimated global relatedness. b, The extent of homozygosity on each chromosome for each individual in each analysis panel. Excludes segments <106 kb and chromosome X in males. Asterisk, NA12874, length=107 Mb. YRI, green; CEU, orange; CHB, blue; JPT, magenta.
Figure 4. Properties of untaggable SNPs
a–e, Properties of the genomic regions surrounding untaggable SNPs in terms of: a, the density of polymorphic SNPs within the consensus data set; b, mean minor allele frequency of polymorphic SNPs; c, maximum _r_2 of SNPs to any others in the Phase II data; d, the density of estimated recombination hotspots (defined from hotspot centres); and e, the estimated mean recombination rate. YRI, green; CEU, orange; CHB+JPT, purple.
Figure 5. Recombination rates around genes
a, The recombination rate, density of recombination-hotspot-associated motifs (all motifs with up to 1 bp different from the consensus CCTCCCTNNCCAC) and G+C content around genes. The blue line indicates the mean. For the recombination rate, grey lines indicate the quartiles of the distribution. Values were calculated separately 5′ from the transcription start site (the first dotted line) and 3′ from the transcription end site (third dotted line) and were joined at the median midpoint position of the transcription unit (central dotted line). Note the sharp drop in recombination rate within the transcription unit, the local increase around the transcription start site and the broad decrease away from the 3′ end of genes. These patterns only partly reflect the distribution of G+C content and the hotspot-associated motif, suggesting that additional factors influence recombination rates around genes. b, Recombination rates within genes of different molecular function. The chart shows the increase or decrease for each category compared to the genome average. P values were estimated by permutation of category; numbers of genes are shown in parentheses.
Figure 6. Properties of non-synonymous and synonymous SNPs
a, The derived allele frequency (DAF) spectrum in each analysis panel for all SNPs (black), synonymous SNPs (green) and non-synonymous SNPs (red). Note the excess of rare variants for coding sequence SNPs but no excess of high-frequency derived variants. b, Enrichment of non-synonymous SNPs among genic SNPs showing high differentiation. For each of ten classes of derived allele frequency (averaged across analysis panels) the fraction of non-synonymous (red) and synonymous (green) variants in that class that show _F_ST > 0.5 is shown. Note the strong enrichment of non-synonymous SNPs among SNPs of moderate to high derived-allele frequency (asterisk, P < 0.05; double asterisk, P < 0.01). c, Lack of enrichment of non-synonymous SNPs among those showing long-range haplotype structure. The integrated extended haplotype homozygosity (iEHH) statistic was calculated for non-synonymous and synonymous SNPs in each analysis panel (YRI, green; CEU, orange; CHB+JPT, purple). For each of ten derived allele frequency classes, the proportion of non-synonymous SNPs among those showing the 5% most extreme statistics (within the allele frequency class) is shown (points). Also shown is the proportion of non-synonymous SNPs among SNPs in the coding sequence for each frequency class (dotted lines). Differences between synonymous and non-synonymous SNPs are tested for using a contingency table test.
Similar articles
- A haplotype map of the human genome.
International HapMap Consortium. International HapMap Consortium. Nature. 2005 Oct 27;437(7063):1299-320. doi: 10.1038/nature04226. Nature. 2005. PMID: 16255080 Free PMC article. - Haplotypes at ATM identify coding-sequence variation and indicate a region of extensive linkage disequilibrium.
Bonnen PE, Story MD, Ashorn CL, Buchholz TA, Weil MM, Nelson DL. Bonnen PE, et al. Am J Hum Genet. 2000 Dec;67(6):1437-51. doi: 10.1086/316908. Epub 2000 Nov 14. Am J Hum Genet. 2000. PMID: 11078475 Free PMC article. - Transferability of tag SNPs to capture common genetic variation in DNA repair genes across multiple populations.
De Bakker PI, Graham RR, Altshuler D, Henderson BE, Haiman CA. De Bakker PI, et al. Pac Symp Biocomput. 2006:478-86. Pac Symp Biocomput. 2006. PMID: 17094262 - HapMap and mapping genes for cardiovascular disease.
Musunuru K, Kathiresan S. Musunuru K, et al. Circ Cardiovasc Genet. 2008 Oct;1(1):66-71. doi: 10.1161/CIRCGENETICS.108.813675. Circ Cardiovasc Genet. 2008. PMID: 20031544 Free PMC article. Review. - SNP and haplotype variation in the human genome.
Salisbury BA, Pungliya M, Choi JY, Jiang R, Sun XJ, Stephens JC. Salisbury BA, et al. Mutat Res. 2003 May 15;526(1-2):53-61. doi: 10.1016/s0027-5107(03)00014-9. Mutat Res. 2003. PMID: 12714183 Review.
Cited by
- HGPGD: the human gene population genetic difference database.
Jiang Y, Zhang R, Lv H, Li J, Wang M, Chang Y, Lv W, Sheng X, Zhang J, Liu P, Zheng J, Shi M, Liu G. Jiang Y, et al. PLoS One. 2013 May 22;8(5):e64150. doi: 10.1371/journal.pone.0064150. Print 2013. PLoS One. 2013. PMID: 23717556 Free PMC article. - Prospective evaluation of B-type natriuretic peptide concentrations and the risk of type 2 diabetes in women.
Everett BM, Cook NR, Chasman DI, Magnone MC, Bobadilla M, Rifai N, Ridker PM, Pradhan AD. Everett BM, et al. Clin Chem. 2013 Mar;59(3):557-65. doi: 10.1373/clinchem.2012.194167. Epub 2013 Jan 3. Clin Chem. 2013. PMID: 23288489 Free PMC article. - The power of meta-analysis in genome-wide association studies.
Panagiotou OA, Willer CJ, Hirschhorn JN, Ioannidis JP. Panagiotou OA, et al. Annu Rev Genomics Hum Genet. 2013;14:441-65. doi: 10.1146/annurev-genom-091212-153520. Epub 2013 May 24. Annu Rev Genomics Hum Genet. 2013. PMID: 23724904 Free PMC article. Review. - Genetic variation in the epidermal transglutaminase genes is not associated with atopic dermatitis.
Liedén A, Winge MC, Sääf A, Kockum I, Ekelund E, Rodriguez E, Fölster-Holst R, Franke A, Illig T, Tengvall-Linder M, Baurecht H, Weidinger S, Wahlgren CF, Nordenskjöld M, Bradley M. Liedén A, et al. PLoS One. 2012;7(11):e49694. doi: 10.1371/journal.pone.0049694. Epub 2012 Nov 26. PLoS One. 2012. PMID: 23189155 Free PMC article. - Small effective population size and genetic homogeneity in the Val Borbera isolate.
Colonna V, Pistis G, Bomba L, Mona S, Matullo G, Boano R, Sala C, Viganò F, Torroni A, Achilli A, Hooshiar Kashani B, Malerba G, Gambaro G, Soranzo N, Toniolo D. Colonna V, et al. Eur J Hum Genet. 2013 Jan;21(1):89-94. doi: 10.1038/ejhg.2012.113. Epub 2012 Jun 20. Eur J Hum Genet. 2013. PMID: 22713810 Free PMC article.
References
- The International HapMap Consortium The International HapMap Project. Nature. 2003;426:789–796. - PubMed
- Bowcock AM. Genomics: guilt by association. Nature. 2007;447:645–646. - PubMed
- Altshuler D, Daly M. Guilt beyond a reasonable doubt. Nature Genet. 2007;39:813–815. - PubMed
Publication types
MeSH terms
Grants and funding
- 077008/WT_/Wellcome Trust/United Kingdom
- 077011/WT_/Wellcome Trust/United Kingdom
- 077046/WT_/Wellcome Trust/United Kingdom
- WT_/Wellcome Trust/United Kingdom
- 081682/WT_/Wellcome Trust/United Kingdom
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials