Improved imputation of common and uncommon SNPs with a new reference set (original) (raw)

Nature Genetics volume 44, pages 6–7 (2012)Cite this article

Statistical imputation of genotype data is an important statistical technique that uses patterns of linkage disequilibrium observed in a reference set of haplotypes to computationally predict genetic variants in silico1. Currently, the most popular reference sets are the publicly available International HapMap2 and 1000 Genomes data sets3. Although these resources are valuable for imputing a sizeable fraction of common SNPs, they may not be optimal for imputing data for the next generation of genome-wide association studies (GWAS) and SNP arrays, which explore a fraction of uncommon variants.

We have built a new resource for the imputation of SNPs for existing and future GWAS, known as the Division of Cancer Epidemiology and Genetics (DCEG) Reference Set. The data set has genotypes for cancer-free individuals, including 728 of European ancestry from three large prospectively sampled studies4,5,6, 98 African-American individuals from the Prostate, Lung, Colon and Ovary Cancer Screening Trial (PLCO), 74 Chinese individuals from a clinical trial in Shanxi, China (SHNX)7 and 349 individuals from the HapMap Project (Table 1). The final harmonized data set includes 2.8 million autosomal polymorphic SNPs for 1,249 individuals after rigorous quality control metrics were applied (see Supplementary Methods and Supplementary Tables 1 and 2).

Table 1 Samples included in the DCEG Reference Set

Full size table

This is a preview of subscription content, access via your institution

Relevant articles

Open Access articles citing this article.

Access options

Subscribe to this journal

Receive 12 print issues and online access

$259.00 per year

only $21.58 per issue

Buy this article

USD 39.95

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Figure 1: Imputation accuracy for individuals of European ancestry with the DCEG Reference Set and publicly available reference sets.

References

  1. Marchini, J. & Howie, B. Nat. Rev. Genet. 11, 499–511 (2010).
    Article CAS PubMed Google Scholar
  2. Frazer, K.A. et al. Nature 449, 851–861 (2007).
    Article CAS PubMed Google Scholar
  3. 1000 Genomes Project Consortium. Nature 467, 1061–1073 (2010).
  4. Prorok, P.C. et al. Control. Clin. Trials 21, 273S–309S (2000).
    Article CAS PubMed Google Scholar
  5. The ATBC Cancer Prevention Study Group. Ann. Epidemiol. 4, 1–10 (1994).
  6. Calle, E.E. et al. Cancer 94, 2490–2501 (2002).
    Article PubMed Google Scholar
  7. Ke, L. Int. J. Cancer 102, 271–274 (2002).
    Article CAS PubMed Google Scholar
  8. Howie, B.N., Donnelly, P. & Marchini, J. PLoS Genet. 5, e1000529 (2009).
    Article PubMed PubMed Central Google Scholar
  9. Browning, B.L. & Browning, S.R. Am. J. Hum. Genet. 84, 210–223 (2009).
    Article CAS PubMed PubMed Central Google Scholar
  10. Park, J.H. et al. Nat. Genet. 42, 570–575 (2010).
    Article CAS PubMed PubMed Central Google Scholar
  11. Kolonel, L.N. et al. Am. J. Epidemiol. 151, 346–357 (2000).
    Article CAS PubMed Google Scholar

Download references

Acknowledgements

The genotyping in the Multiethnic Cohort Study (MEC) was supported by a Department of Defense Breast Cancer Research Program Era of Hope Scholar Award (W81XWH-08-1-0383 to C.A.H.) and an NIH grant (CA132839).

Author information

Authors and Affiliations

  1. Core Genotyping Facility, SAIC-Frederick, National Cancer Institute (NCI)-Frederick, Frederick, Maryland, USA
    Zhaoming Wang, Kevin B Jacobs, Meredith Yeager, Amy Hutchinson & Xiang Deng
  2. Division of Cancer Epidemiology and Genetics, NCI, US National Institutes of Health (NIH), Bethesda, Maryland, USA
    Zhaoming Wang, Kevin B Jacobs, Meredith Yeager, Amy Hutchinson, Joshua Sampson, Nilanjan Chatterjee, Demetrius Albanes, Sonja I Berndt, Charles C Chung, Xiang Deng, Ann W Hsing, Mark P Purdue, Phil Taylor, Margaret Tucker & Stephen J Chanock
  3. Epidemiology Research Program, American Cancer Society, Atlanta, Georgia, USA
    W Ryan Diver, Susan M Gapstur & Lauren R Teras
  4. Department of Preventive Medicine, Keck School of Medicine, University of Southern California Norris Comprehensive Cancer Center, Los Angeles, California, USA
    Christopher A Haiman, Brian E Henderson & Daniel Stram
  5. Department of Chronic Disease Prevention, National Institute for Health and Welfare, Helsinki, Finland
    Jarmo Virtamo
  6. Illumina, San Diego, California, USA
    Michael A Eberle & Jennifer L Stone

Authors

  1. Zhaoming Wang
  2. Kevin B Jacobs
  3. Meredith Yeager
  4. Amy Hutchinson
  5. Joshua Sampson
  6. Nilanjan Chatterjee
  7. Demetrius Albanes
  8. Sonja I Berndt
  9. Charles C Chung
  10. W Ryan Diver
  11. Susan M Gapstur
  12. Lauren R Teras
  13. Christopher A Haiman
  14. Brian E Henderson
  15. Daniel Stram
  16. Xiang Deng
  17. Ann W Hsing
  18. Jarmo Virtamo
  19. Michael A Eberle
  20. Jennifer L Stone
  21. Mark P Purdue
  22. Phil Taylor
  23. Margaret Tucker
  24. Stephen J Chanock

Corresponding author

Correspondence toStephen J Chanock.

Ethics declarations

Competing interests

The authors declare no competing financial interests.

Supplementary information

Rights and permissions

About this article

Cite this article

Wang, Z., Jacobs, K., Yeager, M. et al. Improved imputation of common and uncommon SNPs with a new reference set.Nat Genet 44, 6–7 (2012). https://doi.org/10.1038/ng.1044

Download citation

This article is cited by