Genes mirror geography within Europe (original) (raw)

Nature volume 456, pages 98–101 (2008)Cite this article

An Addendum to this article was published on 13 November 2008

Abstract

Understanding the genetic structure of human populations is of fundamental interest to medical, forensic and anthropological sciences. Advances in high-throughput genotyping technology have markedly improved our understanding of global patterns of human genetic variation and suggest the potential to use large samples to uncover variation among closely spaced populations1,2,3,4,5. Here we characterize genetic variation in a sample of 3,000 European individuals genotyped at over half a million variable DNA sites in the human genome. Despite low average levels of genetic differentiation among Europeans, we find a close correspondence between genetic and geographic distances; indeed, a geographical map of Europe arises naturally as an efficient two-dimensional summary of genetic variation in Europeans. The results emphasize that when mapping the genetic basis of a disease phenotype, spurious associations can arise if genetic structure is not properly accounted for. In addition, the results are relevant to the prospects of genetic ancestry testing6; an individual’s DNA can be used to infer their geographic origin with surprising accuracy—often to within a few hundred kilometres.

This is a preview of subscription content, access via your institution

Access options

Subscribe to this journal

Receive 51 print issues and online access

$199.00 per year

only $3.90 per issue

Buy this article

Prices may be subject to local taxes which are calculated during checkout

Additional access options:

Similar content being viewed by others

References

  1. Jakobsson, M. et al. Genotype, haplotype and copy-number variation in worldwide human populations. Nature 451, 998–1003 (2008)
    Article ADS CAS Google Scholar
  2. Li, J. Z. et al. Worldwide human relationships inferred from genome-wide patterns of variation. Science 319, 1100–1104 (2008)
    Article ADS CAS Google Scholar
  3. Wellcome Trust Case Control Consortium. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447, 661–678 (2007)
  4. Tian, C. et al. Analysis and application of European genetic substructure using 300K SNP information. PLoS Genet. 4, e4 (2008)
    Article Google Scholar
  5. Price, A. L. et al. Discerning the ancestry of European Americans in genetic association studies. PLoS Genet. 4, e236 (2008)
    Article Google Scholar
  6. Shriver, M. D. & Kittles, R. A. Genetic ancestry and the search for personalized genetic histories. Nature Rev. Genet. 5, 611–618 (2004)
    Article CAS Google Scholar
  7. Nelson, M. R. et al. The Population Reference Sample (POPRES): a resource for population, disease, and pharmacological genetics research. Am. J. Hum. Genet. (in the press)
  8. Patterson, N., Price, A. & Reich, D. Population structure and eigenanalysis. PLoS Genet. 2, e190 (2006)
    Article Google Scholar
  9. Novembre, J. & Stephens, M. Interpreting principal component analyses of spatial population genetic variation. Nature Genet. 40, 646–649 (2008)
    Article CAS Google Scholar
  10. Menozzi, P., Piazza, A. & Cavalli-Sforza, L. Synthetic maps of human gene frequencies in Europeans. Science 201, 786–792 (1978)
    Article ADS CAS Google Scholar
  11. Campbell, C. D. et al. Demonstrating stratification in a European American population. Nature Genet. 37, 868–872 (2005)
    Article CAS Google Scholar
  12. Price, A. L. et al. Principal components analysis corrects for stratification in genome-wide association studies. Nature Genet. 38, 904–909 (2006)
    Article CAS Google Scholar
  13. McCarthy, M. I. et al. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nature Rev. Genet. 9, 356–369 (2008)
    Article CAS Google Scholar
  14. Zhu, X., Zhang, S., Zhao, H. & Cooper, R. S. Association mapping, using a mixture model for complex traits. Genet. Epidemiol. 23, 181–196 (2002)
    Article Google Scholar
  15. Weedon, M. N. et al. Genome-wide association analysis identifies 20 loci that influence adult height. Nature Genet. 40, 575–583 (2008)
    Article CAS Google Scholar
  16. Lettre, G. et al. Identification of ten loci associated with height highlights new biological pathways in human growth. Nature Genet. 40, 584–591 (2008)
    Article CAS Google Scholar
  17. Cavalli-Sforza, L. L., Menozzi, P. & Piazza, A. The History and Geography of Human Genes 292 (Princeton Univ. Press, 1994)
    MATH Google Scholar
  18. Bauchet, M. et al. Measuring European population stratification with microarray genotype data. Am. J. Hum. Genet. 80, 948–956 (2007)
    Article CAS Google Scholar
  19. Weir, B. S. & Cockerham, C. C. Estimating F-statistics for the analysis of population structure. Evolution 38, 1358–1370 (1984)
    CAS Google Scholar
  20. Eberle, M. A. & Kruglyak, L. An analysis of strategies for discovery of single nucleotide polymorphisms. Genet. Epidemiol. 19, S29–S35 (2000)
    Article Google Scholar
  21. Clark, A. G., Hubisz, M. J., Bustamante, C. D., Williamson, S. H. & Nielsen, R. Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 15, 1496–1502 (2005)
    Article CAS Google Scholar
  22. Slatkin, M. Rare alleles as indicators of gene flow. Evolution 39, 53–65 (1985)
    Article Google Scholar
  23. Falush, D., Stephens, M. & Pritchard, J. K. Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164, 1567–1587 (2003)
    CAS PubMed PubMed Central Google Scholar
  24. Tang, H., Coram, M., Wang, P., Zhu, X. & Risch, N. Reconstructing genetic ancestry blocks in admixed individuals. Am. J. Hum. Genet. 79, 1–12 (2006)
    Article CAS Google Scholar
  25. Hellenthal, G., Auton, A. & Falush, D. Inferring human colonization history using a copying model. PLoS Genet. 4, e1000078 (2008)
    Article Google Scholar
  26. Kooner, J. et al. Genome-wide scan identifies variation in MLXIPL associated with plasma triglycerides. Nature Genet. 40, 149–151 (2008)
    Article CAS Google Scholar
  27. Firmann, M. et al. The CoLaus study: A population-based study to investigate the epidemiology and genetic determinants of cardiovascular risk factors and metabolic syndrome. BMC Cardiovasc. Dis. 8, 6 (2008)
    Article Google Scholar
  28. Purcell, S. et al. PLINK: A tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81, 559–575 (2007)
    Article CAS Google Scholar

Download references

Acknowledgements

We thank J. Kooner and J. Chambers of the LOLIPOP study and G. Waeber, P. Vollenweider, D. Waterworth, J. S. Beckmann, M. Bochud and V. Mooser of the CoLaus study for providing access to their collections. Financial support was provided by the Giorgi-Cavaglieri Foundation (S.B.), the Swiss National Science Foundation (S.B.), US National Science Foundation Postdoctoral Fellowship in Bioinformatics (J.N.), US National Institutes of Health (M.S., C.D.B.) and GlaxoSmithKline (M.R.N.).

Author Contributions M.R.N. coordinated sample collection and genotyping. K.S.K., A.I., J.N. and A.R.B. performed quality control and prepared genotypic and demographic data for further analyses. C.B., M.S., M.R.N., S.B., J.N., T.J., K.B., Z.K., A.R.B. and A.A. all contributed to the design of analyses. J.N., S.B., T.J., K.B. and Z.K. performed PCA analyses. M.S. and J.N. designed and performed assignment-based analyses. T.J. and J.N. performed genome-wide association simulations. J.N., C.B., M.S., M.R.N. and A.A. wrote the paper. All authors discussed the results and commented on the manuscript.

Author information

Authors and Affiliations

  1. Department of Ecology and Evolutionary Biology, Interdepartmental Program in Bioinformatics, University of California–Los Angeles, Los Angeles, California 90095, USA,
    John Novembre
  2. Department of Human Genetics,,
    John Novembre & Matthew Stephens
  3. Department of Statistics, University of Chicago, Chicago, Illinois 60637, USA,
    Matthew Stephens
  4. Department of Medical Genetics,,
    Toby Johnson, Zoltán Kutalik & Sven Bergmann
  5. University Institute for Social and Preventative Medecine, Centre Hospitalier Universitaire Vaudois (CHUV), University of Lausanne, Rue de Bugnon 27 - DGM 328, CH-1005 Lausanne, Switzerland ,
    Toby Johnson
  6. Swiss Institute of Bioinformatics, Central Administration, Quartier Sorge - Batiment Genopode, 1015 Lausanne, Switzerland ,
    Toby Johnson, Zoltán Kutalik & Sven Bergmann
  7. Department of Biological Statistics and Computational Biology, Cornell University, Ithaca, New York 14853, USA,
    Katarzyna Bryc, Adam R. Boyko, Adam Auton, Amit Indap & Carlos D. Bustamante
  8. GlaxoSmithKline, Research Triangle Park, North Carolina 27709, USA ,
    Karen S. King & Matthew R. Nelson

Authors

  1. John Novembre
    You can also search for this author inPubMed Google Scholar
  2. Toby Johnson
    You can also search for this author inPubMed Google Scholar
  3. Katarzyna Bryc
    You can also search for this author inPubMed Google Scholar
  4. Zoltán Kutalik
    You can also search for this author inPubMed Google Scholar
  5. Adam R. Boyko
    You can also search for this author inPubMed Google Scholar
  6. Adam Auton
    You can also search for this author inPubMed Google Scholar
  7. Amit Indap
    You can also search for this author inPubMed Google Scholar
  8. Karen S. King
    You can also search for this author inPubMed Google Scholar
  9. Sven Bergmann
    You can also search for this author inPubMed Google Scholar
  10. Matthew R. Nelson
    You can also search for this author inPubMed Google Scholar
  11. Matthew Stephens
    You can also search for this author inPubMed Google Scholar
  12. Carlos D. Bustamante
    You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence toJohn Novembre.

Supplementary information

Supplementary Information

This file contains Supplementary Notes, Supplementary Figures 1-6 with legends, and Supplementary Tables 1-5. (PDF 715 kb)

PowerPoint slides

Rights and permissions

About this article

Cite this article

Novembre, J., Johnson, T., Bryc, K. et al. Genes mirror geography within Europe.Nature 456, 98–101 (2008). https://doi.org/10.1038/nature07331

Download citation

This article is cited by

Editorial Summary

Ethnic variation in the genes

The power of the latest massively parallel synthetic DNA sequencing technologies is demonstrated in two major collaborations that shed light on the nature of genomic variation with ethnicity. The first describes the genomic characterization of an individual from the Yoruba ethnic group of west Africa. The second reports a personal genome of a Han Chinese, the group comprising 30% of the world's population. These new resources can now be used in conjunction with the Venter, Watson and NIH reference sequences. A separate study looked at genetic ethnicity on the continental scale, based on data from 1,387 individuals from more than 30 European countries. Overall there was little genetic variation between countries, but the differences that do exist correspond closely to the geographic map. Statistical analysis of the genome data places 50% of the individuals within 310 km of their reported origin. As well as its relevance for testing genetic ancestry, this work has implications for evaluating genome-wide association studies that link genes with diseases.