Interpreting principal component analyses of spatial population genetic variation - PubMed (original) (raw)
Interpreting principal component analyses of spatial population genetic variation
John Novembre et al. Nat Genet. 2008 May.
Abstract
Nearly 30 years ago, Cavalli-Sforza et al. pioneered the use of principal component analysis (PCA) in population genetics and used PCA to produce maps summarizing human genetic variation across continental regions. They interpreted gradient and wave patterns in these maps as signatures of specific migration events. These interpretations have been controversial, but influential, and the use of PCA has become widespread in analysis of population genetics data. However, the behavior of PCA for genetic data showing continuous spatial variation, such as might exist within human continental groups, has been less well characterized. Here, we find that gradients and waves observed in Cavalli-Sforza et al.'s maps resemble sinusoidal mathematical artifacts that arise generally when PCA is applied to spatial data, implying that the patterns do not necessarily reflect specific migration events. Our findings aid interpretation of PCA results and suggest how PCA can help correct for continuous population structure in association studies.
Figures
Figure 1
Comparison of PC-maps of [3] with theoretical and empirical predictions. The first column shows the theoretical expected PC-maps for a class of models in which genetic similarity decays with geographic distance (see text for details). The second column shows PC-maps for population genetic data simulated with no range expansions, but constant homogeneous migration rate in a 2-dimensional habitat. The columns marked Asia, Europe, and Africa are redrawn from the originals of [3]. Each map is marked by which PC it represents. The order of maps in each of the last three columns was chosen to correspond with the shapes in the first two columns.
Figure 2
Results of PCA applied to data from a one-dimensional habitat. (A) Schematic of the one-dimensional habitat, with circles marking sampling locations and shades of blue marking order along the line. (B) One-dimensional PC-maps (i.e. plots of each PC element against the geographic position of the corresponding sample location). (C) Biplots of PC1 vs. PC2, PC2 vs. PC3, and PC3 vs. PC4. Colors correspond to those in Panel A. In many datasets without spatially referenced samples, the colors and the lines connecting neighboring points would not be observed; here they are shown to aid interpretation.
Comment in
- Principal component analysis of genetic data.
Reich D, Price AL, Patterson N. Reich D, et al. Nat Genet. 2008 May;40(5):491-2. doi: 10.1038/ng0508-491. Nat Genet. 2008. PMID: 18443580 No abstract available.
Similar articles
- Comparing spatial maps of human population-genetic variation using Procrustes analysis.
Wang C, Szpiech ZA, Degnan JH, Jakobsson M, Pemberton TJ, Hardy JA, Singleton AB, Rosenberg NA. Wang C, et al. Stat Appl Genet Mol Biol. 2010;9(1):Article 13. doi: 10.2202/1544-6115.1493. Epub 2010 Jan 27. Stat Appl Genet Mol Biol. 2010. PMID: 20196748 Free PMC article. - Influence of admixture and paleolithic range contractions on current European diversity gradients.
Arenas M, François O, Currat M, Ray N, Excoffier L. Arenas M, et al. Mol Biol Evol. 2013 Jan;30(1):57-61. doi: 10.1093/molbev/mss203. Epub 2012 Aug 25. Mol Biol Evol. 2013. PMID: 22923464 - Correcting principal component maps for effects of spatial autocorrelation in population genetic data.
Frichot E, Schoville S, Bouchard G, François O. Frichot E, et al. Front Genet. 2012 Nov 20;3:254. doi: 10.3389/fgene.2012.00254. eCollection 2012. Front Genet. 2012. PMID: 23181073 Free PMC article. - Extracting functional networks with spatial independent component analysis: the role of dimensionality, reliability and aggregation scheme.
Esposito F, Goebel R. Esposito F, et al. Curr Opin Neurol. 2011 Aug;24(4):378-85. doi: 10.1097/WCO.0b013e32834897a5. Curr Opin Neurol. 2011. PMID: 21734575 Review. - African human diversity, origins and migrations.
Reed FA, Tishkoff SA. Reed FA, et al. Curr Opin Genet Dev. 2006 Dec;16(6):597-605. doi: 10.1016/j.gde.2006.10.008. Epub 2006 Oct 23. Curr Opin Genet Dev. 2006. PMID: 17056248 Review.
Cited by
- Genome-wide insights into the genetic history of human populations.
Pugach I, Stoneking M. Pugach I, et al. Investig Genet. 2015 Apr 1;6:6. doi: 10.1186/s13323-015-0024-0. eCollection 2015. Investig Genet. 2015. PMID: 25834724 Free PMC article. - Principal Component Analyses (PCA)-based findings in population genetic studies are highly biased and must be reevaluated.
Elhaik E. Elhaik E. Sci Rep. 2022 Aug 29;12(1):14683. doi: 10.1038/s41598-022-14395-4. Sci Rep. 2022. PMID: 36038559 Free PMC article. - Exome Analyses of Long QT Syndrome Reveal Candidate Pathogenic Mutations in Calmodulin-Interacting Genes.
Shigemizu D, Aiba T, Nakagawa H, Ozaki K, Miya F, Satake W, Toda T, Miyamoto Y, Fujimoto A, Suzuki Y, Kubo M, Tsunoda T, Shimizu W, Tanaka T. Shigemizu D, et al. PLoS One. 2015 Jul 1;10(7):e0130329. doi: 10.1371/journal.pone.0130329. eCollection 2015. PLoS One. 2015. PMID: 26132555 Free PMC article. - Singular value decomposition of protein sequences as a method to visualize sequence and residue space.
Baxter-Koenigs AR, El Nesr G, Barrick D. Baxter-Koenigs AR, et al. Protein Sci. 2022 Oct;31(10):e4422. doi: 10.1002/pro.4422. Protein Sci. 2022. PMID: 36173173 Free PMC article. - A comparison of worldwide phonemic and genetic variation in human populations.
Creanza N, Ruhlen M, Pemberton TJ, Rosenberg NA, Feldman MW, Ramachandran S. Creanza N, et al. Proc Natl Acad Sci U S A. 2015 Feb 3;112(5):1265-72. doi: 10.1073/pnas.1424033112. Epub 2015 Jan 20. Proc Natl Acad Sci U S A. 2015. PMID: 25605893 Free PMC article.
References
- Menozzi P, Piazza A, Cavalli-Sforza L. Science. 1978;201:786–792. - PubMed
- Cavalli-Sforza L, Menozzi P, Piazza A. Science. 1993;259(5095):639–646. - PubMed
- Cavalli-Sforza LL, Menozzi P, Piazza A. The History and Geography of Human Genes. Princeton University Press; 1994.
- Jobling M, Hurles M, Tyler-Smith C. Human evolutionary genetics. Garland Science; 2004.
- Rendine S, Piazza A, Cavalli-Sforza LL. The American Naturalist. 1986;128:681–706.
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources