Analysis of Population Structure: A Unifying Framework and Novel Methods Based on Sparse Factor Analysis (original) (raw)

Figure 2

Illustration of two different ways that African and European individuals could be represented.

In the first (sparse) representation in the first row, the factors (shown in red) each represent the mean allele frequencies for either the African population () or the European population (); this lends to sparse loadings (shown in blue) for each individual, since the African individuals are only loaded on the factor representing the African population, and likewise for the European individuals. In the second (non-sparse) representation in the second row, each factor is a combination of and , and each individual is loaded onto both factors. Note that the representations are equivalent by the equations under the table. Whereas SFA and admixture-based models tend to choose the first representation because of the sparse priors and implicit regularization, PCA tends towards the second representation (although the actual factors depend on other features of the data such as sample sizes of both groups).

doi: https://doi.org/10.1371/journal.pgen.1001117.g002