Sensitive detection of chromosomal segments of distinct ancestry in admixed populations - PubMed (original) (raw)

Sensitive detection of chromosomal segments of distinct ancestry in admixed populations

Alkes L Price et al. PLoS Genet. 2009 Jun.

Abstract

Identifying the ancestry of chromosomal segments of distinct ancestry has a wide range of applications from disease mapping to learning about history. Most methods require the use of unlinked markers; but, using all markers from genome-wide scanning arrays, it should in principle be possible to infer the ancestry of even very small segments with exquisite accuracy. We describe a method, HAPMIX, which employs an explicit population genetic model to perform such local ancestry inference based on fine-scale variation data. We show that HAPMIX outperforms other methods, and we explore its utility for inferring ancestry, learning about ancestral populations, and inferring dates of admixture. We validate the method empirically by applying it to populations that have experienced recent and ancient admixture: 935 African Americans from the United States and 29 Mozabites from North Africa. HAPMIX will be of particular utility for mapping disease genes in recently admixed populations, as its accurate estimates of local ancestry permit admixture and case-control association signals to be combined, enabling more powerful tests of association than with either signal alone.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Schematic of the Markov model we use for ancestry inference.

The black lower line represents a chromosomal segment from an admixed individual, carrying a number of typed mutations (black circles). The underlying ancestry is shown in the bottom color bar, and reveals an ancestry change from the first population (red) to the second population (blue). The admixed chromosome is modeled as a mosaic of segments of DNA from two sets of individuals drawn from different reference populations (red and blue horizontal lines respectively) closely related to the progenitor populations for the admixture event. The yellow line shows how the admixed chromosome is constructed in terms of this mosaic. The dotted line above the bottom color bar shows the reference population being copied from along the chromosome – note that at most positions, this is identical to the true underlying ancestry, but with occasional “miscopying” from the other population (blue dotted segment occurring within red ancestry segment). Note also that switches between chromosomes being copied from, representing historical recombinations, are rapid (6 switches), while ancestry changes, representing recombination since admixture, are much rarer (1 switch). Finally, note that at most positions the type of the admixed chromosome is identical to that of the chromosome being copied from, but an exception to this occurs at one site, shown as a grey circle, and representing mutation or genotyping error. In our inference framework, we observe only the variation data for the admixed and reference individuals: the yellow line, and the underlying ancestry, must be inferred as the hidden states in a HMM.

Figure 2

Figure 2. Comparison of ancestry estimates produced by HAPMIX, ANCESTRYMAP, and LAMP-ANC.

(A) Results comparison for a simulated recently admixed sample on chromosome 1. On each plot, the y-axis denotes the number of European chromosomal copies predicted by each method. The centromere of the chromosome is blanked out in white. The top plot shows the true number of European chromosomes, while the subsequent labeled plots show the results of applying each respective method. (B) Results comparison for a real African American individual across chromosome 1. Plots are constructed as in (A). We note the visible similarity to the simulation results.

Figure 3

Figure 3. Accuracy of HAPMIX, ANCESTRYMAP, and LAMP-ANC predictions for various values of λ, the number of generations since admixture.

For each admixture time, results are based on analyzing 20 admixed individuals, simulated using an average genome-wide proportion of 80% African and 20% European ancestry. For each method, we plot the squared correlation between predicted and true number of European copies as a function of λ.

Figure 4

Figure 4. Properties of HAPMIX.

(A) For simulated admixed data sets, constructed as described in Materials and Methods using λ = 6 and λ = 100, we plot the r 2 between predicted and true number of European chromosomal copies, as a function of the number of markers genotyped across the genome. (B) The same as part A, except we now fix the number of markers genotyped at 500,000, and vary the number of input chromosomes used to predict ancestry (for full details, see text). (C) Calibration of uncertainty estimates produced by HAPMIX. For the λ = 6 simulations, and for each of x = 0, x = 1, and x = 2 we compare the average probability of x copies of European ancestry predicted by HAPMIX to the true frequency of having x copies of European ancestry, binning the predicted probabilities of x copies of European ancestry into bins of size 0.05. If the method were perfectly calibrated, the results would lie along the line y = x (thin black line). Note that for λ = 6, ancestry is normally inferred with high certainty, and over 98% of data points fall into the most extreme two bins. (D) The same as part A, except using λ = 100. Both the last two plots show reasonable calibration of HAPMIX.

Figure 5

Figure 5. Correlation between ancestry proportion and estimated time since admixture in African Americans.

Each grey point shows an estimate of the time λ since admixture corresponding to one of 935 analysed African American individuals (Materials and Methods). The red line shows sliding averages of 20 individuals, binned according to increasing African ancestry proportions.

Figure 6

Figure 6. Local ancestry estimates produced by HAPMIX for a simulated anciently admixed sample on chromosome 1, simulated using 80% European and 20% African ancestry, with the admixture occurring 100 generations ago.

As in Figure 2, the top plot shows the truth, while the second plot shows the HAPMIX inference. We plot the true number of African chromosomes on chromosome 1 (top plot), together with the number of African copies predicted by HAPMIX (bottom plot).

Figure 7

Figure 7. Local ancestry estimates produced by HAPMIX for three real Mozabite individuals on chromosome 1.

The plots are constructed as for Figure 5, and show HAPMIX estimates of the number of sub-Saharan African copies across chromosome 1 for three individuals chosen for having different genome-wide African ancestries: 20% (top plot), 29% (middle plot) and 75% (bottom plot). The top plot looks similar to Figure 5, while the much longer segments seen in the two individuals with more African ancestry indicate more recent admixture with sub-Saharan Africans.

Figure 8

Figure 8. Principal components analysis of Mozabite, French, and Yoruba samples from the HGDP.

Similar articles

Cited by

References

    1. Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, et al. Methods for high-density admixture mapping of disease genes. Am J Hum Genet. 2004;74:979–1000. - PMC - PubMed
    1. Hoggart CJ, Shriver MD, Kittles RA, Clayton DG, McKeigue PM. Design and analysis of admixture mapping studies. Am J Hum Genet. 2004;74:965–78. - PMC - PubMed
    1. Montana G, Pritchard JK. Statistical tests for admixture mapping with case-control and cases-only data. Am J Hum Genet. 2004;75:771–89. - PMC - PubMed
    1. Smith MW, Patterson N, Lautenberger JA, Truelove AL, McDonald GJ, et al. A high-density admixture map for disease gene discovery in African Americans. Am J Hum Genet. 2004;74:1001–13. - PMC - PubMed
    1. Tian C, Hinds DA, Shigeta R, Kittles R, Ballinger DG, et al. A genomewide single-nucleotide-polymorphism panel with high ancestry information for African American admixture mapping. Am J Hum Genet. 2006;79:640–9. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

LinkOut - more resources