Reconstructing genetic ancestry blocks in admixed individuals - PubMed (original) (raw)
Reconstructing genetic ancestry blocks in admixed individuals
Hua Tang et al. Am J Hum Genet. 2006 Jul.
Abstract
A chromosome in an individual of recently admixed ancestry resembles a mosaic of chromosomal segments, or ancestry blocks, each derived from a particular ancestral population. We consider the problem of inferring ancestry along the chromosomes in an admixed individual and thereby delineating the ancestry blocks. Using a simple population model, we infer gene-flow history in each individual. Compared with existing methods, which are based on a hidden Markov model, the Markov-hidden Markov model (MHMM) we propose has the advantage of accounting for the background linkage disequilibrium (LD) that exists in ancestral populations. When there are more than two ancestral groups, we allow each ancestral population to admix at a different time in history. We use simulations to illustrate the accuracy of the inferred ancestry as well as the importance of modeling the background LD; not accounting for background LD between markers may mislead us to false inferences about mixed ancestry in an indigenous population. The MHMM makes it possible to identify genomic blocks of a particular ancestry by use of any high-density single-nucleotide-polymorphism panel. One application of our method is to perform admixture mapping without genotyping special ancestry-informative-marker panels.
Figures
Figure 1.
Graphical representation of an HMM (a) and an MHMM (b).
Figure 2.
Estimation of two-marker haplotype frequency estimation. Unphased genotype data in 50 individuals were simulated on the basis of chromosome 22 haplotypes of the CEU individuals genotyped in the HapMap project. Each plot can be viewed as a two-dimensional histogram, in which the _X_-axis represents the true haplotype frequency, and the _Y_-axis represents the corresponding estimated frequencies. The intensity at each pixel indicates the height of the histogram, or the number of marker pairs whose true haplotype frequency is at the _X_-coordinate while the estimated haplotype frequency is at the _Y_-coordinate. a, Naive haplotype frequency estimates. Both allele frequencies and haplotype frequencies are estimated from a small sample of individuals. b, Augmented haplotype frequency estimates. Haplotype frequencies were estimated from same set of individuals as in panel a, but allele frequencies were estimated from a larger sample.
Figure 3.
Estimated admixing time, τ, of 400 simulated individuals. Red circles represent the MLE under the MHMM; blue triangles represent the MLE under the HMM by use of the same genotype data. True times are 25, 10, and 25, indicated with a yellow square. Some jitter is added to the MLEs to aid visualization.
Figure 4.
Ancestry for a simulated admixed individual. The _Y_-axis represents the posterior probability that one allele is derived from a specific ancestry; the _X_-axis indicates the physical locations of the markers. Top, True ancestral states. Middle, MHMM estimates. Bottom, HMM estimates.
Figure 5.
Comparison of percentage reduction in MSE. Percentage reduction for individual n is defined as
(MSE HMM _n_-MSE MHMM n)/MSE HMM n
.
Figure 6.
Estimated ancestry for a Han Chinese individual from Beijing. The _Y_-axis represents the posterior probability that one allele is derived from a specific ancestry; the _X_-axis indicates the physical locations of markers. Markers were sampled at an average spacing of 30 kb (top panels), 6 kb (middle panels), and 3 kb (bottom panels), which approximated the density of a 100K SNP chip, approximated the density of a 500K SNP chip, and used all HapMap SNPs, respectively. Left panels, MHMM correctly infers Asian ancestry (yellow) at most markers. Right panels, HMM assigns considerable probability of European ancestry (blue) or African ancestry (red) in several regions.
Figure 7.
Estimated ancestry for a simulated individual with asymmetric admixing history. The _Y_-axis represents the posterior probability that one allele is derived from a specific ancestry; the _X_-axis indicates the physical locations of markers. a, True ancestry along the paternal and the maternal chromosomes. The paternal chromosome was generated assuming
τ=(25,25,25)
and
π=(0.4,0.4,0.2)
, whereas the maternal chromosome was generated assuming
τ=(2,2,2)
and
π=(0.75,0.125,0.125)
. b, Posterior ancestry estimates at the MLE of τ. c, Posterior ancestry estimates under the assumption
τ=(2,2,2)
. d, Posterior ancestry estimates under the assumption
τ=(50,50,50)
.
Similar articles
- Inferring ancestries efficiently in admixed populations with linkage disequilibrium.
Bercovici S, Geiger D. Bercovici S, et al. J Comput Biol. 2009 Aug;16(8):1141-50. doi: 10.1089/cmb.2009.0105. J Comput Biol. 2009. PMID: 19645595 - An ancestry informative marker panel design for individual ancestry estimation of Hispanic population using whole exome sequencing data.
Wang LJ, Zhang CW, Su SC, Chen HH, Chiu YC, Lai Z, Bouamar H, Ramirez AG, Cigarroa FG, Sun LZ, Chen Y. Wang LJ, et al. BMC Genomics. 2019 Dec 30;20(Suppl 12):1007. doi: 10.1186/s12864-019-6333-6. BMC Genomics. 2019. PMID: 31888480 Free PMC article. - Mapping asthma-associated variants in admixed populations.
Mersha TB. Mersha TB. Front Genet. 2015 Sep 29;6:292. doi: 10.3389/fgene.2015.00292. eCollection 2015. Front Genet. 2015. PMID: 26483834 Free PMC article. Review. - A Continuous Correlated Beta Process Model for Genetic Ancestry in Admixed Populations.
Gompert Z. Gompert Z. PLoS One. 2016 Mar 11;11(3):e0151047. doi: 10.1371/journal.pone.0151047. eCollection 2016. PLoS One. 2016. PMID: 26966908 Free PMC article. - Analyses of genetic ancestry enable key insights for molecular ecology.
Gompert Z, Buerkle CA. Gompert Z, et al. Mol Ecol. 2013 Nov;22(21):5278-94. doi: 10.1111/mec.12488. Epub 2013 Sep 19. Mol Ecol. 2013. PMID: 24103088 Review.
Cited by
- AncestryGrapher toolkit: Python command-line pipelines to visualize global- and local- ancestry inferences from the RFMIX version 2 software.
Lisi A, Campbell MC. Lisi A, et al. Bioinformatics. 2024 Nov 1;40(11):btae616. doi: 10.1093/bioinformatics/btae616. Bioinformatics. 2024. PMID: 39412440 Free PMC article. - Global and Local Ancestry and its Importance: A Review.
Goli RC, Chishi KG, Ganguly I, Singh S, Dixit SP, Rathi P, Diwakar V, Sree C C, Limbalkar OM, Sukhija N, Kanaka KK. Goli RC, et al. Curr Genomics. 2024;25(4):237-260. doi: 10.2174/0113892029298909240426094055. Epub 2024 May 9. Curr Genomics. 2024. PMID: 39156729 Free PMC article. Review. - Inference of Locus-Specific Population Mixtures from Linked Genome-Wide Allele Frequencies.
Reyna-Blanco CS, Caduff M, Galimberti M, Leuenberger C, Wegmann D. Reyna-Blanco CS, et al. Mol Biol Evol. 2024 Jul 3;41(7):msae137. doi: 10.1093/molbev/msae137. Mol Biol Evol. 2024. PMID: 38958167 Free PMC article. - Misunderstanding of race as biology has deep negative biological and social consequences.
Lujan HL, DiCarlo SE. Lujan HL, et al. Exp Physiol. 2024 Aug;109(8):1240-1243. doi: 10.1113/EP091491. Epub 2024 May 3. Exp Physiol. 2024. PMID: 38698766 Free PMC article. No abstract available. - Concurrently mapping quantitative trait loci associations from multiple subspecies within hybrid populations.
Warburton CL, Costilla R, Engle BN, Moore SS, Corbet NJ, Fordyce G, McGowan MR, Burns BM, Hayes BJ. Warburton CL, et al. Heredity (Edinb). 2023 Dec;131(5-6):350-360. doi: 10.1038/s41437-023-00651-4. Epub 2023 Oct 6. Heredity (Edinb). 2023. PMID: 37798326 Free PMC article.
References
Web Resource
References
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials