Reconstructing genetic ancestry blocks in admixed individuals - PubMed (original) (raw)

Reconstructing genetic ancestry blocks in admixed individuals

Hua Tang et al. Am J Hum Genet. 2006 Jul.

Abstract

A chromosome in an individual of recently admixed ancestry resembles a mosaic of chromosomal segments, or ancestry blocks, each derived from a particular ancestral population. We consider the problem of inferring ancestry along the chromosomes in an admixed individual and thereby delineating the ancestry blocks. Using a simple population model, we infer gene-flow history in each individual. Compared with existing methods, which are based on a hidden Markov model, the Markov-hidden Markov model (MHMM) we propose has the advantage of accounting for the background linkage disequilibrium (LD) that exists in ancestral populations. When there are more than two ancestral groups, we allow each ancestral population to admix at a different time in history. We use simulations to illustrate the accuracy of the inferred ancestry as well as the importance of modeling the background LD; not accounting for background LD between markers may mislead us to false inferences about mixed ancestry in an indigenous population. The MHMM makes it possible to identify genomic blocks of a particular ancestry by use of any high-density single-nucleotide-polymorphism panel. One application of our method is to perform admixture mapping without genotyping special ancestry-informative-marker panels.

PubMed Disclaimer

Figures

Figure  1.

Figure 1.

Graphical representation of an HMM (a) and an MHMM (b).

Figure  2.

Figure 2.

Estimation of two-marker haplotype frequency estimation. Unphased genotype data in 50 individuals were simulated on the basis of chromosome 22 haplotypes of the CEU individuals genotyped in the HapMap project. Each plot can be viewed as a two-dimensional histogram, in which the _X_-axis represents the true haplotype frequency, and the _Y_-axis represents the corresponding estimated frequencies. The intensity at each pixel indicates the height of the histogram, or the number of marker pairs whose true haplotype frequency is at the _X_-coordinate while the estimated haplotype frequency is at the _Y_-coordinate. a, Naive haplotype frequency estimates. Both allele frequencies and haplotype frequencies are estimated from a small sample of individuals. b, Augmented haplotype frequency estimates. Haplotype frequencies were estimated from same set of individuals as in panel a, but allele frequencies were estimated from a larger sample.

Figure  3.

Figure 3.

Estimated admixing time, τ, of 400 simulated individuals. Red circles represent the MLE under the MHMM; blue triangles represent the MLE under the HMM by use of the same genotype data. True times are 25, 10, and 25, indicated with a yellow square. Some jitter is added to the MLEs to aid visualization.

Figure  4.

Figure 4.

Ancestry for a simulated admixed individual. The _Y_-axis represents the posterior probability that one allele is derived from a specific ancestry; the _X_-axis indicates the physical locations of the markers. Top, True ancestral states. Middle, MHMM estimates. Bottom, HMM estimates.

Figure  5.

Figure 5.

Comparison of percentage reduction in MSE. Percentage reduction for individual n is defined as

(MSE HMM _n_-MSE MHMM n)/MSE HMM n

.

Figure  6.

Figure 6.

Estimated ancestry for a Han Chinese individual from Beijing. The _Y_-axis represents the posterior probability that one allele is derived from a specific ancestry; the _X_-axis indicates the physical locations of markers. Markers were sampled at an average spacing of 30 kb (top panels), 6 kb (middle panels), and 3 kb (bottom panels), which approximated the density of a 100K SNP chip, approximated the density of a 500K SNP chip, and used all HapMap SNPs, respectively. Left panels, MHMM correctly infers Asian ancestry (yellow) at most markers. Right panels, HMM assigns considerable probability of European ancestry (blue) or African ancestry (red) in several regions.

Figure  7.

Figure 7.

Estimated ancestry for a simulated individual with asymmetric admixing history. The _Y_-axis represents the posterior probability that one allele is derived from a specific ancestry; the _X_-axis indicates the physical locations of markers. a, True ancestry along the paternal and the maternal chromosomes. The paternal chromosome was generated assuming

τ=(25,25,25)

and

π=(0.4,0.4,0.2)

, whereas the maternal chromosome was generated assuming

τ=(2,2,2)

and

π=(0.75,0.125,0.125)

. b, Posterior ancestry estimates at the MLE of τ. c, Posterior ancestry estimates under the assumption

τ=(2,2,2)

. d, Posterior ancestry estimates under the assumption

τ=(50,50,50)

.

Similar articles

Cited by

References

Web Resource

    1. SABER, http://www.fhcrc.org/science/labs/tang/

References

    1. Rife D (1954) Populations of hybrid origin as source material for the detection of linkage. Am J Hum Genet 6:26–33 - PMC - PubMed
    1. McKeigue P (1998) Mapping genes that underlie ethnic differences in disease risk: methods for detecting linkage in admixed populations, by conditioning on parental admixture. Am J Hum Genet 63:241–251 - PMC - PubMed
    1. Montana G, Pritchard J (2004) Statistical tests for admixture mapping with case-control and cases-only data. Am J Hum Genet 75:771–789 - PMC - PubMed
    1. Hoggart C, Shriver M, Kittles R, Clayton D, McKeigue P (2004) Design and analysis of admixture mapping studies. Am J Hum Genet 74:965–978 - PMC - PubMed
    1. Patterson N, Hattangadi N, Lane B, Lohmueller K, Hafler D, Oksenberg J, Hauser S, Smith M, O’Brien S, Altshuler D, Daly M, Reich D (2004) Methods for high-density admixture mapping of disease genes. Am J Hum Genet 74:979–1000 - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources