Inferring demographic history from a spectrum of shared haplotype lengths - PubMed (original) (raw)

Inferring demographic history from a spectrum of shared haplotype lengths

Kelley Harris et al. PLoS Genet. 2013 Jun.

Abstract

There has been much recent excitement about the use of genetics to elucidate ancestral history and demography. Whole genome data from humans and other species are revealing complex stories of divergence and admixture that were left undiscovered by previous smaller data sets. A central challenge is to estimate the timing of past admixture and divergence events, for example the time at which Neanderthals exchanged genetic material with humans and the time at which modern humans left Africa. Here, we present a method for using sequence data to jointly estimate the timing and magnitude of past admixture events, along with population divergence times and changes in effective population size. We infer demography from a collection of pairwise sequence alignments by summarizing their length distribution of tracts of identity by state (IBS) and maximizing an analytic composite likelihood derived from a Markovian coalescent approximation. Recent gene flow between populations leaves behind long tracts of identity by descent (IBD), and these tracts give our method power by influencing the distribution of shared IBS tracts. In simulated data, we accurately infer the timing and strength of admixture events, population size changes, and divergence times over a variety of ancient and recent time scales. Using the same technique, we analyze deeply sequenced trio parents from the 1000 Genomes project. The data show evidence of extensive gene flow between Africa and Europe after the time of divergence as well as substructure and gene flow among ancestral hominids. In particular, we infer that recent African-European gene flow and ancient ghost admixture into Europe are both necessary to explain the spectrum of IBS sharing in the trios, rejecting simpler models that contain less population structure.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. An eight base-pair tract of identity by state (IBS).

Figure 2

Figure 2. Spectra of IBS sharing between simulated populations that differ only in admixture time.

Each of the colored tract spectra in Figure 2A was generated from formula image base pairs of sequence alignment simulated with Hudson's MS . The IBS tracts are shared between two populations of constant size 10,000 that diverged 2,000 generations ago, with one haplotype sampled from each population. 5% of the genetic material from one population is the product of a recent admixture pulse from the other population. Figure 2B illustrates the history being simulated. When the admixture occurred less than 1,000 generations ago, it noticeably increases the abundance of long IBS tracts. The gray lines in 2A are theoretical tract abundance predictions, and fit the simulated data extremely well. To smooth out noise in the simulated data, abundances are averaged over intervals with exponentially spaced endpoints formula image.

Figure 3

Figure 3. Shared IBS tracts within bottlenecked populations.

As in Figure 2, each colored spectrum in Figure 3A was generated by using MS to simulate formula image base pairs of pairwise alignment. Both sequences are derived from the population depicted in Figure 3B that underwent a bottleneck from size formula image to size formula image, the duration of the bottleneck being formula image generations. 1,000 generations ago, the population recovered to size 10,000. These bottlenecks leave similar frequencies of very long and very short IBS tracts because they have identical ratios of strength to duration, but they leave different signature increases compared to the no-bottleneck history in the abundance of formula imageformula image-base IBS tracts. In grey are the expected IBS tract spectra that we predict analytically for each simulated history.

Figure 4

Figure 4. Frequencies of IBS tracts shared between the 1000 Genomes trio parental haplotypes.

Each plot records the number of formula image-base IBS tracts observed per base pair of sequence alignment. The red spectrum records tract frequencies compiled from the entire alignment, while the blue spectra result from 100 repetitions of block bootstrap resampling. A slight upward concavity around formula image base pairs is the signature of the out of Africa bottleneck in Europeans.

Figure 5

Figure 5. IBS tract lengths in the 1000 Genomes pilot data: trios v. low coverage.

These IBS tract spectra were generated from pairwise alignments of the 1000 Genomes high coverage trio parental haplotypes and the CEU (European) and YRI (Yoruban) low coverage haplotypes, aligning samples within each population and between the two populations. Due to excess sequencing and phasing errors, the low coverage alignments have excess closely spaced SNPs and too few long shared IBS tracts. Despite this, frequencies of tracts between 1 and 100 kB are very similar between the two datasets and diagnostic of population identity.

Figure 6

Figure 6. Mutation and recombination rates within -base IBS tracts.

Figure 6A shows that there is no length class of IBS tracts with a significantly higher or lower mutation rate than the genome-wide average (recombination rates are taken from the deCODE genetic map [53]). In contrast, Figure 6B shows that IBS tracts shorter than 100 base pairs occur in regions with higher rates of human-chimp differences than the genomewide average. These plots were made using IBS tracts shared between Europeans and Africans, but the results are similar for IBS sharing within each of the populations.

Figure 7

Figure 7. A history inferred from IBS sharing in Europeans and Yorubans.

This is the simplest history we found to satisfactorily explain IBS tract sharing in the 1000 Genomes trio data. It includes ancient ancestral population size changes, an out-of-African bottleneck in Europeans, ghost admixture into Europe from an ancestral hominid, and a long period of gene flow between the diverging populations.

Figure 8

Figure 8. Accurate prediction of IBS sharing in the trio data.

The upper left hand panel summarizes IBS tracts shared within the European and Yoruban 1000 Genomes trio parents, as well as IBS tract sharing between the two groups. The remaining three panels compare these real data to data simulated according to the history from Figure 7 with the maximum likelihood parameters from Table 2.

Figure 9

Figure 9. The coalescent with recombination and the sequentially Markov coalescent associate an observed pair of DNA sequences with a history that specifies a time to most recent common ancestry for each base pair.

Polymorphisms are caused by mutation events, while changes in TMRCA are caused by recombination events.

Figure 10

Figure 10. An -base IBS tract with three recombination events in its history.

A blue skyline profile represents the hidden coalescence history of this idealized IBS tract. In order to predict the frequency of these tracts in a sequence alignment, we must integrate over the coalesence times formula image as well as the times formula image, formula image, and formula image when recombinations occurred.

Similar articles

Cited by

References

    1. Slatkin M, Madison W (1989) A cladistic measure of gene ow inferred form the phylogenies of alleles. Genetics 123: 603–613. - PMC - PubMed
    1. Templeton A (2002) Out of Africa again and again. Nature 416: 45–51. - PubMed
    1. Tajima F (1983) Evolutionary relationship of DNA sequences in finite populations. Genetics 105: 437–460. - PMC - PubMed
    1. Slatkin M, Hudson R (1991) Pairwise comparisons of mitochondrial DNA sequences in stable and exponentially growing populations. Genetics 129: 555–562. - PMC - PubMed
    1. Wakeley J, Hey J (1997) Estimating ancestral population parameters. Genetics 145: 847–855. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources