Inferring human population size and separation history from multiple genome sequences - PubMed (original) (raw)

Inferring human population size and separation history from multiple genome sequences

Stephan Schiffels et al. Nat Genet. 2014 Aug.

Abstract

The availability of complete human genome sequences from populations across the world has given rise to new population genetic inference methods that explicitly model ancestral relationships under recombination and mutation. So far, application of these methods to evolutionary history more recent than 20,000-30,000 years ago and to population separations has been limited. Here we present a new method that overcomes these shortcomings. The multiple sequentially Markovian coalescent (MSMC) analyzes the observed pattern of mutations in multiple individuals, focusing on the first coalescence between any two individuals. Results from applying MSMC to genome sequences from nine populations across the world suggest that the genetic separation of non-African ancestors from African Yoruban ancestors started long before 50,000 years ago and give information about human population history as recent as 2,000 years ago, including the bottleneck in the peopling of the Americas and separations within Africa, East Asia and Europe.

PubMed Disclaimer

Figures

Figure 1

Figure 1. MSMC locally infers branch lengths and coalescence times from observed mutations

(a) A schematic representation of the model. Local genealogies change along the sequences by recombination events that rejoin branches of the tree, according to the SMC’ model [5, 6]. The pattern of mutations depends on the genealogy, with few mutations on branches with recent coalescences and more mutations in deeper branches. The hidden states of the model are the time to the first coalescence, along with the identity of the two sequences participating in the first coalescence (dark blue). (c) MSMC can locally infer its hidden states, shown by the posterior probability in color shading. In black, we plot the first coalescence time as generated by the simulation. This local inference works well for two, four and eight haplotypes. The more haplotypes, the more recent is the typical time to the first coalescence event, while the typical segment length increases.

Figure 2

Figure 2. Testing MSMC on simulated data

(a) To test the resolution of MSMC applied to two, four and eight haplotypes, we simulated a series of exponential population growths and declines, each changing the population size by a factor 10. MSMC recovers the resulting zig-zag-pattern (on a double-logarithmic plot) in different times, depending on the number of haplotypes. With two haplotypes, MSMC infers the population history from 40kya to 3mya, with four haplotypes from 8kya to 300kya, and with eight haplotypes from 2kya to 50kya. (b) Model estimates from two simulated population splits 10kya and 100kya. The dashed lines plot the expected relative cross coalescence rate between the two populations before and after the splits. Maximum likelihood estimates are shown in red (four haplotypes) and purple (eight haplotypes). As expected, four haplotypes yield good estimates for the older split, while eight haplotypes give better estimates for the recent split.

Figure 3

Figure 3. Population Size Inference from whole genome sequences

(a) Population size estimates from four haplotypes (two phased individuals) from each of 9 populations. The dashed line was generated from a reduced data set of only the Native American components of the MXL genomes. Estimates from two haplotypes for CEU and YRI are shown for comparison as dotted lines. (b) Population size estimates from eight haplotypes (four phased individuals) from the same populations as above except MXL and MKK. In contrast to four haplotypes, estimates are more recent. For comparison, we show the result from four haplotypes for CEU, CHB and YRI as dotted lines. Data for this Figure is available via Supplementary Table 5.

Figure 4

Figure 4. Genetic Separation between population pairs

(a) Relative cross coalescence rates in and out of Africa. African/Non-African pairs are shown in red colors, pairs within Africa in Purple colors. (b) Relative cross coalescence rates between populations outside Africa. European/East-Asian pairs in blue colors, Asian/MXL pairs in green colors, and other non-African pairs in other colors as indicated. The pairs that include MXL are masked to include only the putative Native American components. The most recent population separations are inferred from eight haplotypes, i.e. four haplotypes from each population, as indicated in the legend. (c) Comparison of the African/Non-African split with simulations of clean splits. We simulated three scenarios, at split times 50kya, 100kya and 150kya. The comparison demonstrates that the history of relative cross coalescence rate between African and Non-African ancestors is incompatible with a clean split model, and suggests it progressively decreased from beyond 150kya to approximately 50kya. (d) Schematic representation of population separations. Timings of splits, population separations, gene flow and bottlenecks are schematically shown along a logarithmic axis of time. Data for this Figure is available via Supplementary Table 5.

Comment in

Similar articles

Cited by

References

    1. Behar DM, et al. The dawn of human matrilineal diversity. Am J Hum Genet. 2008;82(5):1130–40. - PMC - PubMed
    1. Fu Q, et al. Complete mitochondrial genomes reveal neolithic expansion into Europe. PLoS One. 2012;7(3):e32473. - PMC - PubMed
    1. Balaresque P, et al. A predominantly neolithic origin for European paternal lineages. PLoS Biol. 2010;8(1):e1000285. - PMC - PubMed
    1. Atkinson QD, Gray RD, Drummond AJ. mtDNA variation predicts population size in humans and reveals a major Southern Asian chapter in human prehistory. Molecular Biology and Evolution. 2008;25(2):468–474. - PubMed
    1. McVean GAT, Cardin NJ. Approximating the coalescent with recombination. Philosophical transactions of the Royal Society of London Series B, Biological sciences. 2005;360(1459):1387–1393. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources