Coalescent tree imbalance and a simple test for selective sweeps based on microsatellite variation - PubMed (original) (raw)

Coalescent tree imbalance and a simple test for selective sweeps based on microsatellite variation

Haipeng Li et al. PLoS Comput Biol. 2013.

Abstract

Selective sweeps are at the core of adaptive evolution. We study how the shape of coalescent trees is affected by recent selective sweeps. To do so we define a coarse-grained measure of tree topology. This measure has appealing analytical properties, its distribution is derived from a uniform, and it is easy to estimate from experimental data. We show how it can be cast into a test for recent selective sweeps using microsatellite markers and present an application to an experimental data set from Plasmodium falciparum.

PubMed Disclaimer

Conflict of interest statement

The authors have declared that no competing interests exist.

Figures

Figure 1

Figure 1. Coalescent trees under recombination and selection.

A: Sketch of a neutral coalescent tree with tree size formula image. B and C: A selective sweep in locus C leads to a tree of low height (formula image small). The selective sweep was initiated by a beneficial mutation at time formula image. At some distance from C, a single lineage (circled branch in C) has “recombined away” leading to the unbalanced tree shown at locus B. Note that tree height between trees B and C changes drastically and that formula image at locus C and formula image at locus B. Multiple recombination events (indicated by the crosses at the bottom line) between loci A and B lead to essentially uncorrelated trees at A and B.

Figure 2

Figure 2. Mean and standard deviation of and for coalescent trees of size .

Shown are the values for formula image independent realizations. formula image-axis: values of formula image (black circles) and formula image (red squares) are determined for the subtrees originating at node formula image, formula image. The solid gray line shows the theoretical expectation according to eq (3).

Figure 3

Figure 3. Correlation across distance.

Correlation based on simulations (formula image replicates) of the statistic formula image of the true tree. Pearson's correlation coefficient is measured between formula image and formula image for pairs of trees at position formula image and position formula image. Three scenarios are compared: standard neutral model with constant population size (green), population bottleneck (blue) and selective sweep (red). Sample size formula image, formula image, formula image and a recombinaton rate of formula image is assumed. The bottleneck parameters are: formula image, formula image. The selective sweep has a strength of formula image. The selected site is at position formula image. Under standard neutrality, formula image correlation is reached at position formula image cM, corresponding to about formula image bp.

Figure 4

Figure 4. Estimation of .

A: estimation of formula image by formula image. B: estimation of formula image by formula image. First row: standard neutral model. Second row: Selective sweep; estimation of formula image at distance formula image from selected site. Third row: Selective sweep; distance formula image from selected site. Parameters: formula image; formula image (top and bottom row); formula image (middle row); formula image; formula image; formula image.

Figure 5

Figure 5. Profile of and along a recombining chromosome.

Plots in column A show the distribution of formula image, i.e. when the tree topology is known. Plots in column B show the distribution of the estimate formula image when the tree topology is unknown, but estimated from microsatellite polymorphism data. Each boxplot corresponds to one of formula image marker loci located at the positions indicated on the formula imageaxis. The regions spans formula image kb in total. Symmetric step-wise mutation model with formula image. Other parameters: formula image, formula image and recombination rate per bp formula image (corresponding to 1 cM/Mb). First row: standard neutral model with constant formula image. Second row: bottleneck model with severity formula image and onset formula image. Third row: Selective sweep at locus formula image with formula image which was completed formula image time units ago. For comparison with the theoretical expectation, the leftmost boxplot in each panel shows the standard normal distribution (labeled ‘N’).

Figure 6

Figure 6. Power to detect loci under recent selection by the three tests defined in eqs (14) to (16)

Parameters: level formula image (solid) and formula image (dotted); selection coefficient formula image; time since fixation formula image; sample size formula image; mutation rate formula image; recombination rate formula image. The formula imageaxis shows positions to the left (negative values) and right (positive values) of the locus under selection at position formula image. Scale is in cM xformula image, corresponding here to kb.

Figure 7

Figure 7. Traces of selection around a drug resistance locus in Plasmodium.

Results of tests formula image (stars), formula image (circles) and formula image (triangles) applied to a formula image kb region sorrounding the pfmdr1 locus in P.falciparum. Shown are significant results on the 5% (open symbols) and 1% (filled symbols) levels. Positions of the examined microsatellite markers are indicated by arrows. Data from .

Similar articles

Cited by

References

    1. Kingman JFC (1982) The coalescent. Stochastic Processes and their Applications 13: 235–248.
    1. Hudson RR (1990) Gene genealogies and the coalescent process. In: Oxford Surveys in Evolutionary Biology, volume 7. Oxford University Press. pp. 1–44.
    1. Wakeley J (2009) Coalescent theory – an introduction. Greenwood Village, Colorado: Roberts&Company.
    1. Ewens W (1972) The sampling theory of selectively neutral alleles. Theor Popul Biol 3: 87–112. - PubMed
    1. Tajima F (1989) The effect of change in population size on DNA polymorphism. Genetics 123: 597–601. - PMC - PubMed

Publication types

MeSH terms

Grants and funding

HL was supported by the National Key Basic Research Program of China (2012CB316505), the NSFC grants (31172073 and 91131010) and the Bairen Program, and through a grant to TW by the German Research Foundation (DFG-SFB680, www.dfg.de). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

LinkOut - more resources