Allele frequency distribution under recurrent selective sweeps - PubMed (original) (raw)

Allele frequency distribution under recurrent selective sweeps

Yuseob Kim. Genetics. 2006 Mar.

Abstract

The allele frequency of a neutral variant in a population is pushed either upward or downward by directional selection on a linked beneficial mutation ("selective sweeps"). DNA sequences sampled after the fixation of the beneficial allele thus contain an excess of rare neutral alleles. This study investigates the allele frequency distribution under selective sweep models using analytic approximation and simulation. First, given a single selective sweep at a fixed time, I derive an expression for the sampling probabilities of neutral mutants. This solution can be used to estimate the time of the fixation of a beneficial allele from sequence data. Next, I obtain an approximation to mean allele frequencies under recurrent selective sweeps. Under recurrent sweeps, the frequency spectrum is skewed toward rare alleles. However, the excess of high-frequency derived alleles, previously shown to be a signature of single selective sweeps, disappears with recurrent sweeps. It is shown that, using this approximation and multilocus polymorphism data, genomewide parameters of directional selection can be estimated.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Sampling probabilities of neutral mutant alleles after a single selective sweep event (n = 8), for various distances from the selective target and time of the fixation of the beneficial allele. Analytic prediction is obtained from Equation 2. The simulation result is based on 105 coalescent trees for each site. Other parameters: α = 1000, N = 105, θ = 0.01.

Figure 2.

Figure 2.

(A) Distribution of y at the end of a selective sweep. A two-locus forward simulation using the method in K

im

and S

tephan

(2000) was conducted with N = 10,000, s = 0.05, and r/s = 0.025. (B) Sampling probability for n = 8 was determined by transforming an allele frequency x, randomly chosen from distribution φ(x) = θ/x, to y + (1 − y)x with probability x or to (1 − y)x with probability 1 − x. This procedure was repeated 105 times and the resulting distribution of allele frequency was converted to the sampling probability. Columns with light shading are obtained by using a constant y that is the mean of the distribution shown in A [mean = 0.815; (4_Ns_)−r/s = 0.827]. Using variable y in A results in probabilities shown by columns with dark shading. Solid columns represent the results from coalescent simulation (α = 1000, θ = 0.01, r/s = 0.025, and τ = 0).

Figure 3.

Figure 3.

Decay of skewed frequency spectrum after a single selective sweep. The expectations of relative heterozygosity (formula image), Tajima's D (given S = 20), and normalized Fay and Wu's H are plotted along the distance from the selective target (given by r/s) for increasing values of τ.

Figure 4.

Figure 4.

Allele frequency distribution of neutral mutant alleles under recurrent selective sweeps (n = 15, α = 2000, θ = 0.01, M = 2 × 105, R = 400). Shaded and solid columns show analytic approximation and coalescent simulation, respectively. (A) Probability of sampling i mutant alleles per nucleotide (i = 1, … , n − 1). (B) Frequency spectrum (frequency distribution conditional on polymorphism at the site). Points connected by shaded lines show the frequency spectrum under the standard neutral model [formula image for i mutants per site].

Figure 4.

Figure 4.

Allele frequency distribution of neutral mutant alleles under recurrent selective sweeps (n = 15, α = 2000, θ = 0.01, M = 2 × 105, R = 400). Shaded and solid columns show analytic approximation and coalescent simulation, respectively. (A) Probability of sampling i mutant alleles per nucleotide (i = 1, … , n − 1). (B) Frequency spectrum (frequency distribution conditional on polymorphism at the site). Points connected by shaded lines show the frequency spectrum under the standard neutral model [formula image for i mutants per site].

Figure 5.

Figure 5.

Correlation between Tajima's D and relative reduction of genetic variation under recurrent selective sweeps predicted by analytic approximation. It is assumed that sequence data (n = 25) are taken from a 1-kb segment in the middle of the chromosome and θ = 0.01. The sampling probability given by Equation 4 was transformed to the expected level of variation (formula image), where formula image, and to expected Tajima's D, using formula image. Solid curve: α increases from 150 to 5000 with Λ = 4, R = 400. Dashed curves: Λ increases from 1 to 12 with α = 1000 and R = 400, 1000, and 2000. M = 2 × 105.

Figure 6.

Figure 6.

Contour plots of composite likelihood (Equation 6) calculated for four random sets of multilocus data (n = 15, L = 30, α = 1000, and 4_N_λ = 4 × 10−5). Contour lines were drawn in increments of five downward from the maximum value (included in the open area).

Figure 7.

Figure 7.

Distribution of the maximum composite-likelihood estimate of α when the true value of 4_N_λ = 4 × 10−5 is known. The true value of α is 1000. The likelihood was calculated by varying α from 100 to 5000 in increments of 100. Shaded (solid) columns show the distribution of estimates when the likelihood is based on sampling probabilities (frequency spectrum).

Similar articles

Cited by

References

    1. Andolfatto, P., 2001. Adaptive hitchhiking effects on genome variability. Curr. Opin. Genet. Dev. 11: 635–641. - PubMed
    1. Andolfatto, P., and M. Przeworski, 2001. Regions of lower crossing over harbor more rare variants in African populations of Drosophila melanogaster. Genetics 158: 657–665. - PMC - PubMed
    1. Bachtrog, D., 2004. Evidence that positive selection drives Y-chromosome degeneration in Drosophila miranda. Nat. Genet. 36: 518–522. - PubMed
    1. Barton, N. H., 1995. Linkage and the limits to natural selection. Genetics 140: 821–884. - PMC - PubMed
    1. Barton, N. H., 2000. Genetic hitchhiking. Philos. Trans. R. Soc. Lond. B 355: 1533–1562. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources