Microsatellites are molecular clocks that support accurate inferences about history - PubMed (original) (raw)

Microsatellites are molecular clocks that support accurate inferences about history

James X Sun et al. Mol Biol Evol. 2009 May.

Abstract

Microsatellite length mutations are often modeled using the generalized stepwise mutation process, which is a type of random walk. If this model is sufficiently accurate, one can estimate the coalescence time between alleles of a locus after a mathematical transformation of the allele lengths. When large-scale microsatellite genotyping first became possible, there was substantial interest in using this approach to make inferences about time and demography, but that interest has waned because it has not been possible to empirically validate the clock by comparing it with data in which the mutation process is well understood. We analyzed data from 783 microsatellite loci in human populations and 292 loci in chimpanzee populations, and compared them with up to one gigabase of aligned sequence data, where the molecular clock based upon nucleotide substitutions is believed to be reliable. We empirically demonstrate a remarkable linearity (r(2) > 0.95) between the microsatellite average square distance statistic and sequence divergence. We demonstrate that microsatellites are accurate molecular clocks for coalescent times of at least 2 million years (My). We apply this insight to confirm that the African populations San, Biaka Pygmy, and Mbuti Pygmy have the deepest coalescent times among populations in the Human Genome Diversity Project. Furthermore, we show that microsatellites support unbiased estimates of population differentiation (F(ST)) that are less subject to ascertainment bias than single nucleotide polymorphism (SNP) F(ST). These results raise the prospect of using microsatellite data sets to determine parameters of population history. When genotyped along with SNPs, microsatellite data can also be used to correct for SNP ascertainment bias.

PubMed Disclaimer

Figures

F<sc>IG</sc>. 1.—

FIG. 1.—

Microsatellite ASD is linear with sequence divergence. Horizontal axes are sequence divergences measured in substitutions per kb, which we assume is an accurate gold standard. Vertical axes are microsatellite ASD values. Crosshairs are data with standard errors for each population pair. The linear regression line is shown. For WGS humans (A), the correlation coefficient is r = 0.989 (P = 4.9e−7, 95% CI 0.946–0.998). In the left box are Yoruba versus (top to bottom): European, East Asian, and Yoruba. In the right box are Biaka Pygmy versus (top to bottom): European, Yoruba, and East Asian. For RRS humans (B), r = 0.983 (P = 5.3e−11, 95% CI 0.949–0.995). In the left box are Yoruba versus (top to bottom): European, Australian Aborigine, East Asian, and Yoruba. In the right box is Biaka Pygmy versus: European, Yoruba, and East Asian; also are San versus: Yoruba, European, and East Asian. For chimpanzees (C), r = 0.986 (P = 2.7e−4, 95% CI 0.877–0.999).

F<sc>IG</sc>. 2.—

FIG. 2.—

Inferred pairwise sequence divergences of HGDP populations. Microsatellite ASD for each pair of populations in HGDP is computed. Then using regression from figure 1_A_, we inferred the divergence of each population pair in substitutions per kb. The grayscale intensities display the range of divergences. As shown, San and Pygmy Africans are equidistant from all other populations, suggesting that they have the largest _t_MRCA to any other human population.

F<sc>IG</sc>. 3.—

FIG. 3.—

Microsatellite and SNP _F_ST are almost equivalent, with the discrepancy likely due to SNP ascertainment. Horizontal axes are the SNP _F_ST. Vertical axes are the microsatellite _F_ST. In Panel A are _F_ST computed from real HGDP data. There are (53 choose 2) = 1,378 pairwise population comparisons (data points). Circles and plus signs are data for each population pair. The linearity is clear, and the regression lines (not shown) intersect the origin. However, there are two distinct slopes for _F_ST > 0.1. In circles are 1,035 (46 non-African populations, choose 2) non-Africans versus non-Africans, with regression line slope = 0.91 and correlation coefficient 0.983 (P < 1e−10, 95% CI 0.982–0.986). In plus signs are Africans versus all populations, with regression line slope = 0.73 and correlation coefficient 0.969 (P < 1e−10, 95% CI 0.962–0.975). In Panel B are simulated data (demographic model in

supplementary fig. S2_A_

,

Supplementary Material

online) with different SNP ascertainment schemes: No ascertainment in circles, ascertaining using two samples from population A (“African”) in dots, ascertaining using two samples from population B (“European”) in crosses, and ascertaining using one sample from each population in plus signs. In Panel C are simulated data (demographic model in

supplementary fig. S2_B_

,

Supplementary Material

online) of four populations resembling Africans, Europeans, East Asians, and Native Americans. We used the European-African ascertainment scheme (see text). In circles are non-Africans versus non-Africans. In plus signs are Africans versus non-Africans. For panels B and C, enough loci were simulated such that standard errors are of negligible magnitude.

F<sc>IG</sc>. 4.—

FIG. 4.—

A unifying view of ASD and microsatellite _F_ST. The horizontal axis is interpopulation variance. The vertical axis is intrapopulation variance. Afr = Africans, NA = Native Americans, PI = Pacific Islanders, EA = East Asians, EMC = Europeans, Middle Easterners, and Central South Asians. It is shown (Materials and Methods) that microsatellite _F_ST and ASD are functions of these two variances. Lines of constant ASD are dashed lines with negative slope. Lines of constant _F_ST are dashed lines with positive slope. The data are (53 choose 2) = 1,378 pairwise HGDP population comparisons. Clearly, this picture segregates populations into distinguishable clusters. Africans versus all are above the thick black line. Non-Africans versus non-Africans are below the line. Distinguishable clusters are demarcated in ovals and squares.

Similar articles

Cited by

References

    1. Altshuler D, Pollara VJ, Cowles CR, Van Etten WJ, Baldwin J, Linton L, Lander ES. An SNP map of the human genome generated by reduced representation shotgun sequencing. Nature. 2000;407:513–516. - PubMed
    1. Amos W, Rubinstzein DC. Microsatellites are subject to directional evolution. Nat Genet. 1996;12:13–14. - PubMed
    1. Becquet C, Patterson N, Stone AC, Przeworski M, Reich D. Genetic structure of chimpanzee populations. PLoS Genet. 2007;3:e66. - PMC - PubMed
    1. Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL. High resolution of human evolutionary trees with polymorphic microsatellites. Nature. 1994;368:455–457. - PubMed
    1. Caswell JL, Mallick S, Richter DJ, Neubauer J, Schirmer C, Gnerre S, Reich D. Analysis of chimpanzee history based on genome sequence alignments. PLoS Genet. 2008;4:e1000057. - PMC - PubMed

Publication types

MeSH terms

LinkOut - more resources