Gene genealogy and variance of interpopulational nucleotide differences (original) (raw)
Related papers
Gene genealogy and variance of interpopulational nucleotide differences. Genetics
Genetics
A mathematical theory is developed for computing the probability that m genes sampled from one population (species) and n genes sampled from another are derived from 1 genes that existed at the time of population splitting. The expected time of divergence between the two most closely related genes sampled from two different populations and the time of divergence (coalescence) of all genes sampled are studied by using this theory. it is shown that the time of divergence between the two most closely related genes can be used as an approximate estimate of the time of population splitting ( T ) only when T = t l ( 2 N ) is small, where t and N are the number of generations and the effective population size, respectively. The variance of Nei and Li's estimate ( d ) of the number of net nucleotide differences between two populations is also studied. It is shown that the standard error ( s d ) of d is larger than the mean when T is small ( T << 1). In this case, sd is reduced considerably by increasing sample size. When T is large (T > I), however, a large proportion of the variance of ' Present address:
Evolution, 2000
Molecular methods as applied to the biogeography of single species (phylogeography) or multiple codistributed species (comparative phylogeography) have been productively and extensively used to elucidate common historical features in the diversification of the Earth's biota. However, only recently have methods for estimating population divergence times or their confidence limits while taking into account the critical effects of genetic polymorphism in ancestral species become available, and earlier methods for doing so are underutilized. We review models that address the crucial distinction between the gene divergence, the parameter that is typically recovered in molecular phylogeographic studies, and the population divergence, which is in most cases the parameter of interest and will almost always postdate the gene divergence. Assuming that population sizes of ancestral species are distributed similarly to those of extant species, we show that phylogeographic studies in vertebrates suggest that divergence of alleles in ancestral species can comprise from less than 10% to over 50% of the total divergence between sister species, suggesting that the problem of ancestral polymorphism in dating population divergence can be substantial. The variance in the number of substitutions (among loci for a given species or among species for a given gene) resulting from the stochastic nature of DNA change is generally smaller than the variance due to substitutions along allelic lines whose coalescence times vary due to genetic drift in the ancestral population. Whereas the former variance can be reduced by further DNA sequencing at a single locus, the latter cannot. Contrary to phylogeographic intuition, dating population divergence times when allelic lines have achieved reciprocal monophyly is in some ways more challenging than when allelic lines have not achieved monophyly, because in the former case critical data on ancestral population size provided by residual ancestral polymorphism is lost. In the former case differences in coalescence time between species pairs can in principle be explained entirely by differences in ancestral population size without resorting to explanations involving differences in divergence time. Furthermore, the confidence limits on population divergence times are severely underestimated when those for number of substitutions per site in the DNA sequences examined are used as a proxy. This uncertainty highlights the importance of multilocus data in estimating population divergence times; multilocus data can in principle distinguish differences in coalescence time (T) resulting from differences in population divergence time and differences in T due to differences in ancestral population sizes and will reduce the confidence limits on the estimates. We analyze the contribution of ancestral population size () to T and the effect of uncertainty in on estimates of population divergence () for single loci under reciprocal monophyly using a simple Bayesian extension of Takahata and Satta's and Yang's recent coalescent methods. The confidence limits on decrease when the range over which ancestral population size is assumed to be distributed decreases and when increases; they generally exclude zero when /(4N e) Ͼ 1. We also apply a maximum-likelihood method to several single and multilocus data sets. With multilocus data, the criterion for excluding ϭ 0 is roughly that l/(4N e) Ͼ 1, where l is the number of loci. Our analyses corroborate recent suggestions that increasing the number of loci is critical to decreasing the uncertainty in estimates of population divergence time.
Molecular Biology and Evolution, 2001
Genetic distances play an important role in estimating divergence time of bifurcated populations. However, they can be greatly affected by demographic processes, such as migration and population dynamics, which complicate their interpretation. For example, the widely used distance for microsatellite loci, (␦) 2 , assumes constant population size, no gene flow, and mutation-drift equilibrium. It is shown here that (␦) 2 strongly underestimates divergence time if populations are growing and/or connected by gene flow. In recent publications, the average estimate of divergence time between African and non-African populations obtained by using (␦) 2 is about 34,000 years, although archaeological data show a much earlier presence of modern humans out of Africa. I introduce a different estimator of population separation time based on microsatellite statistics, T D , that does not assume mutation-drift equilibrium, is independent of population dynamics in the absence of gene flow, and is robust to weak migration flow for growing populations. However, it requires a knowledge of the variance in the number of repeats at the beginning of population separation, V 0. One way to overcome this problem is to find minimal and maximal bounds for the variance and thus obtain the earliest and latest bounds for divergence time (this is not a confidence interval, and it simply reflects an uncertainty about the value of V 0 in an ancestral population). Another way to avoid the uncertainty is to choose from among present populations a reference whose variation is presumably close to what it might have been in an ancestral population. A different approach for using T D is to estimate the time difference between adjacent nodes on a phylogenetic population tree. Using data on variation at autosomal short tandem repeat loci with di-, tri-, and tetranucleotide repeats in worldwide populations, T D gives an estimate of 57,000 years for the separation of the out-of-Africa branch of modern humans from Africans based on the value of V 0 in the Southern American Indian populations; the earliest bound for this event has been estimated to be about 135,000 years. The data also suggest that the Asian and European populations diverged from each other about 20,000 years, after the occurrence of the out-of-Africa branch.
… of the Royal …, 2000
In this paper, we derive the expectation of two popular genetic distances under a model of pure population ¢ssion allowing for unequal population sizes. Under the model, we show that conventional genetic distances are not proportional to the divergence time and generally overestimate it due to unequal genetic drift and to a bottleneck e¡ect at the divergence time. This bias cannot be totally removed even if the present population sizes are known. Instead, we present a method to estimate the divergence times between populations which is based on the average number of nucleotide di¡erences within and between populations. The method simultaneously estimates the divergence time, the ancestral population size and the relative sizes of the derived populations. A simulation study revealed that this method is essentially unbiased and that it leads to better estimates than traditional approaches for a very wide range of parameter values. Simulations also indicated that moderate population growth after divergence has little e¡ect on the estimates of all three estimated parameters. An application of our method to a comparison of humans and chimpanzee mitochondrial DNA diversity revealed that common chimpanzees have a signi¢cantly larger female population size than humans.
Mathematical biosciences, 2014
The main purpose of this paper is to develop a theoretical framework for assessing effective population size and genetic divergence in situations with structured populations that consist of various numbers of more or less interconnected subpopulations. We introduce a general infinite allele model for a diploid, monoecious and subdivided population, with subpopulation sizes varying over time, including local subpopulation extinction and recolonization, bottlenecks, cyclic census size changes or exponential growth. Exact matrix analytic formulas are derived for recursions of predicted (expected) gene identities and gene diversities, identity by descent and coalescence probabilities, and standardized variances of allele frequency change. This enables us to compute and put into a general framework a number of different types of genetically effective population sizes (Ne) including variance, inbreeding, nucleotide diversity, and eigenvalue effective size. General expressions for predicti...
Estimating Divergence Times from Molecular Data on Phylogenetic and Population Genetic Timescales
Annual Review of Ecology and Systematics, 2002
▪ Molecular clocks have profoundly influenced modern views on the timing of important events in evolutionary history. We review recent advances in estimating divergence times from molecular data, emphasizing the continuum between processes at the phylogenetic and population genetic scales. On the phylogenetic scale, we address the complexities of DNA sequence evolution as they relate to estimating divergences, focusing on models of nucleotide substitution and problems associated with among-site and among-lineage rate variation. On the population genetic scale, we review advances in the incorporation of ancestral population processes into the estimation of divergence times between recently separated species. Throughout the review we emphasize new statistical methods and the importance of model testing during the process of divergence time estimation.
Genetics, 2003
The effective population sizes of ancestral as well as modern species are important parameters in models of population genetics and human evolution. The commonly used method for estimating ancestral population sizes, based on counting mismatches between the species tree and the inferred gene trees, is highly biased as it ignores uncertainties in gene tree reconstruction. In this article, we develop a Bayes method for simultaneous estimation of the species divergence times and current and ancestral population sizes. The method uses DNA sequence data from multiple loci and extracts information about conflicts among gene tree topologies and coalescent times to estimate ancestral population sizes. The topology of the species tree is assumed known. A Markov chain Monte Carlo algorithm is implemented to integrate over uncertain gene trees and branch lengths (or coalescence times) at each locus as well as species divergence times. The method can handle any species tree and allows different numbers of sequences at different loci. We apply the method to published noncoding DNA sequences from the human and the great apes. There are strong correlations between posterior estimates of speciation times and ancestral population sizes. With the use of an informative prior for the human-chimpanzee divergence date, the population size of the common ancestor of the two species is estimated to be ,000,02ف with a 95% credibility interval (8000, 40,000). Our estimates, however, are affected by model assumptions as well as data quality. We suggest that reliable estimates have yet to await more data and more realistic models.