Performance of a Divergence Time Estimation Method under a Probabilistic Model of Rate Evolution (original) (raw)

Abstract

Rates of molecular evolution vary over time and, hence, among lineages. In contrast, widely used methods for estimating divergence times from molecular sequence data assume constancy of rates. Therefore, methods for estimation of divergence times that incorporate rate variation are attractive. Improvements on a previously proposed Bayesian technique for divergence time estimation are described. New parameterization more effectively captures the phylogenetic structure of rate evolution on a tree. Fossil information and other evidence can now be included in Bayesian analyses in the form of constraints on divergence times. Simulation results demonstrate that the accuracy of divergence time estimation is substantially enhanced when constraints are included.

Introduction

The incompleteness of the fossil record has made DNA and protein sequences the main source of information about some important evolutionary events. This information is typically extracted by assuming that DNA and protein sequences change at a rate that is constant over time. The idea is that fossil evidence from one evolutionary lineage may allow the rate of evolution to be calibrated. After estimation of the rate that is assumed to be common among all lineages, the observed amount of sequence divergence can be translated into time, because amounts of sequence divergence are a product of evolutionary rate and time.

The rate at which sequences change depends on a large number of factors (e.g., natural selection, population size, generation time, mutation rate, and mutation pattern). It is inconceivable that these factors are identical for all evolutionary lineages. In fact, it may be that no two evolutionary lineages share the same rate of molecular evolution.

However, closely related evolutionary lineages typically evolve at similar rates. This is because many factors that affect evolutionary rates are biological, and closely related lineages are biologically similar. Differences in rates between distantly related lineages are more likely to be large because there has been more time for divergence of the factors affecting rates.

Widely used methods for estimating divergence times assume that all evolutionary lineages experience sequence change at identical rates. Recognition of the potential problems with assuming “globally” constant rates on a phylogeny has led to the development of “local” divergence time estimation techniques that allow different portions of a phylogeny to evolve at different rates but force all branches within a particular portion of the phylogeny to evolve at the same rate (e.g., Hasegawa, Kishino, and Yano 1989 ; Rambaut and Bromham 1998 ; Yoder and Yang 2000 ).

Sanderson (1997) proposed a “nonparametric” method for estimating divergence times that assumes neither globally nor locally constant rates of molecular evolution. Instead, the Sanderson method has the attractive property of allowing the rate to evolve over time. The pioneering work of Sanderson was followed with Bayesian estimates of divergence times that were based on models of evolution of the rate of molecular evolution (Thorne, Kishino, and Painter 1998 ; Huelsenbeck, Larget, and Swofford 2000 ; Korber et al. 2000 ).

In this study, we extend the Bayesian techniques for estimating divergence times, explore their behavior via simulation, and investigate the robustness of these techniques to violations of their assumptions. In addition, we assess the impact on divergence time estimation of constraints that can incorporate fossil information into the procedure.

Method

Model of Rate Evolution

Our model approximates a process in which rates change over time in a continuous fashion. By allowing autocorrelation of rates over evolutionary time, the model and its subsequent improvements have the potential to provide an improved statistical fit to sequence data sets. Our approach can be viewed as a careful model of the correlation of rates among nodes and an approximation of the rate trajectory along branches. On any particular branch of the tree, the rate of molecular evolution will change over time, but we make no attempt to explicitly model this. Instead, we allow different branches to have different rates but use a single rate for each branch to approximate the trajectory of rates on that branch.

For simplicity, the rate on a branch is assumed to be the average of the rates at the nodes that begin and end it. This assumption makes it technically correct to interpret our model as having the rate along a branch change in a linear fashion from the node that begins it to the node that ends it. We dislike this interpretation. We view the mean of the rates at the two nodes simply as a reasonable approximation of the average rate on a branch.

Model of Evolution of the Rate of Evolution

Given the logarithm of the rate at a node that begins a branch, we assume that the logarithm of the rate at the node ending the branch has a normal distribution. In one of our earlier implementations, the mean of the normal distribution for the logarithm of the ending rate was set equal to the logarithm of the beginning rate. For many data sets, this arrangement works well. However, when the expectation of the logarithm of the ending rate is equal to the logarithm of the beginning rate, the expectation of the ending rate is actually greater than the beginning rate. This effect on the expected ending rate may be undesirable because it predicts a tendency for rates to be higher at the tips of a tree than at the root. Our current implementation sets the mean of the normal distribution for the logarithm of the ending rate by forcing the expected ending rate to be equal to the beginning rate. The variance of the normal distribution for the logarithm of the ending rate is set equal to the time duration of the branch multiplied by a parameter ν. A large value of ν means that there is little rate autocorrelation, and a low value implies strong rate autocorrelation. The case of ν = 0 is equivalent to the molecular-clock assumption of constant evolutionary rates. Roughly, the evolutionary rate varies with the coefficient of variation √ν_t_ in t time units, when t is small.

In earlier work (Thorne, Kishino, and Painter 1998 ), rates were assigned to branches of a rooted tree rather than to nodes of the tree. The variance of the logarithm of the rate on a daughter branch given the logarithm of the rate on the parental branch was the product of ν and the time separating the midpoints of the parental and daughter branches. A flaw of this earlier parameterization is that the correlation of rates on two sister branches should not depend on the time duration of their parental branch. In the current parameterization, the correlation of rates on two sister branches is independent of the time duration of the parental branch.

Because the hyperparameter ν can have a strong influence on an analysis, we add a gamma-distributed prior for ν as another level to our hierarchical model. The additional level allows flexibility for the Bayesian analysis by incorporating uncertainty regarding ν. We also specify the prior distribution for the rate of molecular evolution at the root node with a gamma distribution. This prior distribution is assumed to be independent of ν and the internal node times on the tree.

The phrase “branch length” is employed to represent an expected amount of sequence change on a branch rather than to represent the time duration of the branch. It is prior information that combines with the branch length information to provide posterior information about node rates and divergence times. On a rooted bifurcating tree topology that relates n taxa, there are n tip nodes and n − 1 internal nodes. Because such a topology has one more node than branches, a unique set of 2_n_ − 1 node rates cannot be identified from the node times and the 2_n_ − 2 branch lengths. We add a restriction to the node rates so that they can be uniquely identified from the node times and the branch lengths. This is done by selecting one of the branches that emanate from the root and forcing the ending rate of this branch to be equal to the rate at its beginning. For all other branches, our model of rate evolution determines the relationship between the beginning and ending rates.

Our reasons for constraining one branch to have identical beginning and ending rates stem from the Metropolis-Hastings algorithm (Metropolis et al. 1953 ; Hastings 1970 ), which we adopt to sample from our posterior distributions of interest. Presence of the constraint greatly improves the ability of the Metropolis-Hastings algorithm to approximate the posterior distribution because the constraint increases the “mixing rate” of the Markov chain on which the Metropolis-Hastings algorithm relies.

Prior for Divergence Times

In earlier work (Thorne, Kishino, and Painter 1998 ), the prior distribution for divergence times was a Yule process (e.g., Karlin and Taylor 1975 , pp. 119–123). The Yule process can be seen as modeling origination of evolutionary lineages due to speciation. A more complicated stochastic process that reflects extinction and taxonomic sampling as well as speciation could also be incorporated (e.g., see the phylogenetic reconstruction method of Yang and Rannala [1997] ). From the standpoint of biologically interpreting the parameters of a prior for divergence times, explicit consideration of speciation and extinction is attractive. However, the processes of speciation and extinction are poorly understood. Further work is needed to characterize the suitability of models for these processes as priors for divergence times.

Our prior, a generalization of the Dirichlet distribution to rooted tree structures, is not intended as a detailed model of the speciation and extinction processes. This prior was selected because it is statistically simple and, more importantly, provides flexibility. As opposed to dictating the divergence time estimates, the prior allows the posterior to be responsive to the data.

Let Ti represent the time of node i. We assume all tip sequences are isolated at time 0. In figure 1 , this means Ti = 0 for i ∈ {0, … , 8}. Times of internal nodes on the rooted tree are measured in terms of time units before the existence of the tips.

For the time from the present to the root, we use a gamma-distributed prior. Given the root time, the prior for times of other internal nodes on a tree can be described with a recursive procedure. This recursive procedure consists of specifying the prior distribution of divergence times for all internal nodes along a path. Successive paths on the rooted tree are selected until all internal nodes on the tree have been part of a path. Each path begins at an internal node with an already-defined prior. Each path ends at a tip node. After an internal node has been included within a path, it is allowed to begin subsequent paths, but it is not otherwise allowed to be included within subsequent paths. The first selected path will begin at the root node. Subsequent selected paths may begin at any node (including the root) that has already been included within a selected path.

The procedure by which paths are selected depends on the number of branches that are traversed by the path. Specifically, the path that traverses the most branches is the one that is selected among all candidate paths. In cases in which multiple paths are equally optimal, our prior for divergence times is designed so that the prior will not depend on which path is arbitrarily selected (see below).

The time duration represented by each selected path is the sum of the time durations of the branches traversed by the path. If the time of the internal node that begins a path is known, then the times of other internal nodes traversed by the path can be determined from the set of proportions of the time durations of each of the branches on the path relative to the total time duration of the path.

Imagine that a selected path includes k branches. The proportion of time represented by branch i relative to the total time represented by the path will be denoted π_i_, where i is the _i_th branch on the path from the initial internal node to the tip at which the path ends. The joint density of these proportions is assumed to follow a Dirichlet distribution. The probability density of a Dirichlet distribution is

formula

where 0 < π_i_ < 1 and α_i_ > 0 for i ∈ {1, … , k}. Also,

formula

In the present case, we force α1 = α2 = … = α_k_ and denote the common value by α. With this restriction, the prior distribution of divergence times is not affected by the order of path selection in cases in which multiple paths are equally optimal under our selection criteria. To date, we have only performed careful analyses for α = 1. Values of α that are <1 might also be of interest because they could induce divergence time priors that would be more characteristic of cases in which adaptive radiations are expected. An equivalent prior distribution for divergence times can be obtained by consecutive samples from beta distributions, but we omit that description here.

As an example, consider the rooted tree in figure 1 . To sample from the divergence time prior for this case, the first step would be to generate a random value from the gamma distribution for _T_16, the time of the root. The path from the root node to node 0 traverses five branches. No other path from the root to a tip includes more branches. Therefore, the path from the root to node 0 could be selected first. This path can be divided into five time intervals because

formula

We can define five proportions from these time intervals: π1 = (_T_16 − _T_14)/_T_16, π2 = (_T_14 − _T_11)/_T_16, π3 = (_T_11 − _T_10)/_T_16, π4 = (_T_10 − _T_9)/_T_16, and π5 = (_T_9 − _T_0)/_T_16. These proportions can be jointly sampled from a Dirichlet distribution. Together with _T_16, these proportions determine _T_14, _T_11, _T_10, and _T_9. The next selected path can be from the root to node 6. Given _T_16, the three proportions, (_T_16 − _T_15)/_T_16, (_T_15 − _T_13)/_T_16, and (_T_13 − _T_6)/_T_16, can be jointly sampled from a Dirichlet distribution to determine _T_15 and _T_13. Because no selected path has yet included node 12, the final path can be from node 14 to node 4. Given _T_14, the proportions (_T_14 − _T_12)/_T_14 and (_T_12 − _T_4)/_T_14 can be jointly sampled to determine _T_12.

Posterior Distribution

In the Bayesian framework, the goal is to study the posterior distribution of the parameters. The data set of aligned sequences will be denoted by X, and we assume that these sequences are related via a known rooted tree topology. Let R be a vector representing node rates, and let T be a vector denoting the times of the internal nodes. The posterior distribution for R, T, and the rate variation parameter ν is

formula

Because the branch lengths B are completely determined by the times T and the rates R in our model,

formula

As described previously (Thorne, Kishino, and Painter 1998 ), we approximate p(X | B) with a multivariate normal distribution that is based on maximum-likelihood estimates of the branch lengths. An outgroup sequence is needed solely to estimate the lengths of the branches that emanate from the ingroup root node. Then, Markov chain Monte Carlo (MCMC) is employed to permit approximation of this posterior distribution.

The details of our MCMC implementation are somewhat tedious, and the general procedure has already been described (Thorne, Kishino, and Painter 1998 ). Therefore, we omit many details here. The basic idea is to have each state of a Markov chain represent specific values of our parameters T, R, and ν. In our implementation, the chain is assigned a randomly selected initial state. Thereafter, new states are randomly proposed and then randomly accepted or rejected. If a proposed state is accepted, it becomes the new state of the Markov chain. If the proposed state is rejected, the Markov chain remains at its current state. The random proposal and acceptance or rejection of proposed states is repeated many times. If the probability of accepting a proposed state is appropriately determined (see Hastings 1970 ), then the stationary distribution of the resulting Markov chain will be the desired posterior distribution.

In our implementation, we cycle through different types of random proposal steps. One type produces a proposed state that is identical to the current state of the Markov chain except that the time of one internal node differs between the two states. Another proposal type is similar, except that it is the rate at a node that differs between the two states. Likewise, in a third proposal type, the two states are identical except for the value of ν. A fourth type generates a proposed state with branch lengths that are identical to the current state by multiplying all node rates of the current state by some randomly selected factor and then dividing all node dates by the same factor. The fifth and sixth proposal types are identical to the fourth except that the multiplication of rates is not done by the fifth type and the division of dates is not performed by the sixth type. Finally, each of these latter three proposal types is also used in a modified version that performs multiplication and/or division operations on only the times and/or the rates of nodes that are descendants of some randomly selected node on the tree.

Constraints Imposed by the Fossil Record

Analysis of an earlier model of rate evolution (see Thorne, Kishino, and Painter 1998 ) yielded successful estimation of the proportion of time since an internal node of interest relative to the time since the root of the tree, but absolute times of divergence were not as well estimated. Even in the absence of the constant-rate assumption of a perfect molecular clock, constraints can assist in calibration of rates of molecular evolution. Fossil or other evidence may indicate that a certain internal node exceeds some age. Also, external evidence may indicate that a certain node must be younger than some age. Constraints are the basis of the not-very-parametric divergence time estimation technique of Sanderson (1997) . We have adopted his idea by incorporating constraints into our method. Let C represent a set of constraints on node times. Our MCMC method is designed to approximate the posterior distribution p(T, R, ν | X, C). In the presence of constraints, equation (2) is modified so that

formula

We note that the prior for divergence times is affected by the constraints. In other words, p(T | C) and p(T) will differ. In our implementation, the prior for the root time in the absence of constraints is a gamma distribution. The inclusion of constraints within a tree means that the time since the root no longer has a gamma distribution. It is important that the prior for divergence times is biologically plausible. Because the effects of constraints on the prior for node times are not easy to calculate explicitly, our practice for analysis of actual data sets has been to specify the mean and variance of the gamma distribution p(T) that would describe the time since the root if there were no constraints on node times. We then assess the effects of the constraints by using MCMC to approximate p(T | C), the prior distribution of divergence times in the presence of the constraints. If the resulting approximation of the prior is biologically implausible, we adjust the gamma distribution until a sufficiently plausible divergence time prior results. This procedure of adjusting the prior until it is biologically plausible is inelegant and indirect. The notion of biological plausibility assumes that the researcher understands how the constraints affect the ranges of plausible divergence times but has not yet refined this prior understanding by examining the sequence data. An alternative, although psychologically somewhat impractical approach, would be to ask the investigator to describe what the prior p(T) would be if the fossil or other information that gave rise to the constraints had never been collected. Further work is needed to achieve more direct prior specification. With constraints, we believe that there is potential for getting good estimates of absolute times of internal nodes. In the future, additional stratigraphic information could also potentially be incorporated into analyses (e.g., Huelsenbeck and Rannala 1997 ).

Simulation Studies

In many ways, the simulations presented here represent best-case scenarios. These scenarios were selected because they illustrate how posterior distributions for divergence times can be impacted by constraints on node times and by prior distributions for the amount of rate fluctuation per unit time. One hundred data sets were simulated and analyzed for each set of conditions that were explored. In each of the next several cases, an outgroup and 16 ingroup sequences were simulated according to the tree topology and node times depicted in figure 2 . The outgroup is shown as the rightmost tip in figure 2 . The root of the entire tree represents time 0.5625 since the root, and the root of the ingroup represents time 0.5. Every simulation began by randomly sampling the rate of evolution at the ingroup root node from a gamma distribution with mean 1 and standard deviation 0.5. Although our MCMC algorithm assumes that one of the two branches emanating from the ingroup node has the same rate at the beginning and end of the branch, this assumption was not made when simulating rates at nodes on the tree. Otherwise, rates were evolved according to our model for rate evolution. In addition, the rate at the tip of the outgroup node was randomly sampled based on the rate at the ingroup root node and our model. The simulated rates, together with the node times shown in figure 2 , determine the branch lengths of the tree. The resulting branch lengths allowed simulated evolution of DNA sequences along the tree. All sequences were of length 1,000 and were generated according to the Jukes-Cantor model (Jukes and Cantor 1969 ).

In all MCMC analyses, the Markov chain was initialized with a randomly selected state. Next, 10,000 proposal cycles were used to “burn in” the chain. Thereafter, the state of the chain was sampled after every 10 cycles until 10,000 samples had been collected. The length of the burn-in period, the number of cycles between samples, and the number of samples were somewhat arbitrary choices that were made because diagnostics applied to a few MCMC analyses of simulated data indicated that these choices provided sufficient mixing of the chain.

In all MCMC analyses, the prior for the rate at the ingroup root node was the same as the distribution from which the rate at the ingroup root node was randomly sampled (i.e., a gamma distribution with a mean of 1 and a standard deviation of 0.5.). In all MCMC analyses performed without constraints on node times, the prior for the ingroup root time was a gamma distribution with a mean equal to the actual ingroup root time of 0.5 and with a standard deviation of 0.25. In all analyses performed in the presence of constraints on node times, the prior for the ingroup root time was the distribution that would result if a gamma distribution with a mean of 0.5 and a standard deviation of 0.25 were forced to conform to the constraints.

Simulation Studies Without Constraints on Node Times

In the first simulation, the difficulty of estimating divergence times in the absence of rate calibration is illustrated. In this case, data were simulated according to a perfect molecular clock. Therefore, rates at all nodes on the tree were identical to the randomly sampled rate at the ingroup root node. To make the situation even more favorable, the MCMC algorithm was forced to assume that ν = 0. Figure 3_A_ represents the prior distribution that was approximated by 100 MCMC runs. The thin lines depict the topology and true node times of the ingroup. There is a short horizontal line centered above or below each internal node. This line shows the average of the 100 different estimates of the posterior mean. With each of the 100 MCMC runs, a 95% prior probability interval for each internal node time was calculated. The average of the upper bounds and the average of the lower bounds for these prior intervals were calculated. These averages determine the limits of the vertical lines that extend above and below each of the averages of the estimated posterior means. Near each internal node on the tree is the number of times out of the 100 simulations in which the true node time was contained within the 95% prior interval. Because figure 3_A_ represents priors rather than posteriors, the same distribution is being approximated in each of the 100 runs of the MCMC routine. Therefore, these numbers should be close to 0 or close to 100 if the approximation is succeeding. Departures from 0 or 100 would be consistent with failure of the Markov chain to converge to the prior distribution in some fraction of the runs. In figure 3_A,_ the average of the prior means is close to the true node times. This is because the simulations were being performed under a best-case scenario. The width of the prior intervals demonstrates that the prior for divergence times was relatively diffuse.

Figure 3_B_ was produced in the same fashion as figure 3_A_ except that figure 3_B_ is based on approximations to posterior distributions of divergence times. Because it is based on posterior distributions and each posterior distribution was approximated from a different simulated data set, the number of times that the true node time is captured within the 95% credibility interval in figure 3_B_ need not be 0 or 100 if the Markov chains are converging. The fact that these numbers are close to 100 is probably attributable to the fact that the true divergence times were equal to the expected values for the prior distributions. Although the credibility intervals generally capture the true node times, the credibility intervals are much wider than is desirable.

Figure 3_C_ is based on the same 100 analyses as figure 3_B_ . In figure 3_C,_ the posterior distributions depict the proportion of time from the tips to each internal node relative to the time from the tips to the root node. Figure 3_C_ shows that proportional times can be well estimated. We believe that the difference between the narrow credibility intervals in figure 3_C_ and the wide intervals in figure 3_B_ illustrates the need for fossil or other external information to assist in the calibration of evolutionary rates.

Simulation Studies with Constraints on Node Times

Analyses depicted in figure 4 were performed under the constraint that one specific node must exceed 0.2 time units in age and must be no older than 0.3 time units in age. These constraints are depicted with horizontal lines that are centered above the constrained node. Figure 4_A_ was produced under the same situation as figure 3_A_ except that the prior distributions of divergence times were forced to be consistent with the two constraints. Figure 4_A_ shows that the prior distributions for divergence times are again diffuse, but nodes that are “close” to the constraints have noticeably more narrow 95% prior intervals.

Figure 4_B_ was produced under the same conditions as figure 3_B_ except that the two constraints were obeyed. The posterior distributions indicate that the molecular clock assumption can produce good estimates of divergence times in the presence of constraints when a molecular clock is true and the analysis assumes a molecular clock.

In figure 4_C_ , robustness to the prior for rate variation per unit time is explored. Data sets were again simulated under a molecular clock. Instead of forcing the MCMC procedure to assume a perfect molecular clock, the prior for the rate variation parameter ν was a gamma distribution with a mean of 2 and a standard deviation of 2. The credibility intervals for this case are slightly wider than those in figure 4_B_ but are far more narrow than those of the prior (see fig. 4_A_ ).

Figure 4_D_ shows the case in which data sets were simulated without a molecular clock but were analyzed by assuming a molecular clock. To simulate data, the value of ν was sampled from a gamma distribution with a mean of 2 and a standard deviation of 2. Divergence date estimates appear to be poor when a clock is assumed but rates are actually highly variable.

Data sets summarized in figure 4_E_ were generated by sampling the value of ν from a gamma distribution with a mean of 2 and a standard deviation of 2. In this scenario, the prior for ν was a gamma distribution with a mean of 2 and a standard deviation of 2. Although the credibility intervals are wide, they are more narrow than those of the prior (fig. 4_A_ ), and the true divergence times are captured more often than in the scenario depicted in figure 4_D_ .

Robustness to the Prior

We emphasize that, except for differences between the true value of ν and the prior, these are all best-case scenarios. Actual performance of divergence time estimation is likely to be substantially worse. For example, the assumed model of nucleotide substitution may differ from the actual model of nucleotide substitution. The impact of this difference on divergence time estimation is difficult to predict.

To show the impact of substantial deviations between the prior for divergence times and the actual times, posterior distributions for additional simulations are summarized in figure 5 . Except for the actual divergence times and the depicted constraints, all features of these scenarios are identical to the case of figure 4_E_ . Figure 5_A_ shows the prior distributions of divergence times for a situation in which much of the lineage splitting occurred near the tips of the trees. Figure 5_B_ summarizes the posterior distributions for this case. Likewise, figure 5_C_ and D, respectively, represent priors and posteriors for a situation in which much of the lineage splitting occurs near the root. Notice that the posteriors tend to be closer to the true values than do the priors. In figure 5_C,_ the frequencies with which certain nodes are captured within the 95% prior interval are intermediate. Inspection of the MCMC output for these cases indicates that these intermediate frequencies are explained by slight fluctuations of the estimated interval endpoints among the 100 simulations.

The effect of the prior for ν on our posterior distributions is of particular concern because we have little information regarding what a biologically reasonable value of ν might be. In the simulations summarized in figure 4_C,_ the mean of the prior for ν was 2, whereas the true value of ν was 0 (i.e., a perfect molecular clock). The average of the 100 estimated posterior means for ν with this scenario was 0.18. In one of the 100 cases, the rate of molecular evolution at the root node and other nodes happened to be so high that branch lengths were large and poorly estimated. Without this one extreme data set, the average of the remaining 99 posterior means for ν was 0.11, and the standard deviation among the means was 0.07. This indicates that, at least for this simulation scenario, the data do substantially influence the posterior for ν. We note that the rate variation parameter ν will be estimated with more precision for data sets with higher numbers of taxa. Based on the relatively accurate estimates of divergence times seen in figure 4_C_ and similar simulation experiments (data not shown), our tentative conclusion is that our procedure will work well when ν has a diffuse prior. Much more extensive simulation will need to be done before this conclusion is more than tentative.

Figure 6 shows a realization of branch lengths that were randomly generated with our model of rate evolution and the node times and topology depicted in figure 2 . To generate these branch lengths, ν was set at exactly 2 and the rate of molecular evolution at the root node was set to 1. Figure 6 is intended to illustrate the degree to which a typical set of branch lengths might deviate from being clocklike for the cases in which simulations allowed rate variation over time. Figure 6 depicts “true” branch lengths, rather than estimated branch lengths. Estimation error is likely to make branch lengths appear even less clocklike than they actually are.

To investigate error that might be attributable to the MCMC approximation procedure, a sequence data set was randomly generated according to the branch lengths depicted in figure 6 . Branch lengths were estimated from these simulated sequence data, and the branch length estimates served as the basis for 100 different MCMC analyses. The MCMC analyses each assumed the prior distributions implemented for the scenarios depicted in figure 4_E_ . In other words, the means of the priors are the values that generated the branch lengths of figure 6 . Each MCMC analysis was initiated at a different random starting point. Because the 100 analyses were performed on the same data, the approximate posteriors should be very similar to one another, and this was found to be the case. For example, the true value of ν for this case was 2. The 100 posterior means of ν had an average value of 1.781, and the standard deviation among these 100 posterior means was 0.026. The lower and upper values of the 95% credibility intervals for ν averaged 0.696 and 3.857 and had respective standard deviations of 0.013 and 0.117. The true value of the root time for this case was 0.5. The average of the 100 posterior means for this time was 0.4609, and the standard deviation among these posterior means was 0.0016. The lower and upper values of the 95% credibility intervals for the root time averaged 0.3398 and 0.6149 and had respective standard deviations of 0.0016 and 0.0035.

Discussion

Huelsenbeck-Larget-Swofford Model for Rate Evolution

There are at least two notable ways in which the Huelsenbeck-Larget-Swofford (HLS; Huelsenbeck, Larget, and Swofford 2000 ) model for rate evolution differs from our own. In the HLS model, rates change in discrete jumps. Our model approximates a continuous process of rate change. Some factors that affect rates might be better modeled with a continuous process, and others with a discrete process. We believe that both approaches warrant further exploration. Hybrids between the continuous and discrete models should also be considered. However, it may be difficult to statistically discriminate between discrete and continuous models solely on the basis of sequence data.

Another notable way in which the HLS approach differs from our own is that the HLS treatment of rate evolution is not approximate, whereas ours is. In our approach, the branch lengths are determined by the node rates and times. In the HLS approach, the branch lengths are not completely determined by the node rates and times. For example, under the HLS model of discrete rate changes, a branch might experience exactly one rate-changing event. With only information on the rate at the node that ends the branch and the rate at the node that begins the branch, we have no information about whether the rate-changing event occurred closer to the end or the beginning of the branch. Because the branch length is affected by the exact time of the rate-changing event with the HLS procedure, the node rates and the node times are not by themselves sufficient to determine the branch lengths.

Our model could be generalized by assigning a probability density p(B | R, T) to the branch lengths for given node rates and times. This differs from our current approach in that branch lengths would not be deterministically specified from the node rates and times. In fact, this sort of generalization of equation (3) could have both the HLS model and our own as special cases. One form of p(B | R, T) would be determined by the HLS Poisson model. However, there is no reason to limit the form of p(B | R, T) to the HLS Poisson model. The probability density p(B | R, T) could be selected so as to reflect any desired combination of discrete and continuous rate change. Recent research (Bickel 2000 ) suggests that other stochastic models for rate evolution should also be explored.

Intraspecific and Interspecific Data

Our dating method was designed with interspecific data sets in mind. For example, this method has recently been employed to address the timescale of eutherian evolution (Cao et al. 2000 ). With interspecific data, it may be reasonable to assume that the rate of evolution of two lineages changes independently following their last common ancestral node. In contrast, this independence assumption is less plausible for intraspecific data. Rates of evolution in a large population may be slow because natural selection can overwhelm genetic drift, whereas rates of evolution in a small population may be large because genetic drift can overwhelm selection. This means that a sudden drop in population size can simultaneously increase the rate of evolution of all lineages within a population.

Conclusions

Bayesian methods have their own advantages and pitfalls. There is reason to be optimistic about Bayesian approaches for estimating divergence dates, but their strengths and weaknesses are poorly characterized. A primary concern is how sensitive the posteriors of parameters are to the priors. As shown here, we are beginning to study the effect of priors on the posterior. The robustness of Bayesian approaches for divergence time estimation needs to be better characterized. Fortunately, robustness can be evaluated both by simulation and by consistency of divergence time estimates with information external to an analysis. Assisted by robustness studies, we expect methods for divergence time estimation to substantially improve in the near future.

Edward Holmes, Reviewing Editor

Abbreviations: HLS, Huelsenbeck/Larget/Swofford; MCMC, Markov chain Monte Carlo.

Keywords: molecular clock phylogeny Markov chain Monte Carlo Metropolis-Hastings algorithm

Address for correspondence and reprints: Hirohisa Kishino, Laboratory of Biometrics, Graduate School of Agriculture and Life Sciences, University of Tokyo, Yayoi 1-1-1, Bunkyo-ku, Tokyo 113-8657, Japan. E-mail: [email protected].

Fig. 1.—Example of a rooted tree topology.

Fig. 2.—A rooted tree topology that was used in simulation studies

Fig. 3.—Performance of divergence time estimation in the absence of constraints on node times. A, Approximate prior distributions of divergence times when data are simulated according to a molecular clock and a molecular clock is assumed when estimating divergence times. B, Approximate posterior distributions of divergence times when data are simulated according to a molecular clock and a molecular clock is assumed when estimating divergence times. C, Approximate posterior distributions of the proportion of time from tips to internal nodes relative to the time from tips to the root when data are simulated according to a molecular clock and a molecular clock is assumed when estimating divergence times

Fig. 4.—Performance of divergence time estimation in the presence of constraints on node times. A, Approximate prior distributions of divergence times when data are simulated according to a molecular clock and a molecular clock is assumed when estimating divergence times. B, Approximate posterior distributions of divergence times when data are simulated according to a molecular clock and a molecular clock is assumed when estimating divergence times. C, Approximate posterior distributions of divergence times when data are simulated according to a molecular clock but a molecular clock is not assumed when estimating divergence times. D, Approximate posterior distributions of divergence times when data are not simulated according to a molecular clock but a molecular clock is assumed when estimating divergence times. E, Approximate posterior distributions of divergence times when data are not simulated according to a molecular clock and a molecular clock is not assumed when estimating divergence times

Fig. 5.—Performance of divergence time estimation when actual node times are uncharacteristic of their prior distributions. A, Approximate prior distributions when branches near the tips of the rooted tree have short time durations. B, Approximate posterior distributions when branches near the tips of the rooted tree have short time durations. C, Approximate prior distributions when branches near the root of the rooted tree have short time durations. D, Approximate posterior distributions when branches near the root of the rooted tree have short time durations

Fig. 6.—An example of branch lengths that result when ν = 2, the rate at the ingroup root node is 1, and evolution occurs along the rooted evolutionary tree of figure 2 . This tree was drawn with version 3.54c of the PHYLIP software package (Felsenstein 1989 )

Kent Holsinger and two anonymous reviewers made helpful suggestions. H.K. was supported by grants 12554037 and BSAR-497 from the Japan Society for the Promotion of Science. J.L.T. was supported by NIH grant GM45344 and NSF grant INT-990934. Software written in the C language that implements these techniques can be obtained from J.L.T. ([email protected]).

literature cited

Bickel, D. R.

2000

. Implications of fluctuations in substitution rates: impact on the uncertainty of branch lengths and relative-rate tests.

J. Mol. Evol.

381

–390.

Cao, Y., M. Fujiwara, M. Nikaido, N. Okada, and M. Hasegawa.

2000

. Interordinal relationships and the timescale of eutherian evolution as inferred from mitochondrial genome data. Gene (in press).

Felsenstein, J.

1989

. PHYLIP—phylogeny inference package (Version 3.2). Cladistics 5:164–166.

Hasegawa, M., H. Kishino, and T. Yano.

1989

. Estimation of branching dates among primates by molecular clocks of nuclear DNA which slowed down in Hominoidea.

J. Hum. Evol.

461

–476.

Hastings, W. K.

1970

. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109.

Huelsenbeck, J. P., B. Larget, and D. L. Swofford.

2000

. A compound Poisson process for relaxing the molecular clock. Genetics 154:1879–1892.

Huelsenbeck, J. P., and B. Rannala.

1997

. Maximum likelihood estimation of phylogeny using stratigraphic data. Paleobiology 23:174–180.

Jukes, T. H., and C. R. Cantor.

1969

. Evolution of protein molecules. Pp. 21–32 in H. N. Munro, ed. Mammalian protein metabolism. Academic Press, New York.

Karlin, S., and H. M. Taylor.

1975

. A first course in stochastic processes. 2nd edition. Academic Press, San Diego.

Korber, B., M. Muldoon, J. Theiler, F. Gao, R. Gupta, A. Lapedes, B. H. Hahn, S. Wolinsky, and T. Bhattacharya.

2000

. Timing the ancestor of the HIV-1 pandemic strains. Science 288:1789–1796.

Metropolis, N., A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller.

1953

. Equations of state calculations by fast computing machines.

J. Chem. Phys.

1087

–1092.

Rambaut, A., and L. Bromham.

1998

. Estimating divergence dates from molecular sequences.

Mol. Biol. Evol.

442

–448.

Sanderson, M. J.

1997

. A nonparametric approach to estimating divergence times in the absence of rate constancy.

Mol. Biol. Evol.

1218

–1232.

Thorne, J. L., H. Kishino, and I. S. Painter.

1998

. Estimating the rate of evolution of the rate of molecular evolution.

Mol. Biol. Evol.

1647

–1657.

Yang, Z., and B. Rannala.

1997

. Bayesian phylogenetic inference using DNA sequences: a Markov chain Monte Carlo method.

Mol. Biol. Evol.

717

–724.

Yoder, A. D., and Z. Yang.

2000

. Estimation of primate speciation dates using local molecular clocks.

Mol. Biol. Evol.

1081

–1090.