Potential Applications and Pitfalls of Bayesian Inference of Phylogeny (original) (raw)

Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo Method

Molecular Biology and Evolution, 1997

An improved Bayesian method is presented for estimating phylogenetic trees using DNA sequence data. The birthdeath process with species sampling is used to specify the prior distribution of phylogenies and ancestral speciation times, and the posterior probabilities of phylogenies are used to estimate the maximum posterior probability (MAP) tree. Monte Carlo integration is used to integrate over the ancestral speciation times for particular trees. A Markov Chain Monte Carlo method is used to generate the set of trees with the highest posterior probabilities. Methods are described for an empirical Bayesian analysis, in which estimates of the speciation and extinction rates are used in calculating the posterior probabilities, and a hierarchical Bayesian analysis, in which these parameters are removed from the model by an additional integration. The Markov Chain Monte Carlo method avoids the requirement of our earlier method for calculating MAP trees to sum over all possible topologies (which limited the number of taxa in an analysis to about five). The methods are applied to analyze DNA sequences for nine species of primates, and the MAP tree, which is identical to a maximum-likelihood estimate of topology, has a probability of approximately 95%.

ReviewArticle Phylogenetic Analyses: A Toolbox Expanding towards Bayesian Methods

The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species.

The prior probabilities of phylogenetic trees

2008

Abstract Bayesian methods have become among the most popular methods in phylogenetics, but theoretical opposition to this methodology remains. After providing an introduction to Bayesian theory in this context, I attempt to tackle the problem mentioned most often in the literature: the “problem of the priors”—how to assign prior probabilities to tree hypotheses. I first argue that a recent objection—that an appropriate assignment of priors is impossible—is based on a misunderstanding of what ignorance and bias are.

Phylogenetic analyses: A toolbox expanding towards Bayesian methods

International Journal of Plant Genomics, 2008

The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species.

Branch-length prior influences Bayesian posterior probability of phylogeny

2005

The Bayesian method for estimating species phylogenies from molecular sequence data provides an attractive alternative to maximum likelihood with nonparametric bootstrap due to the easy interpretation of posterior probabilities for trees and to availability of efficient computational algorithms. However, for many data sets it produces extremely high posterior probabilities, sometimes for apparently incorrect clades. Here we use both computer simulation and empirical data analysis to examine the effect of the prior model for internal branch lengths. We found that posterior probabilities for trees and clades are sensitive to the prior for internal branch lengths, and priors assuming long internal branches cause high posterior probabilities for trees. In particular, uniform priors with high upper bounds bias Bayesian clade probabilities in favor of extreme values. We discuss possible remedies to the problem, including empirical and full Bayesian methods and subjective procedures suggested in Bayesian hypothesis testing. Our results also suggest that the bootstrap proportion and Bayesian posterior probability are different measures of accuracy, and that the bootstrap proportion, if interpreted as the probability that the clade is true, can be either too liberal or too conservative.

A biologist’s guide to Bayesian phylogenetic analysis

Nature Ecology & Evolution, 2017

Bayesian methods have become very popular in molecular phylogenetics due to the availability of user-friendly software implementing sophisticated models of evolution. However, Bayesian phylogenetic models are complex, and analyses are often carried out using default settings, which may not be appropriate. Here, we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC), the diagnosis of an MCMC run, and ways of summarising the MCMC sample. We discuss the specification of the prior, the choice of the substitution model, and partitioning of the data. Finally, we provide a list of common Bayesian phylogenetic software and provide recommendations as to their use.

Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees

Molecular Biology and Evolution, 1999

We further develop the Bayesian framework for analyzing aligned nucleotide sequence data to reconstruct phylogenies, assess uncertainty in the reconstructions, and perform other statistical inferences. We employ a Markov chain Monte Carlo sampler to sample trees and model parameter values from their joint posterior distribution. All statistical inferences are naturally based on this sample. The sample provides a most-probable tree with posterior probabilities for each clade, information that is qualitatively similar to that for the maximum-likelihood tree with bootstrap proportions and permits further inferences on tree topology, branch lengths, and model parameter values. On moderately large trees, the computational advantage of our method over bootstrapping a maximum-likelihood analysis can be considerable. In an example with 31 taxa, the time expended by our software is orders of magnitude less than that a widely used phylogeny package for bootstrapping maximum likelihood estimation would require to achieve comparable statistical accuracy. While there has been substantial debate over the proper interpretation of bootstrap proportions, Bayesian posterior probabilities clearly and directly quantify uncertainty in questions of biological interest, at least from a Bayesian perspective. Because our tree proposal algorithms are independent of the choice of likelihood function, they could also be used in conjunction with likelihood models more complex than those we have currently implemented.

Empirical evaluation of a prior for Bayesian phylogenetic inference

Philosophical Transactions of the Royal Society B: Biological Sciences, 2008

The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior meanμ0should approach zero faster thanbut more slowly than 1/n, wherenis the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding gen...