Polytomies and Bayesian phylogenetic inference (original) (raw)

Markov chain Monte Carlo algorithms for the Bayesian analysis of phylogenetic trees

Molecular Biology and Evolution, 1999

We further develop the Bayesian framework for analyzing aligned nucleotide sequence data to reconstruct phylogenies, assess uncertainty in the reconstructions, and perform other statistical inferences. We employ a Markov chain Monte Carlo sampler to sample trees and model parameter values from their joint posterior distribution. All statistical inferences are naturally based on this sample. The sample provides a most-probable tree with posterior probabilities for each clade, information that is qualitatively similar to that for the maximum-likelihood tree with bootstrap proportions and permits further inferences on tree topology, branch lengths, and model parameter values. On moderately large trees, the computational advantage of our method over bootstrapping a maximum-likelihood analysis can be considerable. In an example with 31 taxa, the time expended by our software is orders of magnitude less than that a widely used phylogeny package for bootstrapping maximum likelihood estimation would require to achieve comparable statistical accuracy. While there has been substantial debate over the proper interpretation of bootstrap proportions, Bayesian posterior probabilities clearly and directly quantify uncertainty in questions of biological interest, at least from a Bayesian perspective. Because our tree proposal algorithms are independent of the choice of likelihood function, they could also be used in conjunction with likelihood models more complex than those we have currently implemented.

Modelling heterotachy in phylogenetic inference by reversible-jump Markov chain Monte Carlo

Philosophical Transactions of the Royal Society B: Biological Sciences, 2008

The rate at which a given site in a gene sequence alignment evolves over time may vary. This phenomenon—known as heterotachy—can bias or distort phylogenetic trees inferred from models of sequence evolution that assume rates of evolution are constant. Here, we describe a phylogenetic mixture model designed to accommodate heterotachy. The method sums the likelihood of the data at each site over more than one set of branch lengths on the same tree topology. A branch-length set that is best for one site may differ from the branch-length set that is best for some other site, thereby allowing different sites to have different rates of change throughout the tree. Because rate variation may not be present in all branches, we use a reversible-jump Markov chain Monte Carlo algorithm to identify those branches in which reliable amounts of heterotachy occur. We implement the method in combination with our ‘pattern-heterogeneity’ mixture model, applying it to simulated data and five published d...

The prior probabilities of phylogenetic trees

2008

Abstract Bayesian methods have become among the most popular methods in phylogenetics, but theoretical opposition to this methodology remains. After providing an introduction to Bayesian theory in this context, I attempt to tackle the problem mentioned most often in the literature: the “problem of the priors”—how to assign prior probabilities to tree hypotheses. I first argue that a recent objection—that an appropriate assignment of priors is impossible—is based on a misunderstanding of what ignorance and bias are.

Bayesian Phylogenetic Inference via Markov Chain Monte Carlo Methods

Biometrics, 1999

We derive a Markov chain t o sample from the posterior distribution for a phylogenetic tree given sequence information from the corresponding set of organisms, a stochastic model for these data, and a prior distribution on the space of trees. A transformation of the tree into a canonical cophenetic matrix form suggests a simple and effective proposal distribution for selecting candidate trees close to the current tree in the chain. We illustrate the algorithm with restriction site data on 9 plant species, then extend to DNA sequences from 32 species of fish. The algorithm mixes well in both examples from random starting trees, generating reproducible estimates and credible sets for the path of evolution.

Empirical evaluation of a prior for Bayesian phylogenetic inference

Philosophical Transactions of the Royal Society B: Biological Sciences, 2008

The Bayesian method of phylogenetic inference often produces high posterior probabilities (PPs) for trees or clades, even when the trees are clearly incorrect. The problem appears to be mainly due to large sizes of molecular datasets and to the large-sample properties of Bayesian model selection and its sensitivity to the prior when several of the models under comparison are nearly equally correct (or nearly equally wrong) and are of the same dimension. A previous suggestion to alleviate the problem is to let the internal branch lengths in the tree become increasingly small in the prior with the increase in the data size so that the bifurcating trees are increasingly star-like. In particular, if the internal branch lengths are assigned the exponential prior, the prior meanμ0should approach zero faster thanbut more slowly than 1/n, wherenis the sequence length. This paper examines the usefulness of this data size-dependent prior using a dataset of the mitochondrial protein-coding gen...

ReviewArticle Phylogenetic Analyses: A Toolbox Expanding towards Bayesian Methods

The reconstruction of phylogenies is becoming an increasingly simple activity. This is mainly due to two reasons: the democratization of computing power and the increased availability of sophisticated yet user-friendly software. This review describes some of the latest additions to the phylogenetic toolbox, along with some of their theoretical and practical limitations. It is shown that Bayesian methods are under heavy development, as they offer the possibility to solve a number of long-standing issues and to integrate several steps of the phylogenetic analyses into a single framework. Specific topics include not only phylogenetic reconstruction, but also the comparison of phylogenies, the detection of adaptive evolution, and the estimation of divergence times between species.

Assessing confidence in phylogenetic trees : bootstrap versus Markov chain Monte Carlo

2002

Recent implementations of Bayesian approaches are one of the largest advances in phylogenetic tree estimation in the last 10 years. Markov chain Monte Carlo (MCMC) is used in these new approaches to estimate the Bayesian posterior probability for each tree topology of interest. Our goal is to assess the confidence in the estimated tree (particularly in whether prespecified groups are

A biologist’s guide to Bayesian phylogenetic analysis

Nature Ecology & Evolution, 2017

Bayesian methods have become very popular in molecular phylogenetics due to the availability of user-friendly software implementing sophisticated models of evolution. However, Bayesian phylogenetic models are complex, and analyses are often carried out using default settings, which may not be appropriate. Here, we summarize the major features of Bayesian phylogenetic inference and discuss Bayesian computation using Markov chain Monte Carlo (MCMC), the diagnosis of an MCMC run, and ways of summarising the MCMC sample. We discuss the specification of the prior, the choice of the substitution model, and partitioning of the data. Finally, we provide a list of common Bayesian phylogenetic software and provide recommendations as to their use.

Bayesian Phylogenetic Inference Using a Combinatorial Sequential Monte Carlo Method

Journal of the American Statistical Association, 2015

The application of Bayesian methods to large-scale phylogenetics problems is increasingly limited by computational issues, motivating the development of methods that can complement existing Markov chain Monte Carlo (MCMC) schemes. Sequential Monte Carlo (SMC) methods are approximate inference algorithms that have become very popular for time series models. Such methods have been recently developed to address phylogenetic inference problems but currently available techniques are only applicable to a restricted class of phylogenetic tree models compared to MCMC. In this article, we propose an original combinatorial SMC (CSMC) method to approximate posterior phylogenetic tree distributions, which is applicable to a general class of models and can be easily combined with MCMC to infer evolutionary parameters. Our method only relies on the existence of a flexible partially ordered set structure and is more generally applicable to sampling problems on combinatorial spaces. We demonstrate that the proposed CSMC algorithm provides consistent estimates under weak assumptions, is computationally fast, and is additionally easily parallelizable. Supplementary materials for this article are available online.

The Importance of Proper Model Assumption in Bayesian Phylogenetics

Systematic Biology, 2004

We studied the importance of proper model assumption in the context of Bayesian phylogenetics by examining >5,000 Bayesian analyses and six nested models of nucleotide substitution. Model misspecification can strongly bias bipartition posterior probability estimates. These biases were most pronounced when rate heterogeneity was ignored. The type of bias seen at a particular bipartition appeared to be strongly influenced by the lengths of the branches surrounding that bipartition. In the Felsenstein zone, posterior probability estimates of bipartitions were biased when the assumed model was underparameterized but were unbiased when the assumed model was overparameterized. For the inverse Felsenstein zone, however, both underparameterization and overparameterization led to biased bipartition posterior probabilities, although the bias caused by overparameterization was less pronounced and disappeared with increased sequence length. Model parameter estimates were also affected by model misspecification. Underparameterization caused a bias in some parameter estimates, such as branch lengths and the gamma shape parameter, whereas overparameterization caused a decrease in the precision of some parameter estimates. We caution researchers to assure that the most appropriate model is assumed by employing both a priori model choice methods and a posteriori model adequacy tests. [Bayesian phylogenetic inference; convergence; Markov chain Monte Carlo; maximum likelihood; model choice; posterior probability.]