Choosing among partition models in Bayesian phylogenetics - PubMed (original) (raw)
Choosing among partition models in Bayesian phylogenetics
Yu Fan et al. Mol Biol Evol. 2011 Jan.
Abstract
Bayesian phylogenetic analyses often depend on Bayes factors (BFs) to determine the optimal way to partition the data. The marginal likelihoods used to compute BFs, in turn, are most commonly estimated using the harmonic mean (HM) method, which has been shown to be inaccurate. We describe a new more accurate method for estimating the marginal likelihood of a model and compare it with the HM method on both simulated and empirical data. The new method generalizes our previously described stepping-stone (SS) approach by making use of a reference distribution parameterized using samples from the posterior distribution. This avoids one challenging aspect of the original SS method, namely the need to sample from distributions that are close (in the Kullback-Leibler sense) to the prior. We specifically address the choice of partition models and find that using the HM method can lead to a strong preference for an overpartitioned model. In contrast to the HM method and the original SS method, we show using simulated data that the generalized SS method is strikingly more precise (repeatable BF values of the same data and partition model) and yields BF values that are much more reasonable than those produced by the HM method. Comparisons of HM and generalized SS methods on an empirical data set demonstrate that the generalized SS method tends to choose simpler partition schemes that are more in line with expectation based on inferred patterns of molecular evolution. The generalized SS method shares with thermodynamic integration the need to sample from a series of distributions in addition to the posterior. Such dedicated path-based Markov chain Monte Carlo analyses appear to be a cost of estimating marginal likelihoods accurately.
Figures
FIG. 1.
Plots relating the number of sites to twice the natural logarithm of the BF (2log(BF)) in favor of the partitioned model (with two equal-size subsets) over the unpartitioned model for 200 data sets simulated under a diversity of unpartitioned GTR + G models (see text for details). (a) Left: 2log(BF) estimated using the HM method. (b) Middle: 2log(BF) estimated using the original SS method. (c) Right: 2log(BF) estimated using the generalized SS method.1
FIG. 2.
Scatterplots showing twice the natural logarithm of the BF (2log(BF)) estimated using two independent analyses started with different pseudorandom number seeds. (a) Left: 2log(BF) estimated using the HM method. (b) Middle: 2log(BF) estimated using the original SS method. (c) Right: 2log(BF) estimated using the generalized SS method.
FIG. 3.
Results of applying the HM and generalized SS methods to the empirical New Zealand cicada data set for four different partitioning schemes: unpartitioned (None), partitioned by gene (Gene, 4 subsets), partitioned by codon (Codon, 3 subsets), and partitioned by both gene and codon (Both, 12 subsets). Error bars represent standard deviations based on 20 independent replicates. The dotted line connects mean log marginal likelihoods estimated using the HM method, and the solid line connects mean log marginal likelihoods estimated using the generalized SS method.
Similar articles
- Improving marginal likelihood estimation for Bayesian phylogenetic model selection.
Xie W, Lewis PO, Fan Y, Kuo L, Chen MH. Xie W, et al. Syst Biol. 2011 Mar;60(2):150-60. doi: 10.1093/sysbio/syq085. Epub 2010 Dec 27. Syst Biol. 2011. PMID: 21187451 Free PMC article. - Species delimitation using Bayes factors: simulations and application to the Sceloporus scalaris species group (Squamata: Phrynosomatidae).
Grummer JA, Bryson RW Jr, Reeder TW. Grummer JA, et al. Syst Biol. 2014 Mar;63(2):119-33. doi: 10.1093/sysbio/syt069. Epub 2013 Nov 20. Syst Biol. 2014. PMID: 24262383 - Genealogical Working Distributions for Bayesian Model Testing with Phylogenetic Uncertainty.
Baele G, Lemey P, Suchard MA. Baele G, et al. Syst Biol. 2016 Mar;65(2):250-64. doi: 10.1093/sysbio/syv083. Epub 2015 Nov 1. Syst Biol. 2016. PMID: 26526428 Free PMC article. - A biologist's guide to Bayesian phylogenetic analysis.
Nascimento FF, Reis MD, Yang Z. Nascimento FF, et al. Nat Ecol Evol. 2017 Oct;1(10):1446-1454. doi: 10.1038/s41559-017-0280-x. Epub 2017 Sep 21. Nat Ecol Evol. 2017. PMID: 28983516 Free PMC article. Review. - Marginal Likelihoods in Phylogenetics: A Review of Methods and Applications.
R Oaks J, A Cobb K, N Minin V, D Leaché A. R Oaks J, et al. Syst Biol. 2019 Sep 1;68(5):681-697. doi: 10.1093/sysbio/syz003. Syst Biol. 2019. PMID: 30668834 Free PMC article. Review.
Cited by
- Mosaic evolution underlies feliform morphological disparity.
Barrett PZ, Hopkins SSB. Barrett PZ, et al. Proc Biol Sci. 2024 Aug;291(2028):20240756. doi: 10.1098/rspb.2024.0756. Epub 2024 Aug 14. Proc Biol Sci. 2024. PMID: 39137889 - A Guide to Phylogenomic Inference.
Patané JSL, Martins J Jr, Setubal JC. Patané JSL, et al. Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11. Methods Mol Biol. 2024. PMID: 38819564 - Emergence and dissemination of equine-like G3P[8] rotavirus A in Brazil between 2015 and 2021.
Gutierrez MB, Arantes I, Bello G, Berto LH, Dutra LH, Kato RB, Fumian TM. Gutierrez MB, et al. Microbiol Spectr. 2024 Apr 2;12(4):e0370923. doi: 10.1128/spectrum.03709-23. Epub 2024 Mar 7. Microbiol Spectr. 2024. PMID: 38451227 Free PMC article. - Global phylogenomic diversity of Brucella abortus: spread of a dominant lineage.
Janke NR, Williamson CHD, Drees KP, Suárez-Esquivel M, Allen AR, Ladner JT, Quance CR, Robbe-Austerman S, O'Callaghan D, Whatmore AM, Foster JT. Janke NR, et al. Front Microbiol. 2023 Nov 29;14:1287046. doi: 10.3389/fmicb.2023.1287046. eCollection 2023. Front Microbiol. 2023. PMID: 38094632 Free PMC article. - Detecting Episodic Evolution through Bayesian Inference of Molecular Clock Models.
Tay JH, Baele G, Duchene S. Tay JH, et al. Mol Biol Evol. 2023 Oct 4;40(10):msad212. doi: 10.1093/molbev/msad212. Mol Biol Evol. 2023. PMID: 37738550 Free PMC article.
References
- Akaike H. A new look at statistical model identification. IEEE Trans Automat Contr. 1974;19:716–723.
- Brandley M, Schmitz A, Reeder T. Partitioned Bayesian analyses, partition choice, and the phylogenetic relationships of scincid lizards. Syst Biol. 2005;54:373–390. - PubMed
- Brown JM, Lemmon AR. The importance of data partitioning and the utility of Bayes factors in Bayesian phylogenetics. Syst Biol. 2007;56:643–655. - PubMed
- Brown MW, Spiegel FW, Silberman JD. Phylogeny of the “Forgotten” Cellular Slime Mold, Fonticula alba, Reveals a Key Evolutionary Branch within Opisthokonta. Mol Biol Evol. 2009;26:2699–2709. - PubMed
- Clarke JA, Middleton KM. Mosaicism, modules, and the evolution of birds: results from a Bayesian approach to the study of morphological evolution using discrete character data. Syst Biol. 2008;57:185–201. - PubMed
Publication types
MeSH terms
LinkOut - more resources
Full Text Sources
Miscellaneous