ASTRAL: genome-scale coalescent-based species tree estimation - PubMed (original) (raw)

ASTRAL: genome-scale coalescent-based species tree estimation

S Mirarab et al. Bioinformatics. 2014.

Abstract

Motivation: Species trees provide insight into basic biology, including the mechanisms of evolution and how it modifies biomolecular function and structure, biodiversity and co-evolution between genes and species. Yet, gene trees often differ from species trees, creating challenges to species tree estimation. One of the most frequent causes for conflicting topologies between gene trees and species trees is incomplete lineage sorting (ILS), which is modelled by the multi-species coalescent. While many methods have been developed to estimate species trees from multiple genes, some which have statistical guarantees under the multi-species coalescent model, existing methods are too computationally intensive for use with genome-scale analyses or have been shown to have poor accuracy under some realistic conditions.

Results: We present ASTRAL, a fast method for estimating species trees from multiple genes. ASTRAL is statistically consistent, can run on datasets with thousands of genes and has outstanding accuracy-improving on MP-EST and the population tree from BUCKy, two statistically consistent leading coalescent-based methods. ASTRAL is often more accurate than concatenation using maximum likelihood, except when ILS levels are low or there are too few gene trees.

Availability and implementation: ASTRAL is available in open source form at https://github.com/smirarab/ASTRAL/. Datasets studied in this article are available at http://www.cs.utexas.edu/users/phylo/datasets/astral.

Supplementary information: Supplementary data are available at Bioinformatics online.

© The Author 2014. Published by Oxford University Press.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Species tree estimation error on the default mammalian datasets with 37 genes and 400 genes (half with 500 bp and half with 1000 bp and with 71% mean BS). We show the missing branch rates for estimated species trees computed using summary methods (MRP, MP-EST, greedy, BUCKy-pop and ASTRAL) as well as concatenation using RAxML. Results are shown for running summary methods on maximum likelihood gene trees (bestML) and on the set of all bootstrap replicates from all genes (All BS), as well as the greedy consensus of running summary methods on individual bootstrap replicates from all genes (MLBS). CA-ML is run on the true alignment. Average and standard error shown based on 20 replicates

Fig. 2.

Fig. 2.

Species tree estimation error on the simulated mammalian datasets. We show the missing branch rates for estimated species trees computed using summary methods (MRP, MP-EST, greedy and ASTRAL) as well as CA-ML. Summary methods are run on RAxML bestML gene trees. We also show performance of summary methods on the true gene trees. Subfigure (A) shows results under default levels of ILS, varying the number of genes and gene tree resolution; (B) shows results under increased ILS levels, varying the number of genes, and on both true gene trees and estimated gene trees and (C) shows results on 200 genes, varying the amount of ILS from very low (5× species tree branch lengths) to very high (0.2× species tree branch lengths)

Fig. 3.

Fig. 3.

Analysis of the Song et al. mammals dataset using ASTRAL and MP-EST. We show the result of applying ASTRAL and MP-EST to 424 gene trees on 37-taxon mammalian species. MP-EST is based on rooted gene trees; ASTRAL is based on unrooted gene trees, and then rooted at the branch leading to the outgroup. Branch support values in black are for both methods, those in red are for ASTRAL and values in blue are for MP-EST. See

Supplementary Materials

for trees with full resolution

Similar articles

Cited by

References

    1. Allman ES, et al. Identifying the rooted species tree from the distribution of unrooted gene trees under the coalescent. J. Math. Biol. 2011;62:833–862. - PubMed
    1. Bayzid MS, Warnow T. Naive binning improves phylogenomic analyses. Bioinformatics. 2013;29:2277–2284. - PubMed
    1. Chiari Y, et al. Phylogenomic analyses support the position of turtles as the sister group of birds and crocodiles (archosauria) BMC Biol. 2012;10:65. - PMC - PubMed
    1. DeGiorgio M, Degnan JH. Fast and consistent estimation of species trees using supermatrix rooted triples. Mol. Biol. Evol. 2010;27:552–569. - PMC - PubMed
    1. DeGiorgio M, Degnan JH. Robustness to divergence time underestimation when inferring species trees from estimated gene trees. Syst. Biol. 2014;63:66–82. - PubMed

Publication types

MeSH terms

LinkOut - more resources