RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies (original) (raw)
Journal Article
1Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg and 2Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
1Scientific Computing Group, Heidelberg Institute for Theoretical Studies, 69118 Heidelberg and 2Department of Informatics, Institute of Theoretical Informatics, Karlsruhe Institute of Technology, 76128 Karlsruhe, Germany
Search for other works by this author on:
Received:
22 December 2013
Revision received:
14 January 2014
Accepted:
14 January 2014
Published:
21 January 2014
Navbar Search Filter Mobile Enter search term Search
Abstract
Motivation: Phylogenies are increasingly used in all fields of medical and biological research. Moreover, because of the next-generation sequencing revolution, datasets used for conducting phylogenetic analyses grow at an unprecedented pace. RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analyses of large datasets under maximum likelihood. Since the last RAxML paper in 2006, it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community.
Results: I present some of the most notable new features and extensions of RAxML, such as a substantial extension of substitution models and supported data types, the introduction of SSE3, AVX and AVX2 vector intrinsics, techniques for reducing the memory requirements of the code and a plethora of operations for conducting post-analyses on sets of trees. In addition, an up-to-date 50-page user manual covering all new RAxML options is available.
Availability and implementation: The code is available under GNU GPL at https://github.com/stamatak/standard-RAxML.
Contact: alexandros.stamatakis@h-its.org
Supplementary information: Supplementary data are available at Bioinformatics online.
1 INTRODUCTION
RAxML (Randomized Axelerated Maximum Likelihood) is a popular program for phylogenetic analysis of large datasets under maximum likelihood. Its major strength is a fast maximum likelihood tree search algorithm that returns trees with good likelihood scores. Since the last RAxML paper (Stamatakis, 2006), it has been continuously maintained and extended to accommodate the increasingly growing input datasets and to serve the needs of the user community. In the following, I will present some of the most notable new features and extensions of RAxML.
2 NEW FEATURES
2.1 Bootstrapping and support values
RAxML offers four different ways to obtain bootstrap support. It implements the standard non-parametric bootstrap and also the so-called rapid bootstrap (Stamatakis et al., 2008), which is a standard bootstrap search that relies on algorithmic shortcuts and approximations to speed up the search process.
It also offers an option to calculate the so-called SH-like support values (Guindon et al., 2010). I recently implemented a method that allows for computing RELL (Resampling Estimated Log Likelihoods) bootstrap support as described by Minh et al. (2013).
Apart from this, RAxML also offers a so-called bootstopping option (Pattengale et al., 2010). When this option is used, RAxML will automatically determine how many bootstrap replicates are required to obtain stable support values.
2.2 Models and data types
Apart from DNA and protein data, RAxML now also supports binary, multi-state morphological and RNA secondary structure data. It can correct for ascertainment bias (Lewis, 2001) for all of the above data types. This might be useful not only for morphological data matrices that only contain variable sites but also for alignments of SNPs.
The number of available protein substitution models has been significantly extended and comprises a general time reversible (GTR) model, as well as the computationally more complex LG4M and LG4X models (Le et al., 2012). RAxML can also automatically determine the best-scoring protein substitution model.
Finally, a new option for conducting a maximum likelihood estimate of the base frequencies has become available.
2.3 Parallel versions
RAxML offers a fine-grain parallelization of the likelihood function for multi-core systems via the PThreads-based version and a coarse-grain parallelization of independent tree searches via MPI (Message Passing Interface). It also supports coarse-grain/fine-grain parallelism via the hybrid MPI/PThreads version (Pfeiffer and Stamatakis, 2010).
Note that, for extremely large analyses on supercomputers, using the dedicated sister program ExaML [Exascale Maximum Likelihood (Stamatakis and Aberer, 2013)] is recommended.
2.4 Post-analysis of trees
RAxML offers a plethora of post-analysis functions for sets of trees. Apart from standard statistical significance tests, it offers efficient (and partially parallelized) operations for computing Robinson–Foulds distances, as well as extended majority rule, majority rule and strict consensus trees (Aberer et al., 2010).
Beyond this, it implements a method for identifying the so-called rogue taxa (Pattengale et al., 2011), and I recently implemented options for calculating the TC (Tree Certainty) and IC (Internode Certainty) measures as introduced by Salichos and Rokas (2013).
Finally, there is the new plausibility checker option (Dao et al., 2013) that allows computing the RF distances between a huge phylogeny with tens of thousands of taxa and several smaller more accurate reference phylogenies that contain a strict subset of the taxa in the huge tree. This option can be used to automatically assess the quality of huge trees that can not be inspected by eye.
2.5 Analyzing next-generation sequencing data
RAxML offers two algorithms for preparing and analyzing next-generation sequencing data. A sliding-window approach (unpublished) is available to assess which regions of a gene (e.g. 16S) exhibit strong and stable phylogenetic signal to support decisions about which regions to amplify. Apart from that, RAxML also implements parsimony and maximum likelihood flavors of the evolutionary placement algorithm [EPA (Berger et al., 2011)] that places short reads into a given reference phylogeny obtained from full-length sequences to determine the evolutionary origin of the reads. It also offers placement support statistics for those reads by calculating likelihood weights. This option can also be used to place fossils into a given phylogeny (Berger and Stamatakis, 2010) or to insert different outgroups into the tree a posteriori, that is, after the inference of the ingroup phylogeny.
2.6 Vector intrinsics
RAxML uses manually inserted and optimized x86 vector intrinsics to accelerate the parsimony and likelihood calculations. It supports SSE3, AVX and AVX2 (using fused multiply-add instructions) intrinsics. For a small single-gene DNA alignment using the Γ model of rate heterogeneity, the unvectorized version of RAxML requires 111.5 s, the SSE3 version 84.4 s and the AVX version 66.22 s to complete a simple tree search on an Intel i7-2620 M core running at 2.70 GHz under Ubuntu Linux.
The differences between AVX and AVX2 are less pronounced and are typically below 5% run time improvement.
2.7 Saving memory
Because memory shortage is becoming an issue due to the growing dataset sizes, RAxML implements an option for reducing memory footprints and potentially run times on large phylogenomic datasets with missing data. The memory savings are proportional to the amount of missing data in the alignment (Izquierdo-Carrasco et al., 2011)
2.8 Miscellaneous new options
RAxML offers options to conduct fast and more superficial tree searches on datasets with tens of thousands of taxa. It can also compute marginal ancestral states and offers an algorithm for rooting trees. Furthermore, it implements a sequential, PThreads-parallelized and MPI-parallelized algorithm for computing all quartets or a subset of quartets for a given alignment.
3 USER SUPPORT AND FUTURE WORK
User support is provided via the RAxML Google group at: https://groups.google.com/forum/?hl=en#!forum/raxml. The RAxML source code contains a comprehensive manual and there is a step-by-step tutorial with some basic commands available at http://www.exelixis-lab.org/web/software/raxml/hands_on.html. Further resources are available via the RAxML software page at http://www.exelixis-lab.org/web/software/raxml/
Future work includes the continued maintenance of RAxML, the adaptation to novel computer architectures and the implementation of novel models and datatypes, in particular codon models.
ACKNOWLEDGEMENT
The author thank several colleagues for contributing code to RAxML: Andre J. Aberer, Simon Berger, Alexey Kozlov, Nick Pattengale, Wayne Pfeiffer, Akifumi S. Tanabe, David Dao and Charlie Taylor.
Funding: This work was funded by institutional funding provided by the Heidelberg Institute for Theoretical Studies.
Conflict of Interest: none declared.
REFERENCES
et al.
Parallelized phylogenetic post-analysis on multi-core architectures
,
J. Comput. Sci.
,
2010
, vol.
1
(pg.
107
-
114
)
Accuracy of morphology-based phylogenetic fossil placement under maximum likelihood
,
International Conference on Computer Systems and Applications (AICCSA), 2010 IEEE/ACS
,
2010
New York, USA
IEEE
(pg.
1
-
9
)
et al.
Performance, accuracy, and web server for evolutionary placement of short sequence reads under maximum likelihood
,
Syst. Biol.
,
2011
, vol.
60
(pg.
291
-
302
)
Dao,D. et al. (2013) Automated plausibility analysis of large phyolgenies. Technical report. Karlsruhe Institute of Technology
et al.
New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of phyml 3.0
,
Syst. Biol.
,
2010
, vol.
59
(pg.
307
-
321
)
et al.
Algorithms, data structures, and numerics for likelihood-based phylogenetic inference of huge trees
,
BMC Bioinformatics
,
2011
, vol.
12
pg.
470
et al.
Modeling protein evolution with several amino acid replacement matrices depending on site rates
,
Mol. Biol. Evol.
,
2012
, vol.
29
(pg.
2921
-
2936
)
A likelihood approach to estimating phylogeny from discrete morphological character data
,
Syst. Biol.
,
2001
, vol.
50
(pg.
913
-
925
)
et al.
Ultrafast approximation for phylogenetic bootstrap
,
Mol. Biol Evol.
,
2013
, vol.
30
(pg.
1188
-
1195
)
et al.
How many bootstrap replicates are necessary?
,
J. Comput. Biol.
,
2010
, vol.
17
(pg.
337
-
354
)
et al.
Uncovering hidden phylogenetic consensus in large data sets
,
IEEE/ACM Trans. Comput. Biol. Bioinforma.
,
2011
, vol.
8
(pg.
902
-
911
)
Hybrid mpi/pthreads parallelization of the raxml phylogenetics code
,
International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW), 2010 IEEE
,
2010
New York, USA
IEEE
(pg.
1
-
8
)
Inferring ancient divergences requires genes with strong phylogenetic signals
,
Nature
,
2013
, vol.
497
(pg.
327
-
331
)
Raxml-vi-hpc: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models
,
Bioinformatics
,
2006
, vol.
22
(pg.
2688
-
2690
)
Novel parallelization schemes for large-scale likelihood-based phylogenetic inference
,
IEEE 27th International Symposium on Parallel Distributed Processing (IPDPS), 2013
,
2013
(pg.
1195
-
1204
)
et al.
A rapid bootstrap algorithm for the raxml web servers
,
Syst. Biol.
,
2008
, vol.
57
(pg.
758
-
771
)
Author notes
Associate Editor: Jonathan Wren
© The Author 2014. Published by Oxford University Press.
This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
Supplementary data
Citations
Views
Altmetric
Metrics
Total Views 123,008
98,726 Pageviews
24,282 PDF Downloads
Since 11/1/2016
Month: | Total Views: |
---|---|
November 2016 | 73 |
December 2016 | 69 |
January 2017 | 260 |
February 2017 | 432 |
March 2017 | 549 |
April 2017 | 394 |
May 2017 | 446 |
June 2017 | 437 |
July 2017 | 541 |
August 2017 | 492 |
September 2017 | 463 |
October 2017 | 544 |
November 2017 | 515 |
December 2017 | 1,024 |
January 2018 | 1,217 |
February 2018 | 1,093 |
March 2018 | 1,193 |
April 2018 | 1,275 |
May 2018 | 1,509 |
June 2018 | 1,645 |
July 2018 | 1,699 |
August 2018 | 1,496 |
September 2018 | 1,255 |
October 2018 | 1,142 |
November 2018 | 1,772 |
December 2018 | 1,375 |
January 2019 | 1,278 |
February 2019 | 1,328 |
March 2019 | 1,815 |
April 2019 | 1,589 |
May 2019 | 1,898 |
June 2019 | 1,355 |
July 2019 | 1,256 |
August 2019 | 1,122 |
September 2019 | 1,164 |
October 2019 | 1,179 |
November 2019 | 1,039 |
December 2019 | 1,048 |
January 2020 | 1,122 |
February 2020 | 1,193 |
March 2020 | 1,172 |
April 2020 | 896 |
May 2020 | 913 |
June 2020 | 1,340 |
July 2020 | 1,422 |
August 2020 | 1,258 |
September 2020 | 1,232 |
October 2020 | 1,312 |
November 2020 | 1,215 |
December 2020 | 1,173 |
January 2021 | 1,252 |
February 2021 | 1,284 |
March 2021 | 1,799 |
April 2021 | 1,345 |
May 2021 | 1,241 |
June 2021 | 1,259 |
July 2021 | 1,180 |
August 2021 | 1,253 |
September 2021 | 1,282 |
October 2021 | 1,474 |
November 2021 | 1,414 |
December 2021 | 1,327 |
January 2022 | 1,381 |
February 2022 | 1,384 |
March 2022 | 1,782 |
April 2022 | 1,536 |
May 2022 | 1,659 |
June 2022 | 1,548 |
July 2022 | 1,433 |
August 2022 | 1,450 |
September 2022 | 1,528 |
October 2022 | 1,566 |
November 2022 | 1,484 |
December 2022 | 1,361 |
January 2023 | 1,481 |
February 2023 | 1,898 |
March 2023 | 2,304 |
April 2023 | 1,835 |
May 2023 | 1,931 |
June 2023 | 1,509 |
July 2023 | 1,669 |
August 2023 | 1,651 |
September 2023 | 1,675 |
October 2023 | 1,804 |
November 2023 | 1,616 |
December 2023 | 1,568 |
January 2024 | 1,807 |
February 2024 | 1,724 |
March 2024 | 1,775 |
April 2024 | 1,649 |
May 2024 | 1,563 |
June 2024 | 1,303 |
July 2024 | 1,324 |
August 2024 | 1,305 |
September 2024 | 1,413 |
October 2024 | 753 |
×
Email alerts
Citing articles via
More from Oxford Academic