FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program (original) (raw)

Journal Article

1Institut de Biologie Computationnelle, LIRMM, UMR 5506: CNRS & Université de Montpellier, France

Search for other works by this author on:

1Institut de Biologie Computationnelle, LIRMM, UMR 5506: CNRS & Université de Montpellier, France

Search for other works by this author on:

1Institut de Biologie Computationnelle, LIRMM, UMR 5506: CNRS & Université de Montpellier, France

Search for other works by this author on:

Cite

Vincent Lefort, Richard Desper, Olivier Gascuel, FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program, Molecular Biology and Evolution, Volume 32, Issue 10, October 2015, Pages 2798–2800, https://doi.org/10.1093/molbev/msv150
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of Neighbor Joining (NJ). FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange. The new 2.0 version also includes Subtree Pruning and Regrafting, while remaining as fast as NJ and providing a number of facilities: Distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations. FastME is available using several interfaces: Command-line (to be integrated in pipelines), PHYLIP-like, and a Web server (http://www.atgc-montpellier.fr/fastme/).

Distance algorithms infer phylogenies from matrices of pairwise distances among taxa. These algorithms are fast and have been shown to be fairly accurate using both real and simulated data (e.g., Kuhner and Felsenstein 1994). Moreover, they account for probabilistic modeling of substitutions while estimating evolutionary distances. Even if they are not as accurate as likelihood-based methods, these algorithms are still widely used due to their speed and simplicity, as assessed by the high number of citations for Neighbor Joining (NJ, Saitou and Nei 1987; see also Studier and Keppler 1988): Approximately 2,000 in 2014 (Web of Science).

NJ is a greedy algorithm that builds trees by iterative agglomeration of taxa. Gascuel and Steel (2006) showed that the criterion being minimized by NJ is the balanced version of minimum evolution (BME), which estimates the tree length using Pauplin’s formula (2000). We proposed fast, BME-based algorithms (Desper and Gascuel 2002, 2004) to 1) construct an initial tree using greedy taxon insertion and 2) perform topological moves, namely Nearest Neighbor Interchanges (NNIs), to improve an initial (e.g., NJ) tree. These algorithms were implemented in FastME 1.0 and were shown to improve accuracy substantially in comparison to NJ’s (e.g., Vinh and von Haeseler 2005), while having a similar computational cost. A related NNI-based approach, using profiles of ancestral sequences instead of a distance matrix, was proposed by Price et al. (2009) and implemented in FastTree1. FastME has been developed over the past several years:

Subtree Pruning and Regrafting (SPR) topological moves are available in FastME 2.0. SPR consists of removing a subtree from the initial tree and reinserting this subtree by dividing any of the remaining branches in the initial tree. We thus have O(n2) alternative trees to improve the initial tree, where n is the number of taxa. The best SPR is selected and the procedure is iterated until no more improving SPR is found. SPRs are more powerful than NNIs (with O(n) alternative trees) and have been shown to be useful in a number of contexts and studies (e.g., with maximum-likelihood [ML]-based tree building; Guindon et al. 2010). Our algorithm first precomputes the average distance between every pair of subtrees of the initial topology; this can be achieved in O(n2) time. Then, the criterion value for any new tree obtained by SPR is computed in constant time, meaning that the total cost of the SPR-based tree search is O(kn2)⁠, where k is the number of iterations. As k is usually smaller than n, the computational cost is similar to that of NJ, that is, O(n3)⁠. Experiments with real data (both DNA and proteins) show that a substantial gain is obtained, compared with NJ and NJ+NNIs; the best alternative is FastTree1, which (quickly) infers trees that are less fitted than NJ+SPR’s regarding minimum evolution, but have similar likelihood value with DNA sequences. Details on our SPR algorithm and these experiments are provided in Supplementary Material online.
A number of tree-building algorithms have been added, to infer an initial tree or to improve that tree (or any input tree) with topological moves. These algorithms seek to optimize BME, but also the Ordinary Least Square version of minimum evolution (OLSME; Rzhetsky and Nei 1993), which may be relevant with nonsequence data. These algorithms and their properties are summarized in table 1.
The calculation of evolutionary distance matrices from DNA and protein sequences is also available. For DNA, most models having an analytical solution (e.g., TN93) have been implemented. For protein sequences, we use standard ML-based estimations, combined with a number of rate matrices (e.g., JTT [Jones, Taylor, and Thorton]) to accommodate various data sets (mitochondria, virus, etc.). In both cases, distances can be estimated assuming a continuous gamma distribution of rates across sites with user-defined parameter. Models and options are summarized in table 1.
Bootstrapping and analysis of multiple data sets can be performed within a single run. FastME 2.0 implements Felsenstein’s bootstrap, where pseudo trees are built from resampled alignments and compared with the original tree obtained from the input alignment. Users can also submit a unique file containing multiple alignments (e.g., corresponding to different genes in phylogenomics studies) and launch tree construction for all of them using the same program options.
Bootstrapping is a highly parallelizable task. The same holds for distance estimations. FastME 2.0 provides parallel computing for these two tasks using the OpenMP API. When compiling FastME, users can choose to obtain a mono-thread or a parallel binary. They may then set, on the command line, the number of cores to be used.
FastME 2.0 includes a menu-driven PHYLIP-like interface, and a command-line interface, to be typically integrated in phylogenomics pipelines. A Web server is also available for occasional users. FastME is an open-source C program, with binaries available for the three main operating systems.

Table 1.

Substitution Models and Algorithms Available in FastME 2.0.

Models
Target	Method
DNA	p-distance	General	Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)

Protein	p-distance	General	Analytical formula
F81-like	General	Analytical formula
LG (Le, MBE, 2008)	General	ML estimation
WAG (Whelan, MBE, 2001)	General	ML estimation
JTT (Jones, CABIOS, 1992)	General	ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978)	General	ML estimation
DCMut (Kosiol, MBE, 2004)	General	ML estimation
CpRev (Adachi, J. Mol. Evol., 2000)	Chloroplast	ML estimation
MtREV (Adachi, J. Mol. Evol., 1996)	Mitochondria	ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002)	Retrovirus	ML estimation
HIVb/w (Nickle, PLoS One, 2007)	HIV	ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010)	Flu	ML estimation

Models
Target	Method
DNA	p-distance	General	Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)

Protein	p-distance	General	Analytical formula
F81-like	General	Analytical formula
LG (Le, MBE, 2008)	General	ML estimation
WAG (Whelan, MBE, 2001)	General	ML estimation
JTT (Jones, CABIOS, 1992)	General	ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978)	General	ML estimation
DCMut (Kosiol, MBE, 2004)	General	ML estimation
CpRev (Adachi, J. Mol. Evol., 2000)	Chloroplast	ML estimation
MtREV (Adachi, J. Mol. Evol., 1996)	Mitochondria	ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002)	Retrovirus	ML estimation
HIVb/w (Nickle, PLoS One, 2007)	HIV	ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010)	Flu	ML estimation

Algorithms
Optimization Criterion	Method and Complexity
First tree	BME (Desper, J. Comp. Biol., 2002)	BME	Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002)	OLSME	Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987)	BME	Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997)	OLSME	Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997)	—	Agglomerative O(_n_3)
Topo. moves	BNNI (Desper, J. Comp. Biol., 2002)	BME	NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002)	OLSME	NNI O(_kn_2)
SPR	BME	SPR O(_kn_2)

Algorithms
Optimization Criterion	Method and Complexity
First tree	BME (Desper, J. Comp. Biol., 2002)	BME	Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002)	OLSME	Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987)	BME	Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997)	OLSME	Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997)	—	Agglomerative O(_n_3)
Topo. moves	BNNI (Desper, J. Comp. Biol., 2002)	BME	NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002)	OLSME	NNI O(_kn_2)
SPR	BME	SPR O(_kn_2)

Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n.

Table 1.

Substitution Models and Algorithms Available in FastME 2.0.

Models
Target	Method
DNA	p-distance	General	Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)

Protein	p-distance	General	Analytical formula
F81-like	General	Analytical formula
LG (Le, MBE, 2008)	General	ML estimation
WAG (Whelan, MBE, 2001)	General	ML estimation
JTT (Jones, CABIOS, 1992)	General	ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978)	General	ML estimation
DCMut (Kosiol, MBE, 2004)	General	ML estimation
CpRev (Adachi, J. Mol. Evol., 2000)	Chloroplast	ML estimation
MtREV (Adachi, J. Mol. Evol., 1996)	Mitochondria	ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002)	Retrovirus	ML estimation
HIVb/w (Nickle, PLoS One, 2007)	HIV	ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010)	Flu	ML estimation

Models
Target	Method
DNA	p-distance	General	Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)

Protein	p-distance	General	Analytical formula
F81-like	General	Analytical formula
LG (Le, MBE, 2008)	General	ML estimation
WAG (Whelan, MBE, 2001)	General	ML estimation
JTT (Jones, CABIOS, 1992)	General	ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978)	General	ML estimation
DCMut (Kosiol, MBE, 2004)	General	ML estimation
CpRev (Adachi, J. Mol. Evol., 2000)	Chloroplast	ML estimation
MtREV (Adachi, J. Mol. Evol., 1996)	Mitochondria	ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002)	Retrovirus	ML estimation
HIVb/w (Nickle, PLoS One, 2007)	HIV	ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010)	Flu	ML estimation

Algorithms
Optimization Criterion	Method and Complexity
First tree	BME (Desper, J. Comp. Biol., 2002)	BME	Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002)	OLSME	Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987)	BME	Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997)	OLSME	Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997)	—	Agglomerative O(_n_3)
Topo. moves	BNNI (Desper, J. Comp. Biol., 2002)	BME	NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002)	OLSME	NNI O(_kn_2)
SPR	BME	SPR O(_kn_2)

Algorithms
Optimization Criterion	Method and Complexity
First tree	BME (Desper, J. Comp. Biol., 2002)	BME	Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002)	OLSME	Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987)	BME	Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997)	OLSME	Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997)	—	Agglomerative O(_n_3)
Topo. moves	BNNI (Desper, J. Comp. Biol., 2002)	BME	NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002)	OLSME	NNI O(_kn_2)
SPR	BME	SPR O(_kn_2)

FastME 2.0 is thus a comprehensive program, including all required tools (numerous algorithms, distance estimation with various models, bootstrapping) to infer phylogenies using a distance approach. Source code, binaries, Web server, user guide, examples, benchmark data sets, etc., are available from http://www.atgc-montpellier.fr/fastme/ (last accessed July 14, 2015).

Acknowledgment

This research was supported by the Institut Français de Bioinformatique (RENABI-IFB, Investissements d’Avenir).

References

2002

Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle

J Comp Biol.

687

–

705

2004

Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting

Mol Biol Evol.

587

598

2006

Neighbor-joining revealed

Mol Biol Evol.

1997

–

2000

2010

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0

Syst Biol.

307

–

321

1994

A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates

Mol Biol Evol.

459

468

2000

Direct calculation of a tree length using a distance matrix

J Mol Evol.

–

2009

FastTree: computing large minimum evolution trees with profiles instead of a distance matrix

Mol Biol Evol.

1641

1650

1993

Theoretical foundation of the minimum-evolution method of phylogenetic inference

Mol Biol Evol.

1073

1095

1987

The neighbor-joining method: a new method for reconstruction of phylogenetic trees

Mol Biol Evol.

406

425

1988

A note on the neighbor-joining algorithm of Saitou and Nei

Mol Biol Evol.

729

731

2005

Shortest triplet clustering: reconstructing large phylogenies using representative sets

BMC Bioinformatics

Author notes

Associate editor: Michael Rosenberg

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 14,313

11,210 Pageviews

3,103 PDF Downloads

Since 11/1/2016

Month:	Total Views:
November 2016	1
December 2016	1
January 2017	4
February 2017	17
March 2017	70
April 2017	33
May 2017	29
June 2017	28
July 2017	43
August 2017	32
September 2017	31
October 2017	32
November 2017	44
December 2017	73
January 2018	109
February 2018	76
March 2018	90
April 2018	124
May 2018	113
June 2018	102
July 2018	104
August 2018	98
September 2018	65
October 2018	99
November 2018	65
December 2018	79
January 2019	70
February 2019	77
March 2019	108
April 2019	92
May 2019	119
June 2019	123
July 2019	113
August 2019	98
September 2019	95
October 2019	122
November 2019	146
December 2019	109
January 2020	100
February 2020	101
March 2020	100
April 2020	89
May 2020	121
June 2020	115
July 2020	117
August 2020	109
September 2020	157
October 2020	143
November 2020	143
December 2020	158
January 2021	186
February 2021	184
March 2021	235
April 2021	243
May 2021	162
June 2021	155
July 2021	169
August 2021	159
September 2021	206
October 2021	291
November 2021	206
December 2021	147
January 2022	159
February 2022	162
March 2022	241
April 2022	211
May 2022	267
June 2022	184
July 2022	200
August 2022	169
September 2022	187
October 2022	245
November 2022	192
December 2022	189
January 2023	197
February 2023	185
March 2023	313
April 2023	266
May 2023	230
June 2023	172
July 2023	219
August 2023	199
September 2023	165
October 2023	338
November 2023	162
December 2023	248
January 2024	264
February 2024	204
March 2024	262
April 2024	224
May 2024	264
June 2024	197
July 2024	300
August 2024	254
September 2024	232
October 2024	257
November 2024	125

Citations

966 Web of Science

FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program (original) (raw)

Cite

Abstract

Acknowledgment

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Email alerts

Citing articles via

Latest

Most Cited

FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program (original) (raw)

Cite

Abstract

Acknowledgment

References

Author notes

Supplementary data

Citations

Views

Altmetric

Email alerts

Email alerts

Citing articles via

Latest

Most Read

Most Cited