FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program (original) (raw)

Journal Article

,

1Institut de Biologie Computationnelle, LIRMM, UMR 5506: CNRS & Université de Montpellier, France

Search for other works by this author on:

,

1Institut de Biologie Computationnelle, LIRMM, UMR 5506: CNRS & Université de Montpellier, France

Search for other works by this author on:

1Institut de Biologie Computationnelle, LIRMM, UMR 5506: CNRS & Université de Montpellier, France

Search for other works by this author on:

Cite

Vincent Lefort, Richard Desper, Olivier Gascuel, FastME 2.0: A Comprehensive, Accurate, and Fast Distance-Based Phylogeny Inference Program, Molecular Biology and Evolution, Volume 32, Issue 10, October 2015, Pages 2798–2800, https://doi.org/10.1093/molbev/msv150
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

FastME provides distance algorithms to infer phylogenies. FastME is based on balanced minimum evolution, which is the very principle of Neighbor Joining (NJ). FastME improves over NJ by performing topological moves using fast, sophisticated algorithms. The first version of FastME only included Nearest Neighbor Interchange. The new 2.0 version also includes Subtree Pruning and Regrafting, while remaining as fast as NJ and providing a number of facilities: Distance estimation for DNA and proteins with various models and options, bootstrapping, and parallel computations. FastME is available using several interfaces: Command-line (to be integrated in pipelines), PHYLIP-like, and a Web server (http://www.atgc-montpellier.fr/fastme/).

Distance algorithms infer phylogenies from matrices of pairwise distances among taxa. These algorithms are fast and have been shown to be fairly accurate using both real and simulated data (e.g., Kuhner and Felsenstein 1994). Moreover, they account for probabilistic modeling of substitutions while estimating evolutionary distances. Even if they are not as accurate as likelihood-based methods, these algorithms are still widely used due to their speed and simplicity, as assessed by the high number of citations for Neighbor Joining (NJ, Saitou and Nei 1987; see also Studier and Keppler 1988): Approximately 2,000 in 2014 (Web of Science).

NJ is a greedy algorithm that builds trees by iterative agglomeration of taxa. Gascuel and Steel (2006) showed that the criterion being minimized by NJ is the balanced version of minimum evolution (BME), which estimates the tree length using Pauplin’s formula (2000). We proposed fast, BME-based algorithms (Desper and Gascuel 2002, 2004) to 1) construct an initial tree using greedy taxon insertion and 2) perform topological moves, namely Nearest Neighbor Interchanges (NNIs), to improve an initial (e.g., NJ) tree. These algorithms were implemented in FastME 1.0 and were shown to improve accuracy substantially in comparison to NJ’s (e.g., Vinh and von Haeseler 2005), while having a similar computational cost. A related NNI-based approach, using profiles of ancestral sequences instead of a distance matrix, was proposed by Price et al. (2009) and implemented in FastTree1. FastME has been developed over the past several years:

Table 1.

Substitution Models and Algorithms Available in FastME 2.0.

Models
Target Method
DNA p-distance General Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)
Protein p-distance General Analytical formula
F81-like General Analytical formula
LG (Le, MBE, 2008) General ML estimation
WAG (Whelan, MBE, 2001) General ML estimation
JTT (Jones, CABIOS, 1992) General ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978) General ML estimation
DCMut (Kosiol, MBE, 2004) General ML estimation
CpRev (Adachi, J. Mol. Evol., 2000) Chloroplast ML estimation
MtREV (Adachi, J. Mol. Evol., 1996) Mitochondria ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002) Retrovirus ML estimation
HIVb/w (Nickle, PLoS One, 2007) HIV ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010) Flu ML estimation
Models
Target Method
DNA p-distance General Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)
Protein p-distance General Analytical formula
F81-like General Analytical formula
LG (Le, MBE, 2008) General ML estimation
WAG (Whelan, MBE, 2001) General ML estimation
JTT (Jones, CABIOS, 1992) General ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978) General ML estimation
DCMut (Kosiol, MBE, 2004) General ML estimation
CpRev (Adachi, J. Mol. Evol., 2000) Chloroplast ML estimation
MtREV (Adachi, J. Mol. Evol., 1996) Mitochondria ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002) Retrovirus ML estimation
HIVb/w (Nickle, PLoS One, 2007) HIV ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010) Flu ML estimation
Algorithms
Optimization Criterion Method and Complexity
First tree BME (Desper, J. Comp. Biol., 2002) BME Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002) OLSME Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987) BME Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997) OLSME Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997) Agglomerative O(_n_3)
Topo. moves BNNI (Desper, J. Comp. Biol., 2002) BME NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002) OLSME NNI O(_kn_2)
SPR BME SPR O(_kn_2)
Algorithms
Optimization Criterion Method and Complexity
First tree BME (Desper, J. Comp. Biol., 2002) BME Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002) OLSME Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987) BME Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997) OLSME Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997) Agglomerative O(_n_3)
Topo. moves BNNI (Desper, J. Comp. Biol., 2002) BME NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002) OLSME NNI O(_kn_2)
SPR BME SPR O(_kn_2)

Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n.

Table 1.

Substitution Models and Algorithms Available in FastME 2.0.

Models
Target Method
DNA p-distance General Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)
Protein p-distance General Analytical formula
F81-like General Analytical formula
LG (Le, MBE, 2008) General ML estimation
WAG (Whelan, MBE, 2001) General ML estimation
JTT (Jones, CABIOS, 1992) General ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978) General ML estimation
DCMut (Kosiol, MBE, 2004) General ML estimation
CpRev (Adachi, J. Mol. Evol., 2000) Chloroplast ML estimation
MtREV (Adachi, J. Mol. Evol., 1996) Mitochondria ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002) Retrovirus ML estimation
HIVb/w (Nickle, PLoS One, 2007) HIV ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010) Flu ML estimation
Models
Target Method
DNA p-distance General Analytical formula
RY symmetric
RY
JC69 (Jukes, Mam. Prot. Metab., 1969)
K2P (Kimura, J. Mol. Evol., 1980)
F81 (Felsenstein, J. Mol. Evol., 1981)
F84 (Felsenstein, Evolution, 1984)
TN93 (Tamura, MBE, 1993)
LogDet (Lockhart, MBE, 1994)
Protein p-distance General Analytical formula
F81-like General Analytical formula
LG (Le, MBE, 2008) General ML estimation
WAG (Whelan, MBE, 2001) General ML estimation
JTT (Jones, CABIOS, 1992) General ML estimation
Dayhoff (Dayhoff, A. Prot. Seq. Struct., 1978) General ML estimation
DCMut (Kosiol, MBE, 2004) General ML estimation
CpRev (Adachi, J. Mol. Evol., 2000) Chloroplast ML estimation
MtREV (Adachi, J. Mol. Evol., 1996) Mitochondria ML estimation
RtREV (Dimmic, J. Mol. Evol., 2002) Retrovirus ML estimation
HIVb/w (Nickle, PLoS One, 2007) HIV ML estimation
FLU (Dang et al., BMC Evol. Biol., 2010) Flu ML estimation
Algorithms
Optimization Criterion Method and Complexity
First tree BME (Desper, J. Comp. Biol., 2002) BME Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002) OLSME Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987) BME Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997) OLSME Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997) Agglomerative O(_n_3)
Topo. moves BNNI (Desper, J. Comp. Biol., 2002) BME NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002) OLSME NNI O(_kn_2)
SPR BME SPR O(_kn_2)
Algorithms
Optimization Criterion Method and Complexity
First tree BME (Desper, J. Comp. Biol., 2002) BME Taxon addition O(_n_2)
GME (Desper, J. Comp. Biol., 2002) OLSME Taxon addition O(_n_2)
NJ (Saitou, MBE, 1987) BME Agglomerative O(_n_3)
UNJ (Gascuel, Math. Hierarchies & Biol., 1997) OLSME Agglomerative O(_n_3)
BioNJ (Gascuel, MBE, 1997) Agglomerative O(_n_3)
Topo. moves BNNI (Desper, J. Comp. Biol., 2002) BME NNI O(_kn_2)
FASTNNI (Desper, J. Comp. Biol., 2002) OLSME NNI O(_kn_2)
SPR BME SPR O(_kn_2)

Note.—All models (except p-distance and LogDet) can be used with a continuous gamma distribution of rates across sites with user-defined parameter (typically 1.0). We distinguish models where a fast analytical formula is available to estimate evolutionary distances, from those (slower) requiring maximization of the likelihood function. For algorithms, we distinguish 1) the criterion being optimized (BME or OLSME) and 2) the construction of a first tree (using iterative taxon addition, or the agglomerative [NJ] scheme) versus the improvement of this initial tree using topological moves (NNIs or SPRs). We display worst case time complexities (as usual); n is the number of taxa and k the number of iterations. With NNIs, k is usually similar to n. With SPRs, k is usually much smaller than n.

FastME 2.0 is thus a comprehensive program, including all required tools (numerous algorithms, distance estimation with various models, bootstrapping) to infer phylogenies using a distance approach. Source code, binaries, Web server, user guide, examples, benchmark data sets, etc., are available from http://www.atgc-montpellier.fr/fastme/ (last accessed July 14, 2015).

Acknowledgment

This research was supported by the Institut Français de Bioinformatique (RENABI-IFB, Investissements d’Avenir).

References

.

2002

.

Fast and accurate phylogeny reconstruction algorithms based on the minimum-evolution principle

.

J Comp Biol.

9

:

687

705

.

.

2004

.

Theoretical foundations of the balanced minimum evolution method of phylogenetic inference and its relationship to weighted least-squares tree fitting

.

Mol Biol Evol.

21

:

587

-

598

.

.

2006

Neighbor-joining revealed

.

Mol Biol Evol.

23

:

1997

2000

.

.

2010

.

New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0

.

Syst Biol.

59

:

307

321

.

.

1994

.

A simulation comparison of phylogeny algorithms under equal and unequal evolutionary rates

.

Mol Biol Evol.

11

:

459

-

468

.

.

2000

.

Direct calculation of a tree length using a distance matrix

.

J Mol Evol.

51

:

41

47

.

.

2009

FastTree: computing large minimum evolution trees with profiles instead of a distance matrix

.

Mol Biol Evol.

26

:

1641

-

1650

.

.

1993

.

Theoretical foundation of the minimum-evolution method of phylogenetic inference

.

Mol Biol Evol.

10

:

1073

-

1095

.

.

1987

The neighbor-joining method: a new method for reconstruction of phylogenetic trees

.

Mol Biol Evol.

4

:

406

-

425

.

.

1988

.

A note on the neighbor-joining algorithm of Saitou and Nei

.

Mol Biol Evol.

5

:

729

-

731

.

.

2005

.

Shortest triplet clustering: reconstructing large phylogenies using representative sets

.

BMC Bioinformatics

6

:

92

Author notes

Associate editor: Michael Rosenberg

© The Author 2015. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution.

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 14,313

11,210 Pageviews

3,103 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 1
December 2016 1
January 2017 4
February 2017 17
March 2017 70
April 2017 33
May 2017 29
June 2017 28
July 2017 43
August 2017 32
September 2017 31
October 2017 32
November 2017 44
December 2017 73
January 2018 109
February 2018 76
March 2018 90
April 2018 124
May 2018 113
June 2018 102
July 2018 104
August 2018 98
September 2018 65
October 2018 99
November 2018 65
December 2018 79
January 2019 70
February 2019 77
March 2019 108
April 2019 92
May 2019 119
June 2019 123
July 2019 113
August 2019 98
September 2019 95
October 2019 122
November 2019 146
December 2019 109
January 2020 100
February 2020 101
March 2020 100
April 2020 89
May 2020 121
June 2020 115
July 2020 117
August 2020 109
September 2020 157
October 2020 143
November 2020 143
December 2020 158
January 2021 186
February 2021 184
March 2021 235
April 2021 243
May 2021 162
June 2021 155
July 2021 169
August 2021 159
September 2021 206
October 2021 291
November 2021 206
December 2021 147
January 2022 159
February 2022 162
March 2022 241
April 2022 211
May 2022 267
June 2022 184
July 2022 200
August 2022 169
September 2022 187
October 2022 245
November 2022 192
December 2022 189
January 2023 197
February 2023 185
March 2023 313
April 2023 266
May 2023 230
June 2023 172
July 2023 219
August 2023 199
September 2023 165
October 2023 338
November 2023 162
December 2023 248
January 2024 264
February 2024 204
March 2024 262
April 2024 224
May 2024 264
June 2024 197
July 2024 300
August 2024 254
September 2024 232
October 2024 257
November 2024 125

Citations

966 Web of Science

×

Email alerts

Email alerts

Citing articles via

More from Oxford Academic