ML or NJ-MCL? A comparison between two robust phylogenetic methods (original) (raw)

Large-scale gene sequencing gives an opportunity to reconstruct the tree of life and histories of multigene species phylogenies from very large datasets. A primary need for reconstructing large-scale phylogenies is a computationally efficient and accurate method. Current efforts to achieve such a goal include NJ-MCL 2 described by Tamura et al. (2004; 2007), an algorithm based on maximum likelihood (ML) and neighbor joining (NJ) algorithms. Although it has been reported that the NJ-MCL method performs better than the NJ method, studies comparing the accuracy of the ML and NJ-MCL methods are lacking. Here, accuracy of the NJ-MCL and the ML methods are examined. The concatenation approach (by progressive addition of genes) is used in a biologically realistic computer simulation to infer the accuracy of the methods. Simulation results clearly show that although NJ-MCL is computationally efficient and outperforms NJ method, the ML method is clearly much more accurate than the NJ-MCL method. The results encourage the use of the ML algorithm where datasets include up to several hundred species, but for reconstructing grand-scale phylogenies (i.e., where several thousand of taxa are included) NJ-MCL is preferred.