A Subdivision Approach to Maximum Parsimony (original) (raw)
Related papers
On Computing the Maximum Parsimony Score of a Phylogenetic Network
SIAM Journal on Discrete Mathematics, 2015
Phylogenetic networks are used to display the relationship of different species whose evolution is not treelike, which is the case, for instance, in the presence of hybridization events or horizontal gene transfers. Tree inference methods such as Maximum Parsimony need to be modified in order to be applicable to networks. In this paper, we discuss two different definitions of Maximum Parsimony on networks, "hardwired" and "softwired", and examine the complexity of computing them given a network topology and a character. By exploiting a link with the problem Multicut, we show that computing the hardwired parsimony score for 2-state characters is polynomial-time solvable, while for characters with more states this problem becomes NP-hard but is still approximable and fixed parameter tractable in the parsimony score. On the other hand we show that, for the softwired definition, obtaining even weak approximation guarantees is already difficult for binary characters and restricted network topologies, and fixed-parameter tractable algorithms in the parsimony score are unlikely. On the positive side we show that computing the softwired parsimony score is fixed-parameter tractable in the level of the network, a natural parameter describing how tangled reticulate activity is in the network. Finally, we show that both the hardwired and softwired parsimony score can be computed efficiently using Integer Linear Programming. The software has been made freely available.
Refining Phylogenetic Trees Given Additional Data: An Algorithm Based on Parsimony
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2008
Given a set X of taxa, a phylogenetic X-tree T that is only partially resolved, and a collection of characters on X, we consider the problem of finding a resolution (refinement) of T that minimizes the parsimony score of the given characters. Previous work has shown that this problem has a polynomial time solution provided certain strong constraints are imposed on the input. In this paper, we provide a new algorithm for this problem and show that it is fixed-parameter tractable under more general conditions.
Lecture Notes in Computer Science, 2007
Phylogenies play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks, rather than trees, of relationships. In a series of papers, we have extended the maximum parsimony (MP) criterion to phylogenetic networks, demonstrated its appropriateness, and established the intractability of the problem of scoring the parsimony of a phylogenetic network. In this work we show the hardness of approximation for the general case of the problem, devise a very fast (linear-time) heuristic algorithm for it, and implement it on simulated as well as biological data.
2007
Phylogenies play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks, rather than trees, of relationships. In a series of papers, we have extended the maximum parsimony (MP) criterion to phylogenetic networks, demonstrated its appropriateness, and established the intractability of the problem of scoring the parsimony of a phylogenetic network. In this work we show the hardness of approximation for the general case of the problem, devise a very fast (linear-time) heuristic algorithm for it, and implement it on simulated as well as biological data.
Parsimony Score of Phylogenetic Networks: Hardness Results and a Linear-Time Heuristic
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2000
Phylogenies-the evolutionary histories of groups of organisms-play a major role in representing the interrelationships among biological entities. Many methods for reconstructing and studying such phylogenies have been proposed, almost all of which assume that the underlying history of a given set of species can be represented by a binary tree. Although many biological processes can be effectively modeled and summarized in this fashion, others cannot: recombination, hybrid speciation, and horizontal gene transfer result in networks of relationships rather than trees of relationships. In previous works, we formulated a maximum parsimony (MP) criterion for reconstructing and evaluating phylogenetic networks, and demonstrated its quality on biological as well as synthetic data sets. In this paper, we provide further theoretical results as well as a very fast heuristic algorithm for the MP criterion of phylogenetic networks. In particular, we provide a novel combinatorial definition of phylogenetic networks in terms of "forbidden cycles," and provide detailed hardness and hardness of approximation proofs for the "small" MP problem. We demonstrate the performance of our heuristic in terms of time and accuracy on both biological and synthetic data sets. Finally, we explain the difference between our model and a similar one formulated by Nguyen et al., and describe the implications of this difference on the hardness and approximation results.
Strategies for Phylogenetic Reconstruction - For the Maximum Parsimony Problem
Proceedings of the 9th International Joint Conference on Biomedical Engineering Systems and Technologies, 2016
The phylogenetic reconstruction is considered a central underpinning of diverse field of biology like: ecology, molecular biology and physiology. The main example is modeling patterns and processes of evolution. Maximum Parsimony (MP) is an important approach to solve the phylogenetic reconstruction by minimizing the total number of genetic transformations, under this approach different metaheuristics have been implemented like tabu search, genetic and memetic algorithms to cope with the combinatorial nature of the problem. In this paper we review different strategies that could be added to existing implementations to improve their efficiency and accuracy. First we present two different techniques to evaluate the objective function by using CPU and GPU technology, then we show a Path-Relinking implementation to compare tree topologies and finally we introduces the application of these techniques in a Simulated Annealing algorithm looking for an optimal solution.
Maximum Parsimony on Phylogenetic networks
Algorithms for Molecular Biology: AMB, 2012
Background: Phylogenetic networks are generalizations of phylogenetic trees, that are used to model evolutionary events in various contexts. Several different methods and criteria have been introduced for reconstructing phylogenetic trees. Maximum Parsimony is a character-based approach that infers a phylogenetic tree by minimizing the total number of evolutionary steps required to explain a given set of data assigned on the leaves. Exact solutions for optimizing parsimony scores on phylogenetic trees have been introduced in the past. Results: In this paper, we define the parsimony score on networks as the sum of the substitution costs along all the edges of the network; and show that certain well-known algorithms that calculate the optimum parsimony score on trees, such as Sankoff and Fitch algorithms extend naturally for networks, barring conflicting assignments at the reticulate vertices. We provide heuristics for finding the optimum parsimony scores on networks. Our algorithms can be applied for any cost matrix that may contain unequal substitution costs of transforming between different characters along different edges of the network. We analyzed this for experimental data on 10 leaves or fewer with at most 2 reticulations and found that for almost all networks, the bounds returned by the heuristics matched with the exhaustively determined optimum parsimony scores. Conclusion: The parsimony score we define here does not directly reflect the cost of the best tree in the network that displays the evolution of the character. However, when searching for the most parsimonious network that describes a collection of characters, it becomes necessary to add additional cost considerations to prefer simpler structures, such as trees over networks. The parsimony score on a network that we describe here takes into account the substitution costs along the additional edges incident on each reticulate vertex, in addition to the substitution costs along the other edges which are common to all the branching patterns introduced by the reticulate vertices. Thus the score contains an in-built cost for the number of reticulate vertices in the network, and would provide a criterion that is comparable among all networks. Although the problem of finding the parsimony score on the network is believed to be computationally hard to solve, heuristics such as the ones described here would be beneficial in our efforts to find a most parsimonious network.
Efficient parsimony-based methods for phylogenetic network reconstruction
Bioinformatics, 2007
Motivation: Phylogenies-the evolutionary histories of groups of organisms-play a major role in representing relationships among biological entities. Although many biological processes can be effectively modeled as tree-like relationships, others, such as hybrid speciation, and horizontal gene transfer (HGT) result in networks, rather than trees, of relationships. Hybrid speciation is a significant evolutionary mechanism in plants, fish, and other groups of species. HGT plays a major role in bacterial genome diversification, and is a significant mechanism by which bacteria develop resistance to antibiotics. Maximum parsimony (MP) is one of the most commonly used criteria for phylogenetic tree inference. Roughly speaking, inference based on this criterion seeks the tree that minimizes the amount of evolution. In 1990, Jotun Hein proposed using this criterion for inferring the evolution of sequences subject to recombination. Preliminary results on small synthetic data sets (Nakhleh et al., 2005) demonstrated the criterion's application to phylogenetic network reconstruction in general, and HGT detection in particular. However, the naive algorithms used by the authors are inapplicable to large data sets due to their demanding computational requirements. Further, no rigorous theoretical analysis of computing the criterion was given, nor was it tested on biological data. Results: In this work, we prove that the problem of scoring the parsimony of a phylogenetic network is NP-hard, and provide an improved fixed parameter tractable algorithm for it. Further, we devise efficient heuristics for parsimony-based reconstruction of phylogenetic networks. We test our methods on both synthetic and biological data (rbcL gene in bacteria), and obtain very promising results.
Inferring Phylogenetic Networks by the Maximum Parsimony Criterion: A Case Study
Molecular Biology and Evolution, 2006
Horizontal gene transfer (HGT) may result in genes whose evolutionary histories disagree with each other, as well as with the species tree. In this case, reconciling the species and gene trees results in a network of relationships, known as the phylogenetic network of the set of species. A phylogenetic network that incorporates HGT consists of an underlying species tree that captures vertical inheritance, and a set of edges which model the "horizontal" transfer of genetic material. In a series of papers, Nakhleh and colleagues have recently formulated a maximum parsimony criterion for phylogenetic networks, provided an array of computationally efficient algorithms and heuristics for computing it, and demonstrated its plausibility on simulated data.
A memetic algorithm for phylogenetic reconstruction with maximum parsimony
… , Machine Learning and Data Mining in …, 2009
The Maximum Parsimony problem aims at reconstructing a phylogenetic tree from DNA, RNA or protein sequences while minimizing the number of evolutionary changes. Much work has been devoted by the research community to solve this NP-complete problem and many algorithms and techniques have been devised in order to find high quality solution with reasonable computational resources. In this paper we present a memetic algorithm (implemented in the software Hydra) which is based on an integration of an effective local search operator with a specific topological tree crossover operator. We report computational results on a set of 12 benchmark instances from the literature and demonstrate its effectiveness with respect to one of the most powerful software (TNT). We also study the behavior of the algorithm with respect to some fundamental ingredients.