A practical O (n log2 n) time algorithm for computing the triplet distance on binary trees (original) (raw)

A sub-cubic time algorithm for computing the quartet distance between two general trees

Algorithms for Molecular Biology, 2011

Background: When inferring phylogenetic trees different algorithms may give different trees. To study such effects a measure for the distance between two trees is useful. Quartet distance is one such measure, and is the number of quartet topologies that differ between two trees. Results: We have derived a new algorithm for computing the quartet distance between a pair of general trees, i.e. trees where inner nodes can have any degree ≥ 3. The time and space complexity of our algorithm is sub-cubic in the number of leaves and does not depend on the degree of the inner nodes. This makes it the fastest algorithm so far for computing the quartet distance between general trees independent of the degree of the inner nodes. Conclusions: We have implemented our algorithm and two of the best competitors. Our new algorithm is significantly faster than the competition and seems to run in close to quadratic time in practice.

Fast computation of distances in a tree

HAL (Le Centre pour la Communication Scientifique Directe), 2020

Computation of distances between two submits of a tree is an operation that occurs in some pattern recognition problem. When this operation has to be done thousands of times on millions of trees, the linear standard algorithms in OpN q for each pair may be a bottleneck to the global computation. This note present recursive spliting method with a complexity of OplogpN qq on each pair in worst case, and Op1q in average on all pair, once a pre-computation OpN logpN qq has been done on the whole tree. A commented C++ implementation is published as a companion to this note.

FPT-algorithms for computing Gromov-Hausdorff and interleaving distances between trees

ArXiv, 2019

Gromov-Hausdorff (GH) distance is a natural way to measure the distortion between two metric spaces. However, there has been only limited algorithmic development to compute or approximate this distance. We focus on computing the Gromov-Hausdorff distance between two metric trees. Roughly speaking, a metric tree is a metric space that can be realized by the shortest path metric on a tree. Previously, Agarwal et al. showed that even for trees with unit edge length, it is NP hard to approximate the GH distance between them within a factor of 3. In this paper, we present a fixed-parameter tractable (FPT) algorithm that can approximate the GH distance between two general metric trees within a factor of 14. Interestingly, the development of our algorithm is made possible by a connection between the GH distance for metric trees and the interleaving distance for the so-called merge trees. The merge trees arise in practice naturally as a simple yet meaningful topological summary, and are of ...

Faster Algorithms for Bounded Tree Edit Distance

2021

Tree edit distance is a well-studied measure of dissimilarity between rooted trees with node labels. It can be computed in O(n3) time [Demaine, Mozes, Rossman, and Weimann, ICALP 2007], and fine-grained hardness results suggest that the weighted version of this problem cannot be solved in truly subcubic time unless the APSP conjecture is false [Bringmann, Gawrychowski, Mozes, and Weimann, SODA 2018]. We consider the unweighted version of tree edit distance, where every insertion, deletion, or relabeling operation has unit cost. Given a parameter k as an upper bound on the distance, the previous fastest algorithm for this problem runs in O(nk3) time [Touzet, CPM 2005], which improves upon the cubic-time algorithm for k ≪ n2/3. In this paper, we give a faster algorithm taking O(nk2 log n) time, improving both of the previous results for almost the full range of log n ≪ k ≪ n/ √ log n. 2012 ACM Subject Classification Theory of computation → Pattern matching

Fast Computation of the Tree Edit Distance between Unordered Trees Using IP Solvers

Lecture Notes in Computer Science, 2014

We propose a new method for computing the tree edit distance between two unordered trees by problem encoding. Our method transforms an instance of the computation into an instance of some IP problems and solves it by an efficient IP solver. The tree edit distance is defined as the minimum cost of a sequence of edit operations (either substitution, deletion, or insertion) to transform a tree into another one. Although its time complexity is NP-hard, some encoding techniques have been proposed for computational efficiency. An example is an encoding method using the clique problem. As a new encoding method, we propose to use IP solvers and provide new IP formulations representing the problem of finding the minimum cost mapping between two unordered trees, where the minimum cost exactly coincides with the tree edit distance. There are IP solvers other than that for the clique problem and our method can efficiently compute ariations of the tree edit distance by adding additional constraints. Our experimental results with Glycan datasets and the Web log datasets CSLOGS show that our method is much faster than an existing method if input trees have a large degree. We also show that two variations of the tree edit distance could be computed efficiently by IP solvers.

Comparing trees via crossing minimization

Journal of Computer and System Sciences, 2010

We consider the following problem (and variants thereof) that has important applications in the construction and evaluation of phylogenetic trees: Two rooted unordered binary trees with the same number of leaves have to be embedded in two layers in the plane such that the leaves are aligned in two adjacent layers. Additional matching edges between the leaves give a one-to-one correspondence between pairs of leaves of the different trees. Our goal is to find two planar embeddings of the two trees (drawn without crossings) that minimize the number of crossings of the matching edges. We derive both (classical) complexity results and (parameterized) algorithms for this problem (and some variants thereof). 1

Computing Refined Buneman Trees in Cubic Time

Lecture Notes in Computer Science, 2003

Reconstructing the evolutionary tree for a set of n species based on pairwise distances between the species is a fundamental problem in bioinformatics. Neighbor joining is a popular distance based tree reconstruction method. It always proposes fully resolved binary trees despite missing evidence in the underlying distance data. Distance based methods based on the theory of Buneman trees and refined Buneman trees avoid this problem by only proposing evolutionary trees whose edges satisfy a number of constraints. These trees might not be fully resolved but there is strong combinatorial evidence for each proposed edge. The currently best algorithm for computing the refined Buneman tree from a given distance measure has a running time of O(n 5 ) and a space consumption of O(n 4 ). In this paper, we present an algorithm with running time O(n 3 ) and space consumption O(n 2 ).

Fast algorithms for computing the tripartition-based distance between phylogenetic networks

2007

Consider two phylogenetic networks N and N ′ of size n. The tripartition-based distance finds the proportion of tripartitions which are not shared by N and N ′ . This distance is proposed by and is a generalization of Robinson-Foulds distance, which is orginally used to compare two phylogenetic trees. This paper gives an O(min{kn log n, n log n + hn})-time algorithm to compute this distance, where h is the number of hybrid nodes in N and N ′ while k is the maximum number of hybrid nodes among all biconnected components in N and N ′ . Note that k << h << n in a phylogenetic network. In addition, we propose algorithms for comparing galled-trees, which are an important, biological meaningful special case of phylogenetic network. We give an O(n)-time algorithm for comparing two galled-trees. We also give an O(n + kh)-time algorithm for comparing a galled-tree with another general network, where h and k are the number of hybrid nodes in the latter network and its biggest biconnected component respectively.

Designing an A* Algorithm for Calculating Edit Distance between Rooted-Unordered Trees

Journal of Computational Biology, 2006

Consequently, there is a need to design metrics and algorithms to compare trees. A natural comparison metric is the "Tree Edit Distance," the number of simple edit (insert/delete) operations needed to transform one tree into the other. Rooted-ordered trees, where the order between the siblings is significant, can be compared in polynomial time. Rooted-unordered trees are used to describe processes or objects where the topology, rather than the order or the identity of each node, is important. For example, in immunology, rooted-unordered trees describe the process of immunoglobulin (antibody) gene diversification in the germinal center over time. Comparing such trees has been proven to be a difficult computational problem that belongs to the set of NP-Complete problems. Comparing two trees can be viewed as a search problem in graphs. A * is a search algorithm that explores the search space in an efficient order. Using a good lower bound estimation of the degree of difference between the two trees, A * can reduce search time dramatically. We have designed and implemented a variant of the A * search algorithm suitable for calculating tree edit distance. We show here that A * is able to perform an edit distance measurement in reasonable time for trees with dozens of nodes.