David Bryant - Academia.edu (original) (raw)
Papers by David Bryant
Computational Biology, 2013
Proceedings of the third annual international conference on Computational molecular biology - RECOMB '99, 1999
We present fast new algorithms for phylogenetic reconstruction from distance data or weighted qua... more We present fast new algorithms for phylogenetic reconstruction from distance data or weighted quartets. The methods are conservative-they will only return edges that are well supported by the input data. This approach is not only philosophically attractive; the conservative tree estimate can be used as a basis for further tree refinement or divide and conquer algorithms. The capability to process quartet data allows these algorithms to be used in tandem with ordinal or qualitative phylogenetic analysis methods. We provide algorithms for three standard conservative phylogenetic constructions: the Buneman tree, the Refined Buneman tree, and split decomposition. We introduce and exploit combinatorial formalisms involving trees, quartets, and splits, and make particular use of an attractive duality between unrooted trees, splits, and dissimilarities on one hand, and rooted trees, clusters, and similarity measures on the other. Using these techniques, we achieve O(n) improvements in the time complexity of the best previously published algorithms (where n is the number of studied species). Our algorithms will be included in the next edition of the popular Splitslkee software package.
Lecture Notes in Computer Science, 2001
An agreement supertree of a collection of unrooted phylogenetic trees {T1, T2, . . . , T k } with... more An agreement supertree of a collection of unrooted phylogenetic trees {T1, T2, . . . , T k } with leaf sets L(T1), L(T2), . . . , L(T k ) is an unrooted tree T with leaf set L(T1) ∪ • • • ∪ L(T k ) such that each tree Ti is an induced subtree of T . In some cases, there may be no possible agreement supertrees of a set of trees, in other cases there may be exponentially many. We present polynomial time algorithms for computing an optimal agreement supertree, if one exists, of a bounded number of binary trees. The criteria of optimality can be one of four standard phylogenetic criteria: binary character compatibility; maximum summed quartet weight; ordinary least squares; and minimum evolution. The techniques can be used to search an exponentially large number of trees in polynomial time.
Systematic Biology, 2005
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differ... more In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected.
Philosophical Transactions of the Royal Society B: Biological Sciences, 2010
In this paper we outline two debates about the nature of human cultural history. The first focuse... more In this paper we outline two debates about the nature of human cultural history. The first focuses on the extent to which human history is tree-like (its shape), and the second on the unity of that history (its fabric). Proponents of cultural phylogenetics are often accused of assuming that human history has been both highly tree-like and consisting of tightly linked lineages. Critics have pointed out obvious exceptions to these assumptions. Instead of a priori dichotomous disputes about the validity of cultural phylogenetics, we suggest that the debate is better conceptualized as involving positions along continuous dimensions. The challenge for empirical research is, therefore, to determine where particular aspects of culture lie on these dimensions. We discuss the ability of current computational methods derived from evolutionary biology to address these questions. These methods are then used to compare the extent to which lexical evolution is tree-like in different parts of the ...
Molecular Biology and Evolution, 2004
Analyses of 55 individual and 31 concatenated protein data sets encoded in Reclinomonas americana... more Analyses of 55 individual and 31 concatenated protein data sets encoded in Reclinomonas americana and Marchantia polymorpha mitochondrial genomes revealed that current methods for constructing phylogenetic trees are insufficiently sensitive (or artifact-insensitive) to ascertain the sister of mitochondria among the current sample of eight aproteobacterial genomes using mitochondrially-encoded proteins. However, Rhodospirillum rubrum came as close to mitochondria as any a-proteobacterium investigated. This prompted a search for methods to directly compare eukaryotic genomes to their prokaryotic counterparts to investigate the origin of the mitochondrion and its host from the standpoint of nuclear genes. We examined pairwise amino acid sequence identity in comparisons of 6,214 nuclear protein-coding genes from Saccharomyces cerevisiae to 177,117 proteins encoded in sequenced genomes from 45 eubacteria and 15 archaebacteria. The results reveal that ;75% of yeast genes having homologues among the present prokaryotic sample share greater amino acid sequence identity to eubacterial than to archaebacterial homologues. At high stringency comparisons, only the eubacterial component of the yeast genome is detectable. Our findings indicate that at the levels of overall amino acid sequence identity and gene content, yeast shares a sister-group relationship with eubacteria, not with archaebacteria, in contrast to the current phylogenetic paradigm based on ribosomal RNA. Among eubacteria and archaebacteria, proteobacterial and methanogen genomes, respectively, shared more similarity with the yeast genome than other prokaryotic genomes surveyed.
Molecular Biology and Evolution, 1998
We present fast new algorithms for evaluating trees with respect to least squares and minimum evo... more We present fast new algorithms for evaluating trees with respect to least squares and minimum evolution (ME), the most commonly used criteria for inferring phylogenetic trees from distance data. The new algorithms include an optimal O(N 2 ) time algorithm for calculating the edge (branch or internode) lengths on a tree according to ordinary or unweighted least squares (OLS); an O(N 3 ) time algorithm for edge lengths under weighted least squares (WLS) including the Fitch-Margoliash method; and an optimal O(N 4 ) time algorithm for generalized least-squares (GLS) edge lengths (where N is the number of taxa in the tree). The ME criterion is based on the sum of edge lengths. Consequently, the edge lengths algorithms presented here lead directly to O(N 2 ), O(N 3 ), and O(N 4 ) time algorithms for ME under OLS, WLS, and GLS, respectively. All of these algorithms are as fast as or faster than any of those previously published, and the algorithms for OLS and GLS are the fastest possible (with respect to order of computational complexity). A major advantage of our new methods is that they are as well adapted to multifurcating trees as they are to binary trees. An optimal algorithm for determining path lengths from a tree with given edge lengths is also developed. This leads to an optimal O(N 2 ) algorithm for OLS sums of squares evaluation and corresponding O(N 3 ) and O(N 4 ) time algorithms for WLS and GLS sums of squares, respectively. The GLS algorithm is time-optimal if the covariance matrix is already inverted. The speed of each algorithm is assessed analytically-the speed increases we calculate are confirmed by the dramatic speed increases resulting from their implementation in PAUP* 4.0. The new algorithms enable far more extensive tree searches and statistical evaluations (e.g., bootstrap, parametric bootstrap, or jackknife) in the same amount of time. Hopefully, the fast algorithms for WLS and GLS will encourage the use of these criteria for evaluating trees and their edge lengths (e.g., for approximate divergence time estimates), since they should be more statistically efficient than OLS.
Journal of Discrete Algorithms, 2004
Breakpoint phylogenies methods have been shown to be an effective tool for extracting phylogeneti... more Breakpoint phylogenies methods have been shown to be an effective tool for extracting phylogenetic information from gene order data. Currently, the only practical breakpoint phylogeny algorithms for the analysis of large genomes with varied gene content are heuristics with no optimality guarantee. Here we begin to address this lack by deriving lower bounds for the breakpoint median problem and for the more complicated breakpoint phylogeny problem. In both cases we employ Lagrange multipliers and sub-gradient optimization to tighten the bounds. The bounds have been implemented and are available as part of the GOTREE package ().
Journal of Computational Biology, 2000
Journal of Classification, 2005
The Neighbor-Joining (NJ) method of Saitou and Nei is the most widely used distance based method ... more The Neighbor-Joining (NJ) method of Saitou and Nei is the most widely used distance based method in phylogenetic analysis. Central to the method is the selection criterion, the formula used to choose which pair of objects to amalgamate next. Here we analyze the NJ selection criterion using an axiomatic approach. We show that any selection criterion that is linear, permutation equivariant, statistically consistent and based solely on distance data will give the same trees as those created by NJ.
European Journal of Combinatorics, 2007
An important procedure in the mathematics of phylogenetic analysis is to associate, to any collec... more An important procedure in the mathematics of phylogenetic analysis is to associate, to any collection of weighted splits, the metric given by the corresponding linear combination of split metrics. In this note, we study necessary and sufficient conditions for a collection of splits of a given finite set X to give rise to a linearly independent collection of split metrics. In addition, we study collections of splits called affine split systems induced by a configurations of lines and points in the plane. These systems not only satisfy the linear-independence condition, but also provide a Z-basis of the Z-lattice D even (X | Z) consisting of all integer-valued symmetric maps D : X × X → Z defined on X that vanish on the diagonal and for which, in addition, D(x, y) + D(y, z) + D(z, x) ≡ 0 mod 2 holds for all x, y, z ∈ X. This Z-lattice is generated by all split metrics considered as vectors in the real vectorspace D(X | R) consisting of all real-valued symmetric maps defined on X that vanish on the diagonal -and, hence, is also an R-basis of that vectorspace.
Biophysical Journal, 2011
Ion channels are characterized by inherently stochastic behavior which can be represented by cont... more Ion channels are characterized by inherently stochastic behavior which can be represented by continuous-time Markov models (CTMM). Although methods for collecting data from single ion channels are available, translating a time series of open and closed channels to a CTMM remains a challenge. Bayesian statistics combined with Markov chain Monte Carlo (MCMC) sampling provide means for estimating the rate constants of a CTMM directly from single channel data. In this article, different approaches for the MCMC sampling of Markov models are combined. This method, new to our knowledge, detects overparameterizations and gives more accurate results than existing MCMC methods. It shows similar performance as QuB-MIL, which indicates that it also compares well with maximum likelihood estimators. Data collected from an inositol trisphosphate receptor is used to demonstrate how the best model for a given data set can be found in practice.
Applied Mathematics Letters, 1999
We present a polynomial time algorithm for computing the refined Buneman tree, thereby making it ... more We present a polynomial time algorithm for computing the refined Buneman tree, thereby making it applicable for tree reconstruction on large data sets. The refined Buneman tree retains many of the desirable properties of its predecessor, the well known Buneman tree, but has the practical advantage that it is typically more refined.
Annals of Combinatorics, 2008
Determining an optimal phylogenetic tree using maximum parsimony, also referred to as the Steiner... more Determining an optimal phylogenetic tree using maximum parsimony, also referred to as the Steiner tree problem in phylogenetics, is NP hard. Here we provide a new formulation for this problem which leads to an analytical and linear time solution when the dimensionality (sequence length, or number of characters) is at most two. This new formulation of the problem provides a direct link between the maximum parsimony problem and the maximum compatibility problem via the intersection graph. The solution for the "two character case" has numerous practical applications in phylogenetics, some of which are discussed.
Bioconsensus, 2003
A consensus tree method takes a collection of phylogenetic trees and outputs a single "representa... more A consensus tree method takes a collection of phylogenetic trees and outputs a single "representative" tree. The first consensus method was proposed by Adams in 1972. Since then a large variety of different methods have been developed, and there has been considerable debate over how they should be used. This paper has two goals. First, we survey the main consensus tree methods used in phylogenetics. Second, we explore, pretty exhaustively, the links between the different methods, producing a classification of consensus tree methods.
Computational Biology, 2013
Proceedings of the third annual international conference on Computational molecular biology - RECOMB '99, 1999
We present fast new algorithms for phylogenetic reconstruction from distance data or weighted qua... more We present fast new algorithms for phylogenetic reconstruction from distance data or weighted quartets. The methods are conservative-they will only return edges that are well supported by the input data. This approach is not only philosophically attractive; the conservative tree estimate can be used as a basis for further tree refinement or divide and conquer algorithms. The capability to process quartet data allows these algorithms to be used in tandem with ordinal or qualitative phylogenetic analysis methods. We provide algorithms for three standard conservative phylogenetic constructions: the Buneman tree, the Refined Buneman tree, and split decomposition. We introduce and exploit combinatorial formalisms involving trees, quartets, and splits, and make particular use of an attractive duality between unrooted trees, splits, and dissimilarities on one hand, and rooted trees, clusters, and similarity measures on the other. Using these techniques, we achieve O(n) improvements in the time complexity of the best previously published algorithms (where n is the number of studied species). Our algorithms will be included in the next edition of the popular Splitslkee software package.
Lecture Notes in Computer Science, 2001
An agreement supertree of a collection of unrooted phylogenetic trees {T1, T2, . . . , T k } with... more An agreement supertree of a collection of unrooted phylogenetic trees {T1, T2, . . . , T k } with leaf sets L(T1), L(T2), . . . , L(T k ) is an unrooted tree T with leaf set L(T1) ∪ • • • ∪ L(T k ) such that each tree Ti is an induced subtree of T . In some cases, there may be no possible agreement supertrees of a set of trees, in other cases there may be exponentially many. We present polynomial time algorithms for computing an optimal agreement supertree, if one exists, of a bounded number of binary trees. The criteria of optimality can be one of four standard phylogenetic criteria: binary character compatibility; maximum summed quartet weight; ordinary least squares; and minimum evolution. The techniques can be used to search an exponentially large number of trees in polynomial time.
Systematic Biology, 2005
In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differ... more In phylogenetic analyses with combined multigene or multiprotein data sets, accounting for differing evolutionary dynamics at different loci is essential for accurate tree prediction. Existing maximum likelihood (ML) and Bayesian approaches are computationally intensive. We present an alternative approach that is orders of magnitude faster. The method, Distance Rates (DistR), estimates rates based upon distances derived from gene/protein sequence data. Simulation studies indicate that this technique is accurate compared with other methods and robust to missing sequence data. The DistR method was applied to a fungal mitochondrial data set, and the rate estimates compared well to those obtained using existing ML and Bayesian approaches. Inclusion of the protein rates estimated from the DistR method into the ML calculation of trees as a branch length multiplier resulted in a significantly improved fit as measured by the Akaike Information Criterion (AIC). Furthermore, bootstrap support for the ML topology was significantly greater when protein rates were used, and some evident errors in the concatenated ML tree topology (i.e., without protein rates) were corrected.
Philosophical Transactions of the Royal Society B: Biological Sciences, 2010
In this paper we outline two debates about the nature of human cultural history. The first focuse... more In this paper we outline two debates about the nature of human cultural history. The first focuses on the extent to which human history is tree-like (its shape), and the second on the unity of that history (its fabric). Proponents of cultural phylogenetics are often accused of assuming that human history has been both highly tree-like and consisting of tightly linked lineages. Critics have pointed out obvious exceptions to these assumptions. Instead of a priori dichotomous disputes about the validity of cultural phylogenetics, we suggest that the debate is better conceptualized as involving positions along continuous dimensions. The challenge for empirical research is, therefore, to determine where particular aspects of culture lie on these dimensions. We discuss the ability of current computational methods derived from evolutionary biology to address these questions. These methods are then used to compare the extent to which lexical evolution is tree-like in different parts of the ...
Molecular Biology and Evolution, 2004
Analyses of 55 individual and 31 concatenated protein data sets encoded in Reclinomonas americana... more Analyses of 55 individual and 31 concatenated protein data sets encoded in Reclinomonas americana and Marchantia polymorpha mitochondrial genomes revealed that current methods for constructing phylogenetic trees are insufficiently sensitive (or artifact-insensitive) to ascertain the sister of mitochondria among the current sample of eight aproteobacterial genomes using mitochondrially-encoded proteins. However, Rhodospirillum rubrum came as close to mitochondria as any a-proteobacterium investigated. This prompted a search for methods to directly compare eukaryotic genomes to their prokaryotic counterparts to investigate the origin of the mitochondrion and its host from the standpoint of nuclear genes. We examined pairwise amino acid sequence identity in comparisons of 6,214 nuclear protein-coding genes from Saccharomyces cerevisiae to 177,117 proteins encoded in sequenced genomes from 45 eubacteria and 15 archaebacteria. The results reveal that ;75% of yeast genes having homologues among the present prokaryotic sample share greater amino acid sequence identity to eubacterial than to archaebacterial homologues. At high stringency comparisons, only the eubacterial component of the yeast genome is detectable. Our findings indicate that at the levels of overall amino acid sequence identity and gene content, yeast shares a sister-group relationship with eubacteria, not with archaebacteria, in contrast to the current phylogenetic paradigm based on ribosomal RNA. Among eubacteria and archaebacteria, proteobacterial and methanogen genomes, respectively, shared more similarity with the yeast genome than other prokaryotic genomes surveyed.
Molecular Biology and Evolution, 1998
We present fast new algorithms for evaluating trees with respect to least squares and minimum evo... more We present fast new algorithms for evaluating trees with respect to least squares and minimum evolution (ME), the most commonly used criteria for inferring phylogenetic trees from distance data. The new algorithms include an optimal O(N 2 ) time algorithm for calculating the edge (branch or internode) lengths on a tree according to ordinary or unweighted least squares (OLS); an O(N 3 ) time algorithm for edge lengths under weighted least squares (WLS) including the Fitch-Margoliash method; and an optimal O(N 4 ) time algorithm for generalized least-squares (GLS) edge lengths (where N is the number of taxa in the tree). The ME criterion is based on the sum of edge lengths. Consequently, the edge lengths algorithms presented here lead directly to O(N 2 ), O(N 3 ), and O(N 4 ) time algorithms for ME under OLS, WLS, and GLS, respectively. All of these algorithms are as fast as or faster than any of those previously published, and the algorithms for OLS and GLS are the fastest possible (with respect to order of computational complexity). A major advantage of our new methods is that they are as well adapted to multifurcating trees as they are to binary trees. An optimal algorithm for determining path lengths from a tree with given edge lengths is also developed. This leads to an optimal O(N 2 ) algorithm for OLS sums of squares evaluation and corresponding O(N 3 ) and O(N 4 ) time algorithms for WLS and GLS sums of squares, respectively. The GLS algorithm is time-optimal if the covariance matrix is already inverted. The speed of each algorithm is assessed analytically-the speed increases we calculate are confirmed by the dramatic speed increases resulting from their implementation in PAUP* 4.0. The new algorithms enable far more extensive tree searches and statistical evaluations (e.g., bootstrap, parametric bootstrap, or jackknife) in the same amount of time. Hopefully, the fast algorithms for WLS and GLS will encourage the use of these criteria for evaluating trees and their edge lengths (e.g., for approximate divergence time estimates), since they should be more statistically efficient than OLS.
Journal of Discrete Algorithms, 2004
Breakpoint phylogenies methods have been shown to be an effective tool for extracting phylogeneti... more Breakpoint phylogenies methods have been shown to be an effective tool for extracting phylogenetic information from gene order data. Currently, the only practical breakpoint phylogeny algorithms for the analysis of large genomes with varied gene content are heuristics with no optimality guarantee. Here we begin to address this lack by deriving lower bounds for the breakpoint median problem and for the more complicated breakpoint phylogeny problem. In both cases we employ Lagrange multipliers and sub-gradient optimization to tighten the bounds. The bounds have been implemented and are available as part of the GOTREE package ().
Journal of Computational Biology, 2000
Journal of Classification, 2005
The Neighbor-Joining (NJ) method of Saitou and Nei is the most widely used distance based method ... more The Neighbor-Joining (NJ) method of Saitou and Nei is the most widely used distance based method in phylogenetic analysis. Central to the method is the selection criterion, the formula used to choose which pair of objects to amalgamate next. Here we analyze the NJ selection criterion using an axiomatic approach. We show that any selection criterion that is linear, permutation equivariant, statistically consistent and based solely on distance data will give the same trees as those created by NJ.
European Journal of Combinatorics, 2007
An important procedure in the mathematics of phylogenetic analysis is to associate, to any collec... more An important procedure in the mathematics of phylogenetic analysis is to associate, to any collection of weighted splits, the metric given by the corresponding linear combination of split metrics. In this note, we study necessary and sufficient conditions for a collection of splits of a given finite set X to give rise to a linearly independent collection of split metrics. In addition, we study collections of splits called affine split systems induced by a configurations of lines and points in the plane. These systems not only satisfy the linear-independence condition, but also provide a Z-basis of the Z-lattice D even (X | Z) consisting of all integer-valued symmetric maps D : X × X → Z defined on X that vanish on the diagonal and for which, in addition, D(x, y) + D(y, z) + D(z, x) ≡ 0 mod 2 holds for all x, y, z ∈ X. This Z-lattice is generated by all split metrics considered as vectors in the real vectorspace D(X | R) consisting of all real-valued symmetric maps defined on X that vanish on the diagonal -and, hence, is also an R-basis of that vectorspace.
Biophysical Journal, 2011
Ion channels are characterized by inherently stochastic behavior which can be represented by cont... more Ion channels are characterized by inherently stochastic behavior which can be represented by continuous-time Markov models (CTMM). Although methods for collecting data from single ion channels are available, translating a time series of open and closed channels to a CTMM remains a challenge. Bayesian statistics combined with Markov chain Monte Carlo (MCMC) sampling provide means for estimating the rate constants of a CTMM directly from single channel data. In this article, different approaches for the MCMC sampling of Markov models are combined. This method, new to our knowledge, detects overparameterizations and gives more accurate results than existing MCMC methods. It shows similar performance as QuB-MIL, which indicates that it also compares well with maximum likelihood estimators. Data collected from an inositol trisphosphate receptor is used to demonstrate how the best model for a given data set can be found in practice.
Applied Mathematics Letters, 1999
We present a polynomial time algorithm for computing the refined Buneman tree, thereby making it ... more We present a polynomial time algorithm for computing the refined Buneman tree, thereby making it applicable for tree reconstruction on large data sets. The refined Buneman tree retains many of the desirable properties of its predecessor, the well known Buneman tree, but has the practical advantage that it is typically more refined.
Annals of Combinatorics, 2008
Determining an optimal phylogenetic tree using maximum parsimony, also referred to as the Steiner... more Determining an optimal phylogenetic tree using maximum parsimony, also referred to as the Steiner tree problem in phylogenetics, is NP hard. Here we provide a new formulation for this problem which leads to an analytical and linear time solution when the dimensionality (sequence length, or number of characters) is at most two. This new formulation of the problem provides a direct link between the maximum parsimony problem and the maximum compatibility problem via the intersection graph. The solution for the "two character case" has numerous practical applications in phylogenetics, some of which are discussed.
Bioconsensus, 2003
A consensus tree method takes a collection of phylogenetic trees and outputs a single "representa... more A consensus tree method takes a collection of phylogenetic trees and outputs a single "representative" tree. The first consensus method was proposed by Adams in 1972. Since then a large variety of different methods have been developed, and there has been considerable debate over how they should be used. This paper has two goals. First, we survey the main consensus tree methods used in phylogenetics. Second, we explore, pretty exhaustively, the links between the different methods, producing a classification of consensus tree methods.