Multiple Structural RNA Alignment with Lagrangian Relaxation (original) (raw)

Fast and Accurate Structural RNA Alignment by Progressive Lagrangian Optimization

Lecture Notes in Computer Science, 2005

During the last few years new functionalities of RNA have been discovered, renewing the need for computational tools for their analysis. To this respect, multiple sequence alignment is an essential step in finding structurally conserved regions in related RNA sequences. In contrast to proteins, many classes of functionally related RNA molecules show a rather weak sequence conservation but instead a fairly well conserved secondary structure. Hence, any method that relates RNA sequences in form of multiple alignments should take structural features into account, which has been verified in recent studies. Progress has been made in developing new structural alignment algorithms, however, current methods are computationally costly or do not have the desired accuracy to make them an everyday tool. In this paper we present a fast, practical, and accurate method for computing multiple, structural RNA alignments. The method is based on combining a new pairwise structural alignment method with the popular program T-Coffee. Our pairwise method is based on an integer linear programming (ILP) formulation resulting from a graph-theoretic reformulation of the structural alignment problem. We find provably optimal or near-optimal solutions of the ILP with a Lagrangian approach. Tests on a recently published benchmark set show that our Lagrangian approach outperforms current programs in quality and in the length of the sequences it can align.

An exact mathematical programming approach to multiple RNA sequence-structure alignment

2009

One of the main tasks in computational biology is the computation of alignments of genomic sequences to reveal their commonalities. In case of DNA or protein sequences, sequence information alone is usually sufficient to compute reliable alignments. RNA molecules, however, build spatial conformations—the secondary structure—that are more conserved than the actual sequence. Hence, computing reliable alignments of RNA molecules has to take into account the secondary structure. We present a novel framework for the computation of exact multiple sequence-structure alignments: We give a graph- theoretic representation of the sequence-structure alignment problem and phrase it as an integer linear program. We identify a class of constraints that make the problem easier to solve and relax the original integer linear program in a Lagrangian manner. Experiments on a recently published benchmark show that our algorithms has a comparable performance than more costly dynamic programming algorithm...

A computational model for RNA multiple structural alignment

2006

This paper addresses the problem of aligning multiple sequences of noncoding RNA (ncRNA) genes. We approach this problem with the biologically motivated paradigm that scoring of ncRNA alignments should be based primarily on secondary structure rather than nucleotide conservation.

Probabilistic structural alignment of RNA sequences

International Conference on Acoustics, Speech, and Signal Processing, 2008

We propose an algorithm for estimating the common secondary structure, alignment, and posterior base pairing probabilities for two RNA sequences. A definition of structural alignment is presented based on a novel concept of matched helical regions that generalizes the common secondary structure and alignment constraints used in prior work. A probabilistic framework for scoring structural alignments is developed based on a pseudo free energy model. Utilizing the model, maximum a posteriori probability estimates of secondary structure and alignment, and a posteriori probabilities for base pairing are computed using an efficient dynamic programming algorithm. Experimental results demonstrate that the proposed method offers significant improvements in structure and alignment prediction accuracy in comparison with single sequence thermodynamic methods for secondary structure prediction and purely sequence based alignment.

Accurate multiple sequence-structure alignment of RNA sequences using combinatorial optimization

BMC Bioinformatics, 2007

Background: The discovery of functional non-coding RNA sequences has led to an increasing interest in algorithms related to RNA analysis. Traditional sequence alignment algorithms, however, fail at computing reliable alignments of low-homology RNA sequences. The spatial conformation of RNA sequences largely determines their function, and therefore RNA alignment algorithms have to take structural information into account. Results: We present a graph-based representation for sequence-structure alignments, which we model as an integer linear program (ILP). We sketch how we compute an optimal or near-optimal solution to the ILP using methods from combinatorial optimization, and present results on a recently published benchmark set for RNA alignments. Conclusion: The implementation of our algorithm yields better alignments in terms of two published scores than the other programs that we tested: This is especially the case with an increasing number of input sequences. Our program LARA is freely available for academic purposes from http://www.planet-lisa.net.

A benchmark of multiple sequence alignment programs upon structural RNAs

Nucleic Acids Research, 2005

To date, few attempts have been made to benchmark alignment algorithms upon nucleic acid sequences. Frequently, sophisticated PAM or BLOSUM like models are used to align proteins, yet equivalents are not considered for nucleic acids; Instead, rather ad hoc models are generally favoured. Here we systematically test the performance of existing alignment algorithms on structural RNAs. The goals of this work are: (1) To determine conditions where it is appropriate to apply common sequence alignment methods to the structural RNA alignment problem. This indicates where and when researchers should consider augmenting the alignment process with auxiliary information, such as secondary structure. (2) To determine which sequence alignment algorithms perform well under the broadest range of conditions. We find that sequence alignment alone, using the current algorithms, is generally inappropriate below 50-60% sequence-identity, secondly we note that the probabilistic method ProAlign and the aging Clustal algorithms generally out-perform other sequence-based algorithms, under the broadest range of applications.

Structural RNA alignment by multi-objective optimization

Bioinformatics, 2013

Motivation: The calculation of reliable alignments for structured RNA is still considered as an open problem. One approach is the incorporation of secondary structure information into the optimization criteria by using a weighted sum of sequence and structure components as an objective function. As it is not clear how to choose the weighting parameters, we use multi-objective optimization to calculate a set of Pareto-optimal RNA sequence-structure alignments. The solutions in this set then represent all possible trade-offs between the different objectives, independent of any previous weighting. Results: We present a practical multi-objective dynamic programming algorithm, which is a new method for the calculation of the set of Pareto-optimal solutions to the pairwise RNA sequence-structure alignment problem. In selected examples, we show the usefulness of this approach, and its advantages over state-of-the-art single-objective algorithms. Availability and implementation: The source ...

Fast Pairwise Structural RNA Alignments by Pruning of the Dynamical Programming Matrix

PLoS Computational Biology, 2007

It has become clear that noncoding RNAs (ncRNA) play important roles in cells, and emerging studies indicate that there might be a large number of unknown ncRNAs in mammalian genomes. There exist computational methods that can be used to search for ncRNAs by comparing sequences from different genomes. One main problem with these methods is their computational complexity, and heuristics are therefore employed. Two heuristics are currently very popular: pre-folding and pre-aligning. However, these heuristics are not ideal, as pre-aligning is dependent on sequence similarity that may not be present and pre-folding ignores the comparative information. Here, pruning of the dynamical programming matrix is presented as an alternative novel heuristic constraint. All subalignments that do not exceed a length-dependent minimum score are discarded as the matrix is filled out, thus giving the advantage of providing the constraints dynamically. This has been included in a new implementation of the FOLDALIGN algorithm for pairwise local or global structural alignment of RNA sequences. It is shown that time and memory requirements are dramatically lowered while overall performance is maintained. Furthermore, a new divide and conquer method is introduced to limit the memory requirement during global alignment and backtrack of local alignment. All branch points in the computed RNA structure are found and used to divide the structure into smaller unbranched segments. Each segment is then realigned and backtracked in a normal fashion. Finally, the FOLDALIGN algorithm has also been updated with a better memory implementation and an improved energy model. With these improvements in the algorithm, the FOLDALIGN software package provides the molecular biologist with an efficient and user-friendly tool for searching for new ncRNAs. The software package is available for download at http://foldalign. ku.dk. Citation: Havgaard JH, Torarinsson E, Gorodkin J (2007) Fast pairwise structural RNA alignments by pruning of the dynamical programming matrix. PLoS Comput Biol 3(10): e193.

Approximation of RNA multiple structural alignment

Journal of Discrete Algorithms, 2011

In the context of non-coding RNA (ncRNA) multiple structural alignment, Davydov and Batzoglou introduced in [7] the problem of finding the largest nested linear graph that occurs in a set G of linear graphs, the so-called Max-NLS problem. This problem generalizes both the longest common subsequence problem and the maximum common homeomorphic subtree problem for rooted ordered trees.

Simplicity in RNA Secondary Structure Alignment: Towards biologically plausible alignments

2006

Ribonucleic acid (RNA) molecules contain the genetic information that regulates the functions of organisms. Given two different molecules, a preserved function corresponds to a preserved secondary RNA structure. Hence, R N A secondary-structure comparison is essential in predicting the functions of a newly discovered molecule. In this paper, we discuss our SPRC method for RNA structure comparison. In this work, we developed, a novel tree representation of RNA that reflects both its primary and secondary structure and a tree-alignment algorithm, which, given the tree representations of two RNA molecules, produces a sequence of mutations that could transform one RNA molecule to the other. Our SPRC algorithm extends the Zhang-Shasha tree-edit distance calculation algorithm in two ways: first, in addition to the distance, it reports all editing sequences with the same minimum edit cost, and second, it uses a biologically-inspired affine cost function. Furthermore, the SPRC method proposes set of heuristics designed to filter the produced solution set to recommend the simplest editing sequence, as corresponding to the most biologically correct alignment. Experiments on three 5S rRNA families: archaea, eubacteria, and eukaryota, show that SPRC is very effective in producing biologically meaningful RNA secondary structure alignments.