Using Phylogenetic Markov Trees to Detect Conserved Structure in RNA Multiple Alignments (original) (raw)

Using Multiple Alignments and Phylogenetic Trees to Detect RNA Secondary Structure

: We describe a statistical method to determine if a pair of columns in a multiple alignment of a homologous family of RNA sequences shows evidence of being base paired. The method makes explicit use of a given phylogenetic tree for the sequences in the alignment. It is tested on a multiple alignment of 16S rRNA sequences with good results. Introduction and Overview of Methods Most present techniques for RNA secondary structure prediction are based either on energy minimization or on comparative sequence analysis. Energy minimization methods have had less success on large RNA molecules [1 Jacobson93 ] [2 Zuker-91] [3 Zuker-84] [4 Tinoco-71], so comparative sequence analysis is the method of choice here * [5 Han-93] [6 Le-91]. Until now, comparative sequence methods have either required substantial manual intervention [7 James89 ] [8 Woese-83], or were more fully automated, but overlooked information about the phylogenetic relationships among the sequences in the RNA multiple *...

NASP: a parallel program for identifying evolutionarily conserved nucleic acid secondary structures from nucleotide sequence alignments

Bioinformatics, 2011

Many natural nucleic acid sequences have evolutionarily conserved secondary structures with diverse biological functions. A reliable computational tool for identifying such structures would be very useful in guiding experimental analyses of their biological functions. NASP (Nucleic Acid Structure Predictor) is a program that takes into account thermodynamic stability, Boltzmann base pair probabilities, alignment uncertainty, covarying sites and evolutionary conservation to identify biologically relevant secondary structures within multiple sequence alignments. Unique to NASP is the consideration of all this information together with a recursive permutation-based approach to progressively identify and list the most conserved probable secondary structures that are likely to have the greatest biological relevance. By focusing on identifying only evolutionarily conserved structures, NASP forgoes the prediction of complete nucleotide folds but outperforms various other secondary structure prediction methods in its ability to selectively identify actual base pairings.

Simplicity in RNA Secondary Structure Alignment: Towards biologically plausible alignments

2006

Ribonucleic acid (RNA) molecules contain the genetic information that regulates the functions of organisms. Given two different molecules, a preserved function corresponds to a preserved secondary RNA structure. Hence, R N A secondary-structure comparison is essential in predicting the functions of a newly discovered molecule. In this paper, we discuss our SPRC method for RNA structure comparison. In this work, we developed, a novel tree representation of RNA that reflects both its primary and secondary structure and a tree-alignment algorithm, which, given the tree representations of two RNA molecules, produces a sequence of mutations that could transform one RNA molecule to the other. Our SPRC algorithm extends the Zhang-Shasha tree-edit distance calculation algorithm in two ways: first, in addition to the distance, it reports all editing sequences with the same minimum edit cost, and second, it uses a biologically-inspired affine cost function. Furthermore, the SPRC method proposes set of heuristics designed to filter the produced solution set to recommend the simplest editing sequence, as corresponding to the most biologically correct alignment. Experiments on three 5S rRNA families: archaea, eubacteria, and eukaryota, show that SPRC is very effective in producing biologically meaningful RNA secondary structure alignments.

Strategies for measuring evolutionary conservation of RNA secondary structures

BMC Bioinformatics, 2008

Background: Evolutionary conservation of RNA secondary structure is a typical feature of many functional non-coding RNAs. Since almost all of the available methods used for prediction and annotation of non-coding RNA genes rely on this evolutionary signature, accurate measures for structural conservation are essential.

PARTS: Probabilistic Alignment for RNA JoinT Secondary Structure Prediction

Nucleic Acids Research, 2008

A novel method is presented for joint prediction of alignment and common secondary structures of two RNA sequences. The joint consideration of common secondary structures and alignment is accomplished by structural alignment over a search space defined by the newly introduced motif called matched helical regions. The matched helical region formulation generalizes previously employed constraints for structural alignment and thereby better accommodates the structural variability within RNA families. A probabilistic model based on pseudo free energies obtained from precomputed base pairing and alignment probabilities is utilized for scoring structural alignments. Maximum a posteriori (MAP) common secondary structures, sequence alignment and joint posterior probabilities of base pairing are obtained from the model via a dynamic programming algorithm called PARTS. The advantage of the more general structural alignment of PARTS is seen in secondary structure predictions for the RNase P family. For this family, the PARTS MAP predictions of secondary structures and alignment perform significantly better than prior methods that utilize a more restrictive structural alignment model. For the tRNA and 5S rRNA families, the richer structural alignment model of PARTS does not offer a benefit and the method therefore performs comparably with existing alternatives. For all RNA families studied, the posterior probability estimates obtained from PARTS offer an improvement over posterior probability estimates from a single sequence prediction. When considering the base pairings predicted over a threshold value of confidence, the combination of sensitivity and positive predictive value is superior for PARTS than for the single sequence prediction. PARTS source code is available for download under the GNU public license at http:// rna.urmc.rochester.edu.

TurboFold: Iterative probabilistic estimation of secondary structures for multiple RNA sequences

BMC Bioinformatics, 2011

Background The prediction of secondary structure, i.e. the set of canonical base pairs between nucleotides, is a first step in developing an understanding of the function of an RNA sequence. The most accurate computational methods predict conserved structures for a set of homologous RNA sequences. These methods usually suffer from high computational complexity. In this paper, TurboFold, a novel and efficient method for secondary structure prediction for multiple RNA sequences, is presented. Results TurboFold takes, as input, a set of homologous RNA sequences and outputs estimates of the base pairing probabilities for each sequence. The base pairing probabilities for a sequence are estimated by combining intrinsic information, derived from the sequence itself via the nearest neighbor thermodynamic model, with extrinsic information, derived from the other sequences in the input set. For a given sequence, the extrinsic information is computed by using pairwise-sequence-alignment-based ...

Alignments of RNA Structures

IEEE/ACM Transactions on Computational Biology and Bioinformatics, 2000

We describe a theoretical unifying framework to express comparison of RNA structures, which we call alignment hierarchy. This framework relies on the definition of common supersequences for arc-annotated sequences, and encompasses main existing models for RNA structure comparison based on trees and arc-annotated sequences with a variety of edit operations. It also gives rise to edit models that have not been studied yet. We provide a thorough analysis of the alignment hierarchy, including a new polynomial time algorithm and an NP-completeness proof. The polynomial time algorithm involves biologically relevant evolutionary operations, such as pairing or unpairing nucleotides. It has been implemented in a software, called gardenia that is available at the web server http://bioinfo.lifl.fr/RNA/gardenia.

Using Structural Information in Modeling and Multiple Alignments for Phylogenetics

Phylogenetic studies are increasingly based on structural biological data and on statistical formalization. That leads to the study of improved models and of extracting the maximum information from sequence data. In this research, I have proposed to incorporate the structural information in two areas that relate to phylogenetic inference: one is to use a spatial dependent substitution model for likelihood calculation in phylogenetic inference; the other is to use a gap distance measure for MSA evaluation. While the first application is to using an improved substitution models in phylogenetic inference, the second one focuses on the quality of the MSA produced by different alignment procedures.

A phylogenetic approach to RNA structure prediction

Proceedings / ... International Conference on Intelligent Systems for Molecular Biology ; ISMB. International Conference on Intelligent Systems for Molecular Biology, 1999

Methods based on the Mutual Information statistic (MI methods) predict structure by looking for statistical correlations between sequence positions in a set of aligned sequences. Although MI methods are often quite effective, these methods ignore the underlying phylogenetic relationships of the sequences they analyze. Thus, they cannot distinguish between correlations due to structural interactions, and spurious correlations resulting from phylogenetic history. In this paper, we introduce a method analogous to MI that incorporates phylogenetic information. We show that this method accurately recovers the structures of well-known RNA molecules. We also demonstrate, with both real and simulated data, that this phylogenetically-based method outperforms standard MI methods, and improves the ability to distinguish interacting from non-interacting positions in RNA. This method is flexible, and may be applied to the prediction of protein structure given the appropriate evolutionary model. ...

A novel approach to represent and compare RNA secondary structures

Nucleic Acids Research, 2014

Structural information is crucial in ribonucleic acid (RNA) analysis and functional annotation; nevertheless, how to include such structural data is still a debated problem. Dot-bracket notation is the most common and simple representation for RNA secondary structures but its simplicity leads also to ambiguity requiring further processing steps to dissolve. Here we present BEAR (Brand nEw Alphabet for RNA), a new context-aware structural encoding represented by a string of characters. Each character in BEAR encodes for a specific secondary structure element (loop, stem, bulge and internal loop) with specific length. Furthermore, exploiting this informative and yet simple encoding in multiple alignments of related RNAs, we captured how much structural variation is tolerated in RNA families and convert it into transition rates among secondary structure elements. This allowed us to compute a substitution matrix for secondary structure elements called MBR (Matrix of BEAR-encoded RNA secondary structures), of which we tested the ability in aligning RNA secondary structures. We propose BEAR and the MBR as powerful resources for the RNA secondary structure analysis, comparison and classification, motif finding and phylogeny.