Ensemble-based prediction of RNA secondary structures (original) (raw)

Improved RNA secondary structure prediction by maximizing expected pair accuracy

RNA (New York, N.Y.), 2009

Free energy minimization has been the most popular method for RNA secondary structure prediction for decades. It is based on a set of empirical free energy change parameters derived from experiments using a nearest-neighbor model. In this study, a program, MaxExpect, that predicts RNA secondary structure by maximizing the expected base-pair accuracy, is reported. This approach was first pioneered in the program CONTRAfold, using pair probabilities predicted with a statistical learning method. Here, a partition function calculation that utilizes the free energy change nearest-neighbor parameters is used to predict base-pair probabilities as well as probabilities of nucleotides being single-stranded. MaxExpect predicts both the optimal structure (having highest expected pair accuracy) and suboptimal structures to serve as alternative hypotheses for the structure. Tested on a large database of different types of RNA, the maximum expected accuracy structures are, on average, of higher a...

CONTRAfold: RNA secondary structure prediction without physics-based models

2006

Abstract Motivation: For several decades, free energy minimization methods have been the dominant strategy for single sequence RNA secondary structure prediction. More recently, stochastic context-free grammars (SCFGs) have emerged as an alternative probabilistic methodology for modeling RNA structure. Unlike physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, SCFGs use fully-automated statistical learning algorithms to derive model parameters.

Bridging the gap in RNA structure prediction

Current Opinion in Structural Biology, 2007

The field of RNA structure prediction has experienced significant advances in the past several years, thanks to the availability of new experimental data and improved computational methodologies. These methods determine RNA secondary structures and pseudoknots from sequence alignments, thermodynamics-based dynamic programming algorithms, genetic algorithms and combined approaches. Computational RNA three-dimensional modeling uses this information in conjunction with manual manipulation, constraint satisfaction methods, molecular mechanics and molecular dynamics. The ultimate goal of automatically producing RNA three-dimensional models from given secondary and tertiary structure data, however, is still not fully realized. Recent developments in the computational prediction of RNA structure have helped bridge the gap between RNA secondary structure prediction, including pseudoknots, and three-dimensional modeling of RNA.

Efficient parameter estimation for RNA secondary structure prediction

Bioinformatics, 2007

Motivation: Accurate prediction of RNA secondary structure from the base sequence is an unsolved computational challenge. The accuracy of predictions made by free energy minimization is limited by the quality of the energy parameters in the underlying free energy model. The most widely used model, the Turner99 model, has hundreds of parameters, and so a robust parameter estimation scheme should efficiently handle large data sets with thousands of structures. Moreover, the estimation scheme should also be trained using available experimental free energy data in addition to structural data. Results: In this work, we present constraint generation (CG), the first computational approach to RNA free energy parameter estimation that can be efficiently trained on large sets of structural as well as thermodynamic data. Our CG approach employs a novel iterative scheme, whereby the energy values are first computed as the solution to a constrained optimization problem. Then the newly computed energy parameters are used to update the constraints on the optimization function, so as to better optimize the energy parameters in the next iteration. Using our method on biologically sound data, we obtain revised parameters for the Turner99 energy model. We show that by using our new parameters, we obtain significant improvements in prediction accuracy over current state of-the-art methods.

RNA secondary structure prediction by centroids in a Boltzmann weighted ensemble

RNA (New York, N.Y.), 2005

Prediction of RNA secondary structure by free energy minimization has been the standard for over two decades. Here we describe a novel method that forsakes this paradigm for predictions based on Boltzmann-weighted structure ensemble. We introduce the notion of a centroid structure as a representative for a set of structures and describe a procedure for its identification. In comparison with the minimum free energy (MFE) structure using diverse types of structural RNAs, the centroid of the ensemble makes 30.0% fewer prediction errors as measured by the positive predictive value (PPV) with marginally improved sensitivity. The Boltzmann ensemble can be separated into a small number (3.2 on average) of clusters. Among the centroids of these clusters, the "best cluster centroid" as determined by comparison to the known structure simultaneously improves PPV by 46.5% and sensitivity by 21.7%. For 58% of the studied sequences for which the MFE structure is outside the cluster cont...

Comparison of RNA Secondary Structure Prediction Tools in Predicting the Structure

Many numbers of software applications (GUIs) are available for the single stranded nucleic acid secondary structure prediction-like Mfold, CONTRA fold, IPknot, Compa RNA, Centroid Alifold, etc. Some uses Minimum Free Energy models (MFE) algorithm and others use stochastic context-free grammars (SCFGs), and rest rely on dynamic programming evolved as an alternative probabilistic methodology for modelling RNA structure. In contrast to physics-based methods, which are dependent on thousands of experimentally-measured thermodynamic parameters, SCFGs require fully-automated statistical learning algorithms to derive model parameters....

CompaRNA: a server for continuous benchmarking of automated methods for RNA secondary structure prediction

We present a continuous benchmarking approach for the assessment of RNA secondary structure prediction methods implemented in the CompaRNA web server. As of 3 October 2012, the performance of 28 single-sequence and 13 comparative methods has been evaluated on RNA sequences/structures released weekly by the Protein Data Bank. We also provide a static benchmark generated on RNA 2D structures derived from the RNAstrand database. Benchmarks on both data sets offer insight into the relative performance of RNA secondary structure prediction methods on RNAs of different size and with respect to different types of structure. According to our tests, on the average, the most accurate predictions obtained by a comparative approach are generated by CentroidAlifold, MXScarna, RNAalifold and TurboFold. On the average, the most accurate predictions obtained by single-sequence analyses are generated by CentroidFold, ContextFold and IPknot. The best comparative methods typically outperform the best single-sequence methods if an alignment of homologous RNA sequences is available. This article presents the results of our benchmarks as of 3 October 2012, whereas the rankings presented online are continuously updated. We will gladly include new prediction methods and new measures of accuracy in the new editions of CompaRNA benchmarks.

Rich Parameterization Improves RNA Structure Prediction

Journal of Computational Biology, 2011

Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machinelearning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant and all models to date have relatively few parameters. We propose a move to much richer parameterizations.

An improved algorithm for RNA secondary structure prediction

Though not as abundant in known biological processes as proteins, RNA molecules serve as more than mere intermediaries between DNA and proteins, e.g. as catalytic molecules. Furthermore, RNA secondary structure prediction based on free energy rules for stacking and loop formation remains one of the few major breakthroughs in the field of structure prediction. We present a new method to evaluate all possible internal loops of size at most k in an RNA sequence, s, in time O(k|s| 2 ); this is an improvement from the previously used method that uses time O(k 2 |s| 2 ). For unlimited loop size this improves the overall complexity of evaluating RNA secondary structures from O(|s| 4 ) to O(|s| 3 ) and the method applies equally well to finding the optimal structure and calculating the equilibrium partition function. We use our method to examine the soundness of setting k = 30, a commonly

A range of complex probabilistic models for RNA secondary structure prediction that includes the nearest-neighbor model and more

RNA (New York, N.Y.), 2012

The standard approach for single-sequence RNA secondary structure prediction uses a nearest-neighbor thermodynamic model with several thousand experimentally determined energy parameters. An attractive alternative is to use statistical approaches with parameters estimated from growing databases of structural RNAs. Good results have been reported for discriminative statistical methods using complex nearest-neighbor models, including CONTRAfold, Simfold, and ContextFold. Little work has been reported on generative probabilistic models (stochastic context-free grammars [SCFGs]) of comparable complexity, although probabilistic models are generally easier to train and to use. To explore a range of probabilistic models of increasing complexity, and to directly compare probabilistic, thermodynamic, and discriminative approaches, we created TORNADO, a computational tool that can parse a wide spectrum of RNA grammar architectures (including the standard nearest-neighbor model and more) using...