Fully differentiable coarse-grained and all-atom knowledge-based potentials for RNA structure evaluation (original) (raw)

All-atom knowledge-based potential for RNA structure prediction and assessment

Bioinformatics/computer Applications in The Biosciences, 2011

Over the recent years, the vision that RNA simply serves as information transfer molecule has dramatically changed. The study of the sequence/structure/function relationships in RNA is becoming more important. As a direct consequence, the total number of experimentally solved RNA structures has dramatically increased and new computer tools for predicting RNA structure from sequence are rapidly emerging. Therefore, new and accurate methods for assessing the accuracy of RNA structure models are clearly needed. Results: Here, we introduce an all-atom knowledge-based potential for the assessment of RNA three-dimensional (3D) structures. We have benchmarked our new potential, called Ribonucleic Acids Statistical Potential (RASP), with two different decoy datasets composed of near-native RNA structures. In one of the benchmark sets, RASP was able to rank the closest model to the X-ray structure as the best and within the top 10 models for ∼93 and ∼95% of decoys, respectively. The average correlation coefficient between model accuracy, calculated as the root mean square deviation and global distance test-total score (GDT-TS) measures of C3 atoms, and the RASP score was 0.85 and 0.89, respectively. Based on a recently released benchmark dataset that contains hundreds of 3D models for 32 RNA motifs with non-canonical base pairs, RASP scoring function compared favorably to ROSETTA FARFAR force field in the selection of accurate models. Finally, using the self-splicing group I intron and the stem-loop IIIc from hepatitis C virus internal ribosome entry site as test cases, we show that RASP is able to discriminate between known structure-destabilizing mutations and compensatory mutations. Availability: RASP can be readily applied to assess all-atom or coarse-grained RNA structures and thus should be of interest to both developers and end-users of RNA structure prediction methods. The computer software and knowledge-based potentials are freely available at

Bridging the gap in RNA structure prediction

Current Opinion in Structural Biology, 2007

The field of RNA structure prediction has experienced significant advances in the past several years, thanks to the availability of new experimental data and improved computational methodologies. These methods determine RNA secondary structures and pseudoknots from sequence alignments, thermodynamics-based dynamic programming algorithms, genetic algorithms and combined approaches. Computational RNA three-dimensional modeling uses this information in conjunction with manual manipulation, constraint satisfaction methods, molecular mechanics and molecular dynamics. The ultimate goal of automatically producing RNA three-dimensional models from given secondary and tertiary structure data, however, is still not fully realized. Recent developments in the computational prediction of RNA structure have helped bridge the gap between RNA secondary structure prediction, including pseudoknots, and three-dimensional modeling of RNA.

Using sequence signatures and kink-turn motifs in knowledge-based statistical potentials for RNA structure prediction

Nucleic Acids Research

Kink turns are widely occurring motifs in RNA, located in internal loops and associated with many biological functions including translation, regulation and splicing. The associated sequence pattern, a 3-nt bulge and G-A, A-G base-pairs, generates an angle of ∼50 • along the helical axis due to Aminor interactions. The conserved sequence and distinct secondary structures of kink-turns (k-turn) suggest computational folding rules to predict kturn-like topologies from sequence. Here, we annotate observed k-turn motifs within a non-redundant RNA dataset based on sequence signatures and geometrical features, analyze bending and torsion angles, and determine distinct knowledge-based potentials with and without k-turn motifs. We apply these scoring potentials to our RAGTOP (RNA-As-Graph-Topologies) graph sampling protocol to construct and sample coarse-grained graph representations of RNAs from a given secondary structure. We present graph-sampling results for 35 RNAs, including 12 k-turn and 23 non k-turn internal loops, and compare the results to solved structures and to RAG-TOP results without special k-turn potentials. Significant improvements are observed with the updated scoring potentials compared to the k-turn-free potentials. Because k-turns represent a classic example of sequence/structure motif, our study suggests that other such motifs with sequence signatures and unique geometrical features can similarly be utilized for RNA structure prediction and design.

RNA structure prediction: Progress and perspective

Chinese Physics B, 2014

Many recent exciting discoveries have revealed the versatility of RNAs and their impo rtance in a variety of cellular functions which are strongly coupled to RNA structures. To understand the functions of RNAs, some structure prediction models have been developed in recent years. In this review, the progress in computational models for RNA structure prediction is introduced and the distinguishing features of many outstanding algorithms are discussed, emphasizing three dimensional (3D) structure prediction. A promising coarse-grained model for predicting RNA 3D structure, stability and salt effect is also introduced briefly. Finally, we discuss the major challenges in the RNA 3D structure modeling.

Rich Parameterization Improves RNA Structure Prediction

Journal of Computational Biology, 2011

Motivation. Current approaches to RNA structure prediction range from physics-based methods, which rely on thousands of experimentally-measured thermodynamic parameters, to machinelearning (ML) techniques. While the methods for parameter estimation are successfully shifting toward ML-based approaches, the model parameterizations so far remained fairly constant and all models to date have relatively few parameters. We propose a move to much richer parameterizations.

Statistical Potentials for Hairpin and Internal Loops Improve the Accuracy of the Predicted RNA Structure

Journal of Molecular Biology, 2011

Keywords: statistical potentials; RNA folding; comparative analysis; RNA structure; accuracy of the predicted RNA structure RNA is directly associated with a growing number of functions within the cell. The accurate prediction of different RNA higher-order structures from their nucleic acid sequences will provide insight into their functions and molecular mechanics. We have been determining statistical potentials for a collection of structural elements that is larger than the number of structural elements determined with experimentally determined energy values. The experimentally derived free energies and the statistical potentials for canonical base-pair stacks are analogous, demonstrating that statistical potentials derived from comparative data can be used as an alternative energetic parameter. A new computational infrastructure-RNA Comparative Analysis Database (rCAD)-that utilizes a relational database was developed to manipulate and analyze very large sequence alignments and secondary-structure data sets. Using rCAD, we determined a richer set of energetic parameters for RNA fundamental structural elements including hairpin and internal loops. A new version of RNAfold was developed to utilize these statistical potentials. Overall, these new statistical potentials for hairpin and internal loops integrated into the new version of RNAfold demonstrated significant improvements in the prediction accuracy of RNA secondary structure.

Lectures L 2 . 1 RNAComposer : automated high-resolution structure prediction for large RNAs

2012

In contrast to the protein field, a much smaller number of RNA tertiary structures has been assessed by X-ray crystallography, NMR spectroscopy and cryo-EM, and deposited in structural data banks. In view of the rapidly growing access to RNA secondary structures their 3D structure prediction is in great demand in the RNA community. Only a few programs and web-accessible tools have been proposed for semi-automated and automated prediction of the RNA tertiary structure. Automated methods make use of the coarse-grained and atomic-level molecular dynamics, internal coordinate space dynamics, fragment assembly and comparative modelling using templates. They vary considerably in terms of the required input data (RNA sequence, secondary structure, conformational data or structural templates), structure prediction quality across different RNA sizes and computation time. Recently we have developed a novel approach for the fully automated RNA 3D structure prediction from the userdefined secon...

RNA Secondary Structure Prediction Via Energy Density Minimization

Lecture Notes in Computer Science, 2006

There is a resurgence of interest in RNA secondary structure prediction problem (a.k.a. the RNA folding problem) due to the discovery of many new families of non-coding RNAs with a variety of functions. The vast majority of the computational tools for RNA secondary structure prediction are based on free energy minimization. Here the goal is to compute a non-conflicting collection of structural elements such as hairpins, bulges and loops, whose total free energy is as small as possible. Perhaps the most commonly used tool for structure prediction, mfold/RNAfold, is designed to fold a single RNA sequence. More recent methods, such as RNAscf and alifold are developed to improve the prediction quality of this tool by aiming to minimize the free energy of a number of functionally similar RNA sequences simultaneously. Typically, the (stack) prediction quality of the latter approach improves as the number of sequences to be folded and/or the similarity between the sequences increase. If the number of available RNA sequences to be folded is small then the predictive power of multiple sequence folding methods can deteriorate to that of the single sequence folding methods or worse. In this paper we show that delocalizing the thermodynamic cost of forming an RNA substructure by considering the energy density of the substructure can significantly improve on secondary structure prediction via free energy minimization. We describe a new algorithm and a software tool that we call Densityfold, which aims to predict the secondary structure of an RNA sequence by minimizing the sum of energy densities of individual substructures. We show that when only one or a small number of input sequences are available, Densityfold can outperform all available alternatives. It is our hope that this approach will help to better understand the process of nucleation that leads to the formation of biologically relevant RNA substructures.