A comprehensive comparison of comparative RNA structure prediction approaches - PubMed (original) (raw)

Comparative Study

A comprehensive comparison of comparative RNA structure prediction approaches

Paul P Gardner et al. BMC Bioinformatics. 2004.

Abstract

Background: An increasing number of researchers have released novel RNA structure analysis and prediction algorithms for comparative approaches to structure prediction. Yet, independent benchmarking of these algorithms is rarely performed as is now common practice for protein-folding, gene-finding and multiple-sequence-alignment algorithms.

Results: Here we evaluate a number of RNA folding algorithms using reliable RNA data-sets and compare their relative performance.

Conclusions: We conclude that comparative data can enhance structure prediction but structure-prediction-algorithms vary widely in terms of both sensitivity and selectivity across different lengths and homologies. Furthermore, we outline some directions for future research.

PubMed Disclaimer

Figures

Figure 1

Figure 1

RNA analysis. Current automated approaches to analysing homologous RNA sequences and structures usually follow one of three "plans". Plan A uses aligned sequences (usually produced by a standard multiple sequence alignment algorithm) to infer a consensus secondary structure from the evolutionary and energetic information contained in an alignment. This is a highly successful approach, but is limited to data-sets with sequence homology high enough for the alignment step to work yet divergent enough for detection of structurally consistent mutations. Plan B employs the "Sankoff algorithm" to simultaneously align and infer a consensus structure. This algorithm requires extreme amounts of memory and time. Plan C aligns RNA structures rather than sequences. This approach can be used in the rare situation where reliable structures are known. Representative algorithms which could be used for each plan are indicated within the figure.

Figure 2

Figure 2

Alignment consistency. A violation of RNA structural alignment consistency is shown (left), together with a possible correction (right) – see text for details. Note that the inconsistent alignment may maximise sequence similarity, showing 3 mismatches versus 1 mismatch and 2 indels, with the concrete outcome depending on the gap scoring used. Inconsistency is the reason why it is dangerous to align two structures in string representation by a standard sequence alignment algorithm. Inconsistency is hard to detect by human eye inspection, and structural alignments in databases are not always free from consistency violations.

Figure 3

Figure 3

Prediction correlation with reality. Matthews correlation coefficient versus the logarithm of the sequence length for a range of different ncRNAs and structure prediction algorithms. Inset A shows accuracies of thermodynamic single sequence prediction algorithms. Insets B and C shows accuracies of comparative methods on the high and medium similarity data-sets respectively.

Figure 4

Figure 4

ROC plots. We use ROC (receiver operating characteristic) plots to simultaneously display both sensitivity and selectivity for plans A, B and C respectively. Accuracies of the MFE methods (MFold, RNAFold and SFold) are shown in each plot to provide a base-line. Points on the line X = Y are as sensitive as they are selective, points below this line indicates a greater selectivity, points above indicate greater sensitivity. Points below the line X = 100 - Y are worse than "random" assignments; Assuming base-pairs are independent of each other (this is false for base-pairing). Points in the top right corner are "perfect" predictions. Interestingly many algorithms form characteristic clusters in these plots. Where the variance is sufficiently small these have been indicated with a closed curve.

Similar articles

Cited by

References

    1. Doudna J, Cech T. The natural chemical repertoire of natural ribozymes. Nature. 2002;418:222–228. doi: 10.1038/418222a. - DOI - PubMed
    1. Poole AM, Jeffares DC, Penny D. The path from the RNA world. Journal of Molecular Evolution. 1998;46:1–17. - PubMed
    1. Jeffares DC, Poole AM, Penny D. Relics from the RNA world. Journal of Molecular Evolution. 1998;46:18–36. - PubMed
    1. Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SPA, Gingeras TR. Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002;296:916–919. doi: 10.1126/science.1068597. - DOI - PubMed
    1. Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G, Tammana H, Gingeras TR. Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Research. 2004;14:331–342. doi: 10.1101/gr.2094104. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources