Pfold: RNA secondary structure prediction using stochastic context-free grammars - PubMed (original) (raw)

Pfold: RNA secondary structure prediction using stochastic context-free grammars

Bjarne Knudsen et al. Nucleic Acids Res. 2003.

Abstract

RNA secondary structures are important in many biological processes and efficient structure prediction can give vital directions for experimental investigations. Many available programs for RNA secondary structure prediction only use a single sequence at a time. This may be sufficient in some applications, but often it is possible to obtain related RNA sequences with conserved secondary structure. These should be included in structural analyses to give improved results. This work presents a practical way of predicting RNA secondary structure that is especially useful when related sequences can be obtained. The method improves a previous algorithm based on an explicit evolutionary model and a probabilistic model of structures. Predictions can be done on a web server at http://www.daimi.au.dk/\~compbio/pfold.

PubMed Disclaimer

Figures

Figure 1

Figure 1

The effect of treating gaps as unknown nucleotides. Only a single column from the alignment is considered with the nucleotides put at the leaves of the phylogenetic tree. The two trees have identical probabilities since leaves with gaps can be removed.

Figure 1

Figure 1

The effect of treating gaps as unknown nucleotides. Only a single column from the alignment is considered with the nucleotides put at the leaves of the phylogenetic tree. The two trees have identical probabilities since leaves with gaps can be removed.

Figure 2

Figure 2

A structure prediction for three hypothetical sequences. In the top alignment, gaps are treated as unknown nucleotides. The structure, shown as parentheses, include pairs between nucleotides and gaps. In the parenthesis notation, corresponding parentheses indicate positions forming base-pairs. In the bottom alignment, the columns with gaps have been left out of the prediction, because <75% of the sequences have nucleotides in these positions.

Figure 2

Figure 2

A structure prediction for three hypothetical sequences. In the top alignment, gaps are treated as unknown nucleotides. The structure, shown as parentheses, include pairs between nucleotides and gaps. In the parenthesis notation, corresponding parentheses indicate positions forming base-pairs. In the bottom alignment, the columns with gaps have been left out of the prediction, because <75% of the sequences have nucleotides in these positions.

Figure 3

Figure 3

In the top alignment, the KH-99 result is given. The bottom alignment shows the structure when all nucleotides have a 1% chance of being any other nucleotide. The result is a longer stem, which includes one non-standard pair.

Figure 3

Figure 3

In the top alignment, the KH-99 result is given. The bottom alignment shows the structure when all nucleotides have a 1% chance of being any other nucleotide. The result is a longer stem, which includes one non-standard pair.

Figure 4

Figure 4

Prediction of the Klebsiella pneumoniae RNase P RNA structure (5) with the KH-99 method based on the four sequence alignment in the work of Knudsen and Hein (12). The left side shows which areas are correctly predicted and the right side shows the reliability of the prediction. Notice the high correlation between the two. Positions 359–366 form a pseudoknot with positions 68–70 and 72–76. Furthermore, positions 84–87 form a pseudoknot with positions 282–285. Since the algorithm described here does not take pseudoknots into account, this explains why these areas are incorrectly predicted while some of them seem reliable.

Figure 5

Figure 5

A dot plot made from GCA-tRNA sequences from rat, chicken, mouse and cow (1). The lower left corner represents the beginning of the alignment. Imagining the alignment laid out upwards from here and toward the right, the dots inside the square represent pairing probabilities between positions. The dots outside the square represent probabilities of not pairing. The tRNA structure is clearly visible.

Figure 6

Figure 6

Accuracy as a function of the number of sequences used in the prediction. Crosses are from results using ‘correct’ alignments, while boxes are from ClustalW alignments. Each point represents average results for either all possible combinations of the relevant number of sequences or 50 random combinations, whichever is the lowest number.

Figure 6

Figure 6

Accuracy as a function of the number of sequences used in the prediction. Crosses are from results using ‘correct’ alignments, while boxes are from ClustalW alignments. Each point represents average results for either all possible combinations of the relevant number of sequences or 50 random combinations, whichever is the lowest number.

Figure 6

Figure 6

Accuracy as a function of the number of sequences used in the prediction. Crosses are from results using ‘correct’ alignments, while boxes are from ClustalW alignments. Each point represents average results for either all possible combinations of the relevant number of sequences or 50 random combinations, whichever is the lowest number.

Figure 7

Figure 7

Accuracy as a function of pairwise distance between two sequences being analysed. As in Figure 6, crosses are from results using ‘correct’ alignments, while boxes are from ClustalW alignments. The pairs were grouped according to their Jukes–Cantor distances, in the intervals [0;0.2), [0.2;0.4), [0.4;0.6) etc. The points represent average results for 50 random sequence combinations from a specific range of distances. The _x_-value of a point is the average of the 50 distances.

Similar articles

Cited by

References

    1. Sprinzl M., Horn,C., Brown,M., Ioudovitch,A. and Steinberg,S. (1998) Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res., 26, 148–153. - PMC - PubMed
    1. Wuyts J., De Rijk,P., Van de Peer,Y., Winkelmans,T. and De Wachter,R. (2001) The European large subunit ribosomal RNA database. Nucleic Acids Res., 29, 175–177. - PMC - PubMed
    1. Wuyts J., Van de Peer,Y., Winkelmans,T. and De Wachter,R. (2002) The European database on small subunit ribosomal RNA. Nucleic Acids Res., 30, 183–185. - PMC - PubMed
    1. Zwieb C., Gorodkin,J., Knudsen,B., Burks,J. and Wower,J. (2003) tmRDB (tmRNA database). Nucleic Acids Res., 31, 446–447. - PMC - PubMed
    1. Brown J.W. (1999) The ribonuclease P database. Nucleic Acids Res., 27, 314. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources