Statistical prediction of single-stranded regions in RNA secondary structure and application to predicting effective antisense target sites and beyond - PubMed (original) (raw)

Statistical prediction of single-stranded regions in RNA secondary structure and application to predicting effective antisense target sites and beyond

Y Ding et al. Nucleic Acids Res. 2001.

Abstract

Single-stranded regions in RNA secondary structure are important for RNA-RNA and RNA-protein interactions. We present a probability profile approach for the prediction of these regions based on a statistical algorithm for sampling RNA secondary structures. For the prediction of phylogenetically-determined single-stranded regions in secondary structures of representative RNA sequences, the probability profile offers substantial improvement over the minimum free energy structure. In designing antisense oligonucleotides, a practical problem is how to select a secondary structure for the target mRNA from the optimal structure(s) and many suboptimal structures with similar free energies. By summarizing the information from a statistical sample of probable secondary structures in a single plot, the probability profile not only presents a solution to this dilemma, but also reveals 'well-determined' single-stranded regions through the assignment of probabilities as measures of confidence in predictions. In antisense application to the rabbit beta-globin mRNA, a significant correlation between hybridization potential predicted by the probability profile and the degree of inhibition of in vitro translation suggests that the probability profile approach is valuable for the identification of effective antisense target sites. Coupling computational design with DNA-RNA array technique provides a rational, efficient framework for antisense oligonucleotide screening. This framework has the potential for high-throughput applications to functional genomics and drug target validation.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Probability profiles for E.coli tRNAAla, with sampling estimates computed from 1000 sampled secondary structures. (A) The probability profiles for single-stranded nucleotides (sequence width W = 1) indicated by the phylogenetic structure (large dots) and by the minimum free energy structure (vertical bars), estimated by the sampling algorithm (short dashed line) and computed by the Vienna RNA package (long dashed line). For the region between C5 and C25, the sampling estimate predicts the phylogenetic structure substantially better than the Vienna RNA package. (B) The probability profiles for single-stranded sequences of four consecutive nucleotides (sequence width W = 4) in E.coli tRNAAla indicated by the phylogenetic structure (large dots) and by the minimum free energy structure (vertical bars) and estimated by the sampling algorithm (dashed line). The probability profile cannot be computed by the Vienna RNA package or other existing algorithms.

Figure 1

Figure 1

Probability profiles for E.coli tRNAAla, with sampling estimates computed from 1000 sampled secondary structures. (A) The probability profiles for single-stranded nucleotides (sequence width W = 1) indicated by the phylogenetic structure (large dots) and by the minimum free energy structure (vertical bars), estimated by the sampling algorithm (short dashed line) and computed by the Vienna RNA package (long dashed line). For the region between C5 and C25, the sampling estimate predicts the phylogenetic structure substantially better than the Vienna RNA package. (B) The probability profiles for single-stranded sequences of four consecutive nucleotides (sequence width W = 4) in E.coli tRNAAla indicated by the phylogenetic structure (large dots) and by the minimum free energy structure (vertical bars) and estimated by the sampling algorithm (dashed line). The probability profile cannot be computed by the Vienna RNA package or other existing algorithms.

Figure 2

Figure 2

(Opposite and above) Probability profiles (sequence width W = 4) for other representative RNA sequences, with sampling estimates computed from 1000 sampled secondary structures. For X.laevis oocyte 5S rRNA (A, opposite), the large dots present the profile indicated by the phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For E.coli 16S rRNA domain II (B, opposite), E. coli RNase P (C, above) and Group I intron from 26S rRNA of T.thermophila (D, above), the small solid squares (adjacent squares appear to form line segments) present the profile indicated by phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For the Tetrahymena Group I intron, a 6 bp double-stranded region called P3 (38) in the phylogenetic structure is not considered here because of the creation of a pseudoknot. The current sampling algorithm needs to be extended to predict certain types of pseudoknots.

Figure 2

Figure 2

(Opposite and above) Probability profiles (sequence width W = 4) for other representative RNA sequences, with sampling estimates computed from 1000 sampled secondary structures. For X.laevis oocyte 5S rRNA (A, opposite), the large dots present the profile indicated by the phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For E.coli 16S rRNA domain II (B, opposite), E. coli RNase P (C, above) and Group I intron from 26S rRNA of T.thermophila (D, above), the small solid squares (adjacent squares appear to form line segments) present the profile indicated by phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For the Tetrahymena Group I intron, a 6 bp double-stranded region called P3 (38) in the phylogenetic structure is not considered here because of the creation of a pseudoknot. The current sampling algorithm needs to be extended to predict certain types of pseudoknots.

Figure 2

Figure 2

(Opposite and above) Probability profiles (sequence width W = 4) for other representative RNA sequences, with sampling estimates computed from 1000 sampled secondary structures. For X.laevis oocyte 5S rRNA (A, opposite), the large dots present the profile indicated by the phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For E.coli 16S rRNA domain II (B, opposite), E. coli RNase P (C, above) and Group I intron from 26S rRNA of T.thermophila (D, above), the small solid squares (adjacent squares appear to form line segments) present the profile indicated by phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For the Tetrahymena Group I intron, a 6 bp double-stranded region called P3 (38) in the phylogenetic structure is not considered here because of the creation of a pseudoknot. The current sampling algorithm needs to be extended to predict certain types of pseudoknots.

Figure 2

Figure 2

(Opposite and above) Probability profiles (sequence width W = 4) for other representative RNA sequences, with sampling estimates computed from 1000 sampled secondary structures. For X.laevis oocyte 5S rRNA (A, opposite), the large dots present the profile indicated by the phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For E.coli 16S rRNA domain II (B, opposite), E. coli RNase P (C, above) and Group I intron from 26S rRNA of T.thermophila (D, above), the small solid squares (adjacent squares appear to form line segments) present the profile indicated by phylogenetic structure, the dashed line is the sampling estimate and the vertical bars represent the minimum free energy structure. For the Tetrahymena Group I intron, a 6 bp double-stranded region called P3 (38) in the phylogenetic structure is not considered here because of the creation of a pseudoknot. The current sampling algorithm needs to be extended to predict certain types of pseudoknots.

Figure 3

Figure 3

The probability profile for single-stranded sequences of four consecutive nucleotides (sequence width W = 4) estimated by 1000 sampled secondary structures (dashed line) and the profile indicated by the minimum free energy structure (vertical bars) for rabbit β-globin mRNA and the experimentally measured inhibition of ASOs in cell-free translation systems. The profile is shown for the region of the first 230 nt that is targeted by the ASOs. The length and binding sites of the ASOs are indicated by horizontal lines with the names of the ASOs centered above or below the lines. These lines also indicate the inhibition of translation through their position on the vertical axis. The vertical axis also shows the probability for the profile with inhibition corresponding to probability multiplied by 100%.

Figure 4

Figure 4

A potential high-throughput antisense framework for functional genomics, drug target validation and elucidation of genetic pathways. Systematic statistical analysis of DNA expression arrays and SNP databases can provide the basis for high-throughput functional analysis. Integration of computational antisense design and oligonucleotide array presents a rational, efficient, high-throughput platform for antisense oligonucleotide screening.

References

    1. Zuker M. (1989) On finding all suboptimal foldings of an RNA molecule. Science, 244, 48–52. - PubMed
    1. Zuker M. (1989) The use of dynamic programming algorithms in RNA secondary structure prediction. In Waterman,M.S. (ed.), Mathematical Methods for DNA Sequences. CRC Press, Boca Raton, FL, pp. 159–184.
    1. Zuker M. and Stiegler,P. (1981) Optimal computer folding of large RNA sequences using thermodynamics and auxiliary information. Nucleic Acids Res., 9, 133–148. - PMC - PubMed
    1. McCaskill J.S. (1990) The equilibrium partition function and base pair binding probabilities for RNA secondary structure. Biopolymers, 29, 1105–1119. - PubMed
    1. Hofacker I.L., Fontana,W., Stadler,P.F., Bonhöffer,S., Tacker,M. and Schuster,P. (1994) Fast folding and comparison of RNA secondary structures. Monatshefte f. Chemie, 125, 167–188.

MeSH terms

Substances

LinkOut - more resources