RSEARCH: finding homologs of single structured RNA sequences - PubMed (original) (raw)

Comparative Study

RSEARCH: finding homologs of single structured RNA sequences

Robert J Klein et al. BMC Bioinformatics. 2003.

Abstract

Background: For many RNA molecules, secondary structure rather than primary sequence is the evolutionarily conserved feature. No programs have yet been published that allow searching a sequence database for homologs of a single RNA molecule on the basis of secondary structure.

Results: We have developed a program, RSEARCH, that takes a single RNA sequence with its secondary structure and utilizes a local alignment algorithm to search a database for homologous RNAs. For this purpose, we have developed a series of base pair and single nucleotide substitution matrices for RNA sequences called RIBOSUM matrices. RSEARCH reports the statistical confidence for each hit as well as the structural alignment of the hit. We show several examples in which RSEARCH outperforms the primary sequence search programs BLAST and SSEARCH. The primary drawback of the program is that it is slow. The C code for RSEARCH is freely available from our lab's website.

Conclusion: RSEARCH outperforms primary sequence programs in finding homologs of structured RNA sequences.

PubMed Disclaimer

Figures

Figure 1

Figure 1

An example SCFG architecture. The sequence at the top folds into the specified secondary structure. At the bottom, the nodal architecture of the model that would produce this sequence is shown. Shaded triangles represent base pair emitting nodes, and point to the base pair they emit. Open triangles represent single nucleotide emitting nodes, and point to the nucleotide they emit.

Figure 2

Figure 2

The two classes of local alignment. Each example shows how the nodal guide tree best aligns to the target sequence. At the bottom is the RSEARCH output for the alignment. On the left is an example of begin locality, while on the right is an example of end locality. The numbers next to the query sequence represent positions relative to the entire query; the numbers next to the target sequence represent positions relative to the subsequence defined in the "Target =" line.

Figure 3

Figure 3

The RIBOSUM85-60 matrix. The 16 × 16 matrix is used to get scores for aligning base pairs. The 4 × 4 matrix is used to get scores for aligning single-stranded regions. Positive scores are shaded.

Figure 4

Figure 4

RSEARCH statistics, a Distribution of scores for a search against random sequences. We searched a database of 10,000 random sequences of 10,000 nucleotides each with a GC composition of 50% using the precursor to the C. elegans miRNA mir-40 as the query [60]. We took the best score found for each of the 10,000 sequences in the database and plotted their distribution. We then calculated the mean and standard deviation and plotted the Gaussian distribution for those values. We also calculated K and λ for the Gumbel distribution and plotted that distribution. b Average observed number of hits with E-value less than a cutoff versus reported E-value for searches of various RNase P queries against database of Archaeal genomes. E-values were computed using partition points of 40% and 60% G+C content.

Similar articles

Cited by

References

    1. Hentze MW, Caughman SW, Casey JL, Koeller DM, Rouault TA, Harford JB, Klausner RD. A model for the structure and functions of iron-responsive elements. Gene. 1988;72:201–8. doi: 10.1016/0378-1119(88)90145-X. - DOI - PubMed
    1. Schlegl J, Gegout V, Schlager B, Hentze MW, Westhof E, Ehresmann C, Ehresmann B, Romby P. Probing the structure of the regulatory region of human transferrin receptor messenger RNA and its interaction with iron regulatory protein-1. RNA. 1997;3:1159–72. - PMC - PubMed
    1. Lambert A, Lescure A, Gautheret D. A survey of metazoan selenocysteine insertion sequences. Biochimie. 2002;84:953–9. doi: 10.1016/S0300-9084(02)01441-4. - DOI - PubMed
    1. Wilting R, Schorling S, Persson BC, Bock A. Selenoprotein synthesis in Archaea: identification of an mRNA element of Methanococcus jannaschii probably directing selenocysteine insertion. J Mol Biol. 1997;266:637–41. doi: 10.1006/jmbi.1996.0812. - DOI - PubMed
    1. Miranda-Rios J, Navarro M, Soberon M. A conserved RNA structure (thi box) is involved in regulation of thiamin biosynthetic gene expression in bacteria. Proc Natl Acad Sci USA. 2001;98:9736–41. doi: 10.1073/pnas.161168098. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources