Sno/scaRNAbase: a curated database for small nucleolar RNAs and cajal body-specific RNAs (original) (raw)
Abstract
Small nucleolar RNAs (snoRNAs) and Cajal body-specific RNAs (scaRNAs) are named for their subcellular localization within nucleoli and Cajal bodies (conserved subnuclear organelles present in the nucleoplasm), respectively. They have been found to play important roles in rRNA, tRNA, snRNAs, and even mRNA modification and processing. All snoRNAs fall in two categories, box C/D snoRNAs and box H/ACA snoRNAs, according to their distinct sequence and secondary structure features. Box C/D snoRNAs and box H/ACA snoRNAs mainly function in guiding 2′-_O_-ribose methylation and pseudouridilation, respectively. ScaRNAs possess both box C/D snoRNA and box H/ACA snoRNA sequence motif features, but guide snRNA modifications that are transcribed by RNA polymerase II. Here we present a Web-based sno/scaRNA database, called sno/scaRNAbase, to facilitate the sno/scaRNA research in terms of providing a more comprehensive knowledge base. Covering 1979 records derived from 85 organisms for the first time, sno/scaRNAbase is not only dedicated to filling gaps between existing organism-specific sno/scaRNA databases that are focused on different sno/scaRNA aspects, but also provides sno/scaRNA scientists with an opportunity to adopt a unified nomenclature for sno/scaRNAs. Derived from a systematic literature curation and annotation effort, the sno/scaRNAbase provides an easy-to-use gateway to important sno/scaRNA features such as sequence motifs, possible functions, homologues, secondary structures, genomics organization, sno/scaRNA gene's chromosome location, and more. Approximate searches, in addition to accurate and straightforward searches, make the database search more flexible. A BLAST search engine is implemented to enable blast of query sequences against all sno/scaRNAbase sequences. Thus our sno/scaRNAbase serves as a more uniform and friendly platform for sno/scaRNA research. The database is free available at http://gene.fudan.sh.cn/snoRNAbase.nsf.
INTRODUCTION
Small nucleolar RNAs (snoRNAs) and small Cajal body-specific RNAs (scaRNAs) have been found to play vital roles in rRNA, tRNA, snRNA and even mRNA biogenesis. Since the 1990s, a vast collection of snoRNAs in eukaryotic cell have been found to be involved in rRNA methylation and pseudouridilation (1–3). Later in 2000, snoRNA homologues in archaea have been reported to function in tRNA modification (4). In humans, brain-specific snoRNAs are responsible for guiding modification of mRNAs (5). In 2001, a new type of modification guiding small RNAs, Cajal body-specific RNAs, was discovered and they guide the modification of snRNAs (6). Besides the functions in modification of different RNAs, a small number of snoRNAs, such as snoRNAs U3, U8, U14, E1, E2 and E3, are involved in the cleavage of pre-rRNAs (7,8).
Based on distinct sequence motifs and subcellular locations, sno/scaRNAs fall into three major groups: box C/D snoRNA, box H/ACA snoRNAs and scaRNAs (6,9–11). Box C/D snoRNAs share two short sequence motifs, box C (AUGAUGA) at the 5′ ends and box D (CUGA) at the 3′ ends, respectively. Two imperfect copies of these boxes, namely box C′ and box D′, have also been found in some box C/D snoRNAs. Immediately upstream of box D and/or D′ is a 10–21 nt antisense element complementary to targeted RNAs (10,12–14). Both the AUGAUGA and CUGA box motifs and the antisense element play essential roles in RNA methylation or processing (9). Each methylation site exclusively pairs with the fifth nucleotide upstream of box D or box D′ in the complementary region between a box C/D snoRNA and targeted RNA (15,16). Box H/ACA snoRNAs contain two conserved sequence motifs: a box H (ANANNA, where N stands for any nucleotide) and a box ACA (ACANNN), and two stem–loops near molecule 5′ and 3′ end, respectively. In the internal loop of the one or two stems is an appropriate bipartite guide sequence of 4–10 nt that forms a short snoRNA–rRNA duplex flanking the target site (10,17,18). The pseudouridylation site also obeys a spacing rule and it always appears at 14–16 nt upstream of box H or ACA within the bipartite guide sequence of a box H/ACA snoRNA (17,19). Different from the location of box C/D and box H/ACA snoRNAs in the nucleoli, scaRNAs accumulate within the Cajal bodies (conserved subnuclear organelles that are present in the nucleoplasm) (20). Moreover, a scaRNA molecule, such as U92, ACA47, ACA11, U109 and ACA57, can possess both box C/D and box H/ACA sequence motifs (e.g. U85), guiding both the methylation and pseudouridylation of snRNAs (6).
Sno/scaRNAs show high diversities in sequences, genomic organizations and processing pathways in varied organisms (8,12,21–25). A central and comprehensive knowledge base of sno/scaRNAs will undoubtedly speed up the current discovery process of sno/scaRNAs and deepen our understanding of their roles. Current databases exist of sno/scaRNAs (26–28), but their focuses on only one or two organisms featuring different sno/scaRNA characteristics made it very inconvenient in exploring sno/scaRNAs features/functions from the comparative genomics point of view. In this paper, we describe a more comprehensive, uniform, and curated sno/scaRNA database, sno/scaRNAbase. It contains 1979 sno/scaRNAs derived from 85 organisms and characterized in terms of sequence motifs, homologues, secondary structures, genomics organization, function that is experimentally verified or predicted, sno/scaRNA gene's chromosome location, and more. With its unified data form and a use-case-oriented user interface, sno/scaRNAbase allows users to browse and compare major features of known sno/scaRNAs from different organisms. It also provides the scientific community a platform to find a unified nomenclature for sno/scaRNAs that currently does not follow a general logic.
DATABASE CONTENT
Sno/scaRNAbase is a publicly available database of sno/scaRNAs obtained from 85 organisms that at least one sno/scaRNA has been reported. It has been developed using Lotus Domino Designer 6.0. One thousand nine hundred and seventy-nine sno/scaRNAs have been collected from various sources (literatures, GenBank searches, research group contacts, etc.). More than half of the data are not listed in the currently existing snoRNA databases. All data were further curated by trained biologists to ensure annotation quality. A sortable and searchable bibliography database of 1074 sno/scaRNA references (almost all sno/scaRNA reference publications) was extensively used during this curation process, and now becomes one part of the sno/scaRNAbase (Figure 1).
Figure 1.
The schematic illustration of the sno/scaRNAbase.
For each sno/scaRNA, we strive to extract as much sno/scaRNA information as possible from the sources mentioned above. Besides the regular sequence information (the sequence, the GenBank accession no., alias names, references, etc), the following important sno/scaRNA features have also been taken into account in the database design: (i) conserved sequence motifs and antisense elements of sno/scaRNA families, (ii) methylation or pseudouridylation sites that a sno/scaRNA guides, (iii) sno/scaRNA gene's chromosome location, (iv) genomic organization, (v) function that is experimentally verified or predicted (vi) other highly similar sequences in sno/scaRNAbase, and (vii) predicted secondary structure.
DATABASE OUTLINE
The sno/scaRNAbase browse/search page
A comprehensive interface was designed to explore and search different types of sno/scaRNAs (Figures 1 and 2). Each record is linked to a detailed sno/scaRNA feature page (see blow the sno/scaRNA record page). We provide the following sorting pages.
- All by Organism is an overview of all sno/scaRNAs grouped by organism. This view collapses in default according to organisms, and expands when users click a triangle next to an organism name. This allows users to see all available sno/scaRNAs in a certain organism.
- BoxC/D snoRNAs, box H/ACA snoRNAs and scaRNAs are specific for three types of sno/scaRNAs. These browsing pages provide general information, such as sno/scaRNA name, references, GenBank entry, sequence length and the organism that a sno/scaRNA was isolated from. Detailed information, including possible functions, sequence motifs, and organization, is available by clicking the link associated with a sno/scaRNA name.
- The Search, Home, and Help buttons link to search form, home page, and help page on this interface, respectively. The search page, as described below, uses either accurate or approximate searches to enable more flexible database search. This search helps identify inconsistency in the current nomenclature.
Figure 2.
The sno/scaRNAbase browse page. a. Browsing result ordered by organism. b. Sorting buttons. c. Browsing selection I: browsing sno/scaRNAs by organism. d. A collapsing or expanding button. e. Browsing selection II: browsing sno/scaRNAs by three categories. f. Buttons for viewing previous or next pages.
The sno/scaRNAbase search engine
Sno/scaRNAbase search is straightforward. A full-text search is capable of searching any fields of all sno/scaRNAbase records with user-defined keywords. For example, a full-text search of ‘ctga’ will return sno/scaRNAs with the ‘ctga’ in box D or D′ fields. It is necessary since different sno/scaRNAs demonstrate different features and these features sometimes are documented differently in references, thus it is unpractical to provide a specific search on all records fields through a uniform search form.
To enable the database search flexible, and to better track those sno/scaRNAs that are inconsistently documented in original publications, we provide three options for getting search results. One is using approximate searches that either consider a keyword as a root word and retrieve all sno/scaRNAs containing any words derived from this root, or take into account all words with spelling similar to the keyword and return any sno/scaRNAs containing these words. The former search, which is called a ‘word variants search’, is necessary because of the presence of multiple copies of sno/scaRNAs and the inconsistency of sno/scaRNA records. In this way, when searching a snoRNA name, different copies of the snoRNA usually distinguished by adding a suffix to the end of a snoRNA will be returned. For example, entering ‘U14’ in the sno/scaRNA name field will return U14.1, U14.2, U14.3, and U14.4, etc. The latter search, which is defined as a ‘fuzzy search’, searches words that are spelled similar to a keyword. For instance, a full-text search of ‘tgatga’ in Arabidopsis thaliana with options of using word variants and showing 100 as the maximum number records to return returns 87 hits. While there are 98 hits if fuzzy search option is selected. Those 98 hits include snoRNAs with keyword ‘tgatga’, ‘tgacga’ (e.g. snoR4-2), ‘tgatgg’ (e.g. snoR101), and ‘cgatga’ (e.g. snoR27), etc. This is especially useful when searching sno/scaRNAs with a certain sequence motif. The other two options for controlling search results are: Max Number of documents to return and Show results in order of relevance, newest first (listing the latest record added in the sno/scaRNAbase first), or oldest first (listing the oldest record added in the sno/scaRNAbase first).
The search result page returns not only a list of sno/scaRNAs that are further linked to detailed sno/scaRNA record pages, but also the search string that was used, which is useful for users to refine searches.
The sno/scaRNA record page
An example of a sno/scaRNA record page is shown in Figure 3. Unless a record is not available, almost all sno/scaRNAs have the following information: sno/scaRNA name, other name, class, nucleotide sequence, sequence length, GenBank accession number, Pubmed references, the organism that a sno/scaRNA was isolated from, possible homologues, and predicted secondary structure. The secondary structures were calculated by RNAfold (29), which has been proved remarkably effective in predicting RNA structures (30). Regarding the possible homologues, they are determined by Blastn. Those hits with high similarity (currently _E_-value < 2e−0.5 and bit score >40 are used as thresholds), to a certain degree, indicate they were possibly derived from a common ancestor. To better understand their relationships, those highly similar sno/scaRNAs, together with the organisms they were isolated from, are summarized in order of descending similarity. In this way, users can analyze different copies of a sno/scaRNA in one organism and its homologues in other organisms. The record page is also linked to GenBank sequence records, Pubmed references, and the GenBank taxonomy site.
Figure 3.
An example of a sno/scaRNA record page. a. Links to home/main pages, the previous/next sno/scaRNA record page, and the help page. b. A selected sno/scaRNA name. c. Possible homologues found in the sno/scaRNAbase. d. A link to the predicted secondary structure.
FUTURE DEVELOPMENTS
Sno/scaRNAbase is a periodically updated database dedicated to understanding sno/scaRNAs. More updated records, as well as more useful links (e.g. GeneCards and Genelinx), will be added to make the sno/scaRNAbase a more comprehensive knowledge base. In addition, we will merge duplicated entries reported from different sources, and plan to add experimentally verified sno/scaRNA secondary structure data. Further, we hope this database helps our explorations of sno/scaRNA functions and facilitates the genetic characterization of novel sno/scaRNAs, especially from the evolutionary point of view.
Acknowledgments
We thank Shanghai R&D Public Service Platform and Natural Science Foundations of China (No. 30300059, 30470356) for financial support. We also thank anonymous referees for their helpful comments and suggestions. Funding to pay the Open Access publication charges for this article was provided by Natural Science Foundations of China (No. 30470356).
Conflict of interest statement. None declared.
REFERENCES
- 1.Bachellerie J.P., Cavaille J., Huttenhofer A. The expanding snoRNA world. Biochimie. 2002;84:775–790. doi: 10.1016/s0300-9084(02)01402-5. [DOI] [PubMed] [Google Scholar]
- 2.Kiss T. Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell. 2002;109:145–148. doi: 10.1016/s0092-8674(02)00718-3. [DOI] [PubMed] [Google Scholar]
- 3.Lafontaine D.L., Bousquet-Antonelli C., Henry Y., Caizergues-Ferrer M., Tollervey D. The box H + ACA snoRNAs carry Cbf5p, the putative rRNA pseudouridine synthase. Genes Dev. 1998;12:527–537. doi: 10.1101/gad.12.4.527. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 4.Clouet d'Orval B., Bortolin M.L., Gaspin C., Bachellerie J.P. Box C/D RNA guides for the ribose methylation of archaeal tRNAs. The tRNATrp intron guides the formation of two ribose-methylated nucleosides in the mature tRNATrp. Nucleic Acids Res. 2001;29:4518–4529. doi: 10.1093/nar/29.22.4518. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 5.Cavaille J., Buiting K., Kiefmann M., Lalande M., Brannan C.I., Horsthemke B., Bachellerie J.P., Brosius J., Huttenhofer A. Identification of brain-specific and imprinted small nucleolar RNA genes exhibiting an unusual genomic organization. Proc. Natl Acad. Sci. USA. 2000;97:14311–14316. doi: 10.1073/pnas.250426397. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 6.Jady B.E., Kiss T. A small nucleolar guide RNA functions both in 2′-O-ribose methylation and pseudouridylation of the U5 spliceosomal RNA. EMBO J. 2001;20:541–551. doi: 10.1093/emboj/20.3.541. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 7.Venema J., Vos H.R., Faber A.W., van Venrooij W.J., Raue H.A. Yeast Rrp9p is an evolutionarily conserved U3 snoRNP protein essential for early pre-rRNA processing cleavages and requires box C for its association. RNA. 2000;6:1660–1671. doi: 10.1017/s1355838200001369. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 8.Grandi P., Rybin V., Bassler J., Petfalski E., Strauss D., Marzioch M., Schafer T., Kuster B., Tschochner H., Tollervey D., et al. 90S pre-ribosomes include the 35S pre-rRNA, the U3 snoRNP, and 40S subunit processing factors but predominantly lack 60S synthesis factors. Mol. Cell. 2002;10:105–115. doi: 10.1016/s1097-2765(02)00579-8. [DOI] [PubMed] [Google Scholar]
- 9.Weinstein L.B., Steitz J.A. Guided tours: from precursor snoRNA to functional snoRNP. Curr. Opin. Cell Biol. 1999;11:378–384. doi: 10.1016/S0955-0674(99)80053-2. [DOI] [PubMed] [Google Scholar]
- 10.Balakin A.G., Smith L., Fournier M.J. The RNA world of the nucleolus: two major families of small RNAs defined by different box elements with related functions. Cell. 1996;86:823–834. doi: 10.1016/s0092-8674(00)80156-7. [DOI] [PubMed] [Google Scholar]
- 11.Darzacq X., Jady B.E., Verheggen C., Kiss A.M., Bertrand E., Kiss T. Cajal body-specific small nuclear RNAs: a novel class of 2′-O-methylation and pseudouridylation guide RNAs. EMBO J. 2002;21:2746–2756. doi: 10.1093/emboj/21.11.2746. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 12.Bachellerie J.P., Caffarelli J., Qu L.H. The Ribosome: Structure, Function, Antibiotics and Cellular Interactions. Washington, DC: ASM Press; 2000. Nucleotide modifications of eukaryotic rRNAs: the world of small nucleolar RNA guides revisited; pp. 191–203. [Google Scholar]
- 13.Bachellerie J.P., Cavaille J. Guiding ribose methylation of rRNA. Trends Biochem. Sci. 1997;22:257–261. doi: 10.1016/s0968-0004(97)01057-8. [DOI] [PubMed] [Google Scholar]
- 14.Maxwell E.S., Fournier M.J. The small nucleolar RNAs. Annu. Rev. Biochem. 1995;64:897–934. doi: 10.1146/annurev.bi.64.070195.004341. [DOI] [PubMed] [Google Scholar]
- 15.Kiss-Laszlo Z., Henry Y., Bachellerie J.P., Caizergues-Ferrer M., Kiss T. Site-specific ribose methylation of preribosomal RNA: a novel function for small nucleolar RNAs. Cell. 1996;85:1077–1088. doi: 10.1016/s0092-8674(00)81308-2. [DOI] [PubMed] [Google Scholar]
- 16.Cavaille J., Nicoloso M., Bachellerie J.P. Targeted ribose methylation of RNA in vivo directed by tailored antisense RNA guides. Nature. 1996;383:732–735. doi: 10.1038/383732a0. [DOI] [PubMed] [Google Scholar]
- 17.Bortolin M.L., Ganot P., Kiss T. Elements essential for accumulation and function of small nucleolar RNAs directing site-specific pseudouridylation of ribosomal RNAs. EMBO J. 1999;18:457–469. doi: 10.1093/emboj/18.2.457. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 18.Ganot P., Caizergues-Ferrer M., Kiss T. The family of box ACA small nucleolar RNAs is defined by an evolutionarily conserved secondary structure and ubiquitous sequence elements essential for RNA accumulation. Genes Dev. 1997;11:941–956. doi: 10.1101/gad.11.7.941. [DOI] [PubMed] [Google Scholar]
- 19.Ganot P., Bortolin M.L., Kiss T. Site-specific pseudouridine formation in preribosomal RNA is guided by small nucleolar RNAs. Cell. 1997;89:799–809. doi: 10.1016/s0092-8674(00)80263-9. [DOI] [PubMed] [Google Scholar]
- 20.Matera A.G. Of coiled bodies, gems, and salmon. J. Cell Biochem. 1998;70:181–192. [PubMed] [Google Scholar]
- 21.Brown J.W., Echeverria M., Qu L.H. Plant snoRNAs: functional evolution and new modes of gene expression. Trends Plant Sci. 2003;8:42–49. doi: 10.1016/s1360-1385(02)00007-9. [DOI] [PubMed] [Google Scholar]
- 22.Dheur S., Vo le T.A., Voisinet-Hakil F., Minet M., Schmitter J.M., Lacroute F., Wyers F., Minvielle-Sebastia L. Pti1p and Ref2p found in association with the mRNA 3′ end formation complex direct snoRNA maturation. EMBO J. 2003;22:2831–2840. doi: 10.1093/emboj/cdg253. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 23.Rebane A., Tamme R., Laan M., Pata I., Metspalu A. A novel snoRNA (U73) is encoded within the introns of the human and mouse ribosomal protein S3a genes. Gene. 1998;210:255–263. doi: 10.1016/s0378-1119(98)00070-5. [DOI] [PubMed] [Google Scholar]
- 24.Runte M., Huttenhofer A., Gross S., Kiefmann M., Horsthemke B., Buiting K. The IC-SNURF-SNRPN transcript serves as a host for multiple small nucleolar RNA species and as an antisense RNA for UBE3A. Hum. Mol. Genetics. 2001;10:2687–2700. doi: 10.1093/hmg/10.23.2687. [DOI] [PubMed] [Google Scholar]
- 25.Omer A.D., Lowe T.M., Russell A.G., Ebhardt H., Eddy S.R., Dennis P.P. Homologs of small nucleolar RNAs in Archaea. Science. 2000;288:517–522. doi: 10.1126/science.288.5465.517. [DOI] [PubMed] [Google Scholar]
- 26.Samarsky D.A., Fournier M.J. A comprehensive database for the small nucleolar RNAs from Saccharomyces cerevisiae. Nucleic Acids Res. 1999;27:161–164. doi: 10.1093/nar/27.1.161. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 27.Lestrade L., Weber M.J. snoRNA-LBME-db, a comprehensive database of human H/ACA and C/D box snoRNAs. Nucleic Acids Res. 2006;34:D158–D162. doi: 10.1093/nar/gkj002. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 28.Brown J.W., Echeverria M., Qu L.H., Lowe T.M., Bachellerie J.P., Huttenhofer A., Kastenmayer J.P., Green P.J., Shaw P., Marshall D.F. Plant snoRNA database. Nucleic Acids Res. 2003;31:432–435. doi: 10.1093/nar/gkg009. [DOI] [PMC free article] [PubMed] [Google Scholar]
- 29.Hofacker I.L., Fontana W., Stadler P.F., Bonhoeffer L.S., Tacker M., Schuster P. Fast folding and comparison of RNA secondary structures. Monatshefte Fur. Chemie. 1994;125:167–188. [Google Scholar]
- 30.Zuker M. Calculating nucleic acid secondary structure. Curr. Opin. Struct. Biol. 2000;10:303–310. doi: 10.1016/s0959-440x(00)00088-9. [DOI] [PubMed] [Google Scholar]