Non-Coding RNA Analysis Using the Rfam Database - PubMed (original) (raw)
Non-Coding RNA Analysis Using the Rfam Database
Ioanna Kalvari et al. Curr Protoc Bioinformatics. 2018 Jun.
Abstract
Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc.
Keywords: Infernal; RNA family; Rfam; genome annotation; non-coding RNA.
Copyright © 2018 John Wiley & Sons, Inc.
Figures
Figure 12.5.1.
Search box on the Rfam home page.
Figure 12.5.2.
Example text search showing riboswitch families with a known 3D structure that are found in one or more Bacillus species. For each family the secondary structure, the number of annotated sequences, and the number of species where the family is found are displayed.
Figure 12.5.3.
Browsing ncRNA clans (left) and motifs (right). For each clan the number of families is shown and the results can be sorted by the number of member families. For each motif the number of families where the motif occurs is shown and motifs can be sorted accordingly.
Figure 12.5.4.
Searching for the human genome (left) and viewing the human genome summary page (right).
Figure 12.5.5.
Searching for ncRNA families found in the human genome.
Figure 12.5.6.
Browsing human snoRNA sequences. The results can be sorted by bit score or E-value (see Guidelines for Understanding Results for more information).
Figure 12.5.7.
Sequence summary page of a human SCARNA2 sequence. The embedded genome browser shows the location of a small Cajal body-specific RNA 2 sequence (RF01268) on chromosome 1.
Figure 12.5.8.
Summary tab for the SAM riboswitch family page (RF00162) showing a Wikipedia article about the family and a list of Rfam clans the family belongs to.
Figure 12.5.9.
The Alignment tab enables viewing and downloading the Seed alignment in several formats.
Figure 12.5.10.
Viewing part of the Seed alignment for SAM riboswitch. The alignment is colored by secondary structure helical regions.
Figure 12.5.11.
A list of sequence regions that belong to a family can be found in the Sequences tab.
Figure 12.5.12.
R-scape secondary structure visualisations for the SAM riboswitch (RF00162) shown in the Secondary structure tab. Two structures are shown: On the left, the R-scape analysis of the current secondary structure in the Rfam Seed alignment. On the right, an R-scape optimised structure predicted using the statistically significant covarying basepairs as folding constraints.
Figure 12.5.13.
Secondary structure of the SAM riboswitch (RF00162) colored by sequence conservation (conserved nucleotides are red, variable nucleotides are blue).
Figure 12.5.14.
R-chie visualisation of the Seed alignment and the consensus secondary structure of the SAM riboswitch (RF00162). Canonical basepairs are shown as blue arcs, and the green alignment columns indicate valid basepairs. This visualisation suggests that the Seed alignment is of reasonable quality.
Figure 12.5.15.
The Structures tab lists the 3D structures from the Protein Data Bank that match the SAM riboswitch Rfam family (RF00162).
Figure 12.5.16.
Sunburst representation of the taxonomic distribution of the SAM riboswitch family (RF00162).
Figure 12.5.17.
Curation tab of the SAM riboswitch family (RF00162) showing the source of the Seed alignment and the secondary structure, the authors of the family, and the parameters used to build the covariance model.
Figure 12.5.18.
Sequence search results showing an alignment between the query sequence (the #SEQ line) matching the covariance model of the tRNA family (the #CM line). The secondary structure predicted for the query sequence is shown in the #SS line.
Figure 12.5.19.
Batch sequence search interface.
Figure 12.5.20.
Batch search results in a tabular format showing ncRNA families found in Hepatitis delta virus genotype III (L22063.1).
Figure 12.5.21.
Example output of an SQL query showing Rfam accessions (rfam_acc), sequence accessions (rfamseq_acc), the start and stop coordinates of the ncRNAs relative to the sequence accessions (seq_start and_seq_end_), and bit score (bit_score, see Background information for more details about bit scores).
Figure 12.5.22.
Building an RNA family using Infernal. The Seed alignment is a starting point used to build a covariance model (CM) which is then used to search for more hits in a large sequence database. The hits may be added to the Seed alignment, if necessary. The Full alignment is an alignment of all sequences in a family. Cmbuild, cmsearch, and cmalign are Infernal programs used for building CMs, searching sequence database, and aligning sequences to the CMs, respectively.
Similar articles
- Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families.
Kalvari I, Argasinska J, Quinones-Olvera N, Nawrocki EP, Rivas E, Eddy SR, Bateman A, Finn RD, Petrov AI. Kalvari I, et al. Nucleic Acids Res. 2018 Jan 4;46(D1):D335-D342. doi: 10.1093/nar/gkx1038. Nucleic Acids Res. 2018. PMID: 29112718 Free PMC article. - Rfam: an RNA family database.
Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Griffiths-Jones S, et al. Nucleic Acids Res. 2003 Jan 1;31(1):439-41. doi: 10.1093/nar/gkg006. Nucleic Acids Res. 2003. PMID: 12520045 Free PMC article. - Rfam 11.0: 10 years of RNA families.
Burge SW, Daub J, Eberhardt R, Tate J, Barquist L, Nawrocki EP, Eddy SR, Gardner PP, Bateman A. Burge SW, et al. Nucleic Acids Res. 2013 Jan;41(Database issue):D226-32. doi: 10.1093/nar/gks1005. Epub 2012 Nov 3. Nucleic Acids Res. 2013. PMID: 23125362 Free PMC article. - Customized strategies for discovering distant ncRNA homologs.
Mosig A, Zhu L, Stadler PF. Mosig A, et al. Brief Funct Genomic Proteomic. 2009 Nov;8(6):451-60. doi: 10.1093/bfgp/elp035. Epub 2009 Sep 24. Brief Funct Genomic Proteomic. 2009. PMID: 19779009 Review. - Computational identification of functional RNA homologs in metagenomic data.
Nawrocki EP, Eddy SR. Nawrocki EP, et al. RNA Biol. 2013 Jul;10(7):1170-9. doi: 10.4161/rna.25038. Epub 2013 May 20. RNA Biol. 2013. PMID: 23722291 Free PMC article. Review.
Cited by
- The transcriptomic landscape of Magnetospirillum gryphiswaldense during magnetosome biomineralization.
Riese CN, Wittchen M, Jérôme V, Freitag R, Busche T, Kalinowski J, Schüler D. Riese CN, et al. BMC Genomics. 2022 Oct 10;23(1):699. doi: 10.1186/s12864-022-08913-x. BMC Genomics. 2022. PMID: 36217140 Free PMC article. - 3D Modeling of Non-coding RNA Interactions.
Singh KP, Gupta S. Singh KP, et al. Adv Exp Med Biol. 2022;1385:281-317. doi: 10.1007/978-3-031-08356-3_11. Adv Exp Med Biol. 2022. PMID: 36352219 - A chromosome-level genome assembly of the Korean minipig (Sus scrofa).
Wy S, Kwon D, Park W, Chai HH, Cho IC, Kim J. Wy S, et al. Sci Data. 2024 Aug 3;11(1):840. doi: 10.1038/s41597-024-03680-8. Sci Data. 2024. PMID: 39097649 Free PMC article. - Comparative cytogenetics among Boana species (Anura, Hylidae): focus on evolutionary variability of repetitive DNA.
Venancio Neto S, Noleto RB, Azambuja M, Gazolla CB, Santos BR, Nogaroto V, Vicari MR. Venancio Neto S, et al. Genet Mol Biol. 2023 Jan 6;45(4):e20220203. doi: 10.1590/1678-4685-GMB-2022-0203. eCollection 2023. Genet Mol Biol. 2023. PMID: 36622243 Free PMC article. - Multiomics studies with co-transformation reveal microRNAs via miRNA-TF-mRNA network participating in wood formation in Hevea brasiliensis.
Chen J, Liu M, Meng X, Zhang Y, Wang Y, Jiao N, Chen J. Chen J, et al. Front Plant Sci. 2023 Aug 14;14:1068796. doi: 10.3389/fpls.2023.1068796. eCollection 2023. Front Plant Sci. 2023. PMID: 37645463 Free PMC article.
References
- Barquist L, Burge SW, & Gardner PP (2016). Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. Current Protocols in Bioinformatics / Editorial Board, Andreas D. Baxevanis … [et Al.], 54, 12.13.1–12.13.25. - PMC - PubMed
- Describes building RNA families using Infernal and introduces related tools and workflows.
- Cech TR, & Steitz JA (2014). The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones. Cell, 157(1), 77–94. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources