Non-Coding RNA Analysis Using the Rfam Database - PubMed (original) (raw)

Non-Coding RNA Analysis Using the Rfam Database

Ioanna Kalvari et al. Curr Protoc Bioinformatics. 2018 Jun.

Abstract

Rfam is a database of non-coding RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. Using a combination of manual and literature-based curation and a custom software pipeline, Rfam converts descriptions of RNA families found in the scientific literature into computational models that can be used to annotate RNAs belonging to those families in any DNA or RNA sequence. Valuable research outputs that are often locked up in figures and supplementary information files are encapsulated in Rfam entries and made accessible through the Rfam Web site. The data produced by Rfam have a broad application, from genome annotation to providing training sets for algorithm development. This article gives an overview of how to search and navigate the Rfam Web site, and how to annotate sequences with RNA families. The Rfam database is freely available at http://rfam.org. © 2018 by John Wiley & Sons, Inc.

Keywords: Infernal; RNA family; Rfam; genome annotation; non-coding RNA.

Copyright © 2018 John Wiley & Sons, Inc.

PubMed Disclaimer

Figures

Figure 12.5.1.

Figure 12.5.1.

Search box on the Rfam home page.

Figure 12.5.2.

Figure 12.5.2.

Example text search showing riboswitch families with a known 3D structure that are found in one or more Bacillus species. For each family the secondary structure, the number of annotated sequences, and the number of species where the family is found are displayed.

Figure 12.5.3.

Figure 12.5.3.

Browsing ncRNA clans (left) and motifs (right). For each clan the number of families is shown and the results can be sorted by the number of member families. For each motif the number of families where the motif occurs is shown and motifs can be sorted accordingly.

Figure 12.5.4.

Figure 12.5.4.

Searching for the human genome (left) and viewing the human genome summary page (right).

Figure 12.5.5.

Figure 12.5.5.

Searching for ncRNA families found in the human genome.

Figure 12.5.6.

Figure 12.5.6.

Browsing human snoRNA sequences. The results can be sorted by bit score or E-value (see Guidelines for Understanding Results for more information).

Figure 12.5.7.

Figure 12.5.7.

Sequence summary page of a human SCARNA2 sequence. The embedded genome browser shows the location of a small Cajal body-specific RNA 2 sequence (RF01268) on chromosome 1.

Figure 12.5.8.

Figure 12.5.8.

Summary tab for the SAM riboswitch family page (RF00162) showing a Wikipedia article about the family and a list of Rfam clans the family belongs to.

Figure 12.5.9.

Figure 12.5.9.

The Alignment tab enables viewing and downloading the Seed alignment in several formats.

Figure 12.5.10.

Figure 12.5.10.

Viewing part of the Seed alignment for SAM riboswitch. The alignment is colored by secondary structure helical regions.

Figure 12.5.11.

Figure 12.5.11.

A list of sequence regions that belong to a family can be found in the Sequences tab.

Figure 12.5.12.

Figure 12.5.12.

R-scape secondary structure visualisations for the SAM riboswitch (RF00162) shown in the Secondary structure tab. Two structures are shown: On the left, the R-scape analysis of the current secondary structure in the Rfam Seed alignment. On the right, an R-scape optimised structure predicted using the statistically significant covarying basepairs as folding constraints.

Figure 12.5.13.

Figure 12.5.13.

Secondary structure of the SAM riboswitch (RF00162) colored by sequence conservation (conserved nucleotides are red, variable nucleotides are blue).

Figure 12.5.14.

Figure 12.5.14.

R-chie visualisation of the Seed alignment and the consensus secondary structure of the SAM riboswitch (RF00162). Canonical basepairs are shown as blue arcs, and the green alignment columns indicate valid basepairs. This visualisation suggests that the Seed alignment is of reasonable quality.

Figure 12.5.15.

Figure 12.5.15.

The Structures tab lists the 3D structures from the Protein Data Bank that match the SAM riboswitch Rfam family (RF00162).

Figure 12.5.16.

Figure 12.5.16.

Sunburst representation of the taxonomic distribution of the SAM riboswitch family (RF00162).

Figure 12.5.17.

Figure 12.5.17.

Curation tab of the SAM riboswitch family (RF00162) showing the source of the Seed alignment and the secondary structure, the authors of the family, and the parameters used to build the covariance model.

Figure 12.5.18.

Figure 12.5.18.

Sequence search results showing an alignment between the query sequence (the #SEQ line) matching the covariance model of the tRNA family (the #CM line). The secondary structure predicted for the query sequence is shown in the #SS line.

Figure 12.5.19.

Figure 12.5.19.

Batch sequence search interface.

Figure 12.5.20.

Figure 12.5.20.

Batch search results in a tabular format showing ncRNA families found in Hepatitis delta virus genotype III (L22063.1).

Figure 12.5.21.

Figure 12.5.21.

Example output of an SQL query showing Rfam accessions (rfam_acc), sequence accessions (rfamseq_acc), the start and stop coordinates of the ncRNAs relative to the sequence accessions (seq_start and_seq_end_), and bit score (bit_score, see Background information for more details about bit scores).

Figure 12.5.22.

Figure 12.5.22.

Building an RNA family using Infernal. The Seed alignment is a starting point used to build a covariance model (CM) which is then used to search for more hits in a large sequence database. The hits may be added to the Seed alignment, if necessary. The Full alignment is an alignment of all sequences in a family. Cmbuild, cmsearch, and cmalign are Infernal programs used for building CMs, searching sequence database, and aligning sequences to the CMs, respectively.

Similar articles

Cited by

References

    1. Aken BL, Achuthan P, Akanni W, Amode MR, Bernsdorff F, Bhai J, … Flicek P (2017). Ensembl 2017. Nucleic Acids Research, 45(D1), D635–D642. - PMC - PubMed
    1. Barquist L, Burge SW, & Gardner PP (2016). Studying RNA Homology and Conservation with Infernal: From Single Sequences to RNA Families. Current Protocols in Bioinformatics / Editorial Board, Andreas D. Baxevanis … [et Al.], 54, 12.13.1–12.13.25. - PMC - PubMed
    2. Describes building RNA families using Infernal and introduces related tools and workflows.
    1. Bernhart SH, Hofacker IL, Will S, Gruber AR, & Stadler PF (2008). RNAalifold: improved consensus structure prediction for RNA alignments. BMC Bioinformatics, 9, 474. - PMC - PubMed
    1. Cech TR, & Steitz JA (2014). The Noncoding RNA Revolution—Trashing Old Rules to Forge New Ones. Cell, 157(1), 77–94. - PubMed
    1. Federhen S (2012). The NCBI Taxonomy database. Nucleic Acids Research, 40(Database issue), D136–43. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources