Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families - PubMed (original) (raw)

Rfam 13.0: shifting to a genome-centric resource for non-coding RNA families

Ioanna Kalvari et al. Nucleic Acids Res. 2018.

Abstract

The Rfam database is a collection of RNA families in which each family is represented by a multiple sequence alignment, a consensus secondary structure, and a covariance model. In this paper we introduce Rfam release 13.0, which switches to a new genome-centric approach that annotates a non-redundant set of reference genomes with RNA families. We describe new web interface features including faceted text search and R-scape secondary structure visualizations. We discuss a new literature curation workflow and a pipeline for building families based on RNAcentral. There are 236 new families in release 13.0, bringing the total number of families to 2687. The Rfam website is http://rfam.org.

© The Author(s) 2017. Published by Oxford University Press on behalf of Nucleic Acids Research.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Growth in the number of RNA families grouped by RNA type in major database releases. The other RNA types group includes types with less than 50 families, such as rRNA, tRNA, snRNA or riboswitches.

Figure 2.

Figure 2.

Overview of species annotated with RNA families in Rfam 13.0. The tree is based on NCBI Taxonomy and was generated using iToL (11).

Figure 3.

Figure 3.

Browsing Mammalian families and genomes annotated in Rfam. The entries can be filtered using facets and sorted by multiple criteria.

Figure 4.

Figure 4.

Sequence summary page for Homo sapiens small Cajal body-specific RNA 2 sequence located in chromosome 1.

Figure 5.

Figure 5.

R-scape visualisation of SAM riboswitch (RF00162,

http://rfam.org/family/SAM

). Left: The current Rfam 13.0 SAM riboswitch seed alignment and consensus secondary structure; 19 of the 27 basepairs in the alignment show statistically significant covariation. Right: The R-scape improved SAM riboswitch seed alignment and consensus secondary structure; 27 out of 36 basepairs show statistically significant covariation. The structures are displayed using R2R (16); significant basepairs, as defined by R-scape, are shown in green. Other colours and markup of the structure diagrams are explained in the legend on the far right.

Figure 6.

Figure 6.

The number of papers curated to extract RNA sequences between January 2016 and June 2017. In total, 660 papers were selected for curation, 260 of which were processed to build 236 new families (some papers were not used for family building for various reasons).

Figure 7.

Figure 7.

Annotating RNAcentral with Rfam families. About 1.8% of RNAcentral could be used as a source of new Rfam families.

References

    1. Aken B.L., Achuthan P., Akanni W., Amode M.R., Bernsdorff F., Bhai J., Billis K., Carvalho-Silva D., Cummins C., Clapham P. et al. . Ensembl 2017. Nucleic Acids Res. 2017; 45:D635–D642. - PMC - PubMed
    1. Speir M.L., Zweig A.S., Rosenbloom K.R., Raney B.J., Paten B., Nejad P., Lee B.T., Learned K., Karolchik D., Hinrichs A.S. et al. . The UCSC Genome Browser database: 2016 update. Nucleic Acids Res. 2016; 44:D717–D725. - PMC - PubMed
    1. Nawrocki E.P., Eddy S.R.. Infernal 1.1: 100-fold faster RNA homology searches. Bioinformatics. 2013; 29:2933–2935. - PMC - PubMed
    1. Gene Ontology Consortium Gene Ontology Consortium: going forward. Nucleic Acids Res. 2015; 43:D1049–D1056. - PMC - PubMed
    1. Eilbeck K., Lewis S.E., Mungall C.J., Yandell M., Stein L., Durbin R., Ashburner M.. The Sequence Ontology: a tool for the unification of genome annotations. Genome Biol. 2005; 6:R44. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources