Genome-wide discovery of structured noncoding RNAs in bacteria - PubMed (original) (raw)

Genome-wide discovery of structured noncoding RNAs in bacteria

Shira Stav et al. BMC Microbiol. 2019.

Abstract

Background: Structured noncoding RNAs (ncRNAs) play essential roles in many biological processes such as gene regulation, signaling, RNA processing, and protein synthesis. Among the most common groups of ncRNAs in bacteria are riboswitches. These cis-regulatory, metabolite-binding RNAs are present in many species where they regulate various metabolic and signaling pathways. Collectively, there are likely to be hundreds of novel riboswitch classes that remain hidden in the bacterial genomes that have already been sequenced, and potentially thousands of classes distributed among various other species in the biosphere. The vast majority of these undiscovered classes are proposed to be exceedingly rare, and so current bioinformatics search techniques are reaching their limits for differentiating between true riboswitch candidates and false positives.

Results: Herein, we exploit a computational search pipeline that can efficiently identify intergenic regions most likely to encode structured ncRNAs. Application of this method to five bacterial genomes yielded nearly 70 novel genetic elements including 30 novel candidate ncRNA motifs. Among the riboswitch candidates identified is an RNA motif involved in the regulation of thiamin biosynthesis.

Conclusions: Analysis of other genomes will undoubtedly lead to the discovery of many additional novel structured ncRNAs, and provide insight into the range of riboswitches and other kinds of ncRNAs remaining to be discovered in bacteria and archaea.

PubMed Disclaimer

Conflict of interest statement

Not applicable.

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures

Fig. 1

Fig. 1

Overview of the search pipeline. Schematic representation of the GC-IGR analysis workflow. a-h Annotations in bold text represent major steps in the analytical pipeline. For a more detailed version of the pipeline, see Additional file 2: Figure S1

Fig. 2

Fig. 2

Plots of the IGRs from the HIMB5 genome sorted based on IGR length and GC content. a IGR plot prior to conducting our detailed analyses. Red line represents the boundary between unknown IGRs chosen for further analysis (upper right) and those that were not chosen (lower left). IGRs selected for further analysis are depicted with dark gray triangles whereas those not chosen are depicted with light gray triangles. b, (Top) A portion of the sector of interest from the HIMB5 IGR plot after analysis, updated to remove false IGRs that overlap known ORFs, to annotate IGRs carrying previously known ncRNAs, and to include all novel motifs identified in this study. No changes in the plot occurred outside of the area depicted. (Bottom) List of novel motif candidates. c, Summary of the classification of all 47 unknown IGRs from HIMB5 chosen for further analysis. Classifications are organized into five main groups (gray arcs) as annotated, wherein “unknown functions” encompasses categories 1 through 4 (unnamed, LRC, MRC and HRC), and the remaining groups are derived from category 5 (named), as described in the main text. The number of novel examples classified in each group are provided in the colored boxes. Classifications depicted as partially transparent lack a representative in the sector of interest in this genome. See Additional file 1: Table S1 for additional details regarding novel motifs

Fig. 3

Fig. 3

Sequence and secondary structure models for several candidate riboswitch classes identified in this study. a-d Four candidate riboswitch classes present in the HIMB5 genome. e A candidate riboswitch class present in the genome of T. lienii. f A candidate riboswitch class present in the genome of B. cicadellinicola. These consensus models are created by comparing all unique sequence representatives for each motif that were uncovered by homology searches of RefSeq 76 and certain metagenomic databases. See the text for details regarding each of these motifs and for hypotheses regarding their biological functions. Note that another riboswitch candidate, thiS, identified in the genome of C. novyi is presented in Fig. 4a

Fig. 4

Fig. 4

Structure and genetic context of the thiS motif. a Consensus sequence and secondary structure model for the thiS motif. Annotations are as described for Fig. 3. The P0 stem is predicted to exist if the lower portion of P1 fails to form. b (Top) Distribution of gene associations for the ~ 700 thiS motif representatives in bacteria. The chart incorporates the first five genes downstream of the thiS motif, and has a total of 1922 entries. These genes, which are typically in the thiS operon, are counted individually. (Bottom) Protein products of the genes abbreviated here, when known, catalyze the reaction steps for thiamin biosynthesis depicted in c. c The biosynthetic pathway of TPP in Bacillus subtilis. Acronyms starting from the top left are: aminoimidazole ribotide (AIR), hydroxymethyl-pyrimidine (HMP) hydroxymethyl-pyrimidine phosphate (HMP-P), hydroxymethyl-pyrimidine pyrophosphate (HMP-PP), hydroxyethyl-thiazole (HET), hydroxyethyl-thiazole phosphate (HET-P), thiamin monophosphate (TMP), and thiamin pyrophosphate (TPP). TMP (green shaded box) is formed by fusing the two compounds HMP-PP (blue shaded box) and HET-P (gold shaded box). Note that both HET-P and TPP can be synthesized through a salvage pathway starting with HET and thiamin, respectively. HMP-PP and HET-P were proposed as the top ligand candidates for the thiS riboswitch candidate. Metabolic scheme is based on that published previously [56]

Fig. 5

Fig. 5

Reporter gene expression is regulated by the thiS riboswitch candidate. a Sequence and predicted secondary structure of the WT thiS RNA associated with the thiS gene of C. maddingley, which was fused to a β-galactosidase reporter gene (lacZ) and a B. subtilis lysC promoter to drive transcription. The lysC promoter was chosen for this purpose because it is known to strongly promote transcription without regulation [73]. Encircled 88 designates the number of additional nucleotides between the end of the terminator element and the lacZ reporter gene sequence. Red nucleotides are > 97% conserved in the thiS consensus model (Fig. 4a). b Reporter gene expression of WT B. subtilis cells and cells lacking the coding region for the ThiS protein (Δ_thiS_) grown in minimal (GMM) liquid media. c Agar diffusion assay of the Δ_thiS B. subtilis_ strain with a WT riboswitch reporter construct. The filter disk was spotted with 10 mM thiamin on a minimal (GMM) agar medium plate with 100 μg mL− 1 X-Gal

Fig. 6

Fig. 6

Representatives of various types of predicted structured nucleic acid motifs discovered among five bacterial genomes. a-e Sequence and predicted secondary structure models for representative ‘named’ motifs identified among the five bacterial genomes examined in this study. Extended blue shading in a and b designate possible short ORFs. For the translated WebLogo consensus sequence in a, amino acids in blue, green, and black are hydrophilic, neutral, and hydrophobic, respectively. The two candidate uORFs in b are associated with shikimate metabolism genes, and notable amino acids related to this pathway and encoded by the uORFs include phenylalanine [F] and tyrosine [Y]. The protein binding candidate in c is depicted with two pyrimidine-rich sequences highlighted that might function as protein binding sites. In addition to the type I consensus depicted, representatives conforming to a type II (only one hairpin-loop similar to P3) and a type III (terminator stem only) consensus also exist. RBS designates ribosome binding sites. Additional annotations are as described in the legend to Fig. 3. f Comprehensive summary of the fate of the unknown IGRs after analysis of the five bacterial genomes examined in this study. Annotations are as described in the legend to Fig. 2c

Similar articles

Cited by

References

    1. Breaker RR. Riboswitches and the RNA world. Cold Spring Harb Perspect Biol. 2012;4. - PMC - PubMed
    1. Sherwood AV, Henkin TM. Riboswitch-mediated gene regulation: novel RNA architectures dictate gene expression responses. Annu Rev Microbiol. 2016;70:361–374. doi: 10.1146/annurev-micro-091014-104306. - DOI - PubMed
    1. Peselis A, Serganov A. Themes and variations in riboswitch structure and function. Biochim Biophys Acta. 2014;1839:908–918. doi: 10.1016/j.bbagrm.2014.02.012. - DOI - PMC - PubMed
    1. Barrick JE, Corbino KA, Winkler WC, Nahvi A, Mandal M, Collins J, et al. New motifs suggest and expanded scope for riboswitches in bacterial genetic control. Proc Natl Acad Sci U S A. 2004;101:6421–6426. doi: 10.1073/pnas.0308014101. - DOI - PMC - PubMed
    1. Weinberg Z, Barrick JE, Yao Z, Roth A, Kim JN, Gore J, et al. Identification of 22 candidate structured RNAs in bacteria using the CMfinder comparative genomics pipeline. Nucleic Acids Res. 2007;35:4809–4819. doi: 10.1093/nar/gkm487. - DOI - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources