ConStruct: Improved construction of RNA consensus structures - PubMed (original) (raw)

ConStruct: Improved construction of RNA consensus structures

Andreas Wilm et al. BMC Bioinformatics. 2008.

Abstract

Background: Aligning homologous non-coding RNAs (ncRNAs) correctly in terms of sequence and structure is an unresolved problem, due to both mathematical complexity and imperfect scoring functions. High quality alignments, however, are a prerequisite for most consensus structure prediction approaches, homology searches, and tools for phylogeny inference. Automatically created ncRNA alignments often need manual corrections, yet this manual refinement is tedious and error-prone.

Results: We present an extended version of CONSTRUCT, a semi-automatic, graphical tool suitable for creating RNA alignments correct in terms of both consensus sequence and consensus structure. To this purpose CONSTRUCT combines sequence alignment, thermodynamic data and various measures of covariation. One important feature is that the user is guided during the alignment correction step by a consensus dotplot, which displays all thermodynamically optimal base pairs and the corresponding covariation. Once the initial alignment is corrected, optimal and suboptimal secondary structures as well as tertiary interaction can be predicted. We demonstrate CONSTRUCT's ability to guide the user in correcting an initial alignment, and show an example for optimal secondary consensus structure prediction on very hard to align SECIS elements. Moreover we use CONSTRUCT to predict tertiary interactions from sequences of the internal ribosome entry site of CrP-like viruses. In addition we show that alignments specifically designed for benchmarking can be easily be optimized using CONSTRUCT, although they share very little sequence identity.

Conclusion: CONSTRUCT's graphical interface allows for an easy alignment correction based on and guided by predicted and known structural constraints. It combines several algorithms for prediction of secondary consensus structure and even tertiary interactions. The CONSTRUCT package can be downloaded from the URL listed in the Availability and requirements section of this article.

PubMed Disclaimer

Figures

Figure 1

Flowchart and graphical user interface of ConStruct. Steps are numbered as in the text. The graphical user interface (grey part) shows results of a structural alignment for IRES regions of CrP-like viruses [45]; for a full view and further details of this alignment see Fig. 3. The main windows of CONSTRUCT are the "Consensus Dotplot" and the "Alignment Editor". The top-right triangle in the consensus dotplot shows thermodynamic base pairing probability of individual sequences (blue/green) and thermodynamic consensus matrix (red), the horizontal and vertical bars denote gaps; the lower-left triangle shows the MI (as a measure of covariance) normalized by pair entropy and a threshold of _t_CV = 50 % applied. Predicted structures may be displayed in several representations and formats. On the right side, two possible representations are shown. The Circles plot (upper window) shows the consensus structure as predicted by maximum weighted matching (MWM); consensus pairing probability is color-coded from white to red. The crossing arcs represent pseudoknots. Below the "Structural Alignment Output" is shown. From top to bottom: ten sequences [with background colors green for loops, red for consensus base pairs, pink for consensus base pair changes (covarying pairs), and white for non-base pairs in paired regions], the consensus sequence, and the consensus structure in bracket-dot notation and character-encoded (both with background colors from white to red proportional to sequence conservation resp. pairing probability). For an overview of colors used in CONSTRUCT see Table S4 in Additional file 1.

Figure 2

Visualization of alignments by ConStruct. An alignment of SECIS elements created by CLUSTALW (A) and after manual optimization/correction using CONSTRUCT (B). In both cases predicted consensus structures and CONSTRUCT's GUI are shown. For an overview of colors used in CONSTRUCT see Table S4 in Additional file 1. Top left: Corresponding drawings of consensus structures (annotated with the consensus sequence) generated by CONSTRUCT; consensus base pairing probability is color-coded from white to red. Top right: Corresponding dotplots: the base pairing probability of individual sequences (dark blue for the selected sequence M_janaschii_sps and green for others) is shown top-right in CONSTRUCT's main window; yellow to red dots show the consensus pairing probability; white to light blue bars denote gaps. The lower-left triangle shows the MI normalized by pair entropy with a threshold of _t_CV = 30 % in rainbow-colors from yellow to red. The cursors in A and B (arrow in thermodynamics part and black square in MI part) point to a similar position. Bottom: Corresponding alignment windows. Nucleotides participating in a base pair to which the cursor points in the dotplot are automatically highlighted [colored by pairing probability from p = 0 (black) to p = 1 (red)]. The motif GAA (turquoise background), which is conserved in the internal loop, has been highlighted using the built-in regular expression search. Clicking with a mouse button to position 3 and to position 25 of the last sequence (M_jannaschii_fmfdh_B; see red cursors) in the alignment editor selects this subsequence; clicking once with left or right mouse button to the double-headed arrow moves the subsequence towards 5' or 3' end, respectively, by one position; in the top-right dotplot the corresponding base pairs are automatically positioned. Similarly, clicking to a 5' and a 3' nucleotide of two different sequences (for an example see blue cursors) selects all corresponding subsequences from the sequence range; if none of the subsequences ends in a gap and all are followed by a gap, the subsequence range is moved towards the gap by clicking to the double-headed arrow.

Figure 3

Alignment of internal ribosomal entry sites (IRES) from CrP-like viruses [45]. Subfigures A-C are created by means of CONSTRUCT using the RNAALIFOLD covariation score including stacking and parameters _w_TD = 0.5, _w_CV = 0.5, _t_TD = 0.03, and _t_CV = 0.15. Colored bars and labels are added in a graphics program according to the nomenclature given in [45]. The used color-coding is explained in Table S4 in Additional file 1. Sensitivity and specificity are above 90 % compared to the structures given in [47] and [46]; falsely predicted are only a few additional, non-contradictory base pairs, for example those labelled by "j" in figure part C. A: Dotplot; note that base pairs, which give rise to the pseudoknots (PK I-III), are present not only in the covariation plot (lower triangle) but also, with low probability, in the thermodynamics plot (upper triangle); i. e., they are part of suboptimal secondary structures. B: Circles plot. C: Structural alignment output.

Cited by

Advances in RNA structure prediction from sequence: new tools for generating hypotheses about viral RNA structure-function relationships.
Schroeder SJ. Schroeder SJ. J Virol. 2009 Jul;83(13):6326-34. doi: 10.1128/JVI.00251-09. Epub 2009 Apr 15. J Virol. 2009. PMID: 19369331 Free PMC article. Review. No abstract available.
DMfold: A Novel Method to Predict RNA Secondary Structure With Pseudoknots Based on Deep Learning and Improved Base Pair Maximization Principle.
Wang L, Liu Y, Zhong X, Liu H, Lu C, Li C, Zhang H. Wang L, et al. Front Genet. 2019 Mar 4;10:143. doi: 10.3389/fgene.2019.00143. eCollection 2019. Front Genet. 2019. PMID: 30886627 Free PMC article.
Alternative splicing of anciently exonized 5S rRNA regulates plant transcription factor TFIIIA.
Fu Y, Bannach O, Chen H, Teune JH, Schmitz A, Steger G, Xiong L, Barbazuk WB. Fu Y, et al. Genome Res. 2009 May;19(5):913-21. doi: 10.1101/gr.086876.108. Epub 2009 Feb 10. Genome Res. 2009. PMID: 19211543 Free PMC article.
Structural Alignment and Covariation Analysis of RNA Sequences.
Tourasse NJ, Darfeuille F. Tourasse NJ, et al. Bio Protoc. 2020 Feb 5;10(3):e3511. doi: 10.21769/BioProtoc.3511. eCollection 2020 Feb 5. Bio Protoc. 2020. PMID: 33654736 Free PMC article.
Conserved Motifs and Domains in Members of Pospiviroidae.
Wüsthoff KP, Steger G. Wüsthoff KP, et al. Cells. 2022 Jan 11;11(2):230. doi: 10.3390/cells11020230. Cells. 2022. PMID: 35053346 Free PMC article.

References

1. Gräf S, Strothmann D, Kurtz S, Steger G. A computational approach to search for non-coding RNAs in large genomic data. In: Nellen W, Hammann C, editor. Small RNAs: Analysis and Regulatory functions of Nucleic Acids and Molecular Biology Series. Vol. 17. Springer Verlag; 2006. pp. 57–74.
1. Klein R, Eddy S. RSEARCH: finding homologs of single structured RNA sequences. BMC Bioinf. 2003;4:44. doi: 10.1186/1471-2105-4-44. - DOI - PMC - PubMed
1. Schöniger M, von Haeseler A. Toward assigning helical regions in alignments of ribosomal RNA and testing the appropriateness of evolutionary models. J Mol Evol. 1999;49:691–698. doi: 10.1007/PL00006590. - DOI - PubMed
1. Wolf M, Achtziger M, Schultz J, Dandekar T, Müller T. Homology modeling revealed more than 20,000 rRNA internal transcribed spacer 2 (ITS2) secondary structures. RNA. 2005;11:1616–1623. doi: 10.1261/rna.2144205. - DOI - PMC - PubMed
1. Caetano-Anolles G. Grass evolution inferred from chromosomal rearrangements and geometrical and statistical features in RNA structure. J Mol Evol. 2005;60:635–652. doi: 10.1007/s00239-004-0244-z. - DOI - PubMed

ConStruct: Improved construction of RNA consensus structures - PubMed (original) (raw)