Frequency and isostericity of RNA base pairs - PubMed (original) (raw)

Frequency and isostericity of RNA base pairs

Jesse Stombaugh et al. Nucleic Acids Res. 2009 Apr.

Abstract

Most of the hairpin, internal and junction loops that appear single-stranded in standard RNA secondary structures form recurrent 3D motifs, where non-Watson-Crick base pairs play a central role. Non-Watson-Crick base pairs also play crucial roles in tertiary contacts in structured RNA molecules. We previously classified RNA base pairs geometrically so as to group together those base pairs that are structurally similar (isosteric) and therefore able to substitute for each other by mutation without disrupting the 3D structure. Here, we introduce a quantitative measure of base pair isostericity, the IsoDiscrepancy Index (IDI), to more accurately determine which base pair substitutions can potentially occur in conserved motifs. We extract and classify base pairs from a reduced-redundancy set of RNA 3D structures from the Protein Data Bank (PDB) and calculate centroids (exemplars) for each base combination and geometric base pair type (family). We use the exemplars and IDI values to update our online Basepair Catalog and the Isostericity Matrices (IM) for each base pair family. From the database of base pairs observed in 3D structures we derive base pair occurrence frequencies for each of the 12 geometric base pair families. In order to improve the statistics from the 3D structures, we also derive base pair occurrence frequencies from rRNA sequence alignments.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

Representation of the three contributions to the IDI illustrated using non-isosteric base pairs. To calculate the IDI for two base pairs, the bases designated ‘first base’ in each base pair are superposed (bases on the left in each panel) and then the following three quantities are evaluated, normalized and summed: (1) The difference, Δ_c_, in the intra-base pair C1′–C1′ distances, illustrated for two non-isosteric cWW base pairs, AG and AU. (2) The inter-base pair C1′–C1′ distance, _t_1, between the C1′ atoms of the second bases of the base pairs, illustrated for the near isosteric cWW AU and AC base pairs. We also calculate the corresponding distance _t_2 after first superposing the second bases of the base pairs. (3) The angle, θ, about an axis perpendicular to the base pair plane, required to superpose the second bases, illustrated using non-isosteric cWW AU and cWS AU base pairs. For some pairs of base pairs, a 180° rotation (flip) about an axis in the base pair plane is required to superpose the second bases (case not shown).

Figure 2.

Figure 2.

Histograms of IDIs between sets of identical (upper left), isosteric (upper right), near isosteric (lower left) and non-isosteric (lower right) base pair instances from the 3D structures in the reduced-redundancy dataset having better than 3.0 Å resolution. Upper left: IDIs calculated between identical base pairs (i.e. GC cWW with GC cWW, UA tWH with UA tWH, etc.). Upper right: IDIs between 200 GC cWW and 200 UA cWW pairs. Lower left: IDIs between 200 GC cWW and 200 GU cWW pairs. Lower right: IDIs between 200 GU cWW and 200 UG cWW pairs.

Figure 3.

Figure 3.

Part of 3D structural alignment of E. coli and H. marismortui 23S rRNAs, illustrating structural conservation of a complex motif of Domain I that includes Helix 24. (a) The 3D structural alignment ofcorresponding base pairs from the E. coli (left) and H. marismortui (right) structures. (b) The annotated 2D structures for E. coli and H. marismortui using the base pair symbols. (c) Stereo view of the E. coli 3D structure, highlighting bases that differ between structures. The base pairs in the alignment and in the 2D and 3D structures are color-coded by geometric base pair family. Letters that correspond to bases which differ between organisms are marked in the secondary structure by a magenta circle and in the 3D structure with thicker lines.

Figure 4.

Figure 4.

Histograms of IDIs between actual base pairs in the 3D–3D alignment of E. coli and T. thermophilus 5S, 16S and 23S rRNAs. The IDIs used in these histograms were calculated before the revision of the 3D structures to correct syn-anti errors. The upper-left panel shows the IDI between all aligned base pairs, whether in the same geometric family or not. The base pairs with IDI > 6.0 are discussed in section ‘Base pair discrepancies between aligned positions in the rRNA 3D structural alignments’. The upper-right panel shows the IDI between aligned base pairs that belong to the same geometric family, and the lower panels subdivide these into two cases, those in which with identical base combinations (lower left) and those with different base combinations (lower right). All IDI values above 6 are placed in the rightmost bin in each histogram.

Figure 5.

Figure 5.

A graphical summary of the base pair occurrence frequencies within each base pair family, obtained from rRNA sequence data (data from

Supplementary Table S8

). For cWW, tHH, tWH, tHS, tWS and tSS, one base combination accounts for >50% of instances. The gray boxes in each matrix indicate base combinations that do not form that type of base pair. For example, there is no GG cWW base pair.

Similar articles

Cited by

References

    1. Leontis NB, Stombaugh J, Westhof E. The non-Watson-Crick base pairs and their associated isostericity matrices. Nucleic Acids Res. 2002;30:3497–3531. - PMC - PubMed
    1. Leontis NB, Westhof E. Geometric nomenclature and classification of RNA base pairs. RNA. 2001;7:499–512. - PMC - PubMed
    1. Griffiths-Jones S, Bateman A, Marshall M, Khanna A, Eddy SR. Rfam: an RNA family database. Nucleic Acids Res. 2003;31:439–441. - PMC - PubMed
    1. Wuyts J, Perriere G, Van De Peer Y. The European ribosomal RNA database. Nucleic Acids Res. 2004;32:D101–D103. - PMC - PubMed
    1. Deshpande N, Addess KJ, Bluhm WF, Merino-Ott JC, Townsend-Merino W, Zhang Q, Knezevich C, Xie L, Chen L, Feng Z, et al. The RCSB Protein Data Bank: a redesigned query system and relational database based on the mmCIF schema. Nucleic Acids Res. 2005;33:D233–D237. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources