Protein–RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures (original) (raw)

Journal Article

,

Department of Biochemistry and Cell Biology, Rice University

6100 Main Street, Houston, TX 77005, USA

Search for other works by this author on:

,

Department of Biochemistry and Cell Biology, Rice University

6100 Main Street, Houston, TX 77005, USA

Search for other works by this author on:

,

Department of Biochemistry and Cell Biology, Rice University

6100 Main Street, Houston, TX 77005, USA

Search for other works by this author on:

Department of Biochemistry and Cell Biology, Rice University

6100 Main Street, Houston, TX 77005, USA

*To whom correspondence should be addressed.

Search for other works by this author on:

†The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

§Present address: Department of Epidemiology and Nutrition, Harvard School of Public Health, Boston, MA, USA

Associate Editor: Anna Tramontano

Author Notes

Revision received:

31 August 2006

Published:

11 September 2006

Cite

N. Morozova, J. Allers, J. Myers, Y. Shamoo, Protein–RNA interactions: exploring binding patterns with a three-dimensional superposition analysis of high resolution structures, Bioinformatics, Volume 22, Issue 22, November 2006, Pages 2746–2752, https://doi.org/10.1093/bioinformatics/btl470
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Motivation: The recognition of specific RNA sequences and structures by proteins is critical to our understanding of RNA processing, gene expression and viral replication. The diversity of RNA structures suggests that RNA recognition is substantially different than that of DNA.

Results: The atomic coordinates of 41 protein–RNA complexes have been used to probe composite nucleoside binding pockets that form the structural and chemical underpinnings of base recognition. Composite nucleoside binding pockets were constructed using three-dimensional superpositions of each RNA nucleoside. Unlike protein–DNA interactions which are dominated by accessibility, RNA recognition frequently occurs in non-canonical and single-strand-like structures that allow interactions to occur from a much wider set of geometries and make fuller use of unique base shapes and hydrogen-bonding ability. By constructing composites that include all van der Waals, hydrogen-bonding, stacking and general non-polar interactions made to a particular nucleoside, the strategies employed are made readily visible. Protein–RNA interactions can result in the formation of a glove-like tight binding pocket around RNA bases, but the size, shape and non-polar binding patterns differ between specific RNA bases. We show that adenine can be distinguished from guanine based on the size and shape of the binding pocket and steric exclusion of the guanine N2 exocyclic amino group. The unique shape and hydrogen-bonding pattern for each RNA base allow proteins to make specific interactions through a very small number of contacts, as few as two in some cases.

Availability: The program ENTANGLE is available from Author Webpage

Contact: shamoo@rice.edu

1 INTRODUCTION

The stereochemical principles underlying RNA recognition have been the subject of intense interest in the RNA community. A better understanding of these principles is necessary for continued development in the areas of protein engineering and structure–function analysis of protein–RNA interactions (Draper, 1999). The study of protein–DNA (Jones et al., 1999; Luscombe et al., 2000, 2001; Mandel-Gutfreund et al., 1995; Nadassy et al., 1999; Pabo and Nekludova, 2000) and protein–protein (Ippolito et al., 1990; Jones and Thornton, 1997; Salwinski and Eisenberg, 2003; Xenarios et al., 2000, 2002) interactions using high resolution structures has provided new insights and methodological approaches to the study of protein–RNA interactions (Allers and Shamoo, 2001; Cheng et al., 2003; Cheng and Frankel, 2004; Hermann and Westhof, 1999; Jones et al., 2001; Treger and Westhof, 2001; Walberer et al., 2003). Three-dimensional analysis of protein–DNA structures has been very successful for the elucidation of protein–DNA interactions but requires a sufficiently large pool of moderately high resolution structures. For example, Luscombe et al. (2001) used 129 protein–DNA structures to identify generic rules of nucleotide recognition by focusing on van der Waals, hydrogen-bonding and water-mediated interactions (Luscombe et al., 2001). With the recent increase in the number of protein–RNA structures available, three-dimensional structural bioinformatic approaches can provide useful insights into specific RNA recognition. It has already been shown that RNA recognition differs quite substantially from that of DNA with a greater percentage of protein–nucleic acid interactions made to the RNA base edge and sugar than to the phosphate backbone (Allers and Shamoo, 2001; Lejeune et al., 2005). The differences in RNA and DNA recognition highlight the importance of appropriate bioinformatic analysis.

To date, most studies have emphasized the importance of hydrogen bonding in base recognition and demonstrated that, in principle, these interactions could be largely sufficient for specificity (Allers and Shamoo, 2001; Cheng et al., 2003; Jeong et al., 2003; Treger and Westhof, 2001). A good corroboration between predicted and observed protein–RNA hydrogen-bonding interactions has been established using a combination of molecular dynamics and identification of the proposed hydrogen-bonding stereochemistries in the protein data base (Cheng et al., 2003; Walberer et al., 2003). In addition, Allers and Shamoo (2001) and Treger and Westhof (2001) showed that protein main chain interactions make up >32% of the possible hydrogen bonds to RNA and suggested that these main chain interactions help form a tight binding pocket. Jeong et al. (2003) looked at the direct and water-mediated hydrogen-bonding properties of both amino acids and RNA bases and showed that histidine, arginine, threonine and lysine have the highest propensity to bind with RNA while uracil has the highest propensity to bind to protein, followed by adenine, guanine and cytosine, respectively. These findings from bioinformatics approaches have supported a model for RNA–protein interactions in which non-helical conformations of RNA make bases much more accessible than is seen in canonical A- or B-form helices. The exposed nucleic acid bases provide excellent recognition features for proteins and allow for a much richer ensemble of interactions. The composite base binding pockets constructed for this work illustrate how these recognition features are used and reflect the unique properties of folded RNAs.

We used ENTANGLE, a Java-based program developed in this laboratory, to build an interaction database of protein binding pockets around RNA bases from 41 protein–RNA complexes at <2.8 Å resolution. ENTANGLE uses structures in PDB format to identify potential hydrogen-bonding, stacking, electrostatic, non-polar and van der Waals interactions (Allers and Shamoo, 2001). We previously used this program to study only hydrogen-bonding and electrostatic interactions; since that investigation we have performed three-dimensional superpositions of all interactions made to each nucleoside in the 41 selected structures. Our program differs from AANT, a similar program developed by Hoffman et al. (2003) which identifies how individual amino acids tend to hydrogen bond to specific nucleosides, in that we have identified how the entire population of hydrogen-bonding, van der Waals, non-polar and stacking interactions form base recognition surfaces (Hoffman et al., 2004). Because of the difficulty involved in identifying the pucker of the ribose sugar at ∼2.8 Å, this study was performed only on the base (Dodson et al., 1996). We excluded symmetry related crystallographic chains from the collection of structures to avoid biasing the database towards crystal forms that contain multiple copies of the same protein in the asymmetric unit.

Discrimination among nucleosides is a key issue in RNA recognition. This study suggests that complex binding pockets use shape complementation from van der Waals and non-polar contacts to provide a tight binding surface around the base. Critical hydrogen-bonding donors/acceptors stud the pocket at appropriate positions allowing different bases to be distinguished. An interesting question in protein recognition of ligands is how proteins distinguish between two very similar structures. Nobeli et al. (2001) studied the molecular recognition and discrimination of adenine and guanine in DNA, focusing on the differences in the hydrogen-bonding patterns of these two bases to proteins (Nobeli et al., 2001). They concluded that the protein environments of these two ligands are significantly different and that substituting an adenine for guanine leads to a severe loss of essential hydrogen-bonding interactions (Nobeli et al., 2001). Like DNA, hydrogen bonding to adenine and guanine in RNA shows a distinct pattern. We extended this analysis to cytosine and uracil and saw the same marked difference in the hydrogen-bonding patterns. Although the smaller pyrimidine bases could readily fit into purine recognition pockets they would be less able to satisfy the pattern of hydrogen-bonding donors/acceptors presented.

Our results show that proteins bind RNA by forming tight pockets made up of non-polar and van der Waals interactions that can extend to available solvent exposed surfaces. In many cases an atom from the protein is positioned near the adenine C2 or pyrimidine C5 to sterically exclude other bases. In addition to selection of particular bases by hydrogen-bonding patterns, steric exclusion is a powerful means of base discrimination that is often overlooked in the analysis of protein–RNA interactions.

2 METHODS

2.1 Construction of protein–RNA interaction database

Forty-one non-redundant entries were selected from the Protein Data Bank (PDB). Each entry contained both protein and RNA and were refined to ∼2.8 Å resolution. The PDBs and chains selected were: 1HC8 (B & D), 1I6U (A & C), 1G1X (B & E), 1A34, 1A9N (A & Q), 1AV6, 1B23, 1C0A, 1CVJ (A & M), 1G2E, 1URN (A & P), 2UP1, 1A1V, 1B7F (A & P), 1C9S (U & W), 1DI2 (A & C, A & D, A & E), 1DK1, 1DUL, 1EC6 (A & D, B & C), 1EXD, 1TTT (A & D, A & B), 2A8V (B & E), 2BBV (C & N), 2FMT (A & C), 1E7X (A & R), 1QU2, 1MMS (A & C), 1DFU, 1JJ2, 1M5K, (A & C), 1KQ2 (B & R), 1KNZ (A & W, B & W), 1JBS (A & C), 1G59 (A & B), 1K8W, 1IL2 (A & C), 1GTN (V & W), 1I5L (E & U, F & U), 1LNG, 1JID and 1HQ1. NMR structures were not included since it was difficult to define the accuracy of the ensemble of structures in terms of displacement that was directly comparable to the X-ray diffraction studies. As done by Allers and Shamoo (2001), in cases when the PDB contained multiple copies of the complex in the asymmetric unit, only one copy of the structure was used (Allers and Shamoo, 2001). The exclusion of these copies reduces the likelihood of inferring interaction patterns from a biased dataset. The PDB entries were analyzed with ENTANGLE (Allers and Shamoo, 2001). The analysis was only on interactions to the RNA nucleosides, and therefore, interactions to the phosphate and the ribose backbone were deleted from the database. The resulting database was created in MySQL using the Structured Query Language (SQL) and Java Database Connectivity (JDBC). The 3D plots that appear in this article were made using RIBBONS (Carson, 1991).

2.2 ENTANGLE classification of interaction cutoffs

ENTANGLE is a JAVA program that classifies and sorts potential protein–nucleic acid interactions and has been described previously (Allers and Shamoo, 2001). ENTANGLE has been modified for this study and incorporates additional functionality that has been added to perform linear transformations on the atomic coordinates in each PDB so as to overlap and orient all the RNA bases in the database in the same position. This has permitted the superposition of all interactions to an RNA base located at different positions and orientations within a single PDB as well as of the interactions to that base from multiple PDBs. The hydrogen-bonding cutoffs were generous to allow a comprehensive view of the stereochemical distributions with respect to distance and geometry: donor to acceptor distance ≤3.9 Å; hydrogen atom to acceptor distance ≤2.5 Å; donor-hydrogen-acceptor angle cutoff >90°C. ENTANGLE determines van der Waals interaction distance based on the sum of the van der Waals radii of the two atoms plus 0.8 Å. As previously discussed by Allers and Shamoo (2001), the van der Waals radii used are those given in the Amber 94 Force field (Cornell et al., 1995). Non-polar contacts are defined in ENTANGLE as hydrophobic interactions between atoms that are ≤5 Å apart. The classification and assignment of putative interactions is based entirely on geometry, and there is no energetic evaluation or ranking of the interactions. ENTANGLE is available from (Author Webpage). A LINUX version of ENTANGLE as well as our interaction database are also available (shamoo@rice.edu).

3 RESULTS

This work presents an aggregate approach to studying protein–RNA interactions from an ensemble of moderately high resolution structures. Protein–RNA interactions from 41 structures were combined into a single database and used to make composite nucleoside binding pockets for adenine, guanine, uracil and cytosine (Fig. 1). The strength of representing data in this way is the relative ease of interpretation. Thousands of interactions can be readily seen to cluster into specific stereochemical geometries that confer specificity by a combination of complementary hydrogen bond donor/acceptor relationships as well as close-fitting van der Waals interactions. Since RNA molecules can assume a wider array of conformations than duplex DNA, the interactions to RNA are distributed relatively evenly throughout the volume surrounding the base. It can also be inferred from the data which atoms have more of their surface available for interactions with the protein (Tables 1–4, Fig. 2). Since protein–RNA interactions often involve co-folding, the ability to form an RNA recognition pocket is critical for proper RNA recognition. Figure 1 shows the formation of these pockets upon ligand binding.

Composite nucleoside binding pockets produced by superposition of each base from 41 high resolution protein–RNA structures to illustrate the stereochemical interactions responsible for base recognition: (a) hydrogen bond donor and acceptor positions within 2.5 Å of the plane made to each base (b) hydrogen-bonding and van der Waals contacts (c) van der Waals, non-polar and hydrogen-bonding interactions (d) van der Waals, non-polar and hydrogen-bonding interactions shown after a 90° rotation of the base to show the orthogonal cross section. Atomic interactions are classified by the program ENTANGLE. Bases are shown as ball and stick models with nitrogen (blue), oxygen (red), carbon (yellow) and hydrogens used as donors (light blue). Atoms from the protein making interactions to the base are shown as spheres: hydrogen bond donors (blue); hydrogen bond acceptors (red); van der Waals contacts (green); non-polar contacts (gray). The figure was made using RIBBONS (Carson, 1991). There is a marked increase in the number of non-polar contacts made to the adenine C2 and these help define a specific adenine recognition surface in many proteins. Ninety-eight van der Waals contacts to the adenine C2 were observed which is higher than the number of hydrogen bonds observed for either adenine N6 or N7. Van der Waals and non-polar contacts are relatively homogeneous in distribution except at the point where the glycosidic bond is made to the base (arrow).

Fig. 1

Composite nucleoside binding pockets produced by superposition of each base from 41 high resolution protein–RNA structures to illustrate the stereochemical interactions responsible for base recognition: (a) hydrogen bond donor and acceptor positions within 2.5 Å of the plane made to each base (b) hydrogen-bonding and van der Waals contacts (c) van der Waals, non-polar and hydrogen-bonding interactions (d) van der Waals, non-polar and hydrogen-bonding interactions shown after a 90° rotation of the base to show the orthogonal cross section. Atomic interactions are classified by the program ENTANGLE. Bases are shown as ball and stick models with nitrogen (blue), oxygen (red), carbon (yellow) and hydrogens used as donors (light blue). Atoms from the protein making interactions to the base are shown as spheres: hydrogen bond donors (blue); hydrogen bond acceptors (red); van der Waals contacts (green); non-polar contacts (gray). The figure was made using RIBBONS (Carson, 1991). There is a marked increase in the number of non-polar contacts made to the adenine C2 and these help define a specific adenine recognition surface in many proteins. Ninety-eight van der Waals contacts to the adenine C2 were observed which is higher than the number of hydrogen bonds observed for either adenine N6 or N7. Van der Waals and non-polar contacts are relatively homogeneous in distribution except at the point where the glycosidic bond is made to the base (arrow).

Discrimination between adenine and guanine can be achieved through specific hydrogen-bonding contacts as shown in Figure 1 or by making close contacts to the C2 position of adenine. This figure shows the van der Waals contacts made to adenine (black) and guanine (gray) with guanine shown as a ball and stick model. The exocyclic amino group of guanine shown by an arrow to N2 would clash with many contacts made in an adenine binding pocket which would discriminate against guanine binding. Steric exclusion was frequently observed in ribosomal protein–RNA interactions and, in combination with one or two hydrogen bonds, can provide excellent specificity.

Fig. 2

Discrimination between adenine and guanine can be achieved through specific hydrogen-bonding contacts as shown in Figure 1 or by making close contacts to the C2 position of adenine. This figure shows the van der Waals contacts made to adenine (black) and guanine (gray) with guanine shown as a ball and stick model. The exocyclic amino group of guanine shown by an arrow to N2 would clash with many contacts made in an adenine binding pocket which would discriminate against guanine binding. Steric exclusion was frequently observed in ribosomal protein–RNA interactions and, in combination with one or two hydrogen bonds, can provide excellent specificity.

Table 1

Purine hydrogen bonds and van der waals (vdW) contacts listed by atom

Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 79 24 C6 37 N/A N1 26 21 C6 19 N/A
C2 98 N/A N6 84 40 N2 74 86 O6 67 41
N3 69 17 N7 62 10 C2 24 N/A N7 48 29
C4 40 N/A C8 49 N/A N3 37 17 C8 44 N/A
C5 53 N/A N9 28 N/A C4 22 N/A N9 21 N/A
C5 21 N/A
Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 79 24 C6 37 N/A N1 26 21 C6 19 N/A
C2 98 N/A N6 84 40 N2 74 86 O6 67 41
N3 69 17 N7 62 10 C2 24 N/A N7 48 29
C4 40 N/A C8 49 N/A N3 37 17 C8 44 N/A
C5 53 N/A N9 28 N/A C4 22 N/A N9 21 N/A
C5 21 N/A

Table 1

Purine hydrogen bonds and van der waals (vdW) contacts listed by atom

Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 79 24 C6 37 N/A N1 26 21 C6 19 N/A
C2 98 N/A N6 84 40 N2 74 86 O6 67 41
N3 69 17 N7 62 10 C2 24 N/A N7 48 29
C4 40 N/A C8 49 N/A N3 37 17 C8 44 N/A
C5 53 N/A N9 28 N/A C4 22 N/A N9 21 N/A
C5 21 N/A
Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 79 24 C6 37 N/A N1 26 21 C6 19 N/A
C2 98 N/A N6 84 40 N2 74 86 O6 67 41
N3 69 17 N7 62 10 C2 24 N/A N7 48 29
C4 40 N/A C8 49 N/A N3 37 17 C8 44 N/A
C5 53 N/A N9 28 N/A C4 22 N/A N9 21 N/A
C5 21 N/A

Table 2

Pyrimidine hydrogen bonds and van der waals (vdW) contacts listed by atom

Cytosine Uracil
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 14 N/A C4 30 N/A N1 25 N/A C4 43 N/A
C2 44 N/A N4 50 26 C2 42 N/A O4 67 22
O2 98 49 C5 39 N/A O2 74 24 C5 44 N/A
N3 42 21 C6 19 N/A N3 53 17 C6 24 N/A
Cytosine Uracil
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 14 N/A C4 30 N/A N1 25 N/A C4 43 N/A
C2 44 N/A N4 50 26 C2 42 N/A O4 67 22
O2 98 49 C5 39 N/A O2 74 24 C5 44 N/A
N3 42 21 C6 19 N/A N3 53 17 C6 24 N/A

Table 2

Pyrimidine hydrogen bonds and van der waals (vdW) contacts listed by atom

Cytosine Uracil
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 14 N/A C4 30 N/A N1 25 N/A C4 43 N/A
C2 44 N/A N4 50 26 C2 42 N/A O4 67 22
O2 98 49 C5 39 N/A O2 74 24 C5 44 N/A
N3 42 21 C6 19 N/A N3 53 17 C6 24 N/A
Cytosine Uracil
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 14 N/A C4 30 N/A N1 25 N/A C4 43 N/A
C2 44 N/A N4 50 26 C2 42 N/A O4 67 22
O2 98 49 C5 39 N/A O2 74 24 C5 44 N/A
N3 42 21 C6 19 N/A N3 53 17 C6 24 N/A

Table 3

Purine hydrogen bonds and van der waals (vdW) contact frequencies listed by atom

Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 0.37 0.11 C6 0.17 N/A N1 0.12 0.09 C6 0.09 N/A
C2 0.46 N/A N6 0.39 0.19 N2 0.33 0.39 O6 0.3 0.18
N3 0.32 0.08 N7 0.29 0.05 C2 0.11 N/A N7 0.22 0.13
C4 0.19 N/A C8 0.23 N/A N3 0.17 0.03 C8 0.2 N/A
C5 0.25 N/A N9 0.13 N/A C4 0.1 N/A N9 0.09 N/A
Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 0.37 0.11 C6 0.17 N/A N1 0.12 0.09 C6 0.09 N/A
C2 0.46 N/A N6 0.39 0.19 N2 0.33 0.39 O6 0.3 0.18
N3 0.32 0.08 N7 0.29 0.05 C2 0.11 N/A N7 0.22 0.13
C4 0.19 N/A C8 0.23 N/A N3 0.17 0.03 C8 0.2 N/A
C5 0.25 N/A N9 0.13 N/A C4 0.1 N/A N9 0.09 N/A

Table 3

Purine hydrogen bonds and van der waals (vdW) contact frequencies listed by atom

Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 0.37 0.11 C6 0.17 N/A N1 0.12 0.09 C6 0.09 N/A
C2 0.46 N/A N6 0.39 0.19 N2 0.33 0.39 O6 0.3 0.18
N3 0.32 0.08 N7 0.29 0.05 C2 0.11 N/A N7 0.22 0.13
C4 0.19 N/A C8 0.23 N/A N3 0.17 0.03 C8 0.2 N/A
C5 0.25 N/A N9 0.13 N/A C4 0.1 N/A N9 0.09 N/A
Adenine Guanine
Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds Atom vdW H-bonds
N1 0.37 0.11 C6 0.17 N/A N1 0.12 0.09 C6 0.09 N/A
C2 0.46 N/A N6 0.39 0.19 N2 0.33 0.39 O6 0.3 0.18
N3 0.32 0.08 N7 0.29 0.05 C2 0.11 N/A N7 0.22 0.13
C4 0.19 N/A C8 0.23 N/A N3 0.17 0.03 C8 0.2 N/A
C5 0.25 N/A N9 0.13 N/A C4 0.1 N/A N9 0.09 N/A

Table 4

Pyrimidine hydrogen bonds and van der waals contact frequencies listed by atom

Cytosine Uracil
Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds
N1 0.08 N/A C4 0.18 N/A N1 0.19 N/A C4 0.32 N/A
C2 0.26 N/A N4 0.3 0.16 C2 0.3 N/A O4 0.49 0.16
O2 0.58 0.29 C5 0.23 N/A O2 0.54 0.18 C5 0.32 N/A
N3 0.25 0.13 C6 0.11 N/A N3 0.39 0.16 C6 0.18 N/A
Cytosine Uracil
Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds
N1 0.08 N/A C4 0.18 N/A N1 0.19 N/A C4 0.32 N/A
C2 0.26 N/A N4 0.3 0.16 C2 0.3 N/A O4 0.49 0.16
O2 0.58 0.29 C5 0.23 N/A O2 0.54 0.18 C5 0.32 N/A
N3 0.25 0.13 C6 0.11 N/A N3 0.39 0.16 C6 0.18 N/A

Table 4

Pyrimidine hydrogen bonds and van der waals contact frequencies listed by atom

Cytosine Uracil
Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds
N1 0.08 N/A C4 0.18 N/A N1 0.19 N/A C4 0.32 N/A
C2 0.26 N/A N4 0.3 0.16 C2 0.3 N/A O4 0.49 0.16
O2 0.58 0.29 C5 0.23 N/A O2 0.54 0.18 C5 0.32 N/A
N3 0.25 0.13 C6 0.11 N/A N3 0.39 0.16 C6 0.18 N/A
Cytosine Uracil
Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds Atom van der Waals H-bonds
N1 0.08 N/A C4 0.18 N/A N1 0.19 N/A C4 0.32 N/A
C2 0.26 N/A N4 0.3 0.16 C2 0.3 N/A O4 0.49 0.16
O2 0.58 0.29 C5 0.23 N/A O2 0.54 0.18 C5 0.32 N/A
N3 0.25 0.13 C6 0.11 N/A N3 0.39 0.16 C6 0.18 N/A

3.1 Overall view from nucleoside binding pocket composites

As described in Materials and Methods, we used ENTANGLE to analyze protein–RNA complexes in PDB format and parsed the interactions by type: van der Waals, non-polar and hydrogen bonding. In addition, the non-polar contacts were further investigated to include an examination of aromatic base stacking. There were 8649 putative interactions in our database: 6242 non-polar, 1710 van der Waals, 433 hydrogen bonds to all 20 amino acids and 84 base stacking interactions to arginine, phenylalanine, tyrosine, tryptophan and histidine (Table 5).

Table 5

Base and interaction-specific distributions of the 41 protein–RNA structures used

Type of interaction Number of interactions
A G C U All
Non-polar contacts (≤5 Å) 2413 1498 1098 1233 6242
van der Waals 599 404 336 371 1710
All H-bonds 91 183 96 63 433
H-bonds (protein donor) 40 107 26 17 190
H-bonds (protein acceptor) 51 76 70 46 243
Type of interaction Number of interactions
A G C U All
Non-polar contacts (≤5 Å) 2413 1498 1098 1233 6242
van der Waals 599 404 336 371 1710
All H-bonds 91 183 96 63 433
H-bonds (protein donor) 40 107 26 17 190
H-bonds (protein acceptor) 51 76 70 46 243

Table 5

Base and interaction-specific distributions of the 41 protein–RNA structures used

Type of interaction Number of interactions
A G C U All
Non-polar contacts (≤5 Å) 2413 1498 1098 1233 6242
van der Waals 599 404 336 371 1710
All H-bonds 91 183 96 63 433
H-bonds (protein donor) 40 107 26 17 190
H-bonds (protein acceptor) 51 76 70 46 243
Type of interaction Number of interactions
A G C U All
Non-polar contacts (≤5 Å) 2413 1498 1098 1233 6242
van der Waals 599 404 336 371 1710
All H-bonds 91 183 96 63 433
H-bonds (protein donor) 40 107 26 17 190
H-bonds (protein acceptor) 51 76 70 46 243

Plotting the contents of our database showed that protein atoms interacting with RNA are able to form a tight binding pocket around the RNA base. Figure 1 shows cross-sections of the binding pockets of each RNA base. The cross-sections illustrate how the non-polar, van der Waals and hydrogen-bonding composition of the binding pockets are configured in three-dimensional space. Shape complementation of the nucleoside recognition pocket is best seen in the cross-sections of the binding pocket through the plane of the base showing van der Waals interactions (Fig. 1). The general shape of the pocket is also seen for the non-polar contacts but can be misleading since the number of putative contacts scale with the distance cutoff. The interior of the binding pocket consists primarily of van der Waals and non-polar contacts which are especially abundant above and below the plane of the base where the donor-acceptor geometries of potential hydrogen-bonding groups would be poor. Non-polar and van der Waals interactions are distributed nearly equally around the base edge with a gap where the ribose attaches to the base.

Amino acids with hydrogen-bonding potential cluster at the donor or acceptor positions of the base and are situated in the base plane to optimize their hydrogen-bonding geometry. Figure 1 shows that, unlike non-polar and van der Waals interactions, hydrogen bonds are not distributed evenly around the RNA bases. Instead, hydrogen bond donor/acceptor groups are clustered in the plane of the base at specific positions, namely the N1 and N6 of adenine, the N2 and O6 of guanine, the O2, N3 and N4 of cytosine and O2, N3 and O4 of uracil. As shown in Figure 1, protein hydrogen bond donors group into two populations about the uracil O4, consistent with two electron lone pairs around the carbonyl oxygen. Protein hydrogen bond acceptors also cluster into two populations about the cytosine N4, adenine N6 and guanine N2. Our measured distance of 2.8–3.0 Å is in good agreement with those observed by Ippolito et al. (1990) for average protein–protein hydrogen bonding distance of 2.8–3.2 Å, and in protein–DNA interactions by Mandel-Gutfreund et al. (Ippolito et al., 1990; Mandel-Gutfreund et al., 1995, 1998). Stacking interactions between nucleosides and phenylalanine, tyrosine, tryptophan, histidine and arginine were also surveyed. Of these stacking interactions, only phenylalanine, tyrosine and arginine were found in >10 occurrences. Although stacking interactions between aromatic amino acid side-chains and nucleosides are less common, they are important determinants in the common RNA recognition motif (RRM) found in many eukaryotic RNA-binding proteins (Maris et al., 2005).

All these interactions together form the composite nucleoside binding pocket. It should be noted that for any given structure, the number and types of interactions will be a subset of those shown. The composite nucleoside data shown in Figure 1 displays the strategies available to proteins for RNA recognition.

3.2 Discrimination of purines from pyrimidines

Although purines might be sterically excluded from a pyrimidine binding pocket through close contacts to the pyrimidine C5–C6 edge, the number of van der Waals interactions that contact C5–C6 is small (Fig. 1) which suggests a second manner of discrimination. Close contacts to the purine five-membered ring might be expected to exclude pyrimidines from a purine binding pocket since the glycosidic bond connecting the base to sugar constrains the base and would place the larger six-membered ring of the pyrimidine in a position that would result in many possible steric clashes. This, however, is not seen, and it is likely that the position of the glycosidic bond makes it more difficult to place a close contact to either the purine C8 or the pyrimidine C6 which may account for the paucity of observed interactions.

In contrast, a mechanism for the exclusion of a pyrimidine from a purine binding pocket cannot be as easily achieved by merely altering its shape. Both cytosine and uracil can, in principle, fit into a surface able to accommodate the larger purine, but both uracil and cytosine place a hydrogen bond acceptor carbonyl group into the position where the guanine N2 amino group would be. As shown in Figure 1 and by the statistical analysis performed by Allers and Shamoo (2001) and Treger and Westhof (2001), the N2 amino group of guanine is utilized more than expected and supports a critical role for this interaction in discriminating against pyrimidines. Conversely, an adenine recognition surface can readily discriminate against guanine and pyrimidines by steric exclusion at the guanine N2 position or pyrimidine O2 position as described in the next section.

3.3 Discrimination of adenine from guanine

Structure-based analysis of protein–RNA interactions confirmed that RNA bases have distinct patterns of hydrogen bonding much like their DNA counterparts (Allers and Shamoo, 2001; Jones et al., 2001; Treger and Westhof, 2001). Nobeli et al. (2001) concluded that proteins differentiate between adenine and guanine based on the locations of hydrogen bond clustering within the binding pocket (Nobeli et al., 2001). In addition to positioning the unique hydrogen bond donors/acceptors to the adenine N1 and N6 relative to the guanine N1, O6 and N2, many proteins employ steric hindrance of the exocyclic amino N2 in guanine by positioning a non-polar atom to contact adenine C2 (Fig. 2). As many as 98 van der Waals interactions to the adenine C2 were identified in this study which is higher than that of any other atom in adenine (Table 1). In addition to the guanine N2, the O6 position is also over represented in the available protein–RNA interaction databases suggesting an essential role for this interaction. As shown in Figure 1, the clustering of hydrogen bond donors to guanine O6 can discriminate against adenine, and in 48% of the interactions studied by Allers and Shamoo (2001) the guanine O6 and N7 are simultaneously contacted by an arginine guanidinium group. Barring a rearrangement of the binding pocket, a close contact to the purine C2 would preclude guanine binding and would serve as an efficient strategy for discriminating against guanine. As shown in Figure 2, steric exclusion of the N1 proton of guanine as well as the amino group at C2 both can be used for recognition of the proper base.

3.4 Discrimination of cytosine from uracil

A study of the composite pyrimidine recognition pockets also showed a strong clustering of complementary hydrogen bond donors and acceptors in the plane of the nucleoside. The clustering shows distinctive patterns for each base in a manner comparable to that seen for the purines. In contrast to the purines however, cytosine and uracil differ only slightly in volume, and the position of exocyclic atoms are qualitatively identical. While pockets that bind guanine are larger than those for adenine in order to accommodate the guanine N2 position, the pockets formed around cytosine and uracil are identical in size because cytosine and uracil have similar volumes; thus, there was no observed role for steric exclusion among pyrimidines.

3.5 Stacking interactions

The stacking interactions of phenylalanine with purines are consistent with their potential energy surfaces (Pearlman and Kim, 1990; Sponer et al., 2001). In roughly half the cases where a phenylalanine is part of a protein–RNA interface, it is a component of a stacking interaction (Fig. 3, Table 6). Guanine has an electropositive field in the plane of the base at the N1, N2 edge that may restrict phenylalanine to positions centered over N3, O6 and N7 electronegative atoms. In contrast, adenine interactions to phenylalanine are less clustered and reflect the broader electronegative surface of adenine where only the N6 exocyclic amino group and C2 hydrogens are predicted to have strong electropositive charges. Interestingly, as shown in Figure 3, the stacking interactions of phenylalanines to purines are oriented either above or below the base. The composite orientation matrices are calculated using three atoms to define the plane of base as well as the plane of the amino acid side chain. The appearance of the phenylalanine ‘above’ (Fig. 3) or ‘below’ reflects the orientation of the base in either a syn or anti conformation where syn and anti refer to the position of the nucleic acid base with respect to the sugar. Although we are unsure about the stereochemical basis for this, it is clear that in all stacking interactions of phenylalanine with purines there is a distinct orientation preference for phenylalanine to stack with guanine in the syn conformation and with adenine in the anti conformer. This pattern is repeated throughout the RRM containing family of proteins including poly A binding protein, hnRNP A1 and snRNP U1A but is not observed for stacking with tyrosine (not shown). Using structures determined by X-ray crystallography, Myers and Shamoo (2004) found that by placing either 2-aminopurine or nebularine in a guanine binding pocket of a protein with a phenylalanine stacking interaction, the base flipped into the anti conformation and away from the syn conformation seen for guanine (Myers and Shamoo, 2004). These substitutions eliminated a hydrogen bond from a lysine to the O6 position and perhaps more importantly, altered the electronegative surface of the base. Most examples of base stacking with aromatic amino acid side-chains are from the RRM family of proteins with surprisingly few from the much larger collection of RNA-binding proteins including the ribosome. The bias of stacking interactions toward RRM containing proteins may explain why guanine and adenine appear to stack with phenylalanine in an oriented fashion. Although stacking interactions are fairly weak and close range they can be important to specificity as in the case of the mRNA cap-binding protein vp39 that binds N7-methylated guanine via precise stacking interactions between a tyrosine and phenylalanine (Hu et al., 2003).

Base stacking interactions made by phenylalanine. Phenylalanine is the most common aromatic amino acid engaged in stacking (phe = 17; tyr = 11; his = 9; trp = 4 interactions). Phenylalanine stacking interactions that are exclusively above (adenine) or below (guanine) reflect the orientation of the bases in either the anti (adenine) or syn (guanine) conformation.

Fig. 3

Base stacking interactions made by phenylalanine. Phenylalanine is the most common aromatic amino acid engaged in stacking (phe = 17; tyr = 11; his = 9; trp = 4 interactions). Phenylalanine stacking interactions that are exclusively above (adenine) or below (guanine) reflect the orientation of the bases in either the anti (adenine) or syn (guanine) conformation.

Table 6

Distribution of stacking interactions

Adenine Uracil Cytosine Guanine
Phenylalanine 7 4 2 4
Histidine 6 3 0 0
Arginine 13 12 15 3
Tyrosine 5 3 2 1
Adenine Uracil Cytosine Guanine
Phenylalanine 7 4 2 4
Histidine 6 3 0 0
Arginine 13 12 15 3
Tyrosine 5 3 2 1

Table 6

Distribution of stacking interactions

Adenine Uracil Cytosine Guanine
Phenylalanine 7 4 2 4
Histidine 6 3 0 0
Arginine 13 12 15 3
Tyrosine 5 3 2 1
Adenine Uracil Cytosine Guanine
Phenylalanine 7 4 2 4
Histidine 6 3 0 0
Arginine 13 12 15 3
Tyrosine 5 3 2 1

Stacking of arginine with adenine, cytosine and uracil is very common and shows a broad distribution of orientations without clustering. In contrast, arginine-stacking over the more electropositive guanine is uncommon (Table 6, Fig. 4). It is likely that the stronger hydrogen-bonding interactions to the guanine O6 and N7 dominate the weaker cation-pi-like stacking of arginine over the base (Sponer et al., 2001). The orientation of the arginine guanidinium group over bases is consistent with that seen in cation-pi interactions observed in proteins (Gallivan and Dougherty, 1999). Stacking interactions between RNA bases, though small in number, have often been shown to be critical, and our studies suggest that arginine-stacking with adenine, cytosine and uracil should also be considered important in RNA recognition.

Stacking interactions made between the nucleotides and arginine. Arginine shows a strong preference for stacking with A, C and U. The majority of guanine-arginine interactions are made via hydrogen bonds to the O6 and/or N7 acceptor positions rather than the weaker stacking interaction. The figure was made using RIBBONS (Carson,1991).

Fig. 4

Stacking interactions made between the nucleotides and arginine. Arginine shows a strong preference for stacking with A, C and U. The majority of guanine-arginine interactions are made via hydrogen bonds to the O6 and/or N7 acceptor positions rather than the weaker stacking interaction. The figure was made using RIBBONS (Carson,1991).

3.6 What is required for specificity?

In general, nucleoside recognition and discrimination for any single base is achieved through a subset of the hydrogen bonding, van der Waals and ring stacking interactions shown in the composites in Figures 1–4. Any of the four nucleosides can be specified by the use of two hydrogen bonds or perhaps as few as one hydrogen bond and a well-placed van der Waals/non-polar contact. In the case of discriminating adenine from guanine, the exocyclic N2 amino group of guanine is sterically excluded by close contacts such as van der Waals interactions proximal to C2 of adenine (Fig. 2). As shown in Figure 2 and Table 1, there is a notable increase in the number of non-polar contacts made to adenine in the C2 position. This motif is seen in many proteins including poly A binding protein and spliceosomal protein U1A and is especially common in ribosomal protein–rRNA interactions. A single hydrogen bond to the 2 and/or 4 position coupled with a steric contact to exclude purines would also provide an excellent means of unique discrimination. One clear benefit of steric exclusion is that an atomic clash generates a prohibitively high energy and can be made using potentially any amino acid. Atoms from the protein involved in steric exclusion constitute an important determinant for RNA specificity and should be considered as important as those residues that contribute hydrogen-bonding groups to the protein–RNA recognition surface.

We thank Dr S. Moran for helpful suggestions and advice. The authors wish to thank The Robert A. Welch Foundation (C-1584) and the American Cancer Society (RSG-03-051-01) for support. Funding to pay the Open Access publication charges for this article was provided by Robert A. Welch Foundation.

Conflict of Interest: none declared.

REFERENCES

Structure-based analysis of protein–RNA interactions using the program ENTANGLE

,

J. Mol. Biol.

,

2001

, vol.

311

(pg.

75

-

86

)

Ribbons 2.0

,

J. Appl. Crystallogr.

,

1991

, vol.

24

(pg.

958

-

961

)

Ab initio interaction energies of hydrogen-bonded amino acid side chain[bond]nucleic acid base interactions

,

J. Am. Chem. Soc.

,

2004

, vol.

126

(pg.

434

-

435

)

et al.

Recognition of nucleic acid bases and base-pairs by hydrogen bonding to amino acid side-chains

,

J. Mol. Biol.

,

2003

, vol.

327

(pg.

781

-

796

)

et al.

A second generation force field for the simulation of proteins and nucleic acids

,

J. Am. Chem. Soc.

,

1995

, vol.

117

(pg.

5179

-

5197

)

et al.

Report of a workshop on the use of statistical validators in protein X-ray crystallography

,

Acta Crsyt.

,

1996

, vol.

D52

(pg.

228

-

234

)

Themes in RNA-Protein Recognition

,

J. Mol. Biol.

,

1999

, vol.

293

(pg.

255

-

270

)

Cation-π interactions in structural biology

,

Proc. Natl Acad. Sci. USA

,

1999

, vol.

96

(pg.

9459

-

9464

)

Non-Watson-Crick base pairs in RNA–protein recognition

,

Chem. Biol.

,

1999

, vol.

6

(pg.

R335

-

343

)

et al.

AANT: the Amino Acid-Nucleotide Interaction Database

,

Nucleic Acids Res.

,

2004

, vol.

32

(pg.

D174

-

D181

)

Database issue

et al.

Insertion of an N7-methylguanine mRNA cap between two coplanar aromatic residues of a cap-binding protein is fast and selective for a positively charged cap

,

J. Biol. Chem.

,

2003

, vol.

278

(pg.

51515

-

51520

)

et al.

Hydrogen bond stereochemistry in protein structure and function

,

J. Mol. Biol.

,

1990

, vol.

215

(pg.

457

-

471

)

et al.

Discovering the interaction propensities of amino acids and nucleotides from protein–RNA complexes

,

Mol. Cells

,

2003

, vol.

16

(pg.

161

-

167

)

Prediction of protein–protein interaction sites using patch analysis

,

J. Mol. Biol.

,

1997

, vol.

272

(pg.

133

-

143

)

et al.

Protein–DNA interactions: a structural analysis

,

J. Mol. Biol.

,

1999

, vol.

287

(pg.

877

-

896

)

et al.

Protein–RNA interactions: a structural analysis

,

Nucleic Acids Res.

,

2001

, vol.

29

(pg.

943

-

954

)

et al.

Protein–nucleic acid recognition: statistical analysis of atomic interactions and influence of DNA structure

,

Proteins

,

2005

, vol.

61

(pg.

258

-

271

)

et al.

An overview of the structures of protein–DNA complexes

,

Genome Biol.

,

2000

, vol.

1

pg.

REVIEWS001

et al.

Amino acid-base interactions: a three-dimensional analysis of protein–DNA interactions at an atomic level

,

Nucleic Acids Res.

,

2001

, vol.

29

(pg.

2860

-

2874

)

et al.

Comprehensive analysis of hydrogen bonds in regulatory protein DNA-complexes: in search of common principles

,

J. Mol. Biol.

,

1995

, vol.

253

(pg.

370

-

382

)

et al.

A role for CH.O interactions in protein–DNA recognition

,

J. Mol. Biol.

,

1998

, vol.

277

(pg.

1129

-

1140

)

et al.

The RNA recognition motif, a plastic RNA-binding platform to regulate post-transcriptional gene expression

,

FEBS J.

,

2005

, vol.

272

(pg.

2118

-

2131

)

Human UP1 as a model for understanding purine recognition in the family of proteins containing the RNA recognition motif (RRM)

,

J. Mol. Biol.

,

2004

, vol.

342

(pg.

743

-

756

)

et al.

Structural features of protein-nucleic acid recognition sites

,

Biochemistry

,

1999

, vol.

38

(pg.

1999

-

2017

)

et al.

On the molecular discrimination between adenine and guanine by proteins

,

Nucleic Acids Res.

,

2001

, vol.

29

(pg.

4294

-

4309

)

Geometric analysis and comparison of protein–DNA interfaces: why is there no simple code for recognition?

,

J. Mol. Biol.

,

2000

, vol.

301

(pg.

597

-

624

)

Atomic charges for DNA constituents derived from single-crystal X-ray diffraction data

,

J. Mol. Biol.

,

1990

, vol.

211

(pg.

171

-

187

)

Computational methods of analysis of protein–protein interactions

,

Curr. Opin. Struct. Biol.

,

2003

, vol.

13

(pg.

377

-

382

)

et al.

Electronic properties, hydrogen bonding, stacking, and cation binding of DNA and RNA bases

,

Biopolymers

,

2001

, vol.

61

(pg.

3

-

31

)

Statistical analysis of atomic contacts at RNA–protein interfaces

,

J. Mol. Recognit.

,

2001

, vol.

14

(pg.

199

-

214

)

et al.

Structural diversity and isomorphism of hydrogen-bonded base interactions in nucleic acids

,

J. Mol. Biol.

,

2003

, vol.

327

(pg.

767

-

780

)

et al.

DIP: the database of interacting proteins

,

Nucleic Acids Res.

,

2000

, vol.

28

(pg.

289

-

291

)

et al.

DIP, the Database of Interacting Proteins: a research tool for studying cellular networks of protein interactions

,

Nucleic Acids Res.

,

2002

, vol.

30

(pg.

303

-

305

)

Author notes

†The authors wish it to be known that, in their opinion, the first two authors should be regarded as joint First Authors.

§Present address: Department of Epidemiology and Nutrition, Harvard School of Public Health, Boston, MA, USA

Associate Editor: Anna Tramontano

© 2006 The Author(s)

This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

Citations

Views

Altmetric

Metrics

Total Views 3,820

2,956 Pageviews

864 PDF Downloads

Since 12/1/2016

Month: Total Views:
December 2016 1
January 2017 32
February 2017 34
March 2017 39
April 2017 85
May 2017 69
June 2017 29
July 2017 16
August 2017 19
September 2017 4
October 2017 10
November 2017 12
December 2017 21
January 2018 14
February 2018 28
March 2018 15
April 2018 8
May 2018 29
June 2018 25
July 2018 39
August 2018 17
September 2018 14
October 2018 21
November 2018 37
December 2018 23
January 2019 14
February 2019 32
March 2019 28
April 2019 62
May 2019 28
June 2019 24
July 2019 39
August 2019 13
September 2019 31
October 2019 19
November 2019 24
December 2019 35
January 2020 23
February 2020 33
March 2020 32
April 2020 35
May 2020 28
June 2020 24
July 2020 34
August 2020 31
September 2020 30
October 2020 39
November 2020 51
December 2020 58
January 2021 68
February 2021 79
March 2021 99
April 2021 88
May 2021 93
June 2021 37
July 2021 38
August 2021 56
September 2021 37
October 2021 61
November 2021 68
December 2021 46
January 2022 57
February 2022 36
March 2022 58
April 2022 81
May 2022 54
June 2022 48
July 2022 41
August 2022 46
September 2022 49
October 2022 50
November 2022 46
December 2022 48
January 2023 46
February 2023 48
March 2023 40
April 2023 32
May 2023 34
June 2023 31
July 2023 31
August 2023 44
September 2023 47
October 2023 38
November 2023 31
December 2023 34
January 2024 50
February 2024 72
March 2024 65
April 2024 71
May 2024 54
June 2024 57
July 2024 68
August 2024 37
September 2024 44
October 2024 54

Citations

118 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic