A Structural Explanation for the Recognition of Tyrosine-Based Endocytotic Signals (original) (raw)

. Author manuscript; available in PMC: 2017 Sep 15.

Abstract

Many cell surface proteins are marked for endocytosis by a cytoplasmic sequence motif Tyrosine-XX-(hydrophobic residue) which is recognized by the μ2 subunit of AP2 adaptors. Crystal structures of the internalisation signal binding domain μ2 complexed with the internalisation signals of EGFR and the trans-golgi network protein TGN38 have been determined at 2.7Å resolution. The signal peptides adopted an extended conformation rather than the expected tight turn. Specificity was conferred by hydrophobic pockets which bind the tyrosine and leucine in the peptide. In the crystal the protein forms dimers which could increase the strength and specificity of binding to dimeric receptors.


The localization and movement of compartment-specific proteins within the cell is largely achieved through the recognition of short sequence motifs by targetting proteins. One of the most studied processes involving such signal recognition is clathrin-mediated endocytosis, which occurs in vesicle trafficking and the internalisation of nutrient and growth factor receptors when bound to their appropriate cargo molecules (reviewed in (1)). During the internalisation of activated growth factor receptors such as the epidermal growth factor receptor (EGFR)tyrosine kinase (reviewed in (2)), receptors are removed from the cell surface in clathrin-coated vesicles and ultimately directed to the endosome and lysosome, where they are inactivated by proteolytic degradation (3, 4).

The first stage of endocytosis is the formation of a clathrin coated pit, when mechanical invagination of a patch of membrane by clathrin occurs as it forms a polyhedral lattice, as does the preferential sorting of selected transmembrane proteins into the pits by adaptor complexes (APs). At least three similar AP complexes (AP1, AP2, and AP3) have been identified, and appear to be associated with different cell compartments. The AP’s comprise four types of subunit; two large ~100kDa (α and β2 in AP2), one medium ~50kDa (μ2 in AP2) and one small ~17kDa (σ2 in AP2). AP2 adaptors link the proteins to be endocytosed (via the μ2 subunit) with the nascent clathrin coat (via the α and β2 subunits), and via the α subunit, they recruit the components (such as EPS15, amphiphysin and dynamin) needed to drive and regulate the formation of clathrin-coated vesicles (reviewed in (5) and (6)). The short linear sequence motifs that act as internalisation signals mainly fall into two classes: the first, and most common, contains a critical tyrosine residue, and mostly conform to the consensus sequence YxxØ where Ø is a bulky hydrophobic residue (Leu, Ile, Met or Phe) (7) that binds directly to μ2 subunits (8); the second is the ‘di-leucine’ motif DxxxLL, which interacts with the β1 subunit of AP1 (9) but may also bind indirectly to the μ subunit via an ‘adaptor’ protein (10, 11).

In order to investigate the nature and selectivity of the binding of YxxØ internalisation signals to APs we have solved the crystal structures to 2.7Å resolution of the signal binding domain of μ2 (residues 158-435) (12) complexed with the internalisation signal peptides from EGFR (FYRALM) (13) and TGN38 (DYQRLN) (14, 15). The protein has an elongated banana-shaped all β-sheet structure. It can be considered as two β-sandwich subdomains (A and B), with subdomain B inserted between strands 6 and 15 of subdomain A, and joined edge to edge such that the convex surface is a continuous 9-stranded mixed β-sheet which runs the whole length of the molecule (see Fig.1).

Figure 1. The structure.

Figure 1

A,B Orthogonal views of μ2 with subdomain A shown in gold, subdomain B in blue and the peptide in magenta. Dotted lines represent disordered loops. The strands of the β-sheet (arrows) are numbered. The two subdomains are linked into a continuous β-sheet through strands 14 and 16/17.

C Sequence alignment of μ2 from rat (Rat), human (Humn), Drosophila (Dros), C. elegans (cElg), Dictostylium (Dict), Arabidopsis thaliana (Plnt), S. pombe (Spmb), μ1 (AP47) from rat and μ3A (p47A) from rat. Identical residues are shaded red, conserved gold and those involved in internalisation signal binding in blue.

The two peptides bind in an identical manner to a site on the surface of two parallel β-sheet strands (β1 and β16), in subdomain A (Fig 2). The peptide assumes an extended conformation when bound, not a tight β-turn as has been proposed (16). Hydrophobic pockets exist for the binding of both the tyrosine and the Ø residue either side of edge strand β16. These pockets are positioned such that when the side chains of the target peptide are correctly bound, three additional hydrogen bonds are made between the backbone of the peptide and β-strand 16, forming an extra strand on the inner edge of the 9-stranded β-sheet (represented schematically in Fig.2C). A similar mechanism of increased strength of binding through β-strand formation on correct recognition of key side chains has been demonstrated in a number of cases, including the interactions of protein kinases with their substrates (17) and protein phosphatases with their regulatory subunits (18).

Figure 2.

Figure 2

A Stereo view of the binding site for the tyrosine residue in the EGFR internalisation signal FYRALM, showing part of the experimental electron density map, with phases calculated using the peptide complex data as native with the Xe and EMTS derivatives, and solvent flattening with a 70% solvent content. The peptide is represented with magenta bonds, and the residues at the top right with green bonds come from the other subunit in the crystallographic dimer. (Figures drawn with BOBSCRIPT (32))

B Stereo view of the binding site for the TGN38 internalisation signal DYQRLN, in the same view as D. The difference electron density shown was calculated using the model from the FYRALM peptide structure with the peptide removed: density for the arginine in the Y+2 position is clearly visible, packed against Trp421.

The tyrosine residue of the internalization peptide makes extensive interactions with side chains in its binding pocket. There are hydrophobic interactions between the tyrosine ring and Trp421 and Phe 174 as well as stacking on the guanidinium group of Arg 423. The hydroxyl group of the tyrosine participates in a network of hydrogen bonds with Asp176, Lys203 (from β2) and again Arg 423, explaining why a Phe at this position gives only poor binding (19). As well as contributing directly to the strength of binding via a direct hydrogen-bond to the tyrosine OH, Asp176 appears to play an important role in correctly orientating the guanidinium group of Arg423. The critical role of Asp176 is reflected in its absolute conservation among all μ2, μ1 and μ3 sequences (Fig.1C). The other major determinant as defined by sequence and combinatorial peptide library analysis of internalisation signals is the presence of a bulky hydrophobic residue at the Y+3 position (7). The binding site for this residue is a cavity lined with aliphatic residues (Fig.2B). The size and flexibility of the side chains within this pocket would allow for the accommodation of any of the residues (Leu, Phe, Met, Ile) that are possible at this position.

Peptide library screening has revealed a preference for an arginine residue at either Y+2 (strong) or Y+1 (weak) (7). In the DYQRLN (TGN38) complex, the arginine forms hydrophobic interactions mainly with Trp421 but also with Ile419 (Fig 2), with its guanidinium group exposed to solvent, and a hydrogen bond between Nε and the carbonyl of Lys420: the favourable hydrophobic interaction outweighs the unfavourable electrostatic interaction with the marked positive potential of the peptide binding surface (Fig.3C and 3D). The FYRALM (EGFR) peptide contains an arginine at the Y+1 position which is not well ordered, implying that it has no significant interaction with μ2. The nature and disposition of the pockets explains why the di-leucine type of internalisation motif is unable to bind to μ2 because there would be no residue capable of filling the tyrosine binding pocket. It also indicates that if the low density lipoprotein receptor internalisation signal NPVY does bind weakly to μ2 (7), and not via an adaptor protein, it would have to do so in the reverse orientation that is with its Asn residue in the Y+3 pocket.

Figure 3. The peptide binding site.

Figure 3

A The binding of the tyrosine residue of the internalisation signal peptide is in a hydrophobic pocket created by Phe174, Trp421 and Arg423, with a hydrogen-bonding network between the tyrosine OH and Asp176, Lys203 and Arg423. The structure shown is that of the DYQRLN TGN38 peptide. B The binding pocket for the bulky hydrophobic residue at Y+3 (Leucine in both peptides) is lined with aliphatic sidechains of Leu173, Leu175, Val401, Leu404, Val422 and the aliphatic portion of Lys420. ArgY+2 of the TGN38 peptide is packed against Trp421. C Schematic representation of the interactions between the internalisation signal of TGN38 and μ2, showing both side chain contacts and the short stretch of β-sheet formed between the peptide and β-strand 16. The peptide is shown with bold lines.

Src homology region 2 (SH2) domains bind similar YxxØ motifs in an extended conformation with the tyrosine phosphorylated (20, 21), but there is no homology either in the structure of the proteins or in their mode of binding. In the case of SH2 domains the specificity and strength of binding to the target peptide arise predominantly from ionic interactions with the phosphate moiety. The structure of the complex demonstrates that if the tyrosine residue were to be phosphorylated, it would be incapable of binding to μ2 both because the size of the tyrosine pocket is too small, and because Asp176 would repel the phosphate. This is supported by data which suggests that phosphorylated peptides will not bind to μ2 subunit (19) and that phosphotyrosine cannot displace EGFR that is bound to AP2 (22).

The residues involved in signal recognition are conserved in μ2 subunits from all species (Fig.1C). The binding sites in the μ1 subunit of AP1 (AP47) are also very similar, though the change K420P may alter the specificity for the Y+3 residue. In the AP3 homologue (μ3A or p47A) the residues K203 and R423 in μ2 involved in binding the tyrosine of the Yxxϕ motif are replaced by C and K respectively, which would be expected to reduce the affinity for tyrosine signals to μ3A. The substitutions Leu173→Ala and Leu175 → Phe in the Y+3 pocket (Fig.1C) may alter the selectivity for residues at this position. The exchange of W421 in μ2 for a glycine in μ3A would remove the specificity for an arginine at the Y+2 position.

How does the machinery of endocytosis recognize a relatively non-specific signal such as the sequence YxxØ? One possibility arises from the observation that most receptors are internalized as dimers, often induced by ligand binding on the outside of the cell, which could place two internalisation signals adjacent to each other. Recognition of this dimer would increase the avidity of binding relative to the monomer, without necessarily precluding binding of monomeric receptors. In the crystal structure the μ2 molecules form a dimer around a crystallographic twofold axis, placing the internalization signal peptides close to each other in a large groove (Fig.3). The dimer buries 1100Å2 of accessible surface, which is smaller than most stable dimer interfaces (typically at least 1200Å2), but μ2 is only a small part of the whole AP2 molecule, and additional interactions may be formed between other subunits of AP2 in a dimer. This provides an attractive explanation for the recognition of dimeric receptors, particularly as peptide binding would favour dimerization, because the peptide contributes 17% of the interface. Dimerization of AP2 complexes has been suggested by the observation that they bind in a 1:1 molar ratio with ligand-activated, and therefore dimeric, EGF receptors (23). Binding of dimeric receptors to AP2 dimers which in turn bind multimers of clathrin provides an implicit mechanism for the formation of the clathrin lattice. The position of the peptide binding sites in the groove of the dimer predicts that the internalization signal must be presented as an accessible region without defined secondary structure, which is in agreement with the observation that EGFR binding to AP2 is increased by the presence of urea (22).

The striking positive electrostatic potential of the μ2 dimer may reflect an ability to interact with negatively charged moieties including proteins (for example the domain following the internalisation signal in EGFR) or the headgroups of negatively charged phospholipids (for example phosphatidyl serine). The planar face shown at the top of Fig.3D would provide a large non-specific ionic interaction with the membrane which would increase the strength of binding to membrane proteins containing appropriately positioned internalisation signals in a manner similar to proteins such as Src and HIV1 gag (24), and may also contribute in recruiting AP2 complexes to the plasma membrane.

The novel structure of the μ2 subunit of the plasma membrane AP2 complexed with the FYRALM peptide explains the specific binding of YxxØ internalisation motifs, the absolute requirement for the motif to be in an extended β-strand conformation, and for the tyrosine residue to be non-phosphorylated. The dimeric packing of the molecules in the crystal suggests that strength and selectivity of binding of receptors may be enhanced by their binding as dimers to dimeric μ subunits.

Figure 4. The crystallographic dimer.

Figure 4

A, B Orthogonal views of the dimer formed in the crystal, along and perpendicular to the crystallographic twofold axis. The A subdomains are coloured gold and green and the B domains blue and purple.

C and D. The surface of the μ2 dimer coloured according to electrostatic surface potential (blue positive, red negative, scale from -30 to +30 kT _e_-1)), in the same view as A and B. The planar face at the top of D may interact with the membrane. (Drawn with GRASP(33))

References and Notes

Table 1. Statistics on data collection and phasing.

Native Xe EMTS FYRALM peptide complex DYQRLN peptide complex
Protein construct 122-435 122-435 122-435 122-435 158-435
Data collection+
Resolution (Å) (outer bin) 3.0 (3.16) 3.0 (3.16) 4.0 (4.22) 2.65 (2.79) 2.70 (2.85)
Rmerge* 0.101 (0.910) 0.079 (0.851) 0.116 (0.302) 0.089 (0.882) 0.101 (1.47)
Completeness(%) 99.9 (99.9) 99.8 (99.8) 99.7 (100) 99.4 (96.7) 98.4 (99.8)
</σ()> 17.3 (2.9) 25.9 (2.2) 20.2 (7.2) 21.3 (2.1) 23.5 (2.2)
Multiplicity 10.9 (10.6) 10.7 (8.2) 10.4 (10.6) 9.2 (8.1) 15.8 (14.7)
Rmeas 0.106 (0.957) 0.088 (0.985) 0.124 (0.334) 0.094 (0.942) 0.104 (1.52)
Wilson plot B (Å2) 100 85 78
Multiple isomorphous replacement Phasing:
Number of sites 1 8
Rderiv 0.096 0.255
Rcullis: acentric (centric) 0.643 (0.707) 0.662 (0.683)
Phasing power: acentric(centric)** 1.88 (1.19) 2.29 (1.87)
Anomalous phasing power 0.54 2.28
Mean figure of merit: acentric (centric) 0.374 (0.350) 0.187 (0.205)§
Figure of merit after solvent flattening (all data) 0.864 0.849§
Refinement
R (Rfree)†† 0.273 (0.297) 0.282 (0.325)
(Å2) 60 75
Nreflections (Nfree) 19296 (842) 18413 (801)
Natoms (Nwater) 2143 (51) 2143 (50)
rmsd bondlength (Å) 0.010 0.012
rmsd angle distance (Å) 0.038 0.040