Structure of the receptor-binding carboxy-terminal domain of bacteriophage T7 tail fibers (original) (raw)

Abstract

The six bacteriophage T7 tail fibers, homo-trimers of gene product 17, are thought to be responsible for the first specific, albeit reversible, attachment to Escherichia coli lipopolysaccharide. The protein trimer forms kinked fibers comprised of an amino-terminal tail-attachment domain, a slender shaft, and a carboxyl-terminal domain composed of several nodules. Previously, we expressed, purified, and crystallized a carboxyl-terminal fragment comprising residues 371–553. Here, we report the structure of this protein trimer, solved using anomalous diffraction and refined at 2 Å resolution. Amino acids 371–447 form a tapered pyramid with a triangular cross-section composed of interlocked β-sheets from each of the three chains. The triangular pyramid domain has three α-helices at its narrow end, which are connected to a carboxyl-terminal three-blade β-propeller tip domain by flexible loops. The monomers of this tip domain each contain an eight-stranded β-sandwich. The exact topology of the β-sandwich fold is novel, but similar to that of knob domains of other viral fibers and the phage Sf6 needle. Several host-range change mutants have been mapped to loops located on the top of this tip domain, suggesting that this surface of the tip domain interacts with receptors on the cell surface.

Keywords: bacterial viruses, caudovirales, crystallography, infection, Podoviridae


Bacteriophages (bacterial viruses or phages) are important biological model systems and, because of the high specificity for their host bacteria, have found application in phage typing, food security, and phage therapy (1). Escherichia coli phage T7 is a member of the Podoviridae family of the Caudovirales (tailed phages) order (2). T7 is composed of an icosahedral capsid with a 20-nm short tail at one of the vertices (3, 4). The capsid is formed by the shell protein gene product (gp) 10 and encloses a DNA of 40 kb. A cylindrical structure composed of gp14, gp15, and gp16 is present inside the capsid (5), attached to the special vertex formed by the connector, a circular dodecamer of gp8 (6). Gp11 and gp12 form the tail; gp13, gp6.7, and gp7.3 have also been shown to be part of the virion and to be necessary for infection, although their location has not been established (7, 8). Although extensive electron microscopy studies have been performed on phage T7 (36, 9), crystallographic studies have so far been limited to its nonstructural proteins.

The main portion of the tail is composed of gp12, a large protein of which six copies are present (10); the small gp11 protein is also located in the tail (5). Attached to the tail are six fibers, each containing three copies of the gp17 protein. T7 tail fibers are elongated homo-trimers, which are responsible for initial, reversible, host cell recognition. A second, irreversible, decision-making interaction with the bacterial membrane is presumably mediated by one or more of the tail-tube proteins. DNA transfer into the host is then mediated by an extension formed by gp14–16 (7, 8, 11). Previously, we have reported the production of well-diffracting crystals of the phage-distal carboxyl-terminal domain of gp17 containing a 45-residue purification tag (12). Here we present its structure, solved using four-wavelength anomalous diffraction analysis of a mercury derivative and refined against data collected from crystals of two different forms.

Results

Structure Solution.

Crystals of gp17(371–553) belonging to two different space groups were obtained, _P_212121 and _C_2221, and a multiwavelength anomalous diffraction dataset was collected on a crystal of form _P_212121 derivatized with methylmercury chloride (12). The derivative was not isomorphous to the native crystal and the native dataset was not used in phase determination. Nine heavy atom sites were identified, six of which are located near the two cysteines present in each of the three chains (Cys408 and Cys499). Two sites are near Asp442 (of chains A and B) and one near His433 of chain A. After phasing and solvent flattening at 2.7 Å resolution, a readily interpretable map was obtained in which 527 residues could be automatically traced. This model was used as input in a molecular replacement with the 1.9 Å resolution dataset obtained of the _P_212121 crystal form. Automatic tracing at this resolution produced a model containing 528 residues, which was completed and corrected manually and to which solvent atoms were added. An intermediate protein model was used to solve the _C_2221 crystal form structure at 2.0 Å resolution, which was also completed. Both structures were refined to R-factors of 15% and free R-factors of 20% and have few residues in unlikely regions of the Ramachandran plot (Table S1). One amino acid, Gly522, is in a very uncommon conformation in all three chains of both structures (all have the backbone torsion angles ϕ around 165° and ψ around −90°); however, it has convincing electron density in each case and is thus likely forced into this conformation by the rest of the structure.

Overview of the Structure.

Electron micrographs revealed gp17 to be an extended protein, with a proximal rod about 16-nm long and 2 nm in diameter, a sharp kink, and a distal rod about 15-nm long and with a diameter that varies between 3 and 5 nm (Fig. 1_A_) (3). The distal rod can be divided into four “nodules” of unequal size, which were estimated to contain residues 268–365, 366–432, 433–456, and 466–553, respectively. The crystal structure of the gp17(371–553) fragment corresponds to the most distal three of these nodules (Fig. 1). The structure can be divided into two parts: a globular “tip” domain (residues Ala465-Glu553), corresponding to the fourth and last nodule, and a tapered interlocked mainly β-structured pyramid domain (residues Gly371 to Trp454) which, from the electron microscopy analysis, was interpreted as forming the second and third nodules. The pyramid domain contains three short α-helices (one from each monomer) at its thinner end.

Fig. 1.

Fig. 1.

Crystal structure of bacteriophage T7 gp17(371–553). (A) Composite negatively stained electron microscopy image of the tail-complex of bacteriophage T7. The region of one of the six symmetry-equivalent positions where our structure fits into this complex is boxed (reprinted from ref. 3, with permission from Elsevier, http://www.sciencedirect.com/science/journal/00222836). (B) Ribbon diagram (Left) and space-filled representation (Right) of gp17(371–553). The three monomers are colored red, green, and blue. The termini of the red monomer are indicated. (C) Topology diagram. β-Strands are shown as arrows, α-helices as rectangles. Secondary structure elements are labeled and their start and end residues are indicated for the green monomer, as are the termini. Only one of the three tip domain monomers is shown. Asterisks and hash signs indicate connections in the other monomers that were removed for clarity. (D) Top view of the tip domain with the side chains of residues Ala518, Asp520, and Val544 shown in space-filled representation (labeled in the green monomer only).

When the trimeric structures from the two crystal forms are superimposed, a rms difference of 1.2 Å between their C-α positions is observed (residues 373–553 of all chains were included in the calculation). Amino acids N-terminal to residue 373 show different conformations, probably as a result of the different crystal packing in the two structures. When individual chains are compared, another local difference becomes apparent in the region around β-turn Asp390-Arg393 in one of the chains, which forms a crystal contact in the _C_2221 crystal form. The tip domains alone (amino acids 465–553) superpose with an rmsd of only 0.2 Å and the pyramid domain (amino acids 373–464) superposes with an rmsd of only 0.5 Å, suggesting the short loop (Val464-Lys466) between the tip and pyramid domains is somewhat flexible. This flexibility may be of importance in the infection process. Adenovirus fiber contains a hinge region between the shaft and head domains (13); mammalian reovirus fiber (σ1 protein) has a hinge region between the last and second-to-last triple β-spiral repeats of the shaft (14). In the case of adenovirus fiber, flexibility has been shown to be important for the infection process, presumably to allow secondary receptor interaction (15). Flexibility of the T7 tail fiber may also be necessary to allow for tail conformational changes during the infection process.

Pyramid Domain.

The pyramid domain is composed of three concave nine-stranded mixed β-sheets stacked against each other, forming a tapered pyramid with a triangular cross-section. The taper is the result of the β-strands at the bottom of the pyramid being longer (up to 6 amino acids) than those near the top (down to three residues). Each β-sheet involves strands contributed by all three chains; interactions between β-strands are antiparallel between strands from the same monomer, but interaction between β-strands from neighboring monomers are parallel. Each β-sheet consists of one strand, R, from the first monomer, followed by five strands, STUVW, from the next monomer, and capped by three strands, XYZ, from the third. The loops connecting the β-strands vary from short β-turns to longer loops; all loops are well ordered in both crystal forms and interact extensively with other loops from the same or from neighboring monomers. The center of the domain contains exclusively hydrophobic and aromatic side-chains; each β-strand contributes one or two central side-chains to this core. In the loops and at the beginning and end of the β-strands, interactions are mixed and, apart from hydrophobic interactions, many polar interactions are formed. At the top of the pyramid domain three α-helices are located, one from each protein chain. Leu455 and Leu459 residues from the α-helices project into the center, forming a small hydrophobic core, capped at the top by Phe463.

Carboxyl-Terminal Globular Tip Domain.

The α-helices are connected to the tip domain by a loop (Val464-Lys466) containing both polar and apolar residues. This loop surrounds a solvent-filled central cavity between the tip and pyramid domains. The presence of this solvent-filled cavity is further indication of possible flexibility between the tip and pyramid domains.

The tip domain of each monomer (amino acids 464–553) forms a β-sandwich with the topology shown in Fig. 1_C_. The β-sandwich contains two sheets of four β-strands each, BIDE and CHGF. β-Strands B and C are on the outside of the trimer, E and F on the inside. A search using the program DALI (16) did not turn up β-sandwich domains with the same topology (see below). As in the β-structured pyramid domain, all loops between β-strands are well ordered in both crystal forms and interact extensively with neighboring loops. Interactions between the tip monomers are of mixed nature, involving salt-bridges, hydrogen bonds, solvent molecules, and van der Waals interactions. The end of the protein chain is locked firmly into place by two salt-bridges between Glu551 and Arg508 of a neighboring chain and between Glu553 (the very C-terminal residue) and Lys468 of the other neighboring chain. This position likely prevents attack of the C terminus by proteases and contributes to trimer stability.

Discussion

Phages belonging to the Caudovirales order attach to host bacteria with the end of their tails. Primary, generally reversible, recognition is via tail-spike proteins or tail fibers on the side of the tail; at this point the phage is not yet committed to DNA injection. Positive recognition leads to central tail proteins productively attaching to the host membrane and DNA injection. The tail-spike of the podovirus P22 has been studied extensively in terms of carbohydrate binding, hydrolysis, folding, and assembly (17, 18). Much is also known about assembly and function of the complex fibers of the myovirus T4 (1921). Siphoviruses like T5 and λ contain less-studied side tail fibers (22, 23) which, like the T4 fibers, do not exhibit receptor-hydrolysis activity. Here we have presented unique high-resolution structural information on a podoviral tail fiber that also does not hydrolyze its receptor.

Stability and Folding of gp17(371–553).

The surface area of a monomer is around 12 × 103 Å2, of which more than 40% is buried upon trimer formation (5.2 × 103 Å2). The calculated dissociation energy of the trimer is 125 kcal/mol (24) and contains more than 120 intermonomer hydrogen bonds and nearly 40 potential intermonomer salt-bridges. When the pyramid domain trimer (Gly371-Phe463) is considered, the surface area of a monomer is 7.4 × 103 Å2, of which nearly half (3.6 × 103 Å2) is buried. The calculated dissociation energy of the pyramid domain trimer is 90 kcal/mol and contains more than 90 intermonomer hydrogen bonds and 8 potential intermonomer salt-bridges. The surface area of a tip domain monomer (Val464-Glu553) is 4.7 × 103 Å2, of which 1.4 × 103 Å2 is buried (around 30%). The calculated dissociation energy of the tip domain trimer is 17 kcal/mol and contains 27 intermonomer hydrogen bonds and 30 potential intermonomer salt-bridges. The large buried surface area explains the stability of trimeric gp17(371–553), as revealed by its high melting temperature (around 74 °C) (Fig. 2) and the fact that it does not dissociate into monomers in SDS/PAGE, unless previously boiled in SDS-containing buffer (12). Our structure shows there are no disulphide bonds in the C-terminal construct, and the crystallized fragment contains the only two cysteines of the gp17 sequence.

Fig. 2.

Fig. 2.

Thermal stability assay. The relative fluorescence emission intensity (R) is plotted as a function of the temperature (x axis, °C). A melting temperature of around 74 °C was estimated (midpoint between the baseline and the point with maximum fluorescence intensity).

The C-terminal tip domain is the only domain of the current structures where the monomer has a potentially independent fold and where the three chains do not intertwine. The surface buried between monomers and calculated dissociation energy is also smaller. This finding suggests gp17 folding may begin with the spontaneous formation of a monomeric carboxyl-terminal β-barrel. Interaction of three β-barrels could then lead to trimer formation. The remaining region of the fiber would then “zip up” to form the intact trimer. This folding pathway would be the same as that proposed for adenovirus fiber (25) and consistent with the observation that amber mutants in the C-terminal domain do not incorporate truncated gp17 in the virions (10). Gp17 does not appear to require specific chaperones for folding like phage T4 fibers do (26).

Comparison with Other Proteins.

When the sequence of phage T7 gp17 is compared with protein sequences in databases, a strong similarity (> 85% sequence identity) is observed with gp17 of E. coli phage T3, Yersinia pestis phage PhiA1122, and enterobacteria phage 13a. Sequence similarity is also observed with fibers of phages from other bacterial species (Kluyvera, Salmonella, Pseudomonas, Vibrio, Klebsiella, and others). The phage attachment domain (amino acids 1–150) belongs to domain family PHA00430 and, of the gp17 domains, has the most sequence homologs in the database, consistent with the fact that many T7-like phages have the same mechanism of attaching the fiber to the phage. The tip domain has fewer sequence homologs, only T3, PhiA1122, enterobacteria phages 13a, BA14, 285P, and Yersinia phages Yep-Phi, Berlin, and Yepe, consistent with the well-known fact that many phages have evolved their C-terminal receptor attachment domain to adapt to different hosts by exchange with other phages (27). An illustrative example is E. coli phage K1F, which has a homologous phage-attachment domain but a different C-terminal domain containing endo-_N_-acetylneuramidase activity to digest the E. coli K1 capsular polysaccharide (28, 29). Only a few phages appear to have evolved by mutation maintaining the same structural framework. Residues 359–524 of T7 gp17 (encompassing all but the final 29 residues of our structure) have been classified in Pfam domain gp37_C, which includes a subset of the highly diverged C-terminal regions of the gp37-type tail-fiber proteins of T4-like phages, the subset found in phages AR1 and Ac3 (30). Thus, amino acid sequence comparisons suggest that the tapered triangular pyramid and part of the tip structure of T7 tail fibers may be structural building blocks used in fibers of phages of the Myoviridae as well as the Podoviridae family.

When searches are conducted either with the whole structure presented here or with separate domains, no structural homologs with the same topology are found in the PDB database. Adenovirus fiber (13), reovirus fibers (14, 31), bacteriophage PRD1 P5 (32), and phage Sf6 cell-penetrating needle (33) all have a β-structured C-terminal domain of similar size and organization as gp17 has, but with a different topology. The Sf6 cell-penetrating needle knob domain (Fig. 3_A_) has the most similar topology, with two small extra β-strands (C1 and C2) and a straight swap of the E and G strands compared with our structure. However, their stalk domains contain only triple β-spiral folds or a triple coiled coil (34, 35), without an intervening β-helical domain, such as the pyramid domain in the present structure. The receptor-binding proteins of lactococcus phages p2 and TP901-1 also have a similar C-terminal domain attached to a short β-helical stalk (36, 37). The bacteriophage T4 short tail-fiber (gp12) and long tail-fiber (gp37) C-terminal domains have different folds, consisting of three intertwined monomers rather than composed of individually folded monomeric domains (21, 38), and T4 fibritin (gpwac) and the phage P22 cell-penetrating needle (gp26) have a much smaller trimerisation domain (39, 40).

Fig. 3.

Fig. 3.

Structures of the cell-penetrating tail needle knob domain of phage Sf6, PDB-code 3RWN (A), gpV of phage P2, PDB entries 3QR7 and 3QR8 (B), and gp138 of phage phi92, PDB codes 3PQH and 3PQI (C). Iron, calcium, and chloride ions are shown as yellow, gray, and green balls, respectively. The region of gp138 that is topologically the same as amino acids 384–446 of phage T7 gp17 is indicated.

The pyramid domain, with its interlocked trimeric β-helix, also does not have any exact topological equivalent in the structure database. Comparable structures include the triple β-helix regions of phage T4 gp12 (41) and gp5 (42), the K1F endosialidase (43) and endo-_N_-acetylneuramidase (29), streptococcal (pro)phage HylP1 and Hylp2 (44, 45), the phage P22 cell penetrating needle gp26 (40), and phage Phi29 tail-spike (46). However, the most similar structures are those of phage P2 gpV (Fig. 3_B_) and phage Phi92 gp138 (Fig. 3_C_) (47, 48), which form the tip of inner tail tube and are the first proteins to pierce the membrane in these myoviruses. Both proteins contain a tapered β-helix, strongly intertwined in the case of gpV and interlocked like gp17 in the case of gp138. Eight β-strands (from sheets STUVW and XYZ) have structural equivalents in gp138. However, both gpV and gp138 contain a small apex domain with a central iron ion instead of the α-helical region, and lack a globular tip domain.

Receptor Binding.

Initial, reversible, binding of phage T7 to bacteria is mediated by the interaction of its six gp17 tail fibers with LPS (7). This interaction is presumably followed by a secondary, irreversible attachment of the tail to an unknown receptor. The relative importance of the two interactions in host-range determination is not known; gp17 may just keep the phage near the bacterial surface to make productive tail interactions with its receptor more likely (i.e., 2D diffusion vs. 3D diffusion). However, Heineman et al. (49) identified two mutants of bacteriophage T7 gp17 involved in host avoidance: a Asp520 to Glu change adapted the phage to avoid E. coli B and a Val544 to Ala change adapted the phage to avoid E. coli K12. Furthermore, Garcia et al. (50) identified a spontaneous host-range change mutant in the highly homologous phage PhiA1122 fiber in this region (Leu523 to Ser, which aligns to Ala518 of bacteriophage T7 gp17). Ala518 and Asp520 are located in the EF-loop and Val544 in the GH-loop of the gp17 tip domain; both are located on the top of the tip trimer in our structure (Fig. 1_D_). When the sequence of bacteriophage T7 gp17 is aligned with that of tail fibers of E. coli phage T3, Salmonella enteridis phage 13a and Y. pestis phage PhiA1122 (Fig. 4), the main differences are also observed in residues that are located in the four loops at the top of the tip domain (BC-, DE-, FG-, and HI-loops). Taken together, this information strongly suggests that the tip domain of gp17 has an important role for host-range determination, presumably by binding to a specific LPS region that differs between bacterial strains or that may be occluded in some strains and available for interactions in others.

Fig. 4.

Fig. 4.

Sequence conservation of gp17. (A) Alignment of the sequence of bacteriophage T7 gp17 with its homologs from E. coli phage T3, S. enteridis phage 13a, and Y. pestis phage PhiA1122. Amino acids present in our structures are shown in bold. Residues that are identical in all four proteins are marked with asterisks. Secondary structure elements identified in our structure are also indicated, beta-strands with arrows and α-helices with plus-signs. (B and C) Sequence conservation mapped on the structure. The color scale is from white (absolutely conserved) to black (no conservation). A top view (B) and a side view (C) are shown. Residues that are not conserved, and thus may be important for host range discrimination, are indicated.

Another possible site for receptor interaction is the concave eight-stranded β-sheet of the pyramid domain. Although there is no biochemical or mutation evidence, the concave shape and the presence of several aromatic residues is suggestive. In this region, in the _P_212121 crystal form, Tyr385, Tyr397, and Tyr413 interact with a trimethylamine-_N_-oxide molecule from the crystallization solution in each of the three monomers, but in the _C_2221 crystal form Tyr425 interacts with a carbonate ion in two of three monomers. Site-directed mutagenesis experiments combined with competition experiments, identification of the LPS region to which T7 binds, and cocrystallization studies with the relevant LPS fragments will be needed to identify the specific gp17-residues from the tip or pyramid domain responsible for receptor binding. If and how this initial interaction is communicated to the phage to trigger irreversible binding and DNA injection is also an interesting future subject of study.

Conclusion

We have solved the structure of the C-terminal domain of the phage T7 tail fiber, gp17(371–553). The structure contains two domains with hitherto unseen topologies, provides insight into the reason for the stability of the protein, and suggests regions that may be involved in receptor binding. The structural data may also contribute to applications of bacteriophages in, for example, bacteria typing or phage therapy, through the rational design of mutants binding different receptors or the engineering of artificial, chimeric phage fibers.

Materials and Methods

Thermal Stability Assay.

Thermal stability of the protein was measured by following the fluorescence of the dye Sypro orange in the presence of 0.01 mM gp17(371–553) as a function of temperature as described in Dupeux et al. (51). The melting temperature was estimated as the temperature corresponding to the midpoint between the baseline and the point with maximum fluorescence intensity (Fig. 2).

Crystallographic Structure Solution and Refinement.

Crystallogenesis and crystallographic data collection have been described previously (12). Datasets of the _P_212121 mercury derivative collected at four different wavelengths (peak, inflection point, high energy remove, low energy remote) were input into AUTOSHARP (52). Three mercury sites were identified by the SHELXC/D programs (53) and phases were refined using data between 20.0 and 2.7 Å with AUTOSHARP, which found additional sites and rejected sites, settling on a final list of nine heavy atom sites. Further solvent flattening and histogram matching was done with SOLOMON (54) and automated building proceeded using BUCCANEER (55) at 2.7 Å resolution. This model was used for molecular replacement into the higher resolution, native data using the program MOLREP (56). The model was then input into the ARP-WARP (57) auto-trace mode using the data to 1.9 Å resolution for the _P_212121 crystal form. Adjustment of the model, addition of extra amino acids and solvent molecules was done with COOT (58). Refinement was done using the REFMAC5 program (59). The final model contains 548 amino acids, 1 poly-ethyleneglycol fragment, 1 Tris molecule, 3 trimethylamine-_N_-oxide molecules, and 754 water molecules. A partially complete model refined against the high-resolution _P_212121 data were used to solve the _C_2221 structure by molecular replacement, which was extended and refined as above. This final model contains 552 amino acids (including one amino acid from the purification tag; the rest of the purification tag is apparently disordered), 2 CO3 molecules, and 910 water molecules. Data statistics were published previously (12); phasing, refinement, and model statistics can be found in Table S1. Validation was performed with MOLPROBITY (60) and protein structure figures were prepared using PYMOL (Schroedinger) and University of California at San Francisco CHIMERA (61).

Supplementary Material

Supporting Information

Acknowledgments

We thank Ana Cuervo and José L. Carrascosa for bacteriophage T7 DNA and introducing us to T7 biology; Antonio L. Llamas-Saiz, José M. Otero, and José-Ignacio Baños-Sanz for diffraction testing of crystals; Silvia Russi and Hassan Belrhali of the European Molecular Biology Laboratory for help with data collection at BM14; Petr Leiman for careful reading of the manuscript; and Florine Dupeux, Martin Rower, Gael Seroul, Delphine Blot, and José A. Marquez for performing the thermofluor and high-throughput crystallization experiments (https://embl.fr/htxlab/). We acknowledge the use of the High-Throughput Crystallization Laboratory at the European Molecular Biology Laboratory Grenoble outstation (France), which receives funding from the European Community’s Seventh Framework Program under Contract 227764. This research was funded by research Grants BFU2008-01588 and BFU2011-24843 from the Spanish Ministry of Science and Innovation; a grant from the Bill and Melinda Gates Foundation through the Grand Challenges Exploration initiative (to M.J.v.R.); and a Formación del Profesorado Universitario predoctoral contract of the Spanish Ministry of Education (to C.G.-D.).

Footnotes

The authors declare no conflict of interest.

This article is a PNAS Direct Submission.

Database deposition: Coordinates and structure factors for the _P_212121 and _C_2221 crystal forms have been submitted to the Protein Data Bank, www.pdb.org (PDB ID codes 4A0T and 4A0U, respectively).

References

Associated Data

This section collects any data citations, data availability statements, or supplementary materials included in this article.

Supplementary Materials

Supporting Information