Protein Folding in the Cell: Challenges and Progress (original) (raw)

. Author manuscript; available in PMC: 2012 Feb 1.

Published in final edited form as: Curr Opin Struct Biol. 2010 Nov 26;21(1):32–41. doi: 10.1016/j.sbi.2010.11.001

Abstract

It is hard to imagine a more extreme contrast than that between the dilute solutions used for in vitro studies of protein folding and the crowded, compartmentalized, sticky, spatially inhomogeneous interior of a cell. This review highlights recent research exploring protein folding in the cell with a focus on issues that are generally not relevant to in vitro studies of protein folding, such as macromolecular crowding, hindered diffusion, co-translational folding, molecular chaperones, and evolutionary pressures. The technical obstacles that must be overcome to characterize protein folding in the cell are driving methodological advances, and we draw attention to several examples, such as fluorescence imaging of folding in cells and genetic screens for in-cell stability.

Introduction

Chris Anfinsen launched the field of protein folding by showing that ribonuclease (specifically, bovine pancreatic ribonuclease A) could refold to an active enzyme after reductive denaturation. Naturally, ribonuclease became emblematic of the fundamental tenet of protein folding—that the primary sequence of a protein specifies an energy landscape and a successful route to the native state at the global energy minimum. Yet ribonuclease folds in vivo during a complex journey through the secretory pathway of the cell. Notably, in its biological folding process, ribonuclease confronts milieux that are densely crowded with macromolecules; it samples the microenvironments of the ribosome tunnel, the translocon, and the ER lumen; it has the opportunity to fold from its N- to C-terminus; and it is not left on its own, but instead is accompanied by lumenal chaperones that facilitate its folding and post-translational modification. As this journey abundantly illustrates, protein folding in the cell confronts many issues that are nonexistent in high dilution refolding experiments. It is thus not surprising that an increasing research effort is being applied to issues and processes involved in cellular protein folding.

This review presents research in the expanding area of protein folding in the cell. We mainly confine our discussion to publications on cellular protein folding that have appeared in the last two years and to areas that have not been recently reviewed. We first describe a number of issues, such as macromolecular crowding and co-translational folding, that arise when considering folding in the cell but are absent in vitro (Figure 1). Next, we describe innovative methods to address these issues, particularly in intact cells where the risks of reductionism are minimized. Lastly, we describe some new papers that highlight the complex biological pressures on protein folding in the cell and how these influence protein evolution. Throughout, space constraints have required us to selectively cite the literature in this exciting area. We apologize to any colleagues whose relevant work is not mentioned. In turn we hope that we provide the interested reader with a sense of the major questions touching in vivo folding and examples of provocative recent papers that address these questions.

Figure 1. Schematic depiction of a protein folding reaction in the cytoplasm of an E. coli cell, showing vividly how different the environment is from dilute in vitro refolding experiments.

Figure 1

The cytoplasmic components are present at their known concentrations. As discussed in this review, features of particular importance to the folding of a protein of interest (in orange) are: the striking extent of volume exclusion due to macromolecular crowding, the presence of molecular chaperones that interact with nascent and incompletely folded proteins (GroEL in green, DnaK in red, and trigger factor in yellow), and the possibility of co-translational folding upon emergence of the polypeptide chain from the ribosome (ribosomal proteins are purple; all RNA is salmon). The cytoplasm image is courtesy of A. Elcock.

Macromolecular crowding

A striking difference between most in vitro folding experiments and the cellular environment is the high concentration of macromolecules, which severely limits the cellular volume accessible to a polypeptide chain. Like many issues related to folding in the cell, determining the effects of crowding on folding presents major technical challenges to both computational and experimental studies; moreover, crowding is generally accompanied by other effects including altered diffusion and weak interactions.

The effects of macromolecular crowding have been discussed in an extensive 2008 review by Zhou, Rivas and Minton [1], and recent computational work in this area was critically reviewed by Adrian Elcock [2]. In general, macromolecular crowding is predicted [3] and found experimentally [4,5] to favor compaction, shifting denatured and intermediate ensembles in a folding reaction away from more extended states. The net effect of crowding on native state stability is as yet not entirely clear [1,6], but recent experimental results suggest that any change in stability is modest [4,7]. McGuffee and Elcock recently reported results of an impressive Brownian dynamics simulation of the E. coli cytoplasm showing that steric and electrostatic potentials alone did not recapitulate experimental results on either protein stability in the cell or diffusion rates, but instead they observed much better fits when they included short-range attractive hydrophobic interactions [8]. Additionally, their calculations showed that the impact of crowding depends on the properties of a given protein and the size differential between it and the surrounding macromolecules, as was also concluded by Christiansen et al. [7].

Macromolecular crowding also affects the viscosity of the cellular environment and solvent viscosity has been invoked as an important factor in determining folding rates and mechanisms (see for example [9]). In a recent study, Dhar et al. used computational and experimental approaches to study the effects of a model crowder, Ficoll, on activity and folding of phosphoglycerate kinase (PGK), a 412-aa protein with two domains connected by a flexible hinge [10]. The protein became strikingly more active in the presence of Ficoll, apparently because crowding decreased the inter-domain separation. Also, the relaxation rate from temperature jump unfolding experiments showed a maximum at 100 g/L Ficoll. The authors interpreted this result as arising from opposing effects of crowding and viscosity [10].

Hindered mobility and sticky neighbors

Several recent studies show that the cellular environment affects macromolecular motion and provide insight into how. For example, two recent papers have used fluorescence recovery after photobleaching to monitor translational diffusion of GFP constructs in E. coli cells [11,12]. Using fusion constructs consisting of GFP and native E. coli proteins, Kumar et al. found unexpectedly low translational diffusion rates and argued that macromolecular crowding and macromolecular networks in the cytoplasm lead to molecular sieving even for small, 27 kDa, proteins [12]. In contrast, for proteins of less than ~115 kDa Nenninger and coworkers argued that intermolecular interactions were the main contributor to reduced diffusion [11]. Intriguingly, Pielak and co-workers observed significantly slower rotational diffusion for protein solutes when another protein rather than an inert molecule such as Ficoll or Dextran was used as a crowding agent, arguing for greater effects on rotational than translational diffusion from weak interactions [13,14]. The importance of weak interactions mirrors results from McGuffee and Elcock's computational modeling of the E. coli cytoplasm [8]. However, a recent study concludes that hydrodynamic effects reproduce experimental size dependence of protein mobility in cells better than non-specific weak interactions [15]. Despite the numerous studies into motion in the cell, the impact of this altered mobility on folding is as yet uncharted territory.

Vectorial synthesis and roles of mRNA and ribosomes in folding

Newly synthesized polypeptide chains emerge from the ribosome vectorially, allowing their N-terminal portions to sample conformational space before the chain is completely synthesized. Additionally, the earliest environments encountered by a nascent chain are the ribosome tunnel and ribosome-associated chaperones. There have been excellent recent reviews on issues related to co-translational folding including one in this volume [1619], and we will not duplicate their coverage.

A single domain stabilized by many long-range contacts is not expected to fold until the entire chain is complete, and recent studies of ribosome-bound nascent chains (RNCs) have confirmed this expectation for an SH3 domain by NMR [20] and GFP by observing chromophore maturation [21]. In fact, the NMR studies on SH3 RNCs reveal little or no compaction until the entire chain has exited the ribosomal tunnel [20]. By comparison, RNCs of the larger GFP may populate a more compact state prior to full translation [21].

The interrelationship of translation rate and folding has been discussed in a number of recent reviews [2225]. Intuitively, slowing translation might allow more time for proper folding, and indeed in a recent study mutant ribosomes with reduced translation rates increased the soluble expression of eukaryotic proteins in E. coli [26]. The messenger RNA sequence can also affect the translation rate either through the use of rare codons [22,25,2729] or by RNA folding [29,30]. Either of these factors may be changed by synonymous mutations, where mRNA sequences are altered without affecting the encoded amino acid. Increasing the translation rate of the multidomain E. coli protein SufI by synonymously exchanging rare codon clusters for common codons was found to decrease co-translational folding and the production of mature, folded protein [31]. Intriguingly, the mutation that leads to deletion of F508 (ΔF508) in the cystic fibrosis transmembrane conductance regulator (CFTR), the most common mutation linked to cystic fibrosis, also changes the preceding codon for Ile507 [32]. Alteration of the local mRNA structure in the mutant retards translation and presumably impairs folding, increasing co-translational ubiquitination and leading to protein degradation [32,33]. Restoration of the original Ile507 codon in the ΔF508 background significantly increases the amount of mature CFTR in the plasma membrane, demonstrating the potential impact of synonymous mutations on in vivo protein folding and maturation [32,33].

As described in recent reviews [16,17,19,34], the nascent chain can also be influenced by the ribosomal exit tunnel and, for proteins targeted to the bacterial periplasm or the eukaryotic ER, by the environment of the translocon. Specific sequences are known to stall ribosomes [35], and exciting recent cryo-EM structures from the Beckmann laboratory reveal the nature of their interactions with the tunnel [36,37] The impact of chain conformations within the tunnel on folding and targeting has been described for the E. coli EspP protein [38], and elegant studies by the Deutsch laboratory [39,40] as well as by Johnson, Skach and collaborators [34,41] have demonstrated that membrane protein structure formation can initiate co-translationally within both the ribosome tunnel and the translocon.

Molecular chaperones remodel the in-vivo folding energy landscape

The ability of molecular chaperones to interact with nascent or incompletely folded chains so as to favor successful folding and disfavor aggregation is well established. Yet the impact of chaperones on the folding mechanisms and stabilities of their clients is less clear, despite expanding literature on the functions and substrate-omes of several chaperones (e.g., [42,43]).

Consider the case of arguably the best-studied chaperone, the E. coli chaperonin GroEL and its partner GroES: Based on their own and others’ experimental results, Horwich and co-workers argue that GroEL/ES is a passive ‘Anfinsen cage’ protecting its substrate from aggregation during folding but otherwise having little effect on its folding landscape [44,45]. Conversely, interactions between the substrate and the walls of the GroEL cavity may actively expand and compress substrate proteins [46,47]. Direct examination of complexes between GroEL and substrates show binding of several GroEl subunits to a single substrate with some distortion in the overall chaperonin conformation [48,49]. This image is consistent with models emerging from recent FRET studies of labeled GroEL substrates [50,51], which implicate a hierarchy of hydrophobically-mediated chaperonin-substrate binding events in remodeling the substrate folding mechanism.

Spatial organization, membranes and compartmentalization

Cellular interiors are highly anisotropic with elaborate and physiologically critical architectures. This subcellular organization plays a major role in folding at several levels: For example, the native structures of membrane proteins are tuned to the diverse microenvironments and two-dimensional character of membranes. Additionally, membranes are barriers, requiring proteins made on one side but destined to perform their functions extracytoplasmically to be translocated across the membrane, in most cases unfolded. For secretory proteins, passage through cellular compartments is highly choreographed with an assembly line of modifying enzymes, chaperones and transport mechanisms. Compartmentalization also opens up the possibility of chemical gradients, e.g. pH or oxidizing potential. Taken together, the spatial organization and compartmentalization of the cellular environment enable folding to occur in temporally optimized steps.

Bacterial proteins destined for the periplasm or the cell exterior can be secreted across membranes either folded (e.g., by the Tat system) or unfolded (e.g., by the Sec system) [52]. Recent work by Ignatova and coworkers reveals that for both E. coli and B. subtilis transcriptomes, proteins targeted to Tat are more likely to have slowly translated regions, which presumably encourage co-translational folding and subsequent translocation [31]. The autotransporter bacterial virulence factors transport their unfolded passenger domain via a channel formed, at least in part, by the C terminal porin domain, and extracellular folding of the C-terminal portion of the passenger domain drives translocation of the remainder of this domain across the membrane [53,54]. This coupling of folding and membrane translocation represents a distinct mechanistic difference between in vitro and in vivo folding.

Proteins that translocate through the Sec channel in bacteria and eukaryotes are greeted by an array of chaperones and modifying enzymes that alter the folding energy landscape. In addition, they move from an ATP-rich, reducing environment to one that is ATP-poor and oxidizing as they enter either the bacterial periplasm, mitochondrial intermembrane space (IMS) or the eukaryotic ER. We refer interested readers to recent reviews of protein folding in the ER [55], in the mitochondrial IMS [56,57], and in the periplasm [58,59]. Here, we discuss only a few highlights of the recent literature in this active research area.

Exposure of nascent chains to the oxidizing environment of the periplasm or ER lumen enables step-wise disulfide bond formation, fixing the topology of secretory proteins. Not surprisingly, the timing and specificity of disulfide bond formation is integral to their in vivo folding, and this issue has been widely studied in eukaryotic proteins [55]. Recently, Kadorkura and Beckwith provided an analogous picture for folding in the bacterial periplasm by trapping folding intermediates of alkaline phosphatase (PhoA) with incomplete disulfide bond formation [60]. They showed that protein disulfide isomerase-facilitated disulfide bond formation in PhoA occurs sequentially as the Cys residues emerge from the translocon [60]. Tapley et al. recently identified another example of coupling folding with the environmental gradient between cytoplasm and periplasm in their discovery of a pH-triggered periplasmic chaperone system [61].

Many proteins that fold in the ER lumen are large with complicated topologies including multiple domains, disulfides, and glyosylation sites, which play roles in specialized in vivo folding mechanisms. A recent study on the influenza membrane glycoprotein neuraminidase [62] found that its interactions with the lectin chaperones calnexin and calreticulin, which bind in a glycosyl-specific manner, help ensure proper folding and oligomerization. In another exciting report, a novel mechanism for the coordinated folding and assembly of heavy and light chains of IgG antibodies was recently unveiled [63,64]. IgG maturation requires that the intrinsically disordered CH1 heavy chain domain undergo isomerization of a specific trans X-Pro bond to cis, disulfide bond formation, and then binding-induced folding upon association with the CL domain of the light chain [63,64]. The unfolded, reduced CH1 domain has a high affinity for the ER-resident Hsp70 chaperone BiP, and so spends its time waiting for the partner light chain in complex with BiP.

Promising new methods for study of in vivo protein folding

The Holy Grail in studies of protein folding in the cell is to directly observe a protein of interest (POI) in intact cells and to characterize its folding, both thermodynamically and kinetically, in situ. Not surprisingly, this has proven exceedingly difficult. Several years ago, Ghaemmaghami and Oas took advantage of _E. coli_’s urea tolerance to perform in-cell urea titrations of the λ repressor headpiece utilizing a novel hydrogen exchange/mass spectrometry method to assess stability [65]. The resulting analysis showed little change in stability relative to dilute solution, although placing the bacteria in hyperosmotic medium markedly increased stability. Subsequently, our lab combined this in-cell urea titration approach with an in-cell fluorescence folding-reporter system to monitor protein stability [66,67]. We also observed little change in stability but a substantial change in the urea dependence of the apparent equilibrium constant (m-value), not explained by our subsequent in vitro study of the effect of crowding [4]. While promising, these early studies underlined the desirability of approaches to probe in-cell stability that do not rely on harsh denaturants.

Exciting recent work from the Gruebele lab combines temperature-jump perturbation methods with fast relaxation imaging (FreI) to interrogate the in vivo folding landscape of a POI, here a temperature-sensitive variant of phosphoglycerate kinase (tsPGK) [68]. tsPGK was fused at each terminus with a fluorescent protein to form a folding-sensitive FRET reporter system (Figure 2a), similar to earlier work by Philipps et al. [69]. Using localized short laser-initiated temperature jumps to trigger tsPGK unfolding, these researchers measured protein folding kinetics in real time in living cells (Figure 2b) [68]. Tantalizingly, FreI experiments show that diffusion is slow in the cytoplasm and that folding kinetics are significantly different in different regions of the cell (Figure 2c) [68].

Figure 2. Monitoring protein folding kinetics in a living cell using fast relaxation imaging (FReI) [68].

Figure 2

(a) The folding sensor was created by sandwiching a temperature-sensitive POI (here, PGK) between two fluorescent proteins, GFP and mCherry, allowing folding to be monitored by FRET. (b) Image of the fluorescence arising from expression of the fusion protein in a human cell. (c) Map of the refolding dynamics of the fusion protein in the cell shown in (b). The time dependence of the FRET signal following a temperature jump is spatially mapped. Figures 2b and c are courtesy of M. Gruebele.

In-cell NMR, recently reviewed by Pielak [70] and by Ito and Selenko [71], is a potentially powerful approach to study proteins in vivo and to gain insight into their stability and folding mechanisms. Unfortunately, many folded proteins fail to show measurable NMR spectra in the cellular environment, most likely due to hindered rotational diffusion. Despite the inherent obstacles, the Shirakawa group has had impressive success applying NMR to small proteins in eukaryotic cells [72], including interrogating stability by hydrogen exchange. They found significant enhancement in exchange kinetics for ubiquitin inside HeLa cells. Interestingly, a mutant ubiquitin that should bind less well to partner proteins had lower hydrogen exchange rates in-cell than did wild-type ubiquitin, arguing that protein-protein interactions may modulate protein stability in the cell.

Clever use of split reporters, in which folding of the POI is coupled to successful binding and folding of two pieces of a reporter protein (Figure 3), has shed light on in vivo folding properties. This strategy was introduced to screen for soluble, and by inference stable, proteins in earlier work by Thomas and coworkers using a genetic marker [73] and by Waldo and coworkers using recombination of GFP [74] (Figure 3a). This approach has been exploited with Im7 as the POI and β-lactamase as the split reporter (Figure 3b) [75], and developed into more general screens for successful folders in the bacterial periplasm [76] and in yeast expression systems [77]. In the Im7 work, Foit et al. correlated resistance to β-lactam antibiotics with the stability of Im7 variants in the periplasm and selected for variants with enhanced stability [75]. Intriguingly, many stabilizing mutations altered functional residues that would normally be involved in interactions between Im7 and its binding partner colicin E7. These results point to the conflicting pressures on protein sequences placed by multiple evolutionary constraints, and the trade-offs in folding properties required to meet in vivo requirements for function (see next section).

Figure 3. Selection and screening strategies designed to assess protein stability in vivo.

Figure 3

(a) Construct designed such that GFP fluorescence reports on protein solubility, which is correlated to stability [74]. The gene for the POI is C-terminally fused to a β strand from GFP, and a GFP construct missing this strand is co-expressed with the fusion. GFP fluorescence reports on the successful docking of the missing strand onto the truncated GFP. In turn, unstable (or insoluble) POIs with their attached GFP strands are degraded (or inaccessible to recombine with the truncated GFP). (b) Screen designed such that successful periplasmic expression of β-lactamase requires a stably folded POI and hence recombination of two β-lactamase fragments that flank the POI [75]. Here, the level of functional periplasmic β-lactamase, which relies on stability of the POI, is read out as resistance to increasing concentrations of β–lactam antibiotic [75]. As in (a), this correlation is presumably based on the enhanced proteolytic susceptibility of the fusion protein when the POI is unstable. (c) Construct designed such that flexibility of the POI controls transcription of the β-lactamase gene [78,79]. When the POI is flanked by the N-terminal DNA-binding domain of bacteriophage λ, which binds to the λ operator, and the RNA polymerase α subunit, which activates transcription of the β-lactamase gene, flexible POIs lead to greater transcriptional activation and higher antibiotic resistance.

These approaches to in-cell stability use a single flanking reporter and are clearly powerful as selections for enhanced stability. However, they are end-point assays based on the proteolytic lability of the fusion construct when the POI is unstable and therefore cannot readily yield an estimate of folding free energy. A related in vivo screen flanks the POI between a DNA-binding domain and a transcriptional activation domain [78] (Figure 3c). Only when the POI is sufficiently flexible can the flanking domains effectively activate transcription of a reporter gene. The authors tested streptococcal protein G B1 variants with melting temperatures over a wide range, from 38 to 100 °C. The resulting range of expression levels was remarkably well correlated to the Tm values of the test protein (higher Tm, lower expression)--reliably enough to select for stabilizing mutations in a directed evolution experiment [79]. In this intriguing stability reporter system, the read-out may be more directly related to the thermodynamic stability of the POI.

Only the fittest survive: evolutionary implications for folding in the cell

How the competition for fitness at the organismal level imposes evolutionary constraints on protein molecular properties is complex. Attaining the native functional state is biologically essential both because a protein must perform its cellular function(s) and because misfolded or incompletely folded proteins can be deleterious [80]. However, the optimal molecular properties to benefit the organism are not simply stability and activity, and even these may be in conflict, as illustrated in the Im7 work described above [75]. Nature is, in a real sense, carrying out a selection experiment with an array of selective pressures, many of which we have yet to identify.

The relationship between protein folding and evolvability has recently been reviewed by Tokuriki and Tawfik [81], and a stimulating earlier review appeared in 2008 [82]. Thus, we mention only a few particularly relevant papers that speak to the biological pressures on protein folding in the cell.

How do in-vivo folding properties evolve? A key challenge is how an organism copes with ‘weak-link’ protein variants as it moves towards more successfully evolved versions. One strategy exploits chaperones as buffers, enabling the organism to cope with mutated proteins that have impaired folding properties [81,83,84]. In a recent study, Shamoo and coworkers explicitly designed a weak-link challenge and asked how it was overcome: They substituted the normal (and essential) gene for adenylate kinase in the thermophile Geobacillus stearothermophilus by the analogous gene from the mesophile B. subtilis and grew the resulting bacterial population at high temperatures to force competition between mutants [85,86]. Success in the population was achieved by a combination of mutations that led to enhanced activity and improved folding properties [85,86].

While the evolution of a single protein is informative, computational analyses of sequence databases provide a more global view of evolutionary trends. For example, Vendruscolo and collaborators have used physicochemical analysis of proteins to predict cellular properties of proteins such as expression levels [87]. More recently, these researchers found that aggregation propensity and folding propensity tend to be anti-correlated [88].

Stability, function, and evolvability are only part of the equation: In thought-provoking work on p53 from Alan Fersht’s lab, an additional factor emerges in biological requirements on protein folding: It appears that the amount of unfolded protein in the cell may be under a selective pressure because of the constraints on degradation and consequent regulation of the pool of protein available for its transcriptional regulatory activity [89].

Concluding thoughts

Elucidating how protein expression, folding, function, misfolding and aggregation, chaperone interactions, and degradation are balanced dynamically will require aggressive, multidisciplinary, and creative research in the future. Most importantly, a holistic view will be required as we seek to understand protein folding in the cell and to gain and integrate both biophysical and physiological insights. Some aspects of in vivo folding have thus far been the domain of particular fields: for example, how proteins fold as they traverse the secretory pathway in eukaryotes has been a topic of study in cell biology. Also, with few exceptions and for obvious reasons, computational modeling has been applied most extensively to isolated, small proteins. We urge that the challenges of folding in the cell be tackled by teams of investigators, capable of developing the needed arsenal of novel and powerful methods and simultaneously retaining a keen sense of the biological realities imposed on this already complex chemical reaction: the successful formation of a native protein capable of insuring the fitness of the host organism.

Acknowledgments

We thank Adrian Elcock and Martin Gruebele for generously providing figures, Rob Smock for help with Figure 1, and Dan Hebert for critical reading of the manuscript. Funding from the National Institutes of Health (grants OD000945, GM027616, and GM094848) is gratefully acknowledged.

Footnotes

Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

References