How the "folding funnel" depends on size and structure of proteins? a view from the scoring function perspective (original) (raw)

Coupling between Properties of the Protein Shape and the Rate of Protein Folding

PLoS ONE, 2009

There are several important questions on the coupling between properties of the protein shape and the rate of protein folding. We have studied a series of structural descriptors intended for describing protein shapes (the radius of gyration, the radius of cross-section, and the coefficient of compactness) and their possible connection with folding behavior, either rates of folding or the emergence of folding intermediates, and compared them with classical descriptors, protein chain length and contact order. It has been found that when a descriptor is normalized to eliminate the influence of the protein size (the radius of gyration normalized to the radius of gyration of a ball of equal volume, the coefficient of compactness defined as the ratio of the accessible surface area of a protein to that of an ideal ball of equal volume, and relative contact order) it completely looses its ability to predict folding rates. On the other hand, when a descriptor correlates well with protein size (the radius of cross-section and absolute contact order in our consideration) then it correlates well with the logarithm of folding rates and separates reasonably well two-state folders from multi-state ones. The critical control for the performance of new descriptors demonstrated that the radius of cross-section has a somewhat higher predictive power (the correlation coefficient is 20.74) than size alone (the correlation coefficient is 20.65). So, we have shown that the numerical descriptors of the overall shape-geometry of protein structures are one of the important determinants of the protein-folding rate and mechanism.

Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures.

Guided by recent experimental results suggesting that protein-folding rates and mechanisms are determined largely by native-state topology, we develop a simple model for protein folding free-energy landscapes based on native-state structures. The configurations considered by the model contain one or two contiguous stretches of residues ordered as in the native structure with all other residues completely disordered; the free energy of each configuration is the difference between the entropic cost of ordering the residues, which depends on the total number of residues ordered and the length of the loop between the two ordered segments, and the favorable attractive interactions, which are taken to be proportional to the total surface area buried by the ordered residues in the native structure. Folding kinetics are modeled by allowing only one residue to become ordered/ disordered at a time, and a rigorous and exact method is used to identify free-energy maxima on the lowest free-energy paths connecting the fully disordered and fully ordered configurations. The distribution of structure in these free-energy maxima, which comprise the transition-state ensemble in the model, are reasonably consistent with experimental data on the folding transition state for five of seven proteins studied. Thus, the model appears to capture, at least in part, the basic physics underlying protein folding and the aspects of nativestate topology that determine protein-folding mechanisms.

De novo and inverse folding predictions of protein structure and dynamics

Journal of Computer-Aided Molecular Design, 1993

In the last two years, the use of simplified models has facilitated major progress in the globular protein folding problem, viz., the prediction of the three-dimensional (3D) structure of a globular protein from its amino acid sequence. A number of groups have addressed the inverse folding problem where one examines the compatibility of a given sequence with a given (and already determined) structure. A comparison of extant inverse protein-folding algorithms is presented, and methodologies for identifying sequences likely to adopt identical folding topologies, even when they lack sequence homology, are described. Extension to produce structural templates or fingerprints from idealized structures is discussed, and for eight-membered [3-barrel proteins, it is shown that idealized fingerprints constructed from simple topology diagrams can correctly identify sequences having the appropriate topology. Furthermore, this inverse folding algorithm is generalized to predict elements of supersecondary structure including 13-hairpins, helical hairpins and ~[3/c~ fragments. Then, we describe a very high coordination number lattice model that can predict the 3D structure of a number of globular proteins de novo; i.e. using just the amino acid sequence. Applications to sequences designed by DeGrado and co-workers [Biophys. J., 61 (1992) A265] predict folding intermediates, native states and relative stabilities in accord with experiment. The methodology has also been applied to the four-helix bundle designed by Richardson and co-workers [Science, 249 (1990) 884] and a redesigned monomeric version of a naturally occurring four-helix dimer, rop. Based on comparison to the rop dimer, the simulations predict conformations with rms values of 3-4 A. from native. Furthermore, the de novo algorithms can assess the stability of the folds predicted from the inverse algorithm, while the inverse folding algorithms can assess the quality of the de novo models. Thus, the synergism of the de novo and inverse folding algorithm approaches provides a set of complementary tools that will facilitate further progress on the protein-folding problem.

Directionality in protein fold prediction

2010

Background: Ever since the groundbreaking work of Anfinsen et al. in which a denatured protein was found to refold to its native state, it has been frequently stated by the protein fold prediction community that all the information required for protein folding lies in the amino acid sequence. Recent in vitro experiments and in silico computational studies, however, have shown that cotranslation may affect the folding pathway of some proteins, especially those of ancient folds. In this paper aspects of cotranslational folding have been incorporated into a protein structure prediction algorithm by adapting the Rosetta program to fold proteins as the nascent chain elongates. This makes it possible to conduct a pairwise comparison of folding accuracy, by comparing folds created sequentially from each end of the protein. Results: A single main result emerged: in 94% of proteins analyzed, following the sense of translation, from N-terminus to C-terminus, produced better predictions than following the reverse sense of translation, from the C-terminus to Nterminus. Two secondary results emerged. First, this superiority of N-terminus to C-terminus folding was more marked for proteins showing stronger evidence of cotranslation and second, an algorithm following the sense of translation produced predictions comparable to, and occasionally better than, Rosetta. Conclusions: There is a directionality effect in protein fold prediction. At present, prediction methods appear to be too noisy to take advantage of this effect; as techniques refine, it may be possible to draw benefit from a sequential approach to protein fold prediction.

Probing protein fold space with a simplified model

2008

We probe the stability and near-native energy landscape of protein fold space using powerful conformational sampling methods together with simple reduced models and statistical potentials. Fold space is represented by a set of 280 protein domains spanning all topological classes and having a wide range of lengths (0-300 residues), amino acid composition, and number of secondary structural elements. The degrees of freedom are taken as the loop torsion angles. This choice preserves the native secondary structure but allows the tertiary structure to change. The proteins are represented by three-point per residue, three-dimensional models with statistical potentials derived from a knowledge-based study of known protein structures. When this space is sampled by a combination of Parallel Tempering and Equi-Energy Monte Carlo, we find that the three-point model captures the known stability of protein native structures with stable energy basins that are near-native (all-α: 4.77 Å, all-β: 2.93 Å, α/β: 3.09 Å, α+β: 4.89 Å on average and within 6 Å for 71.41 %, 92.85 %, 94.29 % and 64.28 % for all-α, all-β, α/β and α+β, classes respectively). Denatured structures also occur and these have interesting structural properties that shed light on the different landscape characteristics of α and β folds. We find that α/β proteins with alternating α and β segments (such as the beta-barrel) are more stable than proteins in other fold classes.

Nature of Driving Force for Protein Folding: A Result From Analyzing the Statistical Potential

Physical Review Letters, 1997

In a statistical approach to protein structure analysis, Miyazawa and Jernigan (MJ) derived a 20 × 20 matrix of inter-residue contact energies between different types of amino acids. Using the method of eigenvalue decomposition, we find that the MJ matrix can be accurately reconstructed from its first two principal component vectors as Mij = C0 + C1(qi + qj) + C2qiqj , with constant C's, and 20 q values associated with the 20 amino acids. This regularity is due to hydrophobic interactions and a force of demixing, the latter obeying Hildebrand's solubility theory of simple liquids.

Comparison of two optimization methods to derive energy parameters for protein folding: Perceptron and Z score

Proteins: Structure, Function, and Genetics, 2000

Two methods were proposed recently to derive energy parameters from known native protein conformations and corresponding sets of decoys. One is based on nding, by means of a perceptron learning scheme, energy parameters such that the native conformations have lower energies than the decoys. The second method maximizes the di erence between the native energy and the average energy of the decoys, measured in terms of the width of the decoys' energy distribution (Z-score). Whereas the perceptron method is sensitive mainly to \outlier" (i.e. extremal) decoys, the Z-score optimization is governed by the high density regions in decoy-space. We compare the two methods by deriving contact energies for two very di erent sets of decoys; the rst obtained for model lattice proteins and the second by threading. We nd that the potentials derived by the two methods are of similar quality and fairly closely related. This nding indicates that standard, naturally occuring sets of decoys are distributed in a way that yields robust energy parameters (that are quite insensitive to the particular method used to derive them). The main practical implication of this nding is that it is not necessary to ne-tune the potential search method to the particular set of decoys used.