An entropy criterion to detect minimally frustrated intermediates in native proteins (original) (raw)

A Data Base of Minimally Frustrated Alpha-Helical Segments Extracted from Proteins According to an Entropy Criterion

1999

A data base of minimally frustrated alpha helical segments is defined by filtering a set comprising 822 non redundant proteins, which contain 4783 alpha helical structures. The data base definition is performed using a neural networkbased alpha helix predictor, whose outputs are rated according to an entropy criterion. A comparison with the presently available experimental results indicates that a subset of the data base contains the initiation sites of protein folding experimentally detected and also protein fragments which fold into stable isolated alpha helices. This suggests the usage of the data base (and/or of the predictor) to highlight patterns which govern the stability of alpha helices in proteins and the helical behavior of isolated protein fragments.

Dynamics of the minimally frustrated helices determine the hierarchical folding of small helical proteins

Physical Review E, 2004

In this paper we aim at determining the key residues of small helical proteins in order to build up reduced models of the folding dynamics. We start by arguing that the folding process can be dissected into concurrent fast and slow dynamics. The fast events are the quasiautonomous coil-to-helix transitions occurring in the minimally frustrated initiation sites of folding in the early stages of the process. The slow processes consist in the docking of the fluctuating helices formed in these critical sites. We show that a neural network devised to predict native secondary structures from sequence can be used to estimate the probabilities of formation of these helical traits as they are embedded in the protein. The resulting probabilities are shown to correlate well with the experimental helicities measured in the same isolated peptides. The relevance of this finding to the hierarchical character of folding is confirmed within the framework of a diffusion-collision-like mechanism. We demonstrate that thermodynamic and topological features of these critical helices allow accurate estimation of the folding times of five proteins that have been kinetically studied. This suggests that these critical helices determine the fundamental events of the whole folding process. A remarkable feature of our model is that not all of the native helices are eligible as critical helices, whereas the whole set of the native helices has been used so far in other reconstructions of the folding mechanism. This stresses that the minimally frustrated helices of these helical proteins comprise the minimal set of determinants of the folding process.

Protein secondary structure: Entropy, correlations and prediction

2003

Motivation: Is protein secondary structure primarily determined by local interactions between residues closely spaced along the amino acid backbone or by non-local tertiary interactions? To answer this question, we measure the entropy densities of primary and secondary structure sequences, and the local inter-sequence mutual information density. Results: We find that the important inter-sequence interactions are short ranged, that correlations between neighboring amino acids are essentially uninformative and that only one-fourth of the total information needed to determine the secondary structure is available from local inter-sequence correlations. These observations support the view that the majority of most proteins fold via a cooperative process where secondary and tertiary structure form concurrently. Moreover, existing single-sequence secondary structure prediction algorithms are almost optimal, and we should not expect a dramatic improvement in prediction accuracy. Availability: Both the data sets and analysis code are freely available from our Web site at http://

proteins Analyzing the effect of homogeneous frustration in protein folding

The energy landscape theory has been an invaluable theoretical framework in the understanding of biological processes such as protein folding, oligomerization, and functional transitions. According to the theory, the energy landscape of protein folding is funneled toward the native state, a conformational state that is consistent with the principle of minimal frustration. It has been accepted that real proteins are selected through natural evolution, satisfying the minimum frustration criterion. However, there is evidence that a low degree of frustration accelerates folding. We examined the interplay between topological and energetic protein frustration. We employed a C a structure-based model for simulations with a controlled nonspecific energetic frustration added to the potential energy function. Thermodynamics and kinetics of a group of 19 proteins are completely characterized as a function of increasing level of energetic frustration. We observed two well-separated groups of proteins: one group where a little frustration enhances folding rates to an optimal value and another where any energetic frustration slows down folding. Protein energetic frustration regimes and their mechanisms are explained by the role of non-native contact interactions in different folding scenarios. These findings strongly correlate with the protein freeenergy folding barrier and the absolute contact order parameters. These computational results are corroborated by principal component analysis and partial least square techniques. One simple theoretical model is proposed as a useful tool for experimentalists to predict the limits of improvements in real proteins.

On the Convergence of Protein Structure and Dynamics. Statistical Learning Studies of Pseudo Folding Pathways

Lecture Notes in Computer Science, 2008

Many algorithms that attempt to predict proteins' native structure from sequence need to generate a large set of hypotheses in order to ensure that nearly correct structures are included, leading to the problem of assessing the quality of alternative 3D conformations. This problem has been mostly approached by focusing on the final 3D conformation, with machine learning techniques playing a leading role. We argue in this paper that additional information for recognising nativelike structures can be obtained by regarding the final conformation as the result of a generative process reminiscent of the folding process that generates structures in nature. We introduce a coarse representation of protein pseudo-folding based on binary trees and introduce a kernel function for assessing their similarity. Kernel-based analysis techniques empirically demonstrate a significant correlation between information contained into pseudo-folding trees and features of native folds in a large and non-redundant set of proteins.

Neural networks to study invariant features of protein folding

Theoretical Chemistry Accounts: Theory, Computation, and Modeling (Theoretica Chimica Acta), 1999

Protein secondary structures result both from short-range and long-range interactions. Here neural networks are used to implement a procedure to detect regions of the protein backbone where local interactions have an overwhelming eect in determining the formation of stretches in a-helical conformation. Within the framework of a modular view of protein folding we have argued that these structures correspond to the initiation sites of folding. The hypothesis to be tested in this paper is that sequence identity beside ensuring similarity of the three-dimensional conformation also entails similar folding mechanisms. In particular, we compare the location and sequence variability of the initiation sites extracted from a set of proteins homologous to horse heart cytochrome . We present evidence that the initiation sites conserve their position in the aligned sequences and exhibit a more reduced variability in the residue composition than the rest of the protein.

Analyzing the effect of homogeneous frustration in protein folding

The energy landscape theory has been an invaluable theoretical framework in the understanding of biological processes such as protein folding, oligomerization and functional transitions. According to the theory, the energy landscape of protein folding is funneled towards the native state, a conformational state that is consistent with the principle of minimal frustration. It has been accepted that real proteins are selected through natural evolution satisfying the minimum frustration criterion. However, there is evidence that a low degree of frustration accelerates folding. We examined the interplay between topological and energetic protein frustration. We employed a Cα structure-based model for simulations with a controlled nonspecific energetic frustration added to the potential energy function. Thermodynamics and kinetics of a group of nineteen proteins are completely characterized as a function of increasing level of energetic frustration. We observed two well separated groups of proteins: one group where a little frustration enhances folding rates to an optimal value and another where any energetic frustration slows down folding. Protein energetic frustration regimes and their mechanisms are explained by the role of nonnative contact interactions in different folding scenarios. These findings strongly correlate with the protein free energy folding barrier and the absolute contact order parameters. These computational results are corroborated by Principal Component Analysis (PCA) and Partial Least Square (PLS) techniques. One simple theoretical model is proposed as a useful tool for experimentalists to predict the limits of improvements in real proteins.

De novo prediction of protein folding pathways and structure using the principle of sequential stabilization

Proceedings of the National Academy of Sciences, 2012

Motivated by the relationship between the folding mechanism and the native structure, we develop a unified approach for predicting folding pathways and tertiary structure using only the primary sequence as input. Simulations begin from a realistic unfolded state devoid of secondary structure and use a chain representation lacking explicit side chains, rendering the simulations many orders of magnitude faster than molecular dynamics simulations. The multiple round nature of the algorithm mimics the authentic folding process and tests the effectiveness of sequential stabilization (SS) as a search strategy wherein 2° structural elements add onto existing structures in a process of progressive learning and stabilization of structure found in prior rounds of folding. Because no a priori knowledge is used, we can identify kinetically significant non-native interactions and intermediates, sometimes generated by only two mutations, while the evolution of contact matrices is often consistent...