Influence of assignment on the prediction of transmembrane helices in protein structures (original) (raw)
Related papers
Transmembrane helices predicted at 95% accuracy
1995
We describe a neural network system that predicts the locations of transmembrane helices in integral membrane proteins. By using evolutionary information as input to the network system, the method significantly improved on a previously published neural network prediction method that had been based on single sequence information.
Transmembrane helix prediction: a comparative evaluation and analysis
2005
The prediction of transmembrane (TM) helices plays an important role in the study of membrane proteins, given the relatively small number (0.5% of the PDB) of highresolution structures for such proteins. We used two datasets (one redundant and one non-redundant) of high-resolution structures of membrane proteins to evaluate and analyse TM helix prediction. The redundant (non-redundant) dataset contains structure of 434 (268) TM helices, from 112 (73) polypeptide chains. Of the 434 helices in the dataset, 20 may be classified as 'half-TM' as they are too short to span a lipid bilayer. We compared 13 TM helix prediction methods, evaluating each method using per segment, per residue and termini scores. Four methods consistently performed well: SPLIT4, TMHMM2, HMMTOP2 and TMAP. However, even the best methods were in error by, on average, about two turns of helix at the TM helix termini. The best and worst case predictions for individual proteins were analysed. In particular, the performance of the various methods and of a consensus prediction method, were compared for a number of proteins (e.g. SecY, ClC, KvAP) containing half-TM helices. The difficulties of predicting half-TM helices suggests that current prediction methods successfully embody the two-state model of membrane protein folding, but do not accommodate a third stage in which, e.g., short helices and re-entrant loops fold within a bundle of stable TM helices.
Method for Predicting Transmembrane Helices in Protein Sequences
International journal of engineering research and technology, 2018
The increasing protein sequences from the genome project require the oretical methods to predict transmembrane helical segments (TMHs). So far, several prediction methods have been reported, but there are some deficiencies in prediction accuracy and adaptability in these methods. Here, a method based on discrete wavelet transform (DWT) has been developed to predict the number and location of TMHs in membrane proteins,80 proteins with known 3D structure from Mptopo database are chosen at random as data sets (including 325 TMHs).TMHs prediction is carried out for the membrane protein sequences and obtain satisfactory result. To verify the feasibility of this method, 80 membrane protein sequences are treated as test sets, 308 TMHs can be predicted and the prediction accuracy is 96.3%Compared with the other prediction results , the obtained results indicate that the proposed method has higher prediction accuracy.
Topology Prediction of Helical Transmembrane Proteins: How Far Have We Reached?
Current Protein & Peptide Science, 2010
Transmembrane protein topology prediction methods play important roles in structural biology, because the structure determination of these types of proteins is extremely difficult by the common biophysical, biochemical and molecular biological methods. The need for accurate prediction methods is high, as the number of known membrane protein structures fall far behind the estimated number of these proteins in various genomes. The accuracy of these prediction methods appears to be higher than most prediction methods applied on globular proteins, however it decreases slightly with the increasing number of structures. Unfortunately, most prediction algorithms use common machine learning techniques, and they do not reveal why topologies are predicted with such a high success rate and which biophysical or biochemical properties are important to achieve this level of accuracy. Incorporating topology data determined so far into the prediction methods as constraints helps us to reach even higher prediction accuracy, therefore collection of such topology data is also an important issue.
Journal of Structural Biology, 2004
Transmembrane proteins make up at least one-fifth of the genome of most organisms and are critical components of key pathways for cell survival and interactions with the environment. The function of helices found at the membrane surface in transmembrane proteins has not been greatly explored, but it is likely that they play an ancillary role to membrane spanning helices and are analogous to the surface active helices of peripheral membrane proteins, being involved in: lipid association, membrane perturbation, transmembrane signal transduction and regulation, and transmembrane helical bundle formation. Due to the difficulties in obtaining high-resolution structural data for this class of proteins, structure-from-sequence predictive methods continue to be developed as a means to obtain structural models for these largely intractable systems. A simple but effective variant of the hydrophobic moment analysis of amino acid sequences is described here as part of a protocol for distinguishing helical sequences that are parallel to or ÔhorizontalÕ at the membrane bilayer/aqueous phase interface from helices that are membrane-embedded or located in extra-membranous domains. This protocol when tested on transmembrane spanning protein amino acid sequences not used in its development, was found to be 84-91% accurate when the results were compared to the partition locations in the corresponding structures determined by X-ray crystallography, and 72% accurate in determining which helices lie horizontal or near horizontal at the lipid interface.
Protein Science, 2003
Helices in membrane spanning regions are more tightly packed than the helices in soluble proteins. Thus, we introduce a method that uses a simple scale of burial propensity and a new algorithm to predict transmembrane helical (TMH) segments and a positive-inside rule to predict amino-terminal orientation. The method (the topology predictor of transmembrane helical proteins using mean burial propensity [THUMBUP]) correctly predicted the topology of 55 of 73 proteins (or 75%) with known three-dimensional structures (the 3D helix database). This level of accuracy can be reached by MEMSAT 1.8 (a 200-parameter model-recognition method) and a new HMM-based method (a 111-parameter hidden Markov model, UMDHMM TMHP ) if they were retrained with the 73-protein database. Thus, a method based on a physiochemical property can provide topology prediction as accurate as those methods based on more complicated statistical models and learning algorithms for the proteins with accurately known structures. Commonly used HMM-based methods and MEMSAT 1.8 were trained with a combination of the partial 3D helix database and a 1D helix database of TMH proteins in which topology information were obtained by gene fusion and other experimental techniques. These methods provide a significantly poorer prediction for the topology of TMH proteins in the 3D helix database. This suggests that the 1D helix database, because of its inaccuracy, should be avoided as either a training or testing database. A Web server of THUMBUP and UMDHMM TMHP is established for academic users at http://www.smbs.buffalo.edu/phys\_bio/service.htm. The 3D helix database is also available from the same Web site.
Bioinformatics, 2004
The dearth of structural data on α-helical membrane proteins (MPs) has hampered thus far the development of reliable knowledge-based potentials that can be used for automatic prediction of transmembrane (TM) protein structure. While algorithms for identifying TM segments are available, modeling of the TM domains of α-helical MPs involves assembling the segments into a bundle. This requires the correct assignment of the buried and lipid-exposed faces of the TM domains. Results: A recent increase in the number of crystal structures of α-helical MPs has enabled an analysis of the lipid-exposed surfaces and the interiors of such molecules on the basis of structure, rather than sequence alone. Together with a conservation criterion that is based on previous observations that conserved residues are mostly found in the interior of MPs, the bias of certain residue types to be preferably buried or exposed is proposed as a criterion for predicting the lipid-exposed and interior faces of TMs. Applications to known structures demonstrates 80% accuracy of this prediction algorithm. Availability: The algorithm used for the predictions is implemented in the ProperTM Web server
Transmembrane helix (TMH) topology prediction is becoming a focal problem in bioinformatics because the structure of TM proteins is difficult to determine using experimental methods. Therefore, methods that can computationally predict the topology of helical membrane proteins are highly desirable. In this paper we introduce TMHindex, a method for detecting TMH segments using only the amino acid sequence information. Each amino acid in a protein sequence is represented by a Compositional Index, which is deduced from a combination of the difference in amino acid occurrences in TMH and non-TMH segments in training protein sequences and the amino acid composition information. Furthermore, a genetic algorithm was employed to find the optimal threshold value for the separation of TMH segments from non-TMH segments. The method successfully predicted 376 out of the 378 TMH segments in a dataset consisting of 70 test protein sequences. The sensitivity and specificity for classifying each amino acid in every protein sequence in the dataset was 0.901 and 0.865, respectively. To assess the generality of TMHindex, we also tested the approach on another standard 73-protein 3D helix dataset. TMHindex correctly predicted 91.8% of proteins based on TM segments. The level of the accuracy achieved using TMHindex in comparison to other recent approaches for predicting the topology of TM proteins is a strong argument in favor of our proposed method. Availability: The datasets, software together with supplementary materials are available at:
Protein Science, 1996
Previously, we introduced a neural network system predicting locations of transmembrane helices (HTMs) based on evolutionary profiles (PHDhtm, Rost B, Casadio R, Fariselli P, Sander C, 1995, Protein Sci 4:521–533). Here, we describe an improvement and an extension of that system. The improvement is achieved by a dynamic programming-like algorithm that optimizes helices compatible with the neural network output. The extension is the prediction of topology (orientation of first loop region with respect to membrane) by applying to the refined prediction the observation that positively charged residues are more abundant in extra-cytoplasmic regions. Furthermore, we introduce a method to reduce the number of false positives, i.e., proteins falsely predicted with membrane helices. The evaluation of prediction accuracy is based on a cross-validation and a double-blind test set (in total 131 proteins). The final method appears to be more accurate than other methods published: (1) For almost 89% (π3%) of the test proteins, all HTMs are predicted correctly. (2) For more than 86% (π3%) of the proteins, topology is predicted correctly. (3) We define reliability indices that correlate with prediction accuracy: for one half of the proteins, segment accuracy raises to 98%; and for two-thirds, accuracy of topology prediction is 95%. (4) The rate of proteins for which HTMs are predicted falsely is below 2% (π1%). Finally, the method is applied to 1,616 sequences of Haemophilus influenzae. We predict 19% of the genome sequences to contain one or more HTMs. This appears to be lower than what we predicted previously for the yeast VIII chromosome (about 25%).
Structure-based statistical analysis of transmembrane helices
European Biophysics Journal, 2013
Recent advances in determination of the highresolution structure of membrane proteins now enable analysis of the main features of amino acids in transmembrane (TM) segments in comparison with amino acids in water-soluble helices. In this work, we conducted a large-scale analysis of the prevalent locations of amino acids by using a data set of 170 structures of integral membrane proteins obtained from the MPtopo database and 930 structures of water-soluble helical proteins obtained from the protein data bank. Large hydrophobic amino acids (Leu, Val, Ile, and Phe) plus Gly were clearly prevalent in TM helices whereas polar amino acids (Glu, Lys, Asp, Arg, and Gln) were less frequent in this type of helix. The distribution of amino acids along TM helices was also examined. As expected, hydrophobic and slightly polar amino acids are commonly found in the hydrophobic core of the membrane whereas aromatic (Trp and Tyr), Pro, and the hydrophilic amino acids (Asn, His, and Gln) occur more frequently in the interface regions. Charged amino acids are also statistically prevalent outside the hydrophobic core of the membrane, and whereas acidic amino acids are frequently found at both cytoplasmic and extra-cytoplasmic interfaces, basic amino acids cluster at the cytoplasmic interface. These results strongly support the experimentally demonstrated biased distribution of positively charged amino acids (that is, the so-called the positive-inside rule) with structural data.