PreSSAPro: A software for the prediction of secondary structure by amino acid properties (original) (raw)

Comparison of amino acid occurrence and composition for predicting protein folds

2006

Background: Prediction of protein three-dimensional structures from amino acid sequences is a long-standing goal in computational/molecular biology. The successful discrimination of protein folds would help to improve the accuracy of protein 3D structure prediction. Results: In this work, we propose a method based on linear discriminant analysis (LDA) for recognizing proteins belonging to 30 different folds using the occurrence of amino acid residues in a set of 1612 proteins. The present method could discriminate the globular proteins from 30 major folding types with the sensitivity of 37%, which is comparable to or better than other methods in the literature. A web server has been developed for predicting the folding type of the protein from amino acid sequence and it is available at http://granular.com/PROLDA/. Conclusions: Linear discriminant analysis based on amino acid occurrence could successfully recognize protein folds. The present method has several advantages such as, (i) it directly predicts the folding type of a protein without performing pair-wise comparisons, (ii) it can discriminate folds among large number of proteins and (iii) it is very fast to obtain the results. This is a simple method, which can be easily incorporated in any other structure prediction algorithms.

A Simple Comparison between Specific Protein Secondary Structure Prediction Tools

Tropical Agricultural Research, 2012

A comparative evaluation of five widely used protein secondary structure prediction programs available in World Wide Web was carried out. Secondary structure data of ten proteins containing 190 secondary structure motifs were collected from Protein Data Bank (PDB). The amino acid sequences of the proteins were then evaluated using GOR, PSIPRED, HNN, PROF, and YASPIN secondary structure prediction tools and the results were compared with the structural information obtained from PDB. The study reveals considerable differences between results obtained from each program. Within the limit of this comparative study, PSIPRED showed the highest prediction accuracy with 77 % accuracy in α helix prediction and 70 % accuracy in β strand prediction. Furthermore, the level of accuracy varied with the length of the secondary structure motifs. Highest accuracies were obtained for α helices of 16-20 amino acids and β strands of 7-9 amino acids in length. The results suggest that, among the most frequently used software programs available in World Wide Web, PSIPRED is the tool that gives the best results for secondary structure prediction.

Distributions of amino acids suggest that certain residue types more effectively determine protein secondary structure

Journal of Molecular Modeling, 2013

Exponential growth in the number of available protein sequences is unmatched by the slower growth in the number of structures. As a result, the development of efficient and fast protein secondary structure prediction methods is essential for the broad comprehension of protein structures. Computational methods that can efficiently determine secondary structure can in turn facilitate protein tertiary structure prediction, since most methods rely initially on secondary structure predictions. Recently, we have developed a fast learning optimized prediction methodology (FLOPRED) for predicting protein secondary structure (Saraswathi et al. in JMM 18:4275, 2012). Data are generated by using knowledge-based potentials combined with structure information from the CATH database. A neural network-based extreme learning machine (ELM) and advanced particle swarm optimization (PSO) are used with this data to obtain better and faster convergence to more accurate secondary structure predicted results. A five-fold crossvalidated testing accuracy of 83.8 % and a segment overlap (SOV) score of 78.3 % are obtained in this study. Secondary structure predictions and their accuracy are usually presented for three secondary structure elements: α-helix, β-strand and coil but rarely have the results been analyzed with respect to their constituent amino acids. In this paper, we use the results obtained with FLOPRED to provide detailed behaviors for different amino acid types in the secondary structure prediction. We investigate the influence of the composition, physico-chemical properties and position specific occurrence preferences of amino acids within secondary structure elements. In addition, we identify the correlation between these properties and prediction accuracy. The present detailed results suggest several important ways that secondary structure predictions can be improved in the future that might lead to improved protein design and engineering.

Amino acid propensities for secondary structures are influenced by the protein structural class

Biochemical and Biophysical Research Communications, 2006

Amino acid propensities for secondary structures were used since the 1970s, when Chou and Fasman evaluated them within datasets of few tens of proteins and developed a method to predict secondary structure of proteins, still in use despite prediction methods having evolved to very different approaches and higher reliability. Propensity for secondary structures represents an intrinsic property of amino acid, and it is used for generating new algorithms and prediction methods, therefore our work has been aimed to investigate what is the best protein dataset to evaluate the amino acid propensities, either larger but not homogeneous or smaller but homogeneous sets, i.e., alla, all-b, a-b proteins. As a first analysis, we evaluated amino acid propensities for helix, b-strand, and coil in more than 2000 proteins from the PDBselect dataset. With these propensities, secondary structure predictions performed with a method very similar to that of Chou and Fasman gave us results better than the original one, based on propensities derived from the few tens of X-ray protein structures available in the 1970s. In a refined analysis, we subdivided the PDBselect dataset of proteins in three secondary structural classes, i.e., all-a, all-b, and a-b proteins. For each class, the amino acid propensities for helix, b-strand, and coil have been calculated and used to predict secondary structure elements for proteins belonging to the same class by using resubstitution and jackknife tests. This second round of predictions further improved the results of the first round. Therefore, amino acid propensities for secondary structures became more reliable depending on the degree of homogeneity of the protein dataset used to evaluate them. Indeed, our results indicate also that all algorithms using propensities for secondary structure can be still improved to obtain better predictive results.

Prediction of secondary structures of proteins using a two-stage method

Computers & Chemical …, 2008

Protein structure determination and prediction has been a focal research subject in life sciences due to the importance of protein structure in understanding the biological and chemical activities of organisms. The experimental methods used to determine the structures of proteins demand sophisticated equipment and time. A host of computational methods are developed to predict the location of secondary structure elements in proteins for complementing or creating insights into experimental results. However, prediction accuracies of these methods rarely exceed 70%. In this paper, a novel two-stage method to predict the location of secondary structure elements in a protein using the primary structure data only is presented. In the first stage of the proposed method, the folding type of a protein is determined using a novel classification approach for multi-class problems. The second stage of the method utilizes data available in the Protein Data Bank and determines the possible location of secondary structure elements in a probabilistic search algorithm. It is shown that the average accuracy of the predictions is 74.1% on a large structure dataset.

Protein Secondary Structure Prediction: A Review of Progress and Directions

Current Bioinformatics

Background: Over the last few decades, a search for the theory of protein folding has grown into a full-fledged research field at the intersection of biology, chemistry and informatics. Despite enormous effort, there are still open questions and challenges, like understanding the rules by which amino acid sequence determines protein secondary structure. Objective: In this review, we depict the progress of the prediction methods over the years and identify sources of improvement. Methods: The protein secondary structure prediction problem is described followed by the discussion on theoretical limitations, description of the commonly used data sets, features and a review of three generations of methods with the focus on the most recent advances. Additionally, methods with available online servers are assessed on the independent data set. Results: The state-of-the-art methods are currently reaching almost 88% for 3-class prediction and 76.5% for an 8-class prediction. Conclusion: This ...

Experimental Evaluation of Protein Secondary Structure Predictors

Lecture Notes in Computer Science, 2009

Understanding protein biological function is a key issue in modern biology, which is largely determined by its 3D shape. Protein 3D shape, in its turn, is functionally implied by its amino acid sequence. Since the direct inspection of such 3D structures is rather expensive and time consuming, a number of software techniques have been developed in the last few years that predict a spatial model, either of the secondary or of the tertiary form, for a given target protein starting from its amino acid sequence. This paper offers a comparison of several available automatic secondary structure prediction tools. The comparison is of the experimental kind, where two relevant sets of proteins, a non-redundant one including 100 elements, and a 180-protein set taken from the CASP 6 contest, were used as test cases. Comparisons have been based on evaluating standard quality measures, such as the Q3 and SOV.

Coupled prediction of protein secondary and tertiary structure

The strong coupling between secondary and tertiary structure formation in protein folding is neglected in most structure prediction methods. In this work we investigate the extent to which nonlocal interactions in predicted tertiary structures can be used to improve secondary structure prediction. The architecture of a neural network for secondary structure prediction that utilizes multiple sequence alignments was extended to accept low-resolution nonlocal tertiary structure information as an additional input. By using this modified network, together with tertiary structure information from native structures, the Q3-prediction accuracy is increased by 7-10% on average and by up to 35% in individual cases for independent test data. By using tertiary structure information from models generated with the ROSETTA de novo tertiary structure prediction method, the Q3-prediction accuracy is improved by 4 -5% on average for small and medium-sized single-domain proteins. Analysis of proteins with particularly large improvements in secondary structure prediction using tertiary structure information provides insight into the feedback from tertiary to secondary structure.