Improving the accuracy of protein secondary structure prediction using structural alignment (original) (raw)

A simple and fast approach to prediction of protein secondary structure from multiply aligned sequences with accuracy above 70%

Protein Science, 1995

To improve secondary structure predictions in protein sequences, the information residing in multiple sequence alignments of substituted but structurally related proteins is exploited. A database comprised of 70 protein families and a total of 2,500 sequences, some of which were aligned by tertiary structural superpositions, was used to calculate residue exchange weight matrices within a-helical, P-strand, and coil substructures, respectively. Secondary structure predictions were made based on the observed residue substitutions in local regions of the multiple alignments and the largest possible associated exchange weights in each of the three matrix types. Comparison of the observed and predicted secondary structure on a per-residue basis yielded a mean accuracy of 72.2%. Individual a-helix, P-strand, and coil states were respectively predicted at 66.4,66.7, and 75.8% correctness, representing a well-balanced three-state prediction. The accuracy level, verified by cross-validation through jack-knife tests on all protein families, dropped, on average, to only 70.9%, indicating the rigor of the prediction procedure. On the basis of robustness, conceptual clarity, accuracy, and executable efficiency, the method has considerable advantage, especially with its sole reliance on amino acid substitutions within structurally related proteins.

A Simple Comparison between Specific Protein Secondary Structure Prediction Tools

Tropical Agricultural Research, 2012

A comparative evaluation of five widely used protein secondary structure prediction programs available in World Wide Web was carried out. Secondary structure data of ten proteins containing 190 secondary structure motifs were collected from Protein Data Bank (PDB). The amino acid sequences of the proteins were then evaluated using GOR, PSIPRED, HNN, PROF, and YASPIN secondary structure prediction tools and the results were compared with the structural information obtained from PDB. The study reveals considerable differences between results obtained from each program. Within the limit of this comparative study, PSIPRED showed the highest prediction accuracy with 77 % accuracy in α helix prediction and 70 % accuracy in β strand prediction. Furthermore, the level of accuracy varied with the length of the secondary structure motifs. Highest accuracies were obtained for α helices of 16-20 amino acids and β strands of 7-9 amino acids in length. The results suggest that, among the most frequently used software programs available in World Wide Web, PSIPRED is the tool that gives the best results for secondary structure prediction.

Computational Methods for Protein Secondary Structure Prediction Using Multiple Sequence Alignments

Current Protein & Peptide Science, 2000

Efforts to use computers in predicting the secondary structure of proteins based only on primary structure information started over a quarter century ago [1-3]. Although the results were encouraging initially, the accuracy of the pioneering methods generally did not attain the level required for using predictions of secondary structures reliably in modelling the three-dimensional topology of proteins. During the last decade, however, the introduction of new computational techniques as well as the use of multiple sequence information has lead to a dramatic increase in the success rate of prediction methods, such that successful 3D modelling based on predicted secondary structure has become feasible [e.g., Ref 4]. This review is aimed at presenting an overview of the scale of the secondary structure prediction problem and associated pitfalls, as well as the history of the development of computational prediction methods. As recent successful strategies for secondary structure prediction all rely on multiple sequence information, some methods for accurate protein multiple sequence alignments will also be described. While the main focus is on prediction methods for globular proteins, also the prediction of trans-membrane segments within membrane proteins will be briefly summarised. Finally, an integrated iterative approach tying secondary structure prediction and multiple alignment will be introduced [5].

The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods

Computational Biology and Chemistry, 2004

All currently leading protein secondary structure prediction methods use a multiple protein sequence alignment to predict the secondary structure of the top sequence. In most of these methods, prior to prediction, alignment positions showing a gap in the top sequence are deleted, consequently leading to shrinking of the alignment and loss of position-specific information. In this paper we investigate the effect of this removal of information on secondary structure prediction accuracy. To this end, we have designed SymSSP, an algorithm that post-processes the predicted secondary structure of all sequences in a multiple sequence alignment by (i) making use of the alignment's evolutionary information and (ii) reintroducing most of the information that would otherwise be lost. The post-processed information is then given to a new dynamic programming routine that produces an optimally segmented consensus secondary structure for each of the multiple alignment sequences. We have tested our method on the state-of-the-art secondary structure prediction methods PHD, PROFsec, SSPro2 and JNET using the HOMSTRAD database of reference alignments. Our consensus-deriving dynamic programming strategy is consistently better at improving the segmentation quality of the predictions compared to the commonly used majority voting technique. In addition, we have applied several weighting schemes from the literature to our novel consensus-deriving dynamic programming routine. Finally, we have investigated the level of noise introduced by prediction errors into the consensus and show that predictions of edges of helices and strands are half the time wrong for all the four tested prediction methods.

PROTEUS2: a web server for comprehensive protein structure prediction and structure-based annotation

Nucleic acids research, 2008

PROTEUS2 is a web server designed to support comprehensive protein structure prediction and structure-based annotation. PROTEUS2 accepts either single sequences (for directed studies) or multiple sequences (for whole proteome annotation) and predicts the secondary and, if possible, tertiary structure of the query protein(s). Unlike most other tools or servers, PROTEUS2 bundles signal peptide identification, transmembrane helix prediction, transmembrane b-strand prediction, secondary structure prediction (for soluble proteins) and homology modeling (i.e. 3D structure generation) into a single prediction pipeline. Using a combination of progressive multi-sequence alignment, structurebased mapping, hidden Markov models, multicomponent neural nets and up-to-date databases of known secondary structure assignments, PROTEUS is able to achieve among the highest reported levels of predictive accuracy for signal peptides (Q2 = 94%), membrane spanning helices (Q2 = 87%) and secondary structure (Q3 score of 81.3%). PROTEUS2's homology modeling services also provide high quality 3D models that compare favorably with those generated by SWISS-MODEL and 3D JigSaw (within 0.2 Å RMSD). The average PROTEUS2 prediction takes »3 min per query sequence. The PROTEUS2 server along with source code for many of its modules is accessible a http:// wishart.biology.ualberta.ca/proteus2.

Alignments grow, secondary structure prediction improves

Proteins, 2002

Using information from sequence alignments significantly improves protein secondary structure prediction. Typically, more divergent profiles yield better predictions. Recently, various groups have shown that accuracy can be improved significantly by using PSI-BLAST profiles to develop new prediction methods. Here, we focused on the influences of various alignment strategies on two 8-year-old PHD methods. The following results stood out. (i) PHD using pairwise alignments predicts about 72% of all residues correctly in one of the three states: helix, strand, and other. Using larger databases and PSI-BLAST raised accuracy to 75%. (ii) More than 60% of the improvement originated from the growth of current sequence databases; about 20% resulted from detailed changes in the alignment procedure (substitution matrix, thresholds, and gap penalties). Another 20% of the improvement resulted from carefully using iterated PSI-BLAST searches. (iii) It is of interest that we failed to improve prediction accuracy further when attempting to refine the alignment by dynamic programming (MaxHom and ClustalW). (iv) Improvement through family growth appears to saturate at some point. However, most families have not reached this saturation. Hence, we anticipate that prediction accuracy will continue to rise with database growth.

Tools for Protein Structure Prediction at the bri-shur. com Web Portal

2012

Internet services on bioinformatics still remain a popular tool for the researchers. Here the authors present a recently developed web-site http://bri-shur.com where several tools and pipelines for protein structure prediction are implemented. The prediction of a structure for a particular protein often requires a sensitive and iterative approach, and the web-site provides an environment for this kind of work. Software that is used in the services includes both free programs available in the Internet and newly developed algorithms. The service on homology screening in PDB for a structure template is implemented using an approach that is alternative to well-known BLAST algorithm and it has some advantages over BLAST. The service on homology modeling uses well-known Nest program. The service on protein energy estimate allows selecting a best template in the set of homologs and adds a functionality of fold recognition to the environment. The design of the site simplifies several of the most useful bioinformatics routines, thus making them available to a large community of researchers. Services are provided free of charge without registration, and the user's privacy is taken care of.

PreSSAPro: A software for the prediction of secondary structure by amino acid properties

Computational Biology and Chemistry, 2007

PreSSAPro is a software, available to the scientific community as a free web service designed to provide predictions of secondary structures starting from the amino acid sequence of a given protein. Predictions are based on our recently published work on the amino acid propensities for secondary structures in either large but not homogeneous protein data sets, as well as in smaller but homogeneous data sets corresponding to protein structural classes, i.e. all-alpha, all-beta, or alpha-beta proteins. Predictions result improved by the use of propensities evaluated for the right protein class. PreSSAPro predicts the secondary structure according to the right protein class, if known, or gives a multiple prediction with reference to the different structural classes. The comparison of these predictions represents a novel tool to evaluate what sequence regions can assume different secondary structures depending on the structural class assignment, in the perspective of identifying proteins able to fold in different conformations. The service is available at the URL http://bioinformatica.isa.cnr.it/PRESSAPRO/.

PSSD: Protein Secondary Structure Database

proteins

Protein Secondary Structure Database (PSSD) is a database that incorporates sequences of secondary structure elements of all proteins which their three dimensional structures are defined by experimental methods such as NMR-Spectroscopy or X-Ray Crystallography and their structural data exists in Brookhaven protein databank. Dictionary of Secondary Structure of Proteins (DSSP) criteria have been used to define both ends of each structural element. At present PSSD includes 290,709 alpha helices, 418,362 beta strands, 571,176 turns and 118,109 helices 3(10) of 21,347 proteins. The following information is given for each entry: (i) PSSD Unique ID, (ii) Description, (iii), Organism source, (iv) Author(s), (v) PDB code, (vi) Cross references to PDB, DSSP and Swiss-Prot databanks, (vii) Sequence of secondary structure element, (viii) number of starting and ending amino acids of each element in its corresponding protein chain, (ix) length of element, (x) the number of the element in its regarding protein chain. A user friendly interface is developed for doing search in database using different combinations of fields mentioned above. Facilities provided in this database allow structure-sequence analysis studies faster, more reliable and suitable. Now, the database is located on IBB Bioinformatics Center (IBC) server. The interface can be accessed via: http://www.ibc.ut.ac.ir/pssd/.