GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy - PubMed (original) (raw)

GPS 2.0, a tool to predict kinase-specific phosphorylation sites in hierarchy

Yu Xue et al. Mol Cell Proteomics. 2008 Sep.

Abstract

Identification of protein phosphorylation sites with their cognate protein kinases (PKs) is a key step to delineate molecular dynamics and plasticity underlying a variety of cellular processes. Although nearly 10 kinase-specific prediction programs have been developed, numerous PKs have been casually classified into subgroups without a standard rule. For large scale predictions, the false positive rate has also never been addressed. In this work, we adopted a well established rule to classify PKs into a hierarchical structure with four levels, including group, family, subfamily, and single PK. In addition, we developed a simple approach to estimate the theoretically maximal false positive rates. The on-line service and local packages of the GPS (Group-based Prediction System) 2.0 were implemented in Java with the modified version of the Group-based Phosphorylation Scoring algorithm. As the first stand alone software for predicting phosphorylation, GPS 2.0 can predict kinase-specific phosphorylation sites for 408 human PKs in hierarchy. A large scale prediction of more than 13,000 mammalian phosphorylation sites by GPS 2.0 was exhibited with great performance and remarkable accuracy. Using Aurora-B as an example, we also conducted a proteome-wide search and provided systematic prediction of Aurora-B-specific substrates including protein-protein interaction information. Thus, the GPS 2.0 is a useful tool for predicting protein phosphorylation sites and their cognate kinases and is freely available on line.

PubMed Disclaimer

Figures

F<sc>ig</sc>. 1.

Fig. 1.

The training data could be reused several times and included in different PK clusters based on their cognate PKs information.

F<sc>ig</sc>. 2.

Fig. 2.

The basic idea of the Group-based Phosphorylation Scoring algorithm. The gray dots represent the positive sites. The nearer distances indicate higher similarity scores between two sites. Given a putative PSP(7, 7) peptide, we can calculate its score. Then we can judge whether the given site is a potentially real phosphorylation site under different thresholds.

F<sc>ig</sc>. 3.

Fig. 3.

A simple method of matrix mutation.

F<sc>ig</sc>. 4.

Fig. 4.

The process of construction of GPS 2.0 software. The training data were taken from the Phospho.ELM 6.0 database. All sites with kinase information were retained. Then these verified sites with their kinases were separated into a hierarchical structure with four levels, including group, family, subfamily, and single PK. The modified version of Group-based Phosphorylation Scoring algorithm was used. The matrix mutation was used to improve the robustness of the prediction system. Then we set the high, medium, and low thresholds based on the calculated FPR for each PK cluster. Finally GPS 2.0 was implemented in Java as the first stand alone software for computational phosphorylation.

F<sc>ig</sc>. 5.

Fig. 5.

The screen snapshot of GPS 2.0 software. As an example, the protein sequence of rat Spinophilin was adopted. And the prediction results of PKA-specific sites with medium threshold are shown. DMPK, myotonic dystrophy protein kinase; PKC, protein kinase C; PKG, protein kinase G; RSK, ribosomal S6 kinase; SGK, serum- and glucocorticoid-regulated protein kinase; TKL, tyrosine kinase-like.

F<sc>ig</sc>. 6.

Fig. 6.

Comparison of various scoring matrices. Self, self-consistency. The BLOSUM62 matrix was adopted to balance the prediction performance and robustness of GPS 2.0.

F<sc>ig</sc>. 7.

Fig. 7.

Prediction performances before and after matrix mutations. For instance, we randomly chose 12 PK clusters to compare the performances. Usually the leave-one-out validations will be improved significantly. But the self-consistencies were only enhanced moderately. Thus, the process of matrix mutation improved both performance and robustness of GPS 2.0. MM, matrix mutation; Self, self-consistency; PKC, protein kinase C; RSK, ribosomal S6 kinase; MAPKAPK, MAPK-activated protein kinase.

References

    1. Zhou, F. F., Xue, Y., Yao, X., and Xu, Y. ( 2006) A general user interface for prediction servers of proteins’ post-translational modification sites. Nat. Protocol. 1, 1318–1321 - PubMed
    1. Iakoucheva, L. M., Radivojac, P., Brown, C. J., O'Connor, T. R., Sikes, J. G., Obradovic, Z., and Dunker, A. K. ( 2004) The importance of intrinsic disorder for protein phosphorylation. Nucleic Acids Res. 32, 1037–1049 - PMC - PubMed
    1. Blom, N., Gammeltoft, S., and Brunak, S. ( 1999) Sequence and structure-based prediction of eukaryotic protein phosphorylation sites. J. Mol. Biol. 294, 1351–1362 - PubMed
    1. Ingrell, C. R., Miller, M. L., Jensen, O. N., and Blom, N. ( 2007) NetPhosYeast: prediction of protein phosphorylation sites in yeast. Bioinformatics 23, 895–897 - PubMed
    1. Tang, Y. R., Chen, Y. Z., Canchaya, C. A., and Zhang, Z. ( 2007) GANNPhos: a new phosphorylation site predictor based on a genetic algorithm integrated neural network. Protein Eng. Des. Sel. 20, 405–412 - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources