Computational identification of ubiquitylation sites from protein sequences - PubMed (original) (raw)
Computational identification of ubiquitylation sites from protein sequences
Chun-Wei Tung et al. BMC Bioinformatics. 2008.
Abstract
Background: Ubiquitylation plays an important role in regulating protein functions. Recently, experimental methods were developed toward effective identification of ubiquitylation sites. To efficiently explore more undiscovered ubiquitylation sites, this study aims to develop an accurate sequence-based prediction method to identify promising ubiquitylation sites.
Results: We established an ubiquitylation dataset consisting of 157 ubiquitylation sites and 3676 putative non-ubiquitylation sites extracted from 105 proteins in the UbiProt database. This study first evaluates promising sequence-based features and classifiers for the prediction of ubiquitylation sites by assessing three kinds of features (amino acid identity, evolutionary information, and physicochemical property) and three classifiers (support vector machine, k-nearest neighbor, and NaïveBayes). Results show that the set of used 531 physicochemical properties and support vector machine (SVM) are the best kind of features and classifier respectively that their combination has a prediction accuracy of 72.19% using leave-one-out cross-validation.Consequently, an informative physicochemical property mining algorithm (IPMA) is proposed to select an informative subset of 531 physicochemical properties. A prediction system UbiPred was implemented by using an SVM with the feature set of 31 informative physicochemical properties selected by IPMA, which can improve the accuracy from 72.19% to 84.44%. To further analyze the informative physicochemical properties, a decision tree method C5.0 was used to acquire if-then rule-based knowledge of predicting ubiquitylation sites. UbiPred can screen promising ubiquitylation sites from putative non-ubiquitylation sites using prediction scores. By applying UbiPred, 23 promising ubiquitylation sites were identified from an independent dataset of 3424 putative non-ubiquitylation sites, which were also validated by using the obtained prediction rules.
Conclusion: We have proposed an algorithm IPMA for mining informative physicochemical properties from protein sequences to build an SVM-based prediction system UbiPred. UbiPred can predict ubiquitylation sites accompanied with a prediction score each to help biologists in identifying promising sites for experimental verification. UbiPred has been implemented as a web server and is available at http://iclab.life.nctu.edu.tw/ubipred.
Figures
Figure 1
Performance comparisons among various classifiers with the three kinds of features. (a) physicochemical property, (b) amino acid identity, and (c) evolutionary information.
Figure 2
The sequence logo of the 151 positive samples with w = 21. (a) information content and (b) frequency plot.
Figure 3
Performance comparisons between the SVM with informative physicochemical properties (SVM+IPCP) and other compared classifiers.
Figure 4
The best 10-CV accuracies of prediction using SVM with the window size 21 for various numbers of features (properties) selected by IPMA from 30 independent runs.
Figure 5
The derived decision tree by using C5.0 and the features of informative physicochemical properties for classification of ubiquitylation sites.
Figure 6
The system flow of the prediction server UbiPred.
Figure 7
Performance comparison of SVM with various features, informative physicochemical properties (UbiPred), amino acid identity, evolutionary information, and all physicochemical properties, in terms of receiver operating characteristic curves.
Figure 8
The schema for illustrating the training data (302 samples) and the independent dataset (3424 putative non-ubiquitylation sites) using w = 21 as an example.
Figure 9
Histogram result of UbiPred using prediction scores from evaluating 3424 putative non-ubiquitylation sites in an independent dataset. The site with a score close to 1 has a high possibility to be an ubiquitylation site.
Figure 10
The sequence logo of the 23 peptides of promising ubiquitylation sites with w = 21. (a) Information content and (b) Frequency plot.
Similar articles
- UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.
Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. Huang CH, et al. BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z. BMC Syst Biol. 2016. PMID: 26818456 Free PMC article. - POPI: predicting immunogenicity of MHC class I binding peptides by mining informative physicochemical properties.
Tung CW, Ho SY. Tung CW, et al. Bioinformatics. 2007 Apr 15;23(8):942-9. doi: 10.1093/bioinformatics/btm061. Epub 2007 Mar 24. Bioinformatics. 2007. PMID: 17384427 - Prediction and analysis of antibody amyloidogenesis from sequences.
Liaw C, Tung CW, Ho SY. Liaw C, et al. PLoS One. 2013;8(1):e53235. doi: 10.1371/journal.pone.0053235. Epub 2013 Jan 7. PLoS One. 2013. PMID: 23308169 Free PMC article. - Automatic prediction of protein function.
Rost B, Liu J, Nair R, Wrzeszczynski KO, Ofran Y. Rost B, et al. Cell Mol Life Sci. 2003 Dec;60(12):2637-50. doi: 10.1007/s00018-003-3114-8. Cell Mol Life Sci. 2003. PMID: 14685688 Free PMC article. Review. - DrugHybrid_BS: Using Hybrid Feature Combined With Bagging-SVM to Predict Potentially Druggable Proteins.
Gong Y, Liao B, Wang P, Zou Q. Gong Y, et al. Front Pharmacol. 2021 Nov 30;12:771808. doi: 10.3389/fphar.2021.771808. eCollection 2021. Front Pharmacol. 2021. PMID: 34916947 Free PMC article. Review.
Cited by
- K48-linked polyubiquitination of dengue virus NS1 protein inhibits its interaction with the viral partner NS4B.
Giraldo MI, Vargas-Cuartas O, Gallego-Gomez JC, Shi PY, Padilla-Sanabria L, Castaño-Osorio JC, Rajsbaum R. Giraldo MI, et al. Virus Res. 2018 Feb 15;246:1-11. doi: 10.1016/j.virusres.2017.12.013. Epub 2017 Dec 30. Virus Res. 2018. PMID: 29294313 Free PMC article. - Incorporating distant sequence features and radial basis function networks to identify ubiquitin conjugation sites.
Lee TY, Chen SA, Hung HY, Ou YY. Lee TY, et al. PLoS One. 2011 Mar 9;6(3):e17331. doi: 10.1371/journal.pone.0017331. PLoS One. 2011. PMID: 21408064 Free PMC article. - POPISK: T-cell reactivity prediction using support vector machines and string kernels.
Tung CW, Ziehm M, Kämper A, Kohlbacher O, Ho SY. Tung CW, et al. BMC Bioinformatics. 2011 Nov 15;12:446. doi: 10.1186/1471-2105-12-446. BMC Bioinformatics. 2011. PMID: 22085524 Free PMC article. - UbiSite: incorporating two-layered machine learning method with substrate motifs to predict ubiquitin-conjugation site on lysines.
Huang CH, Su MG, Kao HJ, Jhong JH, Weng SL, Lee TY. Huang CH, et al. BMC Syst Biol. 2016 Jan 11;10 Suppl 1(Suppl 1):6. doi: 10.1186/s12918-015-0246-z. BMC Syst Biol. 2016. PMID: 26818456 Free PMC article. - Position-specific analysis and prediction of protein pupylation sites based on multiple features.
Zhao X, Dai J, Ning Q, Ma Z, Yin M, Sun P. Zhao X, et al. Biomed Res Int. 2013;2013:109549. doi: 10.1155/2013/109549. Epub 2013 Aug 26. Biomed Res Int. 2013. PMID: 24066285 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Miscellaneous