Length-dependent prediction of protein intrinsic disorder - PubMed (original) (raw)
Length-dependent prediction of protein intrinsic disorder
Kang Peng et al. BMC Bioinformatics. 2006.
Abstract
Background: Due to the functional importance of intrinsically disordered proteins or protein regions, prediction of intrinsic protein disorder from amino acid sequence has become an area of active research as witnessed in the 6th experiment on Critical Assessment of Techniques for Protein Structure Prediction (CASP6). Since the initial work by Romero et al. (Identifying disordered regions in proteins from amino acid sequences, IEEE Int. Conf. Neural Netw., 1997), our group has developed several predictors optimized for long disordered regions (>30 residues) with prediction accuracy exceeding 85%. However, these predictors are less successful on short disordered regions (< or =30 residues). A probable cause is a length-dependent amino acid compositions and sequence properties of disordered regions.
Results: We proposed two new predictor models, VSL2-M1 and VSL2-M2, to address this length-dependency problem in prediction of intrinsic protein disorder. These two predictors are similar to the original VSL1 predictor used in the CASP6 experiment. In both models, two specialized predictors were first built and optimized for short (< or = 30 residues) and long disordered regions (>30 residues), respectively. A meta predictor was then trained to integrate the specialized predictors into the final predictor model. As the 10-fold cross-validation results showed, the VSL2 predictors achieved well-balanced prediction accuracies of 81% on both short and long disordered regions. Comparisons over the VSL2 training dataset via 10-fold cross-validation and a blind-test set of unrelated recent PDB chains indicated that VSL2 predictors were significantly more accurate than several existing predictors of intrinsic protein disorder.
Conclusion: The VSL2 predictors are applicable to disordered regions of any length and can accurately identify the short disordered regions that are often misclassified by our previous disorder predictors. The success of the VSL2 predictors further confirmed the previously observed differences in amino acid compositions and sequence properties between short and long disordered regions, and justified our approaches for modelling short and long disordered regions separately. The VSL2 predictors are freely accessible for non-commercial use at http://www.ist.temple.edu/disprot/predictorVSL2.php.
Figures
Figure 1
VSL2 predictor architectures. The final prediction for VSL2-M1 is calculated as O L × O M + O S × (1 – O M), while for VSL2-M2 it is the output of meta predictor M2. The inputs for M2 are 2 × W in predictions by VSL2-L and VSL2-S for the neighbouring residues in a window of length W in. All component predictors are built using classification algorithms that approximate the posterior probability p(c = 1|x), where x is the feature (input) vector and c is the class label.
Figure 2
Comparison of amino acid compositions between short and long disordered regions. The y-axis represents the difference in amino acid compositions (fractions) from a reference dataset of ordered proteins, Globular-3D. The error bars correspond to one standard deviation estimated using 5,000 bootstrap samples. His-tags and initial methionines were not counted.
Figure 3
Length-dependent prediction accuracies. Per-residue accuracies (sensitivities) are reported on disordered regions from different length ranges.
Figure 4
Representative predictions on two PDB chains. (A) 1REP:C with four short disordered regions at residue 1–14, 50–55, 98–109, and 247–251. (B) 1B70:A with a long disordered region at residue 1–85. These disordered regions are marked as thick line segments. Residues with predictions above 0.5 are interpreted as predicted disordered.
Figure 5
Comparison of receiver operating characteristic (ROC) curves. The ROC curves were plotted using (A) per-chain and (B) per-residue accuracies, by varying the decision thresholds from 0 to 1 in increments of 0.001. The corresponding AUC values were approximated using the trapezoid rule and reported in Table 7.
Figure 6
VSL2 prediction on PDB chain 1 YYH:B. VSL2 prediction (disorder probability) is plotted in blue sold line. Residues with predictions above 0.5 are interpreted as predicted disordered. The long region of missing electron density (residues 1–54) is marked as thick red segment. The fourteen short green segments correspond to the α-helices in the seven ANK repeats (two helices for each repeat).
Similar articles
- Optimizing long intrinsic disorder predictors with protein evolutionary information.
Peng K, Vucetic S, Radivojac P, Brown CJ, Dunker AK, Obradovic Z. Peng K, et al. J Bioinform Comput Biol. 2005 Feb;3(1):35-60. doi: 10.1142/s0219720005000886. J Bioinform Comput Biol. 2005. PMID: 15751111 - Exploiting heterogeneous sequence properties improves prediction of protein disorder.
Obradovic Z, Peng K, Vucetic S, Radivojac P, Dunker AK. Obradovic Z, et al. Proteins. 2005;61 Suppl 7:176-182. doi: 10.1002/prot.20735. Proteins. 2005. PMID: 16187360 - FoldUnfold: web server for the prediction of disordered regions in protein chain.
Galzitskaya OV, Garbuzynskiy SO, Lobanov MY. Galzitskaya OV, et al. Bioinformatics. 2006 Dec 1;22(23):2948-9. doi: 10.1093/bioinformatics/btl504. Epub 2006 Oct 4. Bioinformatics. 2006. PMID: 17021161 - Natively disordered proteins: functions and predictions.
Romero P, Obradovic Z, Dunker AK. Romero P, et al. Appl Bioinformatics. 2004;3(2-3):105-13. doi: 10.2165/00822942-200403020-00005. Appl Bioinformatics. 2004. PMID: 15693736 Review. - Five hierarchical levels of sequence-structure correlation in proteins.
Bystroff C, Shao Y, Yuan X. Bystroff C, et al. Appl Bioinformatics. 2004;3(2-3):97-104. doi: 10.2165/00822942-200403020-00004. Appl Bioinformatics. 2004. PMID: 15693735 Review.
Cited by
- The Disorderly Nature of Caliciviruses.
Young VL, McSweeney AM, Edwards MJ, Ward VK. Young VL, et al. Viruses. 2024 Aug 19;16(8):1324. doi: 10.3390/v16081324. Viruses. 2024. PMID: 39205298 Free PMC article. Review. - "Off-pore" nucleoporins relocalize heterochromatic breaks through phase separation.
Merigliano C, Ryu T, See CD, Caridi CP, Li X, Butova NL, Reynolds TW, Deng C, Chenoweth DM, Capelson M, Chiolo I. Merigliano C, et al. bioRxiv [Preprint]. 2024 Jul 18:2023.12.07.570729. doi: 10.1101/2023.12.07.570729. bioRxiv. 2024. PMID: 39071440 Free PMC article. Preprint. - Structural Properties of Rat Intestinal Fatty Acid-Binding Protein with its Dynamics: Insights into Intrinsic Disorder.
Balli OI, Caglayan SI, Uverksy VN, Coskuner-Weber O. Balli OI, et al. Protein Pept Lett. 2024;31(6):458-468. doi: 10.2174/0109298665313811240530055004. Protein Pept Lett. 2024. PMID: 38910419 - AlphaFold2 modeling and molecular dynamics simulations of an intrinsically disordered protein.
Guo HB, Huntington B, Perminov A, Smith K, Hastings N, Dennis P, Kelley-Loughnane N, Berry R. Guo HB, et al. PLoS One. 2024 May 13;19(5):e0301866. doi: 10.1371/journal.pone.0301866. eCollection 2024. PLoS One. 2024. PMID: 38739602 Free PMC article. - Assessment of Disordered Linker Predictions in the CAID2 Experiment.
Wang K, Hu G, Wu Z, Uversky VN, Kurgan L. Wang K, et al. Biomolecules. 2024 Feb 28;14(3):287. doi: 10.3390/biom14030287. Biomolecules. 2024. PMID: 38540707 Free PMC article.
References
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources
Research Materials