Identifying differences in protein expression levels by spectral counting and feature selection - PubMed (original) (raw)
Identifying differences in protein expression levels by spectral counting and feature selection
P C Carvalho et al. Genet Mol Res. 2008.
Abstract
Spectral counting is a strategy to quantify relative protein concentrations in pre-digested protein mixtures analyzed by liquid chromatography online with tandem mass spectrometry. In the present study, we used combinations of normalization and statistical (feature selection) methods on spectral counting data to verify whether we could pinpoint which and how many proteins were differentially expressed when comparing complex protein mixtures. These combinations were evaluated on real, but controlled, experiments (yeast lysates were spiked with protein markers at different concentrations to simulate differences), which were therefore verifiable. The following normalization methods were applied: total signal, Z-normalization, hybrid normalization, and log preprocessing. The feature selection methods were: the Golub index, the Student t-test, a strategy based on the weighting used in a forward-support vector machine (SVM-F) model, and SVM recursive feature elimination. The results showed that Z-normalization combined with SVM-F correctly identified which and how many protein markers were added to the yeast lysates for all different concentrations. The software we used is available at http://pcarvalho.com/patternlab.
Figures
Figure 1
Protein markers were spiked at different concentrations in 15 yeast total cell lysate samples. Each lysate was analyzed by MudPIT (1.2) and protein identification carried out by Pep_Prob (1.3) and post-processed by DTASelect. Three different test sets were then generated. Combinations of normalization / feature selection methods were used to search for the spiked protein markers with different concentrations in each test set (1.4).
Figure 2
Sum of Pscores calculated for each combination of normalization / feature selection method when comparing the different spiked concentrations (legend), with (B) and without (A) Log preprocessing. Lower bars indicate better performance. The bar heights were limited to 4. We recall that the Pscore is calculated by obtaining the Log10 of the sum of the ranks and subtracting 1. Note that SVM-F with and without log preprocessing obtains at least one perfect score. UD stands for “unnormalized” data.
Similar articles
- PatternLab for proteomics: a tool for differential shotgun proteomics.
Carvalho PC, Fischer JS, Chen EI, Yates JR 3rd, Barbosa VC. Carvalho PC, et al. BMC Bioinformatics. 2008 Jul 21;9:316. doi: 10.1186/1471-2105-9-316. BMC Bioinformatics. 2008. PMID: 18644148 Free PMC article. - Comparative shotgun proteomics using spectral count data and quasi-likelihood modeling.
Li M, Gray W, Zhang H, Chung CH, Billheimer D, Yarbrough WG, Liebler DC, Shyr Y, Slebos RJ. Li M, et al. J Proteome Res. 2010 Aug 6;9(8):4295-305. doi: 10.1021/pr100527g. J Proteome Res. 2010. PMID: 20586475 Free PMC article. - Targeted Feature Detection for Data-Dependent Shotgun Proteomics.
Weisser H, Choudhary JS. Weisser H, et al. J Proteome Res. 2017 Aug 4;16(8):2964-2974. doi: 10.1021/acs.jproteome.7b00248. Epub 2017 Jul 19. J Proteome Res. 2017. PMID: 28673088 Free PMC article. - Machine learning methods for predictive proteomics.
Barla A, Jurman G, Riccadonna S, Merler S, Chierici M, Furlanello C. Barla A, et al. Brief Bioinform. 2008 Mar;9(2):119-28. doi: 10.1093/bib/bbn008. Epub 2008 Feb 29. Brief Bioinform. 2008. PMID: 18310105 Review. - Computational advances of tumor marker selection and sample classification in cancer proteomics.
Tang J, Wang Y, Luo Y, Fu J, Zhang Y, Li Y, Xiao Z, Lou Y, Qiu Y, Zhu F. Tang J, et al. Comput Struct Biotechnol J. 2020 Jul 17;18:2012-2025. doi: 10.1016/j.csbj.2020.07.009. eCollection 2020. Comput Struct Biotechnol J. 2020. PMID: 32802273 Free PMC article. Review.
Cited by
- Detection and Quantitation of Endogenous Membrane-Bound RAS Proteins and KRAS Mutants in Cancer Cell Lines Using 1D-SDS-PAGE LC-MS2.
Kaczmarczyk JA, Whiteley GR, Blonder J. Kaczmarczyk JA, et al. Methods Mol Biol. 2024;2823:269-289. doi: 10.1007/978-1-0716-3922-1_17. Methods Mol Biol. 2024. PMID: 39052226 - The effect of obesity on uterine receptivity is mediated by endometrial extracellular vesicles that control human endometrial stromal cell decidualization and trophoblast invasion.
Galio L, Bernet L, Rodriguez Y, Fourcault C, Dieudonné MN, Pinatel H, Henry C, Sérazin V, Fathallah K, Gagneux A, Krupova Z, Vialard F, Santos ED. Galio L, et al. J Extracell Biol. 2023 Jul 18;2(7):e103. doi: 10.1002/jex2.103. eCollection 2023 Jul. J Extracell Biol. 2023. PMID: 38939074 Free PMC article. - On Comprehensive Mass Spectrometry Data Analysis for Proteome Profiling of Human Blood Samples.
Manchanda S, Meyer M, Li Q, Liang K, Li Y, Kong N. Manchanda S, et al. J Healthc Inform Res. 2018 May 22;2(3):305-318. doi: 10.1007/s41666-018-0022-0. eCollection 2018 Sep. J Healthc Inform Res. 2018. PMID: 35415410 Free PMC article. - Simple, efficient and thorough shotgun proteomic analysis with PatternLab V.
Santos MDM, Lima DB, Fischer JSG, Clasen MA, Kurt LU, Camillo-Andrade AC, Monteiro LC, de Aquino PF, Neves-Ferreira AGC, Valente RH, Trugilho MRO, Brunoro GVF, Souza TACB, Santos RM, Batista M, Gozzo FC, Durán R, Yates JR 3rd, Barbosa VC, Carvalho PC. Santos MDM, et al. Nat Protoc. 2022 Jul;17(7):1553-1578. doi: 10.1038/s41596-022-00690-x. Epub 2022 Apr 11. Nat Protoc. 2022. PMID: 35411045 Review. - Insights into the Structure and Protein Composition of Moorella thermoacetica Spores Formed at Different Temperatures.
Malleck T, Fekraoui F, Bornard I, Henry C, Haudebourg E, Planchon S, Broussolle V. Malleck T, et al. Int J Mol Sci. 2022 Jan 4;23(1):550. doi: 10.3390/ijms23010550. Int J Mol Sci. 2022. PMID: 35008975 Free PMC article.
References
- Badr G, Oommen BJ. On optimizing syntactic pattern recognition using tries and AI-based heuristic-search strategies. IEEE Trans Syst.Man Cybern.B Cybern. 2006;36:611–622. - PubMed
- Bylund D, Danielsson R, Malmquist G, Markides KE. Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. J Chromatogr.A. 2002;961:237–244. - PubMed
- Carlson JM, Chakravarty A, Gross RH. BEAM: a beam search algorithm for the identification of cis-regulatory elements in groups of genes. J Comput.Biol. 2006;13:686–701. - PubMed
- Carvalho PC, Carvalho MGC, Degrave W, Lilla S, De Nucci G, Fonseca R, Spector N, Musacchio J, Domont GB. Differential protein expression patterns obtained by mass spectrometry can aid in the diagnosis of Hodgkin's disease. J.Exp.Ther.Oncol. 2007;6:137–145. - PubMed
- Carvalho PC, Freitas SS, Lima AB, Barros M, Bittencourt I, Degrave W, Cordovil I, Fonseca R, Carvalho MGC, Moura Neto RS, Cabello PH. Personalized diagnosis by cached solutions with hypertension as a study model. Genet.Mol.Res. 2006;5:856–867. - PubMed
Publication types
MeSH terms
Substances
Grants and funding
- R01 MH067880/MH/NIMH NIH HHS/United States
- P41 RR011823-13/RR/NCRR NIH HHS/United States
- U19 AI063603/AI/NIAID NIH HHS/United States
- U19 AI063603-02/AI/NIAID NIH HHS/United States
- U19 AI063603-050002/AI/NIAID NIH HHS/United States
- R01 MH067880-06/MH/NIMH NIH HHS/United States
- 5R01 MH067880/MH/NIMH NIH HHS/United States
- P41 RR011823/RR/NCRR NIH HHS/United States
- P41 RR11823-10/RR/NCRR NIH HHS/United States
LinkOut - more resources
Full Text Sources
Other Literature Sources