Identifying differences in protein expression levels by spectral counting and feature selection - PubMed (original) (raw)

Identifying differences in protein expression levels by spectral counting and feature selection

P C Carvalho et al. Genet Mol Res. 2008.

Abstract

Spectral counting is a strategy to quantify relative protein concentrations in pre-digested protein mixtures analyzed by liquid chromatography online with tandem mass spectrometry. In the present study, we used combinations of normalization and statistical (feature selection) methods on spectral counting data to verify whether we could pinpoint which and how many proteins were differentially expressed when comparing complex protein mixtures. These combinations were evaluated on real, but controlled, experiments (yeast lysates were spiked with protein markers at different concentrations to simulate differences), which were therefore verifiable. The following normalization methods were applied: total signal, Z-normalization, hybrid normalization, and log preprocessing. The feature selection methods were: the Golub index, the Student t-test, a strategy based on the weighting used in a forward-support vector machine (SVM-F) model, and SVM recursive feature elimination. The results showed that Z-normalization combined with SVM-F correctly identified which and how many protein markers were added to the yeast lysates for all different concentrations. The software we used is available at http://pcarvalho.com/patternlab.

PubMed Disclaimer

Figures

Figure 1

Figure 1

Protein markers were spiked at different concentrations in 15 yeast total cell lysate samples. Each lysate was analyzed by MudPIT (1.2) and protein identification carried out by Pep_Prob (1.3) and post-processed by DTASelect. Three different test sets were then generated. Combinations of normalization / feature selection methods were used to search for the spiked protein markers with different concentrations in each test set (1.4).

Figure 2

Figure 2

Sum of Pscores calculated for each combination of normalization / feature selection method when comparing the different spiked concentrations (legend), with (B) and without (A) Log preprocessing. Lower bars indicate better performance. The bar heights were limited to 4. We recall that the Pscore is calculated by obtaining the Log10 of the sum of the ranks and subtracting 1. Note that SVM-F with and without log preprocessing obtains at least one perfect score. UD stands for “unnormalized” data.

Similar articles

Cited by

References

    1. Badr G, Oommen BJ. On optimizing syntactic pattern recognition using tries and AI-based heuristic-search strategies. IEEE Trans Syst.Man Cybern.B Cybern. 2006;36:611–622. - PubMed
    1. Bylund D, Danielsson R, Malmquist G, Markides KE. Chromatographic alignment by warping and dynamic programming as a pre-processing tool for PARAFAC modelling of liquid chromatography-mass spectrometry data. J Chromatogr.A. 2002;961:237–244. - PubMed
    1. Carlson JM, Chakravarty A, Gross RH. BEAM: a beam search algorithm for the identification of cis-regulatory elements in groups of genes. J Comput.Biol. 2006;13:686–701. - PubMed
    1. Carvalho PC, Carvalho MGC, Degrave W, Lilla S, De Nucci G, Fonseca R, Spector N, Musacchio J, Domont GB. Differential protein expression patterns obtained by mass spectrometry can aid in the diagnosis of Hodgkin's disease. J.Exp.Ther.Oncol. 2007;6:137–145. - PubMed
    1. Carvalho PC, Freitas SS, Lima AB, Barros M, Bittencourt I, Degrave W, Cordovil I, Fonseca R, Carvalho MGC, Moura Neto RS, Cabello PH. Personalized diagnosis by cached solutions with hypertension as a study model. Genet.Mol.Res. 2006;5:856–867. - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources