A five-level classification system for proteoform identifications (original) (raw)

SPECTRUM – A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data

Scientific Reports

Top-Down Proteomics (TDP) is an emerging proteomics protocol that involves identification, characterization, and quantitation of intact proteins using high-resolution mass spectrometry. TDP has an edge over other proteomics protocols in that it allows for: (i) accurate measurement of intact protein mass, (ii) high sequence coverage, and (iii) enhanced identification of post-translational modifications (PTMs). However, the complexity of TDP spectra poses a significant impediment to protein search and PTM characterization. Furthermore, limited software support is currently available in the form of search algorithms and pipelines. To address this need, we propose 'SPECTRUM', an open-architecture and open-source toolbox for TDP data analysis. Its salient features include: (i) MS2-based intact protein mass tuning, (ii) de novo peptide sequence tag analysis, (iii) propensity-driven PTM characterization, (iv) blind PTM search, (v) spectral comparison, (vi) identification of truncated proteins, (vii) multifactorial coefficient-weighted scoring, and (viii) intuitive graphical user interfaces to access the aforementioned functionalities and visualization of results. We have validated SPECTRUM using published datasets and benchmarked it against salient TDP tools. SPECTRUM provides significantly enhanced protein identification rates (91% to 177%) over its contemporaries. SPECTRUM has been implemented in MATLAB, and is freely available along with its source code and documentation at https://github.com/ BIRL/SPECTRUM/. Mass spectrometry-based proteomics is a well-established technique for protein identification, characterization, and quantitation 1-3. The conventional Bottom-Up Proteomics (BUP) 4 protocol involves mass spectrometry (MS) analysis of peptides obtained from enzymatic digestion of whole proteins 4,5. Several software tools such as SEQUEST 6 , Mascot 7 and ExPASy tools 8 (FindPept 9 and EasyProt 10) have been reported for BUP data analysis. However, BUP spectra and its analysis have limited power in: (i) identification of post-translational modifications (PTMs) 2 , (ii) sequence coverage 11,12 , and (iii) characterization of very small proteins 13. Recent advancements in proteomics protocols and instrumentation have enabled precise mass measurements of large proteins by employing soft ionization techniques 14 coupled with high-resolution mass analyzers 15. This has led to the emergence of Top-Down Proteomics 16 (TDP) protocol which is becoming increasingly popular for analyzing intact proteins 17,18. TDP offers an enhanced sequence coverage 19 as compared to BUP 4 along with an improved identification of proteoforms (proteins and its variants) 20,21. However, the complexity of high-resolution TDP spectral data poses a significant challenge for analysis tools. Current tools for TDP include ProSight PTM 12 , ProSight PTM 2.0 22 , MS-Align+ 23 , pTop 24 , TopPIC 25 , and MSPathFinder 26 amongst others.

Methods, algorithms and tools in computational proteomics: a practical point of view

Proteomics, 2007

Computational MS-based proteomics is an emerging field arising from the demand of high throughput analysis in numerous large-scale experimental proteomics projects. The review provides a broad overview of a number of computational tools available for data analysis of MS-based proteomics data and gives appropriate literature references to detailed description of algorithms. The review provides, to some extent, discussion of algorithms and methods for peptide and protein identification using MS data, quantitative proteomics, and data storage. The hope is that it will stimulate discussion and further development in computational proteomics. Computational proteomics deserves more scientific attention. There are far fewer computational tools and methods available for proteomics compared to the number of microarray tools, despite the fact that data analysis in proteomics is much more complex than microarray analysis. www.proteomics-journal.com 2816 R. Matthiesen Proteomics 2007, 7, 2815-2832 MASCOT [21, 35, 36] a) MAE [37] a) VEMS [38, 39] b) CPAS [40, 41] b) ProFound [42] b) AMASS [43] a) MSinspect [44] b) PRIDE [45-47] b) VEMS [38, 39, 48] b) SALSA [49] a) PEPPeR [50] b) dbVEMS [38, 39] b) Aldente [51] b) Qscore [52] b) OpenMS [53] b) Proteios [54, 55] b) MS-Fit b) PeptideProphet [30] b) Mzmine [56, 57] b) GPMDB [58] b) PeptIdent b) Scope [59] c) SpecArray [60] b) Raw data formats MS/MS (Direct Database) EPIR [61] c) Peplist [60] b) MzXML [62] b) VEMS [38, 39, 48] b) Spider [63] b) PEAKSQ (BSI) c) mzData [64] b) MASCOT [21, 35, 36] a) SILVER [65] b) MSquant (http://msquant.sourceforge.net/) b) Result formats Phenyx [31, 34](GENEBIO) a) MSight [66] b) AnalysisXML [46] b) SEQUEST (Thermo Finnigan) c) RelEx [22] b) ProtXML [67] b) X!tandem [68] b) ASAPratio [69] b) pepXML [67] b) ProbId [70] b) 2D-gels Pipelines PopITAM [71] b) Flicker [72] b) TPP (tools.proteomecenter.org/TPP.php) b) OMSSA [73] b) Melanie (www.gehealthcare.com) [74-76] a) ProteinScape ™ (www.proteinscape.com) c) P-mod [77] a) PDQuest (www.bio-rad.com) c) Scaffold (www.proteomesoftware.com) c) PLGS (Waters) c) DeCyder (www.gehealthcare.com) c) TOPP [78] b) Paragon (ABI) c,d) Delta2D (www.decodon.com) c) VEMS (http://personal.cicbiogune.es/ rmatthiesen/) c) Spectral library search Progenesis (www.nonlinear.com) c) PLGS (www.waters.com) c) X ! Hunter [58] b) Proteomweaver (www.definiens.com/ www.bio-rad.com) c) Proteotypic Peptide search X! P3 [79] b) MS/MS (Tag database) GutenTag [80] a) InsPecT [81] b) Popitam [71] b) MS/MS (De novo) Lutefisk [82, 83] b) PepHMM [32] b) Sherenga [84] c) PepNovo [33] b) Peaks (BSI) c,e)

Guidelines for the next 10 years of proteomics

2006

In the last ten years, the field of proteomics has expanded at a rapid rate. A range of exciting new technology has been developed and enthusiastically applied to an enormous variety of biological questions. However, the degree of stringency required in proteomic data generation and analysis appears to have been underestimated. As a result, there are likely to be numerous published findings that are of questionable quality, requiring further confirmation and/or validation. This manuscript outlines a number of key issues in proteomic research, including those associated with experimental design, differential display and biomarker discovery, protein identification and analytical incompleteness. In an effort to set a standard that reflects current thinking on the necessary and desirable characteristics of publishable manuscripts in the field, a minimal set of guidelines for proteomics research is then described. These guidelines will serve as a set of criteria which editors of PROTEOMICS will use for assessment of future submissions to the Journal.

ABRF Proteome Informatics Research Group (iPRG) 2016 Study: Inferring Proteoforms from Bottom-up Proteomics Data

Journal of biomolecular techniques : JBT, 2018

This report presents the results from the 2016 Association of Biomolecular Resource Facilities Proteome Informatics Research Group (iPRG) study on proteoform inference and false discovery rate (FDR) estimation from bottom-up proteomics data. For this study, 3 replicate Q Exactive Orbitrap liquid chromatography-tandom mass spectrometry datasets were generated from each of 4 samples spiked with different equimolar mixtures of small recombinant proteins selected to mimic pairs of homologous proteins. Participants were given raw data and a sequence file and asked to identify the proteins and provide estimates on the FDR at the proteoform level. As part of this study, we tested a new submission system with a format validator running on a virtual private server (VPS) and allowed methods to be provided as executable R Markdown or IPython Notebooks. The task was perceived as difficult, and only eight unique submissions were received, although those who participated did well with no one meth...

Differential Proteomics via Probabilistic Peptide Identification Scores

Analytical Chemistry, 2005

Relative quantitation is key to enable differential proteomics and hence answer biological questions by comparing samples. Classical approaches involve stable isotope labeling with/without spiked standards. Although stable isotopes may lead to precise results, their application is not straightforward. In Proteomics, 2004, 4, 2333-2351, we proposed an approach where we summed peptide identification scores to derive a semiquantitative abundance indicator. In this study, we combine such an indicator with a statistical test to detect differentially expressed proteins. We demonstrate the effectiveness of this method by using mixtures of purified proteins and human plasma spiked with proteins at low-nanomolar concentrations. The impact of the number of repeated experiments is discussed, and we show that the statistical test we use performs well with two to three repetitions, whereas a classical t-test would require at least four repetitions to achieve the same performance. Typically, 2.5-5-fold changes are detected with 90-95% confidence in human plasma. The method is finally characterized by deriving estimates of its false positive and negative rates. This new characterization is valid for a wider class of methods such as spectrum sampling (Liu, H.; Sadygov, R. G.; Yates, J. R. III. Anal. Chem. 2004, 76, 4193-4201). Barrillat, N.; Baussant, T.; Boiteau, C.; Botti, P.; Bougueleret, L.; Budin, N.; Canet, D.; Carraud, S.; Chiappe, D.; Christmann, N.; Colinge, J.; Cusin, I.; Dafflon, N.; Depresle, B.; Fasso, I.; Frauchiger, P.; Gaertner, H.; Gleizes, A.; Gonzalez-Couto, E.; Jeandenans, C.; Karmime, A.; Kowall, T.; Lagache, S.; Mahe, E.; Masselot, A.; Mattou, H.; Moniatte, M.; Niknejad, A.; Paolini, M.; Perret, F.; Pinaud, N.; Ranno, F.; Raimondi, S.; Reffas, S.; Regamey, P. O.; Rey, P. A.; Rodriguez-Tome, P.; Rose, K.; Rossellat, G.; Saudrais, C.; Schmidt, C.; Villain, M.; Zwahlen, C. Proteomics 2004, 4, 2333-51. (2) Link, A. J.; Eng, J.; Schieltz, D. M.; Carmack, E.; Mize, G. J.; Morris, D. R.; Garvik, B. M.; Yates,