SPECTRUM – A MATLAB Toolbox for Proteoform Identification from Top-Down Proteomics Data (original) (raw)

Top-Down Proteomics (TDP) is an emerging proteomics protocol that involves identification, characterization, and quantitation of intact proteins using high-resolution mass spectrometry. TDP has an edge over other proteomics protocols in that it allows for: (i) accurate measurement of intact protein mass, (ii) high sequence coverage, and (iii) enhanced identification of post-translational modifications (PTMs). However, the complexity of TDP spectra poses a significant impediment to protein search and PTM characterization. Furthermore, limited software support is currently available in the form of search algorithms and pipelines. To address this need, we propose 'SPECTRUM', an open-architecture and open-source toolbox for TDP data analysis. Its salient features include: (i) MS2-based intact protein mass tuning, (ii) de novo peptide sequence tag analysis, (iii) propensity-driven PTM characterization, (iv) blind PTM search, (v) spectral comparison, (vi) identification of truncated proteins, (vii) multifactorial coefficient-weighted scoring, and (viii) intuitive graphical user interfaces to access the aforementioned functionalities and visualization of results. We have validated SPECTRUM using published datasets and benchmarked it against salient TDP tools. SPECTRUM provides significantly enhanced protein identification rates (91% to 177%) over its contemporaries. SPECTRUM has been implemented in MATLAB, and is freely available along with its source code and documentation at https://github.com/ BIRL/SPECTRUM/. Mass spectrometry-based proteomics is a well-established technique for protein identification, characterization, and quantitation 1-3. The conventional Bottom-Up Proteomics (BUP) 4 protocol involves mass spectrometry (MS) analysis of peptides obtained from enzymatic digestion of whole proteins 4,5. Several software tools such as SEQUEST 6 , Mascot 7 and ExPASy tools 8 (FindPept 9 and EasyProt 10) have been reported for BUP data analysis. However, BUP spectra and its analysis have limited power in: (i) identification of post-translational modifications (PTMs) 2 , (ii) sequence coverage 11,12 , and (iii) characterization of very small proteins 13. Recent advancements in proteomics protocols and instrumentation have enabled precise mass measurements of large proteins by employing soft ionization techniques 14 coupled with high-resolution mass analyzers 15. This has led to the emergence of Top-Down Proteomics 16 (TDP) protocol which is becoming increasingly popular for analyzing intact proteins 17,18. TDP offers an enhanced sequence coverage 19 as compared to BUP 4 along with an improved identification of proteoforms (proteins and its variants) 20,21. However, the complexity of high-resolution TDP spectral data poses a significant challenge for analysis tools. Current tools for TDP include ProSight PTM 12 , ProSight PTM 2.0 22 , MS-Align+ 23 , pTop 24 , TopPIC 25 , and MSPathFinder 26 amongst others.