Advancing Top-down Analysis of the Human Proteome Using a Benchtop Quadrupole-Orbitrap Mass Spectrometer - PubMed (original) (raw)

Advancing Top-down Analysis of the Human Proteome Using a Benchtop Quadrupole-Orbitrap Mass Spectrometer

Luca Fornelli et al. J Proteome Res. 2017.

Abstract

Over the past decade, developments in high resolution mass spectrometry have enabled the high throughput analysis of intact proteins from complex proteomes, leading to the identification of thousands of proteoforms. Several previous reports on top-down proteomics (TDP) relied on hybrid ion trap-Fourier transform mass spectrometers combined with data-dependent acquisition strategies. To further reduce TDP to practice, we use a quadrupole-Orbitrap instrument coupled with software for proteoform-dependent data acquisition to identify and characterize nearly 2000 proteoforms at a 1% false discovery rate from human fibroblasts. By combining a 3 m/z isolation window with short transients to improve specificity and signal-to-noise for proteoforms >30 kDa, we demonstrate improving proteome coverage by capturing 439 proteoforms in the 30-60 kDa range. Three different data acquisition strategies were compared and resulted in the identification of many proteoforms not observed in replicate data-dependent experiments. Notably, the data set is reported with updated metrics and tools including a new viewer and assignment of permanent proteoform record identifiers for inclusion of highly characterized proteoforms (i.e., those with C-scores >40) in a repository curated by the Consortium for Top-Down Proteomics.

Keywords: AUTOPILOT; Orbitrap; data-dependent acquisition; false-discovery rate; gas-phase fractionation; mass spectrometry; medium/high; proteoform; quadrupole; top-down proteomics.

PubMed Disclaimer

Conflict of interest statement

Notes

The authors declare the following competing financial interest(s): The authors declare a conflict and several are involved in software commercialization. Thermo Fisher Scientific is an Industrial Collaborator of the NRTDP.

All RAW data files, the UniProt formatted text file used for generating the proteoform database, and the three .tdReport files associated with this study are available at http://massive.ucsd.edu/ with the identifier MSV000079913.

Figures

Figure 1

Figure 1

Data acquisition strategies for top-down analysis of human proteins below 60 kDa. (A) Traditional data-dependent high/high experiments as well as medium/high experiments start with a broadband MS1 scan for the determination of precursors to be fragmented in a data-dependent top-2 fashion. Similarly, the standard version of AUTOPILOT (AP), employed as first technical replicate in the high/high study, uses by default a MS1-MS2 scheme. (B) The second and third technical replicates of the AUTOPILOT experiment are designed as a SIM march, that is, as a series of SIM scans to investigate an overall 200 m/z window between 700 and 900 m/z. Precursors are selected from online deconvolution of SIM scans. (C) Selected precursors, both from Xcalibur data-dependent or AUTOPILOT-driven acquisition, are quadrupole isolated with a narrow isolation window of 3 m/z units. (D) Selected proteoforms are subject to HCD activation with dedicated parameters for high or low MW proteins. (E) An off-line database search associates each proteoform with a C-score and determines its identification confidence through an FDR calculation based on _q_-values. Well characterized proteoforms are indicated by a unique PFR identifier.

Figure 2

Figure 2

Summary of unique proteoforms and accession numbers identified at 1% FDR from 45 total RAW files. (A) Venn diagram for the total 393 unique accession numbers identified at a 1% protein-level FDR from 54 LC–MS runs. Note, ~80% of the proteins identified by medium/high experiments were not found in either of the two high/high modes of data acquisition. (B) Venn diagram of proteoforms identified at 1% FDR. Approximately 50% of identified proteoforms were shared between top-2 and AUTOPILOT high/high experiments, and low overlap was observed for the <30 kDa and 30–60 kDa portions of the fibroblast proteome interrogated here.

Figure 3

Figure 3

Efficiency of identification of new proteoforms from a single GELFrEE fraction using three technical replicates under Xcalibur data-dependent or AUTOPILOT data acquisition. The number of new proteoforms identified in each technical replicate for GELFrEE fractions 1, 2, and 3 is normalized over the total number of new proteoforms identified in the single GELFrEE fraction of interest. (A) The data-dependent top-2 method shows that for the three fractions considered, the first technical replicate provides the highest number of new proteoforms, and the capability of the data-dependent method of finding new confident proteoforms decreases with the number of technical replicates. (B) The AUTOPILOT experiments show that the SIM march with 50 m/z windows (2nd technical replicate) outperforms the standard AUTOPILOT acquisition based on the MS1–MS2 scheme (1st technical replicate) in two fractions out of three. Conversely, the SIM march composed by eight SIM events (3rd technical replicate) produces the lowest number of new identified proteoforms.

Figure 4

Figure 4

Example of ~41 kDa protein identified from a medium/high experiment (8% GELFrEE fraction 4). (A) The broadband MS1 spectrum obtained using a short transient in the Orbitrap mass analyzer shows high spectral signal-to-noise ratio for a number of charge states from 32 to 55+. (B) The graphical fragment map shows that HCD fragmentation primarily sequenced the C-terminal region to lead to a high C-score of 255 for the proteoform PFR20440, whose experimental mass matches the theoretical one within 2.5 ppm. (C) Histogram of mass distribution for proteoforms identified at 1% FDR through medium/high, top-2 experiments; the distribution is centered around 35–40 kDa.

Figure 5

Figure 5

C-score distributions for the three experimental setups. Identified proteoforms are binned according to their associated C-scores. Panels A–C show C-score distributions for data-dependent high–high, AUTOPILOT high/high, and data-dependent medium/high results, respectively. Proteoforms with a C-score lower than 3 are considered statistically identified but not well characterized. Proteoforms with a C-score between 3 and 40 are defined as partially characterized, as the set of fragment ions used for their identification might be consistent also with the presence of one or more highly similar proteoform(s). Finally, proteoforms with a C-scores >40 are considered well characterized, and their respective PFRs are included in a top-down proteoform repository.

Figure 6

Figure 6

Results of Gene Ontology analysis using DAVID Bioinformatics Resources. (A) First three functional protein groups ranked according to their _p_-values. Functional groups are based on the list of UniProt accession numbers identified at 1% FDR in medium/high experiments. Note that the UniProt accession numbers of the first two functional groups are largely overlapping. (B) Mass distribution of the 41 proteoforms referring to the eight UniProt accession numbers identified for the glycolysis pathway. (C) Summary of the identified proteoforms of the glycolysis-involved enzyme L-lactate dehydrogenase (P00338).

Similar articles

Cited by

References

    1. Toby TK, Fornelli L, Kelleher NL. Progress in Top-Down Proteomics and the Analysis of Proteoforms. Annu Rev Anal Chem. 2016;9:499–519. - PMC - PubMed
    1. Smith LM, Kelleher NL, Linial M, Goodlett D, Langridge-Smith P, Ah Goo Y, Safford G, Bonilla L, Kruppa G, Zubarev R, et al. Proteoform: a single term describing protein complexity. Nat Methods. 2013;10:186–187. - PMC - PubMed
    1. Nesvizhskii AI, Aebersold R. Interpretation of shotgun proteomic data: the protein inference problem. Mol Cell Proteomics. 2005;4:1419–1440. - PubMed
    1. Hebert AS, Richards AL, Bailey DJ, Ulbrich A, Coughlin EE, Westphall MS, Coon JJ. The one hour yeast proteome. Mol Cell Proteomics. 2014;13:339–347. - PMC - PubMed
    1. Ahlf DR, Compton PD, Tran JC, Early BP, Thomas PM, Kelleher NL. Evaluation of the compact high-field orbitrap for top-down proteomics of human cells. J Proteome Res. 2012;11:4308–4314. - PMC - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources