An integrated workflow for crosslinking mass spectrometry - PubMed (original) (raw)

doi: 10.15252/msb.20198994.

Lutz Fischer 1 2, Zhuo A Chen 1, Marta Barbon 3 4, Francis J O'Reilly 1, Sven H Giese 1, Michael Bohlke-Schneider 1, Adam Belsom 1 2, Therese Dau 2, Colin W Combe 2, Martin Graham 2, Markus R Eisele 5, Wolfgang Baumeister 5, Christian Speck 3 4, Juri Rappsilber 1 2

Affiliations

PMID: 31556486
PMCID: PMC6753376
DOI: 10.15252/msb.20198994

An integrated workflow for crosslinking mass spectrometry

Marta L Mendes et al. Mol Syst Biol. 2019 Sep.

Abstract

We present a concise workflow to enhance the mass spectrometric detection of crosslinked peptides by introducing sequential digestion and the crosslink identification software xiSEARCH. Sequential digestion enhances peptide detection by selective shortening of long tryptic peptides. We demonstrate our simple 12-fraction protocol for crosslinked multi-protein complexes and cell lysates, quantitative analysis, and high-density crosslinking, without requiring specific crosslinker features. This overall approach reveals dynamic protein-protein interaction sites, which are accessible, have fundamental functional relevance and are therefore ideally suited for the development of small molecule inhibitors.

Keywords: crosslinking mass spectrometry; protein-protein interactions; proteomics; software; structural biology.

PubMed Disclaimer

Conflict of interest statement

The authors declare that they have no conflict of interest.

Figures

Figure 1. Sequential digestion workflow compared to repeated analysis and parallel digestion

Sequential digestion workflow. Proteins or protein complexes are crosslinked and digested with trypsin. After splitting the sample into four aliquots, one remains single digested with trypsin (T) while the others are sequentially digested with either AspN (A), chymotrypsin (C) or GluC (G). Samples are enriched by SEC, and the three high‐MW fractions are analysed by LC‐MS, submitted to xiSEARCH and xiFDR analysis.
Results of the sequential digestion workflow applied to a synthetic 7‐protein mix, compared to using trypsin alone in four replicates and parallel digestion with trypsin, AspN, chymotrypsin and GluC. A trypsin four replicate experiment shows a large overlap of the four datasets with little gain. Parallel digestions with trypsin, AspN, chymotrypsin and GluC demonstrate high complementarity but moderate gains over trypsin. Sequential digestion shows low overlap between the four datasets and the largest gain in unique residue pairs.
Gains of repeated analysis (trypsin only), parallel digestion and sequential digestion for the same data as shown in panel (B).
Crosslinked peptides obtained by sequential digestion of a synthetic 7‐protein mix are smaller than their corresponding tryptic peptides. Boxplot ranges represent the 25th (lower hinge) and 75th (upper hinge) percentiles, respectively. Middle line represents the median. For trypsin 4 replicates were analysed and for sequential digestion and parallel digestion 1 sample was analysed.

Figure EV1. Sequential digestion increases the number of identified unique residue pairs in a seven‐protein mixture

Links per fraction and gain for sequential digestion and the control experiments composed by an experiment using trypsin alone in four replicates and individual digestions with trypsin, AspN, chymotrypsin and GluC. Trypsin yields the higher number of links per sample followed by sequential digestion and individual digestions. However, sequential digestion yields the largest number of unique residue pairs when combining the data.

Figure EV2. Properties of crosslinked peptides (i.e. the two linked peptides are considered together) for a seven‐protein mixture and each digestion condition

Precursor m/z. Sequentially digested peptides are smaller. Boxplot ranges represent the 25th (lower hinge) and 75th (upper hinge) percentiles, respectively. Middle line represents the median. Upper whisker and lower whisker are defined as follows: upper whisker = min(max(x), Q_3 + 1.5 * IQR), lower whisker = max(min(x), Q_1 − 1.5 * IQR) where IQR is the interquartile range (vertical size of the boxes).
Observed charge state. Sequentially digested peptides have lower charge states.
Calculated hydrophobicity
Observed retention time (RT).
Calculated pI.
Peptide length, of both the observed peptides as part of a crosslink (left) and the number of unique crosslinkable peptides resulting from in silico digestion (right).
Number of missed cleavages. Sequentially digested samples with trypsin + chymotrypsin and trypsin + GluC show more miss‐cleavages than the other fractions.

Data information: For statistical testing, a one‐sided Mann–Whitney _U_‐test with continuity correction was used (Dataset EV4). All tests were carried out with trypsin as reference. The sign above the trypsin data (> or <) shows the direction of the alternative hypothesis. (****_P_ < 0.0001, −: _P_ > 0.05).

Figure EV3. Cleavage site protection in a BS 3‐crosslinked 26S proteasome sample

A–C
We determined the number of available cleavage sites for each secondary enzyme (A: AspN, B: chymotrypsin, C: GluC) in both the trypsin (Tryp) dataset and their respective dataset (Tryp + AspN/Tryp + Chyno/Tryp + GluC). Boxplots show that the bigger the observed peptide is, the lower is its number of remaining cleavage sites, showing that large peptides with a higher number of cleavage sites were digested. In turn, smaller peptides contain a larger density of missed cleavage sites thereby indicating that short length protects peptides from digestion. Boxplot ranges represent the 25th (lower hinge) and 75th (upper hinge) percentiles, respectively. Middle line represents the median.
D
Histogram of intensity weighted MS features as detected by MaxQuant for each digest. The sequential digests show a slight shift to lower masses, but most observed masses are between 1,000 and 2,000 Da.
E
Protection of peptide from secondary cleavage measured as the number of missed‐cleaved peptides in sequential digest divided by the number of peptides in the trypsin digest with potential cleavage sites for the second enzyme.

Figure EV4. Sequentially digested crosslinked peptides show a bias towards having C‐termini that end in K or R

In a seven‐protein mixture, we see a bias towards tryptic C‐terminal when compared to tryptic N‐terminal showing that the increase of identification is in large part from the shorten but still tryptic looking peptides that are easier to identify by LC‐MS/MS.

Figure 2. Sequential digestion of the affinity‐purified complexes OCCM (Saccharomyces cerevisiae; Residue pairs observed in tryptic (green) and non‐tryptic (orange) peptides)

Unique residue pairs mapped to the OCCM complex (PDB 5UDB) and the key link Mcm2‐Orc5 (Mcm2‐850‐Orc5‐369).
The in vitro helicase loading assay demonstrates that an Mcm2 C‐terminal deletion mutant supports complex assembly (lanes 6 and 7) and blocks formation of the final helicase loading product (lanes 8 and 9).
Overexpression analysis of Mcm2‐7ΔC2 shows that this mutant causes dominant lethality, indicating that the C‐terminus of Mcm2 is essential in cell survival.

Figure 3. Sequential digestion of the 26S proteasome from Saccharomyces cerevisiae

Unique residue pairs obtained by Wang et al for the human 26S proteasome (PDB 5GJR).
Unique residue pairs obtained by sequential digestion for the S. cerevisiae 26S proteasome (PDB 4CR2). Sequential digestion returned the highest number of residue pairs so far identified by CLMS for the 26S proteasome. Tryptic residue pairs are represented in green and non‐tryptic in orange.
Long distance (blue) and within distance (pink) between residue pairs were mapped into one of the states of the proteasome (4CR2) showing the accumulation of those into the base of the complex. Residue pairs satisfying other states are represented in yellow. The bar plot shows the distribution of all residue pairs in the complex showing that long‐distance links locate mainly in the base.
Unique residue pairs were mapped into the three states described by Unverdorben et al showing the rearrangement of Rpn5 relative to Rpt4.
Our data support Rpn1 being translated and rotated to be positioned closer to the AAA‐ATPase.
Structural rearrangements of the AAA‐ATPase‐dependent heterohexameric ring throughout four states for the RPT6 and RPT1 mapped to the four states described by Wehmer et al.
Structural rearrangements of the AAA‐ATPase‐dependent heterohexameric ring throughout four states for the RPT4 and RPT6.

Figure 4. Sequential digestion of the 26S proteasome from human cytosol

Comparison of sequential digestion for complex mixtures with previous studies. Sequential digestion returns a higher number of residue pairs with low overlap to published datasets, showing the complementarity of the different approaches. Long‐distance links were determined only within proteins and are comparable for all datasets.
Residue pairs for the TRiC/CCT complex were mapped into the crystal structure and support the rearrangement of the complex reported by Leitner et al (2012a) (PDB 4V94).
Despite the complexity of the sample, we were able to identify the four states of the 26S proteasome showing the flexibility of the AAA‐ATPase‐dependent heterohexameric ring.

Figure EV5. CLMS of a human cytosol as a complex mixture using sequential digestion and xiSEARCH

Tryptic links are displayed in green, and non‐tryptic links are displayed in orange.

A
Overlap of the residue pairs obtained by the four digestion conditions applied. Sequential digestion complements trypsin digestion increasing the identified unique residue pairs.
B
Gain of unique residue pairs by using sequential digestion. The bar represents the number of unique residue pairs gained by adding AspN (blue), chymotrypsin (pink) and GluC (orange) to the digestion with trypsin (green).
C
The unique residue pairs identified for the MCM2‐7 complex were used to generate a protein–protein interaction network and mapped into the crystal structure.
D, E
Protein–protein interaction network for the TRiC/CCT and COPI complexes was generated from the identified unique residue pairs. The unique residue pairs identified for the COPI complex were mapped into the respective crystal structure.
F
The unique residue pairs identified for the ribosome were used to generate a protein–protein interaction network and mapped into the crystal structure.
G
The unique residue pairs identified for the HSP90‐CDC37‐CDK4 complex were used to generate a protein–protein interaction network and mapped into the crystal structure.
H
The unique residue pairs identified for the 26S proteasome were used to generate a protein–protein interaction network and mapped into the crystal structure.

Figure 5. xiSEARCH

xiSEARCH is an open source search engine that takes a peak list as input. Users can define any type of crosslinker, modification, digestion and fragmentation method. The output is a list of matches in .csv format. We use xiFDR to filter results to the desired confidence level.
xiSEARCH + xiFDR(Xi), pLink 2 and Kojak(+PeptideProphet) comparison at 5% residue‐pair FDR. The same trypsin dataset of the 26S proteasome was searched with all three software packages. xiSEARCH was run twice—once giving same likelihood for matching lysine, serine, threonine and tyrosine (xiSEARCH), as is the case for Kojak and pLink 2, and once giving priority to Lysine (Xi*).

Figure EV6. xiSEARCH follows a three‐step approach

In the first step, it tries to identify peptides with an unknown modification that explain the spectrum best. High‐resolution data help in this step in two ways. Knowing the charge, and therefore the mass of a fragment, enables to predict whether a fragment is linear or crosslinked (Giese et al, 2016). This is important as only linear fragments are used to select peptide candidates. Second, with the knowledge that a fragment is probably crosslinked, we can invert the crosslinked fragments into their linear counterparts. By doing these two steps, we can de‐convolute a crosslinked spectrum into a spectrum containing almost exclusively linear fragments of two “independent” peptides. Additionally, the spectrum also gets de‐charged and de‐noised (linearised spectrum). To enable a fast identification, we create an in‐memory representation of all primary fragment (e.g. b‐ and y‐ions)‐to‐peptide relationships that can be derived from the search database (fragment tree). This fragment tree is then used to identify and quickly score alpha‐peptide candidates in the linearised spectrum while ignoring the precursor mass. Secondly, the top‐n candidates are then forwarded to the beta‐peptide selection. Here, we take for each a peptide all peptides that fit the mass gap between the alpha‐peptide candidate plus crosslinker and the precursor mass. The whole list of peptide‐pair candidates is then again preliminarily scored. Third and finally, the top‐m candidate pairs are then fully scored and reported.

References

1. Aebersold RH, Leavitt J, Saavedra RA, Hood LE, Kent SB (1987) Internal amino acid sequence analysis of proteins separated by one‐ or two‐dimensional gel electrophoresis after in situ protease digestion on nitrocellulose. Proc Natl Acad Sci USA 84: 6970–6974 -PMC -PubMed
1. Chavez JD, Weisbrod CR, Zheng C, Eng JK, Bruce JE (2013) Protein interactions, post‐translational modifications and topologies in human cells. Mol Cell Proteomics 12: 1451–1467 -PMC -PubMed
1. Chen S, Wu J, Lu Y, Ma Y‐B, Lee B‐H, Yu Z, Ouyang Q, Finley DJ, Kirschner MW, Mao Y (2016a) Structural basis for dynamic regulation of the human 26S proteasome. Proc Natl Acad Sci USA 113: 12991–12996 -PMC -PubMed
1. Chen ZA, Pellarin R, Fischer L, Sali A, Nilges M, Barlow PN, Rappsilber J (2016b) Structure of complement C3(H2O) revealed by quantitative cross‐linking/mass spectrometry and modeling. Mol Cell Proteomics 15: 2730–2743 -PMC -PubMed
1. Chen Z, Fischer L, Tahir S, Bukowski‐Wills J‐C, Barlow P, Rappsilber J (2016c) Quantitative cross‐linking/mass spectrometry reveals subtle protein conformational changes. Wellcome Open Res 1: 5 -PMC -PubMed

An integrated workflow for crosslinking mass spectrometry - PubMed (original) (raw)

An integrated workflow for crosslinking mass spectrometry

Abstract

Conflict of interest statement

Figures

References

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources

Full Text Sources

Other Literature Sources