Lennart Martens - Academia.edu (original) (raw)
Papers by Lennart Martens
ABSTRACTMaintaining high sensitivity while limiting false positives is a key challenge in peptide... more ABSTRACTMaintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we therefore investigate the effects of integrating the machine learning-based post-processor Percolator into our spectral library searching tool COSS. To evaluate the effects of this post-processing, we have used forty data sets from two different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using two spectral library search tools, COSS and MSPepSearch with and without Percolator post-processing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compo...
ABSTRACTMissing values are a major issue in quantitative data-dependent mass spectrometry-based p... more ABSTRACTMissing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.
Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analys... more Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis due to peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outper-form summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialised end-user, and do not provide protein summaries, which are important for visualisation or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared to the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRob-Sum, which estimates MSqRob’s mod...
Nature Methods, 2017
In shotgun proteomics identified mass spectra that are deemed irrelevant to the scientific hypoth... more In shotgun proteomics identified mass spectra that are deemed irrelevant to the scientific hypothesis are often discarded. Noble (2015) 1 therefore urged researchers to remove irrelevant peptides from the database prior to searching to improve statistical power. We here however, argue that both the classical as well as Noble's revised method produce suboptimal peptide identifications and have problems in controlling the false discovery rate (FDR). Instead, we show that searching for all expected peptides, and removing irrelevant peptides prior to FDR calculation results in more reliable identifications at controlled FDR level than the classical strategy that discards irrelevant peptides post FDR calculation, or than Noble's strategy that discards irrelevant peptides prior to searching. 1 Introduction Reliable peptide identification is key to every mass spectrometry-based shotgun proteomics workflow. The growing concern on reproducibility triggered leading journals to require that all peptideto-spectrum matches (PSMs) are reported along with an estimate of their statistical confidence. The false discovery rate (FDR), i.e. the expected fraction of incorrect identifications, is a very popular statistic for this purpose. In many experiments, however, researchers want to focus on proteins of particular pathways, or few organisms in a metaproteomics sample. Hence, a large fraction of identified peptides are deemed irrelevant for their scientific hypothesis. Considering all PSMs induces an overwhelming multiple testing problem, which leads to few identifications of relevant peptides and thus to underpowered studies. 2 Currently, there is much debate on the optimal search strategy to boost the statistical power within this context (e.g. Noble, 2015 1 and
Nature Methods, 2016
Fig. 1d) 5. Both populations of proteins were clearly distinguished. We also performed our analys... more Fig. 1d) 5. Both populations of proteins were clearly distinguished. We also performed our analysis on dSTORM data of antibody-labeled clathrin, generated by varying the label density using the same staining procedure as Baumgart et al. (Fig. 1e and Supplementary Fig. 5). The temporal accumulation analysis generated from regions with high labeling densities yielded characteristic curves for clustered proteins that are in good agreement with the analysis of the whole titration data set. Filtering the data using a density-based cluster-analysis algorithm (DBSCAN) 7,8 highlighted the impact of randomly distributed background localizations on the shape of the curves (Supplementary Figs. 4-6). We further performed simulations based on the labeling efficiency, blinking statistics and background levels that we extracted from experimental dSTORM data. Analysis of these simulations resulted in curves that are in very good agreement with our experimental data (Supplementary Fig. 7). In summary, we demonstrate a simple yet powerful method that directly provides information on protein clustering. The key advantage is that a single SMLM data set suffices, and no additional samples with different labeling densities are required. This is particularly useful for PALM, because adjusting the label density of fusion proteins is difficult and different expression levels might induce variations in the spatial organization of proteins.
Trends in Cell Biology, 2016
Cell migration is central to the development and maintenance of multicellular organisms. Fundamen... more Cell migration is central to the development and maintenance of multicellular organisms. Fundamental understanding of cell migration can, for example, direct novel therapeutic strategies to control invasive tumor cells. However, the study of cell migration yields an overabundance of experimental data that require demanding processing and analysis for results extraction. Computational methods and tools have therefore become essential in the quantification and modeling of cell migration data. We review computational approaches for the key tasks in the quantification of in vitro cell migration: image pre-processing, motion estimation and feature extraction. Moreover, we summarize the current state-of-the-art for in silico modeling of cell migration. Finally, we provide a list of available software tools for cell migration to assist researchers in choosing the most appropriate solution for their needs. Computational Cell Migration in a Nutshell Cell migration plays a fundamental role in physiological phenomena including neural development, wound healing, and immune function, as well as in disorders such as neurological diseases, fibrosis, and cancer metastasis [1-8]. Investigation of cell migration is therefore essential for successful intervention in physiological and pathological phenomena [9-12]. A major driver in the advance of cell migration research has been the evolution of instrumentation (microscopes and cameras) and the corresponding development of experimental tools and biological models. Indeed, 2D in vitro assays [13,14] have recently given way to more sophisticated two-and-a-half-dimensional (2.5D) and 3D approaches [15,16] which more faithfully represent the tissue environment. Because in vivo experiments are difficult and costly, in vitro and ex vivo setups are widely used, especially in drug compound and gene screening [17,18]. This review therefore primarily focuses on the quantification of cell migration in in vitro setups, while we refer the reader to specific literature on in vivo work [19-23].
Dagstuhl Reports, 2019
This report documents the program and the outcomes of Dagstuhl Seminar 19351 "Computational Prote... more This report documents the program and the outcomes of Dagstuhl Seminar 19351 "Computational Proteomics". The Seminar was originally built around four topics, identification and quantification of DIA data; algorithms for the analysis of protein cross-linking data; creating an online view on complete, browsable proteomes from public data; and detecting interesting biology from proteomics findings. These four topics were led to four correpsonding breakout sessions, which in turn led to five offshoot breakout sessions. The abstracts presented here first describe the four topic introduction talks, as well as a fifth, cross-cutting topic talk on bringin proteomics data into clinical trials. These talk abstracts are followed by one abstract each per breakout session, documenting that breakout's discussion and outcomes. An Executive Summary is also provided, which details the overall seminar structure, the relationship between the breakout sessions and topics, and the most important conclusions for the four topic-derived breakouts.
This report documents the program and the outcomes of Dagstuhl Seminar 21271 "Computational Prote... more This report documents the program and the outcomes of Dagstuhl Seminar 21271 "Computational Proteomics". The Seminar, which took place in a hybrid fashion with both local as well as online participation due to the COVID pandemic, was built around three topics: the rapid uptake of advanced machine learning in proteomics; computational challenges across the various rapidlly evolving approaches for structural and top-down proteomics; and the computational analysis of glycoproteomics data. These three topics were the focus of three corresponding breakout sessions, which ran in parallel throughout the seminar. A fourth breakout session was created during the seminar, on the specific topic of creating a Kaggle competition based on proteomics data. The abstracts presented here first describe the three introduction talks, one for each topic. These talk abstracts are then followed by one abstract each per breakout session, documenting that breakout's discussion and outcomes. An Executive Summary is also provided, which details the overall seminar structure alongside the most important conclusions for the three topic-derived breakouts.
The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current ch... more The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current challenges facing the field of computational proteomics, while at the same time reaching out across the field's borders to engage with other computational omics fields at the joint interfaces. The ramifications of these issues, and possible solutions, were first introduced in short but thought-provoking talks, followed by a plenary discussion to delineate the initial discussion sub-topics. Afterwards, working groups addressed these initial considerations in great detail. Seminar October 15-20, 2017-http://www.dagstuhl.de/17421 License Creative Commons BY 3.0 Unported license © Lennart Martens The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current challenges facing the field of computational proteomics, while at the same time reaching out across the field's borders to engage with other computational omics fields at the joint interfaces. The is...
Nature Communications, 2021
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial co... more Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness...
With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitiou... more With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitious proteomics experiment to date has reached its first milestone. The correspondingly impressive amount of data that came from this pilot project emphasized the need for a cen-tralized dissemination mechanism and led to the development of a detailed, PPP specific data gathering infrastructure at the University of Michigan, Ann Arbor as well as the protein identi-fications database project at the European Bioinformatics Institute as a general proteomics data repository. One issue that crept up while discussing which data to store for the PPP concerns whether the raw, binary data coming from the mass spectrometers should be stored, or rather the more compact and already significantly processed peak lists. As this debate is not restricted to the PPP but relates to the proteomics community in general, we will attempt to detail the relative merits and caveats associated with centralized storag...
Computational Methods for Mass Spectrometry Proteomics
Computational Methods for Mass Spectrometry Proteomics
Computational Methods for Mass Spectrometry Proteomics
Dit standpunt bepleit en beargumenteert de noodzaak om elke jongere een opleiding informaticawete... more Dit standpunt bepleit en beargumenteert de noodzaak om elke jongere een opleiding informaticawetenschappen aan te bieden die toelaat om ’informaticavaardig’ te worden. Informaticavaardigheid gaat verder dan louter ‘digitale geletterdheid’, en houdt ook in dat de jongere in staat moet zijn ‘computationeel’ te denken. Computers zijn onmisbaar geworden, zowel in het professionele leven als in de privesfeer. Om de technologische evolutie te kunnen volgen is het van groot belang dat alle jongeren niet alleen de bestaande technologie leren gebruiken, maar ook de onderliggende werking leren begrijpen. Om de technologische evolutie te kunnen sturen, is het nodig dat voldoende jongeren in staat en gemotiveerd zijn om nieuwe technologie te creeren. Om deze doelstellingen te realiseren, dient het onderwijs van de informatica in het leerplichtonderwijs grondig hervormd te worden. In het basis- en secundair onderwijs dient een basisopleiding informaticawetenschappen opgenomen te worden, waarop i...
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial co... more Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear.Here, we carried out the first community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluated the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample.We observed that variability at the peptide level was predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappeared at the protein group level. While differences were observed for predicted community composition, similar functional profiles were obtained across workflows.CAMPI demonstrates ...
Although metaproteomics, the study of the collective proteome of microbial communities, has becom... more Although metaproteomics, the study of the collective proteome of microbial communities, has become increasingly powerful and popular over the past few years, the field has lagged behind on the availability of user-friendly, end-to-end pipelines for data analysis. We therefore describe the connection from two commonly used metaproteomics data processing tools in the field, MetaProteomeAnalyzer and PeptideShaker, to Unipept for downstream analysis. Through these connections, direct end-to-end pipelines are built from database searching to taxonomic and functional annotation.
MotivationAccurate prediction of liquid chromatographic retention times from small molecule struc... more MotivationAccurate prediction of liquid chromatographic retention times from small molecule structures is useful for reducing experimental measurements and for improved identification in targeted and untargeted MS. However, different experimental setups (e.g. differences in columns, gradients, solvents, or stationary phase) have given rise to a multitude of prediction models that only predict accurate retention times for a specific experimental setup. In practice this typically results in the fitting of a new predictive model for each specific type of setup, which is not only inefficient but also requires substantial prior data to be accumulated on each such setup.ResultsHere we introduce the concept of generalized calibration, which is capable of the straightforward mapping of retention time models between different experimental setups. This concept builds on the database-controlled calibration approach implemented in PredRet, and fits calibration curves on predicted retention time...
Protein phosphorylation is a key post-translational modification (PTM) in many biological process... more Protein phosphorylation is a key post-translational modification (PTM) in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation and functional analysis of phosphosites is therefore crucial to understand their various roles. Phosphosites (P-sites) are mainly analysed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosph...
Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique,... more Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly due to low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool that relies on a probabilistic scoring function, and that includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral search and sequence database search tools. Our comparison showed that COSS identifies more peptides, and is faster than other tools. COSS binaries and source code can be freely downloaded from https://github.com/c...
ABSTRACTMaintaining high sensitivity while limiting false positives is a key challenge in peptide... more ABSTRACTMaintaining high sensitivity while limiting false positives is a key challenge in peptide identification from mass spectrometry data. Here, we therefore investigate the effects of integrating the machine learning-based post-processor Percolator into our spectral library searching tool COSS. To evaluate the effects of this post-processing, we have used forty data sets from two different projects and have searched these against the NIST and MassIVE spectral libraries. The searching is carried out using two spectral library search tools, COSS and MSPepSearch with and without Percolator post-processing, and using sequence database search engine MS-GF+ as a baseline comparator. The addition of the Percolator rescoring step to COSS is effective and results in a substantial improvement in sensitivity and specificity of the identifications. COSS is freely available as open source under the permissive Apache2 license, and binaries and source code are found at https://github.com/compo...
ABSTRACTMissing values are a major issue in quantitative data-dependent mass spectrometry-based p... more ABSTRACTMissing values are a major issue in quantitative data-dependent mass spectrometry-based proteomics. We therefore present an innovative solution to this key issue by introducing a hurdle model, which is a mixture between a binomial peptide count and a peptide intensity-based model component. It enables dramatically enhanced quantification of proteins with many missing values without having to resort to harmful assumptions for missingness. We demonstrate the superior performance of our method by comparing it with state-of-the-art methods in the field.
Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analys... more Label-Free Quantitative mass spectrometry based workflows for differential expression (DE) analysis of proteins impose important challenges on the data analysis due to peptide-specific effects and context dependent missingness of peptide intensities. Peptide-based workflows, like MSqRob, test for DE directly from peptide intensities and outper-form summarization methods which first aggregate MS1 peptide intensities to protein intensities before DE analysis. However, these methods are computationally expensive, often hard to understand for the non-specialised end-user, and do not provide protein summaries, which are important for visualisation or downstream processing. In this work, we therefore evaluate state-of-the-art summarization strategies using a benchmark spike-in dataset and discuss why and when these fail compared to the state-of-the-art peptide based model, MSqRob. Based on this evaluation, we propose a novel summarization strategy, MSqRob-Sum, which estimates MSqRob’s mod...
Nature Methods, 2017
In shotgun proteomics identified mass spectra that are deemed irrelevant to the scientific hypoth... more In shotgun proteomics identified mass spectra that are deemed irrelevant to the scientific hypothesis are often discarded. Noble (2015) 1 therefore urged researchers to remove irrelevant peptides from the database prior to searching to improve statistical power. We here however, argue that both the classical as well as Noble's revised method produce suboptimal peptide identifications and have problems in controlling the false discovery rate (FDR). Instead, we show that searching for all expected peptides, and removing irrelevant peptides prior to FDR calculation results in more reliable identifications at controlled FDR level than the classical strategy that discards irrelevant peptides post FDR calculation, or than Noble's strategy that discards irrelevant peptides prior to searching. 1 Introduction Reliable peptide identification is key to every mass spectrometry-based shotgun proteomics workflow. The growing concern on reproducibility triggered leading journals to require that all peptideto-spectrum matches (PSMs) are reported along with an estimate of their statistical confidence. The false discovery rate (FDR), i.e. the expected fraction of incorrect identifications, is a very popular statistic for this purpose. In many experiments, however, researchers want to focus on proteins of particular pathways, or few organisms in a metaproteomics sample. Hence, a large fraction of identified peptides are deemed irrelevant for their scientific hypothesis. Considering all PSMs induces an overwhelming multiple testing problem, which leads to few identifications of relevant peptides and thus to underpowered studies. 2 Currently, there is much debate on the optimal search strategy to boost the statistical power within this context (e.g. Noble, 2015 1 and
Nature Methods, 2016
Fig. 1d) 5. Both populations of proteins were clearly distinguished. We also performed our analys... more Fig. 1d) 5. Both populations of proteins were clearly distinguished. We also performed our analysis on dSTORM data of antibody-labeled clathrin, generated by varying the label density using the same staining procedure as Baumgart et al. (Fig. 1e and Supplementary Fig. 5). The temporal accumulation analysis generated from regions with high labeling densities yielded characteristic curves for clustered proteins that are in good agreement with the analysis of the whole titration data set. Filtering the data using a density-based cluster-analysis algorithm (DBSCAN) 7,8 highlighted the impact of randomly distributed background localizations on the shape of the curves (Supplementary Figs. 4-6). We further performed simulations based on the labeling efficiency, blinking statistics and background levels that we extracted from experimental dSTORM data. Analysis of these simulations resulted in curves that are in very good agreement with our experimental data (Supplementary Fig. 7). In summary, we demonstrate a simple yet powerful method that directly provides information on protein clustering. The key advantage is that a single SMLM data set suffices, and no additional samples with different labeling densities are required. This is particularly useful for PALM, because adjusting the label density of fusion proteins is difficult and different expression levels might induce variations in the spatial organization of proteins.
Trends in Cell Biology, 2016
Cell migration is central to the development and maintenance of multicellular organisms. Fundamen... more Cell migration is central to the development and maintenance of multicellular organisms. Fundamental understanding of cell migration can, for example, direct novel therapeutic strategies to control invasive tumor cells. However, the study of cell migration yields an overabundance of experimental data that require demanding processing and analysis for results extraction. Computational methods and tools have therefore become essential in the quantification and modeling of cell migration data. We review computational approaches for the key tasks in the quantification of in vitro cell migration: image pre-processing, motion estimation and feature extraction. Moreover, we summarize the current state-of-the-art for in silico modeling of cell migration. Finally, we provide a list of available software tools for cell migration to assist researchers in choosing the most appropriate solution for their needs. Computational Cell Migration in a Nutshell Cell migration plays a fundamental role in physiological phenomena including neural development, wound healing, and immune function, as well as in disorders such as neurological diseases, fibrosis, and cancer metastasis [1-8]. Investigation of cell migration is therefore essential for successful intervention in physiological and pathological phenomena [9-12]. A major driver in the advance of cell migration research has been the evolution of instrumentation (microscopes and cameras) and the corresponding development of experimental tools and biological models. Indeed, 2D in vitro assays [13,14] have recently given way to more sophisticated two-and-a-half-dimensional (2.5D) and 3D approaches [15,16] which more faithfully represent the tissue environment. Because in vivo experiments are difficult and costly, in vitro and ex vivo setups are widely used, especially in drug compound and gene screening [17,18]. This review therefore primarily focuses on the quantification of cell migration in in vitro setups, while we refer the reader to specific literature on in vivo work [19-23].
Dagstuhl Reports, 2019
This report documents the program and the outcomes of Dagstuhl Seminar 19351 "Computational Prote... more This report documents the program and the outcomes of Dagstuhl Seminar 19351 "Computational Proteomics". The Seminar was originally built around four topics, identification and quantification of DIA data; algorithms for the analysis of protein cross-linking data; creating an online view on complete, browsable proteomes from public data; and detecting interesting biology from proteomics findings. These four topics were led to four correpsonding breakout sessions, which in turn led to five offshoot breakout sessions. The abstracts presented here first describe the four topic introduction talks, as well as a fifth, cross-cutting topic talk on bringin proteomics data into clinical trials. These talk abstracts are followed by one abstract each per breakout session, documenting that breakout's discussion and outcomes. An Executive Summary is also provided, which details the overall seminar structure, the relationship between the breakout sessions and topics, and the most important conclusions for the four topic-derived breakouts.
This report documents the program and the outcomes of Dagstuhl Seminar 21271 "Computational Prote... more This report documents the program and the outcomes of Dagstuhl Seminar 21271 "Computational Proteomics". The Seminar, which took place in a hybrid fashion with both local as well as online participation due to the COVID pandemic, was built around three topics: the rapid uptake of advanced machine learning in proteomics; computational challenges across the various rapidlly evolving approaches for structural and top-down proteomics; and the computational analysis of glycoproteomics data. These three topics were the focus of three corresponding breakout sessions, which ran in parallel throughout the seminar. A fourth breakout session was created during the seminar, on the specific topic of creating a Kaggle competition based on proteomics data. The abstracts presented here first describe the three introduction talks, one for each topic. These talk abstracts are then followed by one abstract each per breakout session, documenting that breakout's discussion and outcomes. An Executive Summary is also provided, which details the overall seminar structure alongside the most important conclusions for the three topic-derived breakouts.
The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current ch... more The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current challenges facing the field of computational proteomics, while at the same time reaching out across the field's borders to engage with other computational omics fields at the joint interfaces. The ramifications of these issues, and possible solutions, were first introduced in short but thought-provoking talks, followed by a plenary discussion to delineate the initial discussion sub-topics. Afterwards, working groups addressed these initial considerations in great detail. Seminar October 15-20, 2017-http://www.dagstuhl.de/17421 License Creative Commons BY 3.0 Unported license © Lennart Martens The Dagstuhl Seminar 17421 "Computational Proteomics" discussed in-depth the current challenges facing the field of computational proteomics, while at the same time reaching out across the field's borders to engage with other computational omics fields at the joint interfaces. The is...
Nature Communications, 2021
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial co... more Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear. Here, we carry out a community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluate the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample. We observe that variability at the peptide level is predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappear at the protein group level. While differences are observed for predicted community composition, similar functional profiles are obtained across workflows. CAMPI demonstrates the robustness...
With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitiou... more With the human Plasma Proteome Project (PPP) pilot phase completed, the largest and most ambitious proteomics experiment to date has reached its first milestone. The correspondingly impressive amount of data that came from this pilot project emphasized the need for a cen-tralized dissemination mechanism and led to the development of a detailed, PPP specific data gathering infrastructure at the University of Michigan, Ann Arbor as well as the protein identi-fications database project at the European Bioinformatics Institute as a general proteomics data repository. One issue that crept up while discussing which data to store for the PPP concerns whether the raw, binary data coming from the mass spectrometers should be stored, or rather the more compact and already significantly processed peak lists. As this debate is not restricted to the PPP but relates to the proteomics community in general, we will attempt to detail the relative merits and caveats associated with centralized storag...
Computational Methods for Mass Spectrometry Proteomics
Computational Methods for Mass Spectrometry Proteomics
Computational Methods for Mass Spectrometry Proteomics
Dit standpunt bepleit en beargumenteert de noodzaak om elke jongere een opleiding informaticawete... more Dit standpunt bepleit en beargumenteert de noodzaak om elke jongere een opleiding informaticawetenschappen aan te bieden die toelaat om ’informaticavaardig’ te worden. Informaticavaardigheid gaat verder dan louter ‘digitale geletterdheid’, en houdt ook in dat de jongere in staat moet zijn ‘computationeel’ te denken. Computers zijn onmisbaar geworden, zowel in het professionele leven als in de privesfeer. Om de technologische evolutie te kunnen volgen is het van groot belang dat alle jongeren niet alleen de bestaande technologie leren gebruiken, maar ook de onderliggende werking leren begrijpen. Om de technologische evolutie te kunnen sturen, is het nodig dat voldoende jongeren in staat en gemotiveerd zijn om nieuwe technologie te creeren. Om deze doelstellingen te realiseren, dient het onderwijs van de informatica in het leerplichtonderwijs grondig hervormd te worden. In het basis- en secundair onderwijs dient een basisopleiding informaticawetenschappen opgenomen te worden, waarop i...
Metaproteomics has matured into a powerful tool to assess functional interactions in microbial co... more Metaproteomics has matured into a powerful tool to assess functional interactions in microbial communities. While many metaproteomic workflows are available, the impact of method choice on results remains unclear.Here, we carried out the first community-driven, multi-laboratory comparison in metaproteomics: the critical assessment of metaproteome investigation study (CAMPI). Based on well-established workflows, we evaluated the effect of sample preparation, mass spectrometry, and bioinformatic analysis using two samples: a simplified, laboratory-assembled human intestinal model and a human fecal sample.We observed that variability at the peptide level was predominantly due to sample processing workflows, with a smaller contribution of bioinformatic pipelines. These peptide-level differences largely disappeared at the protein group level. While differences were observed for predicted community composition, similar functional profiles were obtained across workflows.CAMPI demonstrates ...
Although metaproteomics, the study of the collective proteome of microbial communities, has becom... more Although metaproteomics, the study of the collective proteome of microbial communities, has become increasingly powerful and popular over the past few years, the field has lagged behind on the availability of user-friendly, end-to-end pipelines for data analysis. We therefore describe the connection from two commonly used metaproteomics data processing tools in the field, MetaProteomeAnalyzer and PeptideShaker, to Unipept for downstream analysis. Through these connections, direct end-to-end pipelines are built from database searching to taxonomic and functional annotation.
MotivationAccurate prediction of liquid chromatographic retention times from small molecule struc... more MotivationAccurate prediction of liquid chromatographic retention times from small molecule structures is useful for reducing experimental measurements and for improved identification in targeted and untargeted MS. However, different experimental setups (e.g. differences in columns, gradients, solvents, or stationary phase) have given rise to a multitude of prediction models that only predict accurate retention times for a specific experimental setup. In practice this typically results in the fitting of a new predictive model for each specific type of setup, which is not only inefficient but also requires substantial prior data to be accumulated on each such setup.ResultsHere we introduce the concept of generalized calibration, which is capable of the straightforward mapping of retention time models between different experimental setups. This concept builds on the database-controlled calibration approach implemented in PredRet, and fits calibration curves on predicted retention time...
Protein phosphorylation is a key post-translational modification (PTM) in many biological process... more Protein phosphorylation is a key post-translational modification (PTM) in many biological processes and is associated to human diseases such as cancer and metabolic disorders. The accurate identification, annotation and functional analysis of phosphosites is therefore crucial to understand their various roles. Phosphosites (P-sites) are mainly analysed through phosphoproteomics, which has led to increasing amounts of publicly available phosphoproteomics data. Several resources have been built around the resulting phosphosite information, but these are usually restricted to protein sequence and basic site metadata. What is often missing from these resources, however, is context, including protein structure mapping, experimental provenance information, and biophysical predictions. We therefore developed Scop3P: a comprehensive database of human phosphosites within their full context. Scop3P integrates sequences (UniProtKB/Swiss-Prot), structures (PDB), and uniformly reprocessed phosph...
Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique,... more Spectral similarity searching to identify peptide-derived MS/MS spectra is a promising technique, and different spectrum similarity search tools have therefore been developed. Each of these tools, however, comes with some limitations, mainly due to low processing speed and issues with handling large databases. Furthermore, the number of spectral data formats supported is typically limited, which also creates a threshold to adoption. We have therefore developed COSS (CompOmics Spectral Searching), a new and user-friendly spectral library search tool that relies on a probabilistic scoring function, and that includes decoy spectra generation for result validation. We have benchmarked COSS on three different spectral libraries and compared the results with established spectral search and sequence database search tools. Our comparison showed that COSS identifies more peptides, and is faster than other tools. COSS binaries and source code can be freely downloaded from https://github.com/c...