ADRIAN L GUTHALS - Academia.edu (original) (raw)

Uploads

Papers by ADRIAN L GUTHALS

Research paper thumbnail of The Generating Function Approach for Peptide Identification in Spectral Networks

Journal of computational biology : a journal of computational molecular cell biology, Jan 25, 2014

Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and h... more Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and has launched a quest for the identification of every translated protein and peptide. However, computational developments have lagged behind the pace of modern data acquisition protocols and have become a major bottleneck in proteomics analysis of complex samples. As it stands today, attempts to identify MS/MS spectra against large databases (e.g., the human microbiome or 6-frame translation of the human genome) face a search space that is 10-100 times larger than the human proteome, where it becomes increasingly challenging to separate between true and false peptide matches. As a result, the sensitivity of current state-of-the-art database search methods drops by nearly 38% to such low identification rates that almost 90% of all MS/MS spectra are left as unidentified. We address this problem by extending the generating function approach to rigorously compute the joint spectral probability...

Research paper thumbnail of Shotgun Protein Sequencing with Meta-contig Assembly

Molecular & Cellular Proteomics, 2012

Full-length de-novo sequencing of unknown proteins such as antibodies or proteins from organisms ... more Full-length de-novo sequencing of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains an open problem. Conventional de-novo methods sequence MS/MS spectra individually, which yields poor accuracy and limited sequence length. Given these limitations, current techniques for sequencing unknown proteins rely on hybrid approaches involving de-novo sequencing followed by error-tolerant database search and/or homologous mapping to reconstruct protein sequences . In contrast with current approaches, our approach aggregates and jointly sequences multiple spectra from overlapping peptides. Our Meta Shotgun Protein Sequencing (Meta-SPS) approach assembled unidentified MS/MS spectra into "meta" de-novo sequences up to 97 amino acids in length without any sequence homology steps and while mis-predicting only 1 in 33 amino acids.

Research paper thumbnail of Peptide Identification by Tandem Mass Spectrometry with Alternate Fragmentation Modes

Molecular & Cellular Proteomics, 2012

The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination... more The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications. Molecular & Cellular

Research paper thumbnail of Neutron-encoded Signatures Enable Product Ion Annotation From Tandem Mass Spectra

Molecular & Cellular Proteomics, 2013

We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell cul... more We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell culture for the purpose of C-terminal product ion annotation. Two Neu-Code labeling isotopologues of lysine, 13 C 6 15 N 2 and 2 H 8 , which differ by 36 mDa, were metabolically embedded in a sample proteome, and the resultant labeled proteins were combined, digested, and analyzed via liquid chromatography and mass spectrometry. With MS/MS scan resolving powers of ϳ50,000 or higher, product ions containing the C terminus (i.e. lysine) appear as a doublet spaced by exactly 36 mDa, whereas N-terminal fragments exist as a single m/z peak. Through theory and experiment, we demonstrate that over 90% of all y-type product ions have detectable doublets. We report on an algorithm that can extract these neutron signatures with high sensitivity and specificity. In other words, of 15,503 y-type product ion peaks, the y-type ion identification algorithm correctly identified 14,552 (93.2%) based on detection of the Neu-Code doublet; 6.8% were misclassified (i.e. other ion types that were assigned as y-type products). Searching NeuCode labeled yeast with PepNovo ؉ resulted in a 34% increase in correct de novo identifications relative to searching through MS/MS only. We use this tool to simplify spectra prior to database searching, to sort unmatched tandem mass spectra for spectral richness, for correlation of co-fragmented ions to their parent precursor, and for de novo sequence identification. Molecular 1 The abbreviations used are: ETD, electron transfer dissociation; FDR, false discovery rate; HCD, high-energy collision dissociation; MS, mass spectrometry; SILAC, stable isotope labeling with amino acids in cell culture.

Research paper thumbnail of Sequencing-Grade De novo Analysis of MS/MS Triplets (CID/HCD/ETD) From Overlapping Peptides

Journal of Proteome Research, 2013

Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditiona... more Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.

Research paper thumbnail of The spectral networks paradigm in high throughput mass spectrometry

Molecular BioSystems, 2012

High-throughput proteomics is made possible by a combination of modern mass spectrometry instrume... more High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.

Research paper thumbnail of The Generating Function Approach for Peptide Identification in Spectral Networks

Journal of computational biology : a journal of computational molecular cell biology, Jan 25, 2014

Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and h... more Tandem mass (MS/MS) spectrometry has become the method of choice for protein identification and has launched a quest for the identification of every translated protein and peptide. However, computational developments have lagged behind the pace of modern data acquisition protocols and have become a major bottleneck in proteomics analysis of complex samples. As it stands today, attempts to identify MS/MS spectra against large databases (e.g., the human microbiome or 6-frame translation of the human genome) face a search space that is 10-100 times larger than the human proteome, where it becomes increasingly challenging to separate between true and false peptide matches. As a result, the sensitivity of current state-of-the-art database search methods drops by nearly 38% to such low identification rates that almost 90% of all MS/MS spectra are left as unidentified. We address this problem by extending the generating function approach to rigorously compute the joint spectral probability...

Research paper thumbnail of Shotgun Protein Sequencing with Meta-contig Assembly

Molecular & Cellular Proteomics, 2012

Full-length de-novo sequencing of unknown proteins such as antibodies or proteins from organisms ... more Full-length de-novo sequencing of unknown proteins such as antibodies or proteins from organisms with unsequenced genomes remains an open problem. Conventional de-novo methods sequence MS/MS spectra individually, which yields poor accuracy and limited sequence length. Given these limitations, current techniques for sequencing unknown proteins rely on hybrid approaches involving de-novo sequencing followed by error-tolerant database search and/or homologous mapping to reconstruct protein sequences . In contrast with current approaches, our approach aggregates and jointly sequences multiple spectra from overlapping peptides. Our Meta Shotgun Protein Sequencing (Meta-SPS) approach assembled unidentified MS/MS spectra into "meta" de-novo sequences up to 97 amino acids in length without any sequence homology steps and while mis-predicting only 1 in 33 amino acids.

Research paper thumbnail of Peptide Identification by Tandem Mass Spectrometry with Alternate Fragmentation Modes

Molecular & Cellular Proteomics, 2012

The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination... more The high-throughput nature of proteomics mass spectrometry is enabled by a productive combination of data acquisition protocols and the computational tools used to interpret the resulting spectra. One of the key components in mainstream protocols is the generation of tandem mass (MS/MS) spectra by peptide fragmentation using collision induced dissociation, the approach currently used in the large majority of proteomics experiments to routinely identify hundreds to thousands of proteins from single mass spectrometry runs. Complementary to these, alternative peptide fragmentation methods such as electron capture/transfer dissociation and higher-energy collision dissociation have consistently achieved significant improvements in the identification of certain classes of peptides, proteins, and post-translational modifications. Recognizing these advantages, mass spectrometry instruments now conveniently support fine-tuned methods that automatically alternate between peptide fragmentation modes for either different types of peptides or for acquisition of multiple MS/MS spectra from each peptide. But although these developments have the potential to substantially improve peptide identification, their routine application requires corresponding adjustments to the software tools and procedures used for automated downstream processing. This review discusses the computational implications of alternative and alternate modes of MS/MS peptide fragmentation and addresses some practical aspects of using such protocols for identification of peptides and post-translational modifications. Molecular & Cellular

Research paper thumbnail of Neutron-encoded Signatures Enable Product Ion Annotation From Tandem Mass Spectra

Molecular & Cellular Proteomics, 2013

We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell cul... more We report the use of neutron-encoded (NeuCode) stable isotope labeling of amino acids in cell culture for the purpose of C-terminal product ion annotation. Two Neu-Code labeling isotopologues of lysine, 13 C 6 15 N 2 and 2 H 8 , which differ by 36 mDa, were metabolically embedded in a sample proteome, and the resultant labeled proteins were combined, digested, and analyzed via liquid chromatography and mass spectrometry. With MS/MS scan resolving powers of ϳ50,000 or higher, product ions containing the C terminus (i.e. lysine) appear as a doublet spaced by exactly 36 mDa, whereas N-terminal fragments exist as a single m/z peak. Through theory and experiment, we demonstrate that over 90% of all y-type product ions have detectable doublets. We report on an algorithm that can extract these neutron signatures with high sensitivity and specificity. In other words, of 15,503 y-type product ion peaks, the y-type ion identification algorithm correctly identified 14,552 (93.2%) based on detection of the Neu-Code doublet; 6.8% were misclassified (i.e. other ion types that were assigned as y-type products). Searching NeuCode labeled yeast with PepNovo ؉ resulted in a 34% increase in correct de novo identifications relative to searching through MS/MS only. We use this tool to simplify spectra prior to database searching, to sort unmatched tandem mass spectra for spectral richness, for correlation of co-fragmented ions to their parent precursor, and for de novo sequence identification. Molecular 1 The abbreviations used are: ETD, electron transfer dissociation; FDR, false discovery rate; HCD, high-energy collision dissociation; MS, mass spectrometry; SILAC, stable isotope labeling with amino acids in cell culture.

Research paper thumbnail of Sequencing-Grade De novo Analysis of MS/MS Triplets (CID/HCD/ETD) From Overlapping Peptides

Journal of Proteome Research, 2013

Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditiona... more Full-length de novo sequencing of unknown proteins remains a challenging open problem. Traditional methods that sequence spectra individually are limited by short peptide length, incomplete peptide fragmentation, and ambiguous de novo interpretations. We address these issues by determining consensus sequences for assembled tandem mass (MS/MS) spectra from overlapping peptides (e.g., by using multiple enzymatic digests). We have combined electron-transfer dissociation (ETD) with collision-induced dissociation (CID) and higher-energy collision-induced dissociation (HCD) fragmentation methods to boost interpretation of long, highly charged peptides and take advantage of corroborating b/y/c/z ions in CID/HCD/ETD. Using these strategies, we show that triplet CID/HCD/ETD MS/MS spectra from overlapping peptides yield de novo sequences of average length 70 AA and as long as 200 AA at up to 99% sequencing accuracy.

Research paper thumbnail of The spectral networks paradigm in high throughput mass spectrometry

Molecular BioSystems, 2012

High-throughput proteomics is made possible by a combination of modern mass spectrometry instrume... more High-throughput proteomics is made possible by a combination of modern mass spectrometry instruments capable of generating many millions of tandem mass (MS(2)) spectra on a daily basis and the increasingly sophisticated associated software for their automated identification. Despite the growing accumulation of collections of identified spectra and the regular generation of MS(2) data from related peptides, the mainstream approach for peptide identification is still the nearly two decades old approach of matching one MS(2) spectrum at a time against a database of protein sequences. Moreover, database search tools overwhelmingly continue to require that users guess in advance a small set of 4-6 post-translational modifications that may be present in their data in order to avoid incurring substantial false positive and negative rates. The spectral networks paradigm for analysis of MS(2) spectra differs from the mainstream database search paradigm in three fundamental ways. First, spectral networks are based on matching spectra against other spectra instead of against protein sequences. Second, spectral networks find spectra from related peptides even before considering their possible identifications. Third, spectral networks determine consensus identifications from sets of spectra from related peptides instead of separately attempting to identify one spectrum at a time. Even though spectral networks algorithms are still in their infancy, they have already delivered the longest and most accurate de novo sequences to date, revealed a new route for the discovery of unexpected post-translational modifications and highly-modified peptides, enabled automated sequencing of cyclic non-ribosomal peptides with unknown amino acids and are now defining a novel approach for mapping the entire molecular output of biological systems that is suitable for analysis with tandem mass spectrometry. Here we review the current state of spectral networks algorithms and discuss possible future directions for automated interpretation of spectra from any class of molecules.