SwedCAD, a Database of Annotated High-Mass Accuracy MS/MS Spectra of Tryptic Peptides (original) (raw)
Related papers
Examining Troughs in the Mass Distribution of All Theoretically Possible Tryptic Peptides
Journal of Proteome Research, 2011
This work describes the mass distribution of all theoretically possibly tryptic peptides made of 20 amino acids, up to the mass of 3 kDa, with resolution of 0.001 Da. We characterize regions between the peaks of the distribution, including gaps (forbidden zones) and low-populated areas (quiet zones). We show how the gaps shrink over the mass range, and when they completely disappear. We demonstrate that peptide compositions in quiet zones are less diverse than those in the peaks of the distribution, and that by eliminating certain types of unrealistic compositions the gaps in the distribution may be increased. The mass distribution is generated using a parallel implementation of a recursive procedure that enumerates all amino acid compositions. It allows us to enumerate all compositions of tryptic peptides below 3 kDa in 48 minutes using a computer cluster with 12 Intel Xeon X5650 CPUs (72 cores). The results of this work can be used to facilitate protein identification and mass defect labeling in mass spectrometry-based proteomics experiments. Keywords distribution of peptide masses; forbidden zones; quiet zones; amino acid compositions of all theoretically possible peptides; accurate peptide masses; mass accuracy
Proteomics, 2001
A specialised proteomic database for comparing matrix-assisted laser desorption/ionization-time of flight mass spectrometry data of tryptic peptides with corresponding sequence database segments We have developed a specialised proteomic database for the analysis of matrixassisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) data derived from tryptic peptides of Sinorhizobium meliloti proteins. This database currently contains the amino acid sequence data of the proteins predicted from the complete chromosome, MALDI-TOF MS data from proteolytic peptides of about 400 tryptically digested proteins, and the results of a search of the MALDI-TOF MS spectra against the chromosomal amino acid sequences. The database made it possible to access and compare the sequences of theoretical tryptic peptides that correspond to MALDI-TOF peaks in the mass spectrum with predicted tryptic peptides from identified proteins that could not be matched to MALDI-TOF peaks. A comparison of the molecular weights, isoelectric points and amino acid compositions of the identified and nonidentified peptides is presented. We also show how the system can assist in the development of an automated scoring function that facilitates and consolidates protein identification.
Bioorganic & Medicinal Chemistry, 2006
Although genome databases have become the key for proteomic analyses, de novo sequencing remains essential for the study of organisms whose genomes have not been completed. In addition, post-translational modifications present a challenge in database searching. Recognition of the b or y-ion series in a peptide MS/MS spectrum as well as identification of the b 1 -and y nÀ1 -ions can facilitate de novo analyses. Therefore, it is valuable to identify either amino-acid terminus. In previous work, we have demonstrated that peptides modified at the e-amino group of lysine as a t-butyl peroxycarbamate derivative undergo free radical promoted peptide backbone fragmentation under low-energy collision-induced dissociation (CID) conditions. Here we explore the chemistry of the N-terminal amino group modified as a t-butyl peroxycarbamate. The conversion of N-terminal amines to peroxycarbamates of simple amino acids and peptides was studied with aryl t-butyl peroxycarbonates. ESI-MS/MS analysis of the peroxycarbamate adducts gave evidence of a product ion corresponding to the neutral loss of the N-terminal side chain (R), thus identifying this residue. Further fragmentation (MS 3 ) of product ions formed by N-terminal residue side-chain loss (-R) exhibited an m/z shift of the b-ions equal to the neutral loss of R, therefore labeling the b-ion series. The study was extended to the analysis of a protein tryptic digest where the SALSA algorithm was used to identify spectra containing these neutral losses. The method for N-terminus identification presented here has the potential for improvement of de novo analyses as well as in constraining peptide mass mapping database searches.
Prediction of Missed Cleavage Sites in Tryptic Peptides Aids Protein Identification in Proteomics
Journal of Proteome Research, 2007
Protein identification via peptide mass fingerprinting (PMF) remains a key component of highthroughput proteomics experiments in post-genomic science. Candidate protein identifications are made using bioinformatic tools from peptide peak lists obtained via mass spectrometry (MS). These algorithms rely on several search parameters, including the number of potential uncut peptide bonds matching the primary specificity of the hydrolytic enzyme used in the experiment. Typically, up to 1 of these "missed cleavages" are considered by the bioinformatics search tools, usually after digestion of the in silico proteome by trypsin. Using two distinct, non-redundant datasets of peptides identified via PMF and tandem MS, a simple predictive method based on information theory is presented which is able to identify experimentally defined missed cleavages with up to 90% accuracy from amino acid sequence alone. Using this simple protocol, we are able to "mask" candidate protein databases so that confident missed cleavage sites need not be considered for in silico digestion. We show that that this leads to an improvement in database searching, with two different search engines, using the PMF dataset as a test set. In addition, the improved approach is also demonstrated on an independent PMF data set of known proteins which also has corresponding high quality tandem MS data, validating the protein identifications. This approach has wider applicability for proteomics database searching and the program for predicting missed cleavages and masking Fasta-formatted protein sequence databases has been made available via http://ispider.smith.man.acuk/MissedCleave
Analytical Chemistry, 1991
The formatlon of multlply charged molecular Ions vla the tkld-asdsted Ion evaporatkn machanism durlng electrogpray lonlzatlon enables the u w of an atmospheric pressure lonlzatlon qUadNpd@ mass spectrometer system for characterking Mokgkally Important peptldes. The stralghtforward knplementatlon of hlgh-performance llquld chromatography (HPLC) Into thls new strategy to determlne the molecular welght of tryptlc peptides vla the pneumatlcally assisted electtospray (Ion spray) Interface Is presented. Examples u#zlng both " b o r e (1.0 mm) and standard bore (4.6 mm) Inside diameter colmm are shown for the LC/MS molecular weight determlnatlon of tryptlc peptldes In methlonyl-human growth hormone (met-hGH). Injected levels from 50 to 75 pmol of tryptk d t g d onto 1 mm 1.d. HPLC columns provkled full-scan LC/MS or LC/MS/MS results wlthout postcolumn spllttlng of the effluent. When standard 4.6 mm 1.d. HPLC columns were used, a 20:l postcolumn spllt was utlllzed, whlch requlred from 1 to 5 nmol of Injected tryptlc dlgest for full-scan LC/MS or LC/MS/MS results. Colllslon-Induced dlsroclatlon (CID) mass spectra resultlng from elther "lnfudon" or on-llne LC/MS/MS analysls of the abundant doubly charged Ions that predomlnate for tryptic peptldes under ektroapray condltlons provlded structurally useful 88quence lnformatlon for met-hGH and human hemoglobln trvptlc Ugests. The dower maaa rpectrometer scan rate used durlng lnfwlon of sample provkles more accurate mass asdgnments than on-llne LC/MS or LC/MS/MS, but the latter on-Wne expwlmentr pmcluck amblguttks caused by matrlx or component Interferences. However, In m e Instances very weak CID product Ions preclude complete tryptic peptlde structural characterlzatlon based upon the CID data alone. The on-llne LC/MS/MS analysls of the tryptic dlgest from human hemoglobln normal &chaln provlded sufflclent overlapping structural lnformatlon to deduce the sequence of a representative tryptlc fragment. Thls approach provldes an effective means of characterlzlng these blokgkally Important compounds.
Rapid communications in mass spectrometry : RCM, 2014
Mass spectrometry has shown itself as the most efficient tool for the sequencing of peptides. However, de novo sequencing of novel natural peptides is significantly more challenging in comparison with the same procedure applied for the tryptic peptides. To reach the goal in this case it is essential to select the most useful methods of triggering fragmentation and combine complementary techniques. Comparison of low-energy collision-induced dissociation (CID) and higher energy collision-induced dissociation (HCD) modes for sequencing of the natural non-tryptic peptides with disulfide bonds and/or several proline residues in the backbone was achieved using an LTQ FT Ultra Fourier transform ion cyclotron resonance (FTICR) mass spectrometer (Thermo Fisher Scientific, Bremen, Germany) equipped with a 7 T magnet and an LTQ Orbitrap Velos ETD (Thermo Fisher Scientific, Bremen, Germany) instrument. Peptide fractions were obtained by high-performance liquid chromatography (HPLC) separation o...
PROTEOMICS, 2001
A specialised proteomic database for comparing matrix-assisted laser desorption/ionization-time of flight mass spectrometry data of tryptic peptides with corresponding sequence database segments We have developed a specialised proteomic database for the analysis of matrixassisted laser desorption/ionization-time of flight mass spectrometry (MALDI-TOF MS) data derived from tryptic peptides of Sinorhizobium meliloti proteins. This database currently contains the amino acid sequence data of the proteins predicted from the complete chromosome, MALDI-TOF MS data from proteolytic peptides of about 400 tryptically digested proteins, and the results of a search of the MALDI-TOF MS spectra against the chromosomal amino acid sequences. The database made it possible to access and compare the sequences of theoretical tryptic peptides that correspond to MALDI-TOF peaks in the mass spectrum with predicted tryptic peptides from identified proteins that could not be matched to MALDI-TOF peaks. A comparison of the molecular weights, isoelectric points and amino acid compositions of the identified and nonidentified peptides is presented. We also show how the system can assist in the development of an automated scoring function that facilitates and consolidates protein identification.