Fast and accurate database searches with MS-GF+Percolator - PubMed (original) (raw)

. 2014 Feb 7;13(2):890-7.

doi: 10.1021/pr400937n. Epub 2013 Dec 23.

Affiliations

Fast and accurate database searches with MS-GF+Percolator

Viktor Granholm et al. J Proteome Res. 2014.

Abstract

One can interpret fragmentation spectra stemming from peptides in mass-spectrometry-based proteomics experiments using so-called database search engines. Frequently, one also runs post-processors such as Percolator to assess the confidence, infer unique peptides, and increase the number of identifications. A recent search engine, MS-GF+, has shown promising results, due to a new and efficient scoring algorithm. However, MS-GF+ provides few statistical estimates about the peptide-spectrum matches, hence limiting the biological interpretation. Here, we enabled Percolator processing for MS-GF+ output and observed an increased number of identified peptides for a wide variety of data sets. In addition, Percolator directly reports p values and false discovery rate estimates, such as q values and posterior error probabilities, for peptide-spectrum matches, peptides, and proteins, functions that are useful for the whole proteomics community.

PubMed Disclaimer

Conflict of interest statement

Notes

The authors declare no conflicting interests.

Figures

Figure 1

Figure 1. The covariation of two MS-GF+ decoy searches

MS-GF+ was run against a reversed decoy database (Decoy 1) and a shuffled decoy database (Decoy 2) generated with Mimic. For each PSM, the negative logarithm of the E value reported by MS-GF+ was plotted.

Figure 2

Figure 2. Statistical calibration of null p values from MS-GF+Percolator

Using only incorrect matches from the Orbitrap spectra of ISB18 mix 7, p values were estimated from the Percolator score, after post-processing the MS-GF+ results. The y axis represents the p values reported by Percolator, and the x axis a uniform rank from 0 to 1. The black line marks the x = y diagonal, and the gray dashed lines x = 2_y_ and x = y/2. A calculated Kolomogorov-Smirnov test D value of 0.014 between the reported and ideal p values indicates a small difference between the two distributions.

Figure 3

Figure 3. MS-GF+Percolator performance on tryptic datasets

The numbers of accepted unique peptides are plotted as a function of the peptide level q value threshold. Panel (A) shows the results for the human prostate cancer data, (B) the phosphorylated mouse data and (C) the TMT labeled iPRG data. Peptide level confidence estimates from MS-GF+ were obtained by using qvality as described in Experimental procedures.

Figure 4

Figure 4. The performance of MS-GF+Percolator on the yeast data digested by multiple enzymes

The number of accepted unique peptides for different peptide level q values thresholds, for several different enzymes. The enzymes used for digestions are (A) Arg-C, (B) Asp-N, (C) Glu-C, (D) Lys-C and (E) trypsin.) Peptide level confidence estimates from MS-GF+ were obtained by using qvality as described in Experimental procedures.

Similar articles

Cited by

References

    1. Käll L, Vitek O. Computational Mass Spectrometry–Based Proteomics. PLoS computational biology. 2011;7:e1002277. - PMC - PubMed
    1. Eng J, McCormack A, Yates J, et al. An approach to correlate tandem mass spectral data of peptides with amino acid sequences in a protein database. Journal of the American Society for Mass Spectrometry. 1994;5:976–989. - PubMed
    1. Perkins D, Pappin D, Creasy D, Cottrell J. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20:3551–3567. - PubMed
    1. Craig R, Beavis R. TANDEM: matching proteins with tandem mass spectra. Bioinformatics. 2004:921. - PubMed
    1. Kim S, Gupta N, Pevzner P. Spectral probabilities and generating functions of tandem mass spectra: a strike against decoy databases. J Proteome Res. 2008;7:3354–3363. - PMC - PubMed

MeSH terms

LinkOut - more resources