Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy - PubMed (original) (raw)

Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy

Edward L Huttlin et al. J Proteome Res. 2007 Jan.

Abstract

In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination.

PubMed Disclaimer

Figures

Figure 1. Distribution of Forward Incorrect Identifications

The equation listed above expresses the probability of there being a incorrect forward identifications in a particular dataset, given that n reversed peptides were identified. When this equation is solved for a range of possible values for a, a probability distribution is defined. Plotted above is the probability distribution observed for the case where 3 reversed peptide identifications have been made. The 95% confidence interval is shaded in gray.

Figure 2. Number of Incorrect Forward Peptide Identifications: Predicted versus Observed

Each of several mixtures of control proteins were analyzed via mass spectrometry and peptides were identified by searching a composite database containing the forward and reversed sequences of all proteins in Arabidopsis, as well as the sequences for the selected control proteins. Numbers of incorrect peptide identifications against both the forward and reversed protein sequences were determined at an estimated 1% false positive rate based on numbers of reversed peptide identifications. The number of reversed peptide identifications was then used to predict the number of forward incorrect peptide identifications, using the equation in Figure 1. Plotted above are the predicted numbers of incorrect forward peptide identifications, as a function of the number of incorrect forward peptide identifications that were actually observed. Circles represent the average, while error bars represent +/- one standard deviation, as determined by the appropriate probability distribution. White circles represent each of several separate analyses of control proteins, while the black circles represent the peptide identifications from these same analyses, combined in random order to generate datasets of varying sizes. The diagonal line represents the ideal case where the predicted number of incorrect forward peptide identifications exactly equals the observed number of incorrect forward identifications.

Figure 3. Comparison of Predicted versus Actual False Positive Rates

Plotted above is the actual false positive rate as a function of the number of reversed peptide identifications. The solid lines indicate upper and lower boundaries of the 95% confidence intervals (CI) for 1-100 reversed peptide identifications, assuming an estimated 1% false positive rate. A cross-section of for 3 reversed peptide identifications (dotted line) is included as an inset. Also plotted are the actual false positive rates as a function of numbers of reversed peptide identifications at a 1% estimated false positive rate for several analyses of control proteins individually (white circles) and when added in random order to generate datasets of varying sizes (black circles).

Cited by

Proteome-wide identification of novel binding partners to the oncogenic fusion gene protein, NPM-ALK, using tandem affinity purification and mass spectrometry.
Wu F, Wang P, Young LC, Lai R, Li L. Wu F, et al. Am J Pathol. 2009 Feb;174(2):361-70. doi: 10.2353/ajpath.2009.080521. Epub 2009 Jan 8. Am J Pathol. 2009. PMID: 19131589 Free PMC article.
The Presence of Pretreated Lignocellulosic Solids from Birch during Saccharomyces cerevisiae Fermentations Leads to Increased Tolerance to Inhibitors--A Proteomic Study of the Effects.
Koppram R, Mapelli V, Albers E, Olsson L. Koppram R, et al. PLoS One. 2016 Feb 5;11(2):e0148635. doi: 10.1371/journal.pone.0148635. eCollection 2016. PLoS One. 2016. PMID: 26849651 Free PMC article.
Informatics strategies for large-scale novel cross-linking analysis.
Anderson GA, Tolic N, Tang X, Zheng C, Bruce JE. Anderson GA, et al. J Proteome Res. 2007 Sep;6(9):3412-21. doi: 10.1021/pr070035z. Epub 2007 Aug 3. J Proteome Res. 2007. PMID: 17676784 Free PMC article.
Global topology analysis of pancreatic zymogen granule membrane proteins.
Chen X, Ulintz PJ, Simon ES, Williams JA, Andrews PC. Chen X, et al. Mol Cell Proteomics. 2008 Dec;7(12):2323-36. doi: 10.1074/mcp.M700575-MCP200. Epub 2008 Aug 4. Mol Cell Proteomics. 2008. PMID: 18682380 Free PMC article.
Environmental proteomics, biodiversity statistics and food-web structure.
Gotelli NJ, Ellison AM, Ballif BA. Gotelli NJ, et al. Trends Ecol Evol. 2012 Aug;27(8):436-42. doi: 10.1016/j.tree.2012.03.001. Epub 2012 Mar 27. Trends Ecol Evol. 2012. PMID: 22459246 Free PMC article.

References

1. Eng JK, McCormack AL, Yates JR., III J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed
1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Electrophoresis. 1999;20:3551–3567. - PubMed
1. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. J Proteome Res. 2004;3:958–964. - PubMed
1. Cargile BJ, Bundy JL, Stephenson JL., Jr J Proteome Res. 2004;3:1082–1085. - PubMed
1. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Anal Chem. 2002;74:5383–5392. - PubMed

Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy - PubMed (original) (raw)