Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy - PubMed (original) (raw)
Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy
Edward L Huttlin et al. J Proteome Res. 2007 Jan.
Abstract
In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination.
Figures
Figure 1. Distribution of Forward Incorrect Identifications
The equation listed above expresses the probability of there being a incorrect forward identifications in a particular dataset, given that n reversed peptides were identified. When this equation is solved for a range of possible values for a, a probability distribution is defined. Plotted above is the probability distribution observed for the case where 3 reversed peptide identifications have been made. The 95% confidence interval is shaded in gray.
Figure 2. Number of Incorrect Forward Peptide Identifications: Predicted versus Observed
Each of several mixtures of control proteins were analyzed via mass spectrometry and peptides were identified by searching a composite database containing the forward and reversed sequences of all proteins in Arabidopsis, as well as the sequences for the selected control proteins. Numbers of incorrect peptide identifications against both the forward and reversed protein sequences were determined at an estimated 1% false positive rate based on numbers of reversed peptide identifications. The number of reversed peptide identifications was then used to predict the number of forward incorrect peptide identifications, using the equation in Figure 1. Plotted above are the predicted numbers of incorrect forward peptide identifications, as a function of the number of incorrect forward peptide identifications that were actually observed. Circles represent the average, while error bars represent +/- one standard deviation, as determined by the appropriate probability distribution. White circles represent each of several separate analyses of control proteins, while the black circles represent the peptide identifications from these same analyses, combined in random order to generate datasets of varying sizes. The diagonal line represents the ideal case where the predicted number of incorrect forward peptide identifications exactly equals the observed number of incorrect forward identifications.
Figure 3. Comparison of Predicted versus Actual False Positive Rates
Plotted above is the actual false positive rate as a function of the number of reversed peptide identifications. The solid lines indicate upper and lower boundaries of the 95% confidence intervals (CI) for 1-100 reversed peptide identifications, assuming an estimated 1% false positive rate. A cross-section of for 3 reversed peptide identifications (dotted line) is included as an inset. Also plotted are the actual false positive rates as a function of numbers of reversed peptide identifications at a 1% estimated false positive rate for several analyses of control proteins individually (white circles) and when added in random order to generate datasets of varying sizes (black circles).
Similar articles
- Maximizing the sensitivity and reliability of peptide identification in large-scale proteomic experiments by harnessing multiple search engines.
Yu W, Taylor JA, Davis MT, Bonilla LE, Lee KA, Auger PL, Farnsworth CC, Welcher AA, Patterson SD. Yu W, et al. Proteomics. 2010 Mar;10(6):1172-89. doi: 10.1002/pmic.200900074. Proteomics. 2010. PMID: 20101609 - Analysis of the resolution limitations of peptide identification algorithms.
Colaert N, Degroeve S, Helsens K, Martens L. Colaert N, et al. J Proteome Res. 2011 Dec 2;10(12):5555-61. doi: 10.1021/pr200913a. Epub 2011 Oct 26. J Proteome Res. 2011. PMID: 21995378 - A refined method to calculate false discovery rates for peptide identification using decoy databases.
Navarro P, Vázquez J. Navarro P, et al. J Proteome Res. 2009 Apr;8(4):1792-6. doi: 10.1021/pr800362h. J Proteome Res. 2009. PMID: 19714873 - Modification Site Localization in Peptides.
Chalkley RJ. Chalkley RJ. Adv Exp Med Biol. 2016;919:243-247. doi: 10.1007/978-3-319-41448-5_13. Adv Exp Med Biol. 2016. PMID: 27975222 Review. - Challenges and promise at the interface of metaproteomics and genomics: an overview of recent progress in metaproteogenomic data analysis.
Schiebenhoefer H, Van Den Bossche T, Fuchs S, Renard BY, Muth T, Martens L. Schiebenhoefer H, et al. Expert Rev Proteomics. 2019 May;16(5):375-390. doi: 10.1080/14789450.2019.1609944. Epub 2019 Apr 30. Expert Rev Proteomics. 2019. PMID: 31002542 Review.
Cited by
- Proteome-wide identification of novel binding partners to the oncogenic fusion gene protein, NPM-ALK, using tandem affinity purification and mass spectrometry.
Wu F, Wang P, Young LC, Lai R, Li L. Wu F, et al. Am J Pathol. 2009 Feb;174(2):361-70. doi: 10.2353/ajpath.2009.080521. Epub 2009 Jan 8. Am J Pathol. 2009. PMID: 19131589 Free PMC article. - The Presence of Pretreated Lignocellulosic Solids from Birch during Saccharomyces cerevisiae Fermentations Leads to Increased Tolerance to Inhibitors--A Proteomic Study of the Effects.
Koppram R, Mapelli V, Albers E, Olsson L. Koppram R, et al. PLoS One. 2016 Feb 5;11(2):e0148635. doi: 10.1371/journal.pone.0148635. eCollection 2016. PLoS One. 2016. PMID: 26849651 Free PMC article. - Informatics strategies for large-scale novel cross-linking analysis.
Anderson GA, Tolic N, Tang X, Zheng C, Bruce JE. Anderson GA, et al. J Proteome Res. 2007 Sep;6(9):3412-21. doi: 10.1021/pr070035z. Epub 2007 Aug 3. J Proteome Res. 2007. PMID: 17676784 Free PMC article. - Global topology analysis of pancreatic zymogen granule membrane proteins.
Chen X, Ulintz PJ, Simon ES, Williams JA, Andrews PC. Chen X, et al. Mol Cell Proteomics. 2008 Dec;7(12):2323-36. doi: 10.1074/mcp.M700575-MCP200. Epub 2008 Aug 4. Mol Cell Proteomics. 2008. PMID: 18682380 Free PMC article. - Environmental proteomics, biodiversity statistics and food-web structure.
Gotelli NJ, Ellison AM, Ballif BA. Gotelli NJ, et al. Trends Ecol Evol. 2012 Aug;27(8):436-42. doi: 10.1016/j.tree.2012.03.001. Epub 2012 Mar 27. Trends Ecol Evol. 2012. PMID: 22459246 Free PMC article.
References
- Eng JK, McCormack AL, Yates JR., III J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed
- Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Electrophoresis. 1999;20:3551–3567. - PubMed
- Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. J Proteome Res. 2004;3:958–964. - PubMed
- Cargile BJ, Bundy JL, Stephenson JL., Jr J Proteome Res. 2004;3:1082–1085. - PubMed
- Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Anal Chem. 2002;74:5383–5392. - PubMed
Publication types
MeSH terms
Substances
LinkOut - more resources
Full Text Sources
Other Literature Sources