Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy - PubMed (original) (raw)

Prediction of error associated with false-positive rate determination for peptide identification in large-scale proteomics experiments using a combined reverse and forward peptide sequence database strategy

Edward L Huttlin et al. J Proteome Res. 2007 Jan.

Abstract

In recent years, a variety of approaches have been developed using decoy databases to empirically assess the error associated with peptide identifications from large-scale proteomics experiments. We have developed an approach for calculating the expected uncertainty associated with false-positive rate determination using concatenated reverse and forward protein sequence databases. After explaining the theoretical basis of our model, we compare predicted error with the results of experiments characterizing a series of mixtures containing known proteins. In general, results from characterization of known proteins show good agreement with our predictions. Finally, we consider how these approaches may be applied to more complicated data sets, as when peptides are separated by charge state prior to false-positive determination.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Distribution of Forward Incorrect Identifications

The equation listed above expresses the probability of there being a incorrect forward identifications in a particular dataset, given that n reversed peptides were identified. When this equation is solved for a range of possible values for a, a probability distribution is defined. Plotted above is the probability distribution observed for the case where 3 reversed peptide identifications have been made. The 95% confidence interval is shaded in gray.

Figure 2

Figure 2. Number of Incorrect Forward Peptide Identifications: Predicted versus Observed

Each of several mixtures of control proteins were analyzed via mass spectrometry and peptides were identified by searching a composite database containing the forward and reversed sequences of all proteins in Arabidopsis, as well as the sequences for the selected control proteins. Numbers of incorrect peptide identifications against both the forward and reversed protein sequences were determined at an estimated 1% false positive rate based on numbers of reversed peptide identifications. The number of reversed peptide identifications was then used to predict the number of forward incorrect peptide identifications, using the equation in Figure 1. Plotted above are the predicted numbers of incorrect forward peptide identifications, as a function of the number of incorrect forward peptide identifications that were actually observed. Circles represent the average, while error bars represent +/- one standard deviation, as determined by the appropriate probability distribution. White circles represent each of several separate analyses of control proteins, while the black circles represent the peptide identifications from these same analyses, combined in random order to generate datasets of varying sizes. The diagonal line represents the ideal case where the predicted number of incorrect forward peptide identifications exactly equals the observed number of incorrect forward identifications.

Figure 3

Figure 3. Comparison of Predicted versus Actual False Positive Rates

Plotted above is the actual false positive rate as a function of the number of reversed peptide identifications. The solid lines indicate upper and lower boundaries of the 95% confidence intervals (CI) for 1-100 reversed peptide identifications, assuming an estimated 1% false positive rate. A cross-section of for 3 reversed peptide identifications (dotted line) is included as an inset. Also plotted are the actual false positive rates as a function of numbers of reversed peptide identifications at a 1% estimated false positive rate for several analyses of control proteins individually (white circles) and when added in random order to generate datasets of varying sizes (black circles).

Similar articles

Cited by

References

    1. Eng JK, McCormack AL, Yates JR., III J Am Soc Mass Spectrom. 1994;5:976–989. - PubMed
    1. Perkins DN, Pappin DJC, Creasy DM, Cottrell JS. Electrophoresis. 1999;20:3551–3567. - PubMed
    1. Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. J Proteome Res. 2004;3:958–964. - PubMed
    1. Cargile BJ, Bundy JL, Stephenson JL., Jr J Proteome Res. 2004;3:1082–1085. - PubMed
    1. Keller A, Nesvizhskii AI, Kolker E, Aebersold R. Anal Chem. 2002;74:5383–5392. - PubMed

Publication types

MeSH terms

Substances

LinkOut - more resources