An empirical framework for binary interactome mapping - PubMed (original) (raw)

doi: 10.1038/nmeth.1280. Epub 2008 Dec 7.

Jean-François Rual, Alexei Vazquez, Ulrich Stelzl, Irma Lemmens, Tomoko Hirozane-Kishikawa, Tong Hao, Martina Zenkner, Xiaofeng Xin, Kwang-Il Goh, Muhammed A Yildirim, Nicolas Simonis, Kathrin Heinzmann, Fana Gebreab, Julie M Sahalie, Sebiha Cevik, Christophe Simon, Anne-Sophie de Smet, Elizabeth Dann, Alex Smolyar, Arunachalam Vinayagam, Haiyuan Yu, David Szeto, Heather Borick, Amélie Dricot, Niels Klitgord, Ryan R Murray, Chenwei Lin, Maciej Lalowski, Jan Timm, Kirstin Rau, Charles Boone, Pascal Braun, Michael E Cusick, Frederick P Roth, David E Hill, Jan Tavernier, Erich E Wanker, Albert-László Barabási, Marc Vidal

Affiliations

An empirical framework for binary interactome mapping

Kavitha Venkatesan et al. Nat Methods. 2009 Jan.

Abstract

Several attempts have been made to systematically map protein-protein interaction, or 'interactome', networks. However, it remains difficult to assess the quality and coverage of existing data sets. Here we describe a framework that uses an empirically-based approach to rigorously dissect quality parameters of currently available human interactome maps. Our results indicate that high-throughput yeast two-hybrid (HT-Y2H) interactions for human proteins are more precise than literature-curated interactions supported by a single publication, suggesting that HT-Y2H is suitable to map a significant portion of the human interactome. We estimate that the human interactome contains approximately 130,000 binary interactions, most of which remain to be mapped. Similar to estimates of DNA sequence data quality and genome size early in the Human Genome Project, estimates of protein interaction data quality and interactome size are crucial to establish the magnitude of the task of comprehensive human interactome mapping and to elucidate a path toward this goal.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Conceptual framework for interactome mapping

The concepts of “screening completeness” (fraction of all pair-wise protein combinations tested), “assay sensitivity” (fraction of all biophysical interactions identifiable by a given assay), “sampling sensitivity” (fraction of all identifiable interactions that are detected in a single trial) and “precision” (fraction of pairs reported by a given assay that are true positives) can be estimated independently and combined to empirically estimate the size of binary interactomes. PRS: the set of positive reference set interactions; RRS: the random reference set. Solid black lines in a given network graph represent true biophysical interactions present in that network, dashed lines represent true biophysical interactions missing in that network, and solid colored lines represent biophysical artifactual pairs present in that network.

Figure 2

Figure 2. Assay sensitivity and background positive rate of binary interactome mapping assays

(a) How positive reference set interactions were chosen from among the interactions available in the curated literature of low-throughput experimentally derived interactions (LC). (b) How random reference set pairs were chosen from among the possible pairs in our human ORFeome v1.1 clone collection. (c) Distribution of cellular location of proteins making up the positive and random reference sets. (d) Assay sensitivity (fraction of hsPRS-v1 pairs scoring positive) and background positive rate (fraction of hsRRS-v1 pairs scoring positive) of the Y2H-CCSB assay based on varying experimental and scoring conditions, including the use of an alternate protocol (Supplementary Methods). We did not use the results of testing the hsRRS-v1 pairs here to estimate the false discovery rate of the Y2H-CCSB assay due to limited sample size. (e) Assay sensitivity and background positive rate of the MAPPIT assay upon varying experiment-to-control-ratio (ECR) scores (Supplementary Methods). (f) Upper panel: assay sensitivity and background positive rate of Y2H-CCSB and MAPPIT under the specific experimental conditions used (Supplementary Methods). For Y2H-CCSB, the fraction of hsPRS-v1 pairs scoring positive in at least one configuration and in both pair-wise mating experiments is depicted. This condition reflects the assay sensitivity of the specific experimental and scoring conditions of Y2H-CCSB used to generate CCSB-HI1. Lower panel: Venn diagram of hsPRS-v1 pairs scoring positive in the two assays. (g) Results of testing each hsPRS-v1 pair and each hsRRS-v1 pair using Y2H-CCSB and MAPPIT. Blue or yellow shaded squares represent protein pairs scored positive by a given assay.

Figure 2

Figure 2. Assay sensitivity and background positive rate of binary interactome mapping assays

(a) How positive reference set interactions were chosen from among the interactions available in the curated literature of low-throughput experimentally derived interactions (LC). (b) How random reference set pairs were chosen from among the possible pairs in our human ORFeome v1.1 clone collection. (c) Distribution of cellular location of proteins making up the positive and random reference sets. (d) Assay sensitivity (fraction of hsPRS-v1 pairs scoring positive) and background positive rate (fraction of hsRRS-v1 pairs scoring positive) of the Y2H-CCSB assay based on varying experimental and scoring conditions, including the use of an alternate protocol (Supplementary Methods). We did not use the results of testing the hsRRS-v1 pairs here to estimate the false discovery rate of the Y2H-CCSB assay due to limited sample size. (e) Assay sensitivity and background positive rate of the MAPPIT assay upon varying experiment-to-control-ratio (ECR) scores (Supplementary Methods). (f) Upper panel: assay sensitivity and background positive rate of Y2H-CCSB and MAPPIT under the specific experimental conditions used (Supplementary Methods). For Y2H-CCSB, the fraction of hsPRS-v1 pairs scoring positive in at least one configuration and in both pair-wise mating experiments is depicted. This condition reflects the assay sensitivity of the specific experimental and scoring conditions of Y2H-CCSB used to generate CCSB-HI1. Lower panel: Venn diagram of hsPRS-v1 pairs scoring positive in the two assays. (g) Results of testing each hsPRS-v1 pair and each hsRRS-v1 pair using Y2H-CCSB and MAPPIT. Blue or yellow shaded squares represent protein pairs scored positive by a given assay.

Figure 3

Figure 3. Precision and sampling sensitivity in interactome datasets

(a) Comparison of interactome datasets by comparing the rate of observing a positive by MAPPIT given a positive in the dataset. (b) Interactome datasets were further compared after removing various biases by considering interactions originally derived using full-length (FL) proteins and using Y2H assays. (c) Precision of each tested dataset computed by accounting for the rate of detecting hsRRS-v1 pairs and Y2H-supported hsPRS-v1 pairs by MAPPIT in b. Error bars represent estimated standard deviation of the mean based on a Monte Carlo simulation of scores observed in a given assay. (d and e) Sampling sensitivity and Y2H-CCSB repeat screens. Bars filled with white represent protein pairs uncovered in only one screen and progressively dark shades of blue represent protein pairs reported in increasing number of multiple screens. (d) Data observed in Y2H-CCSB repeat screens indicating the total number of positive pairs reported after one, two, three or four screens. (e) Predicted saturation curve of the number of uncovered interactions against the number of screens for Y2H-CCSB after modeling the data in d and assuming a single isoform per gene in the respective tested spaces.

Figure 4

Figure 4. Correlation of interacting pairs for shared functional annotation

Correlation of interacting pairs in CCSB-HI1 and MDC-HI1 interactome maps for specific shared Gene Ontology functional annotations. _P_-values indicate the probability of observing such a correlation by chance (compare black bars to white bars) computed using Fisher’s exact test. Analysis was performed on MDC-HI1 and CCSB-HI1 interactions reported using full-length ORFs.

Similar articles

Cited by

References

    1. Vidal M. Interactome modeling. FEBS Lett. 2005;579:1834–1838. - PubMed
    1. Rual JF, et al. Towards a proteome-scale map of the human protein-protein interaction network. Nature. 2005;437:1173–1178. - PubMed
    1. Stelzl U, et al. A human protein-protein interaction network: a resource for annotating the proteome. Cell. 2005;122:957–968. - PubMed
    1. Ewing RM, et al. Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol. 2007;3:89. - PMC - PubMed
    1. Peri S, et al. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res. 2004;32:D497–501. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources