Discovering pathways by orienting edges in protein interaction networks - PubMed (original) (raw)

Discovering pathways by orienting edges in protein interaction networks

Anthony Gitter et al. Nucleic Acids Res. 2011 Mar.

Abstract

Modern experimental technology enables the identification of the sensory proteins that interact with the cells' environment or various pathogens. Expression and knockdown studies can determine the downstream effects of these interactions. However, when attempting to reconstruct the signaling networks and pathways between these sources and targets, one faces a substantial challenge. Although pathways are directed, high-throughput protein interaction data are undirected. In order to utilize the available data, we need methods that can orient protein interaction edges and discover high-confidence pathways that explain the observed experimental outcomes. We formalize the orientation problem in weighted protein interaction graphs as an optimization problem and present three approximation algorithms based on either weighted Boolean satisfiability solvers or probabilistic assignments. We use these algorithms to identify pathways in yeast. Our approach recovers twice as many known signaling cascades as a recent unoriented signaling pathway prediction technique and over 13 times as many as an existing network orientation algorithm. The discovered paths match several known signaling pathways and suggest new mechanisms that are not currently present in signaling databases. For some pathways, including the pheromone signaling pathway and the high-osmolarity glycerol pathway, our method suggests interesting and novel components that extend current annotations.

PubMed Disclaimer

Figures

Figure 1.

Figure 1.

An example of the MAX-DI-CUT to MEO transformation. The MEO graph has the same vertices as the MAX-DI-CUT graph plus an additional center vertex, to which all other vertices are connected. The MAX-DI-CUT edges are used to define the MEO source–target pairs.

Figure 2.

Figure 2.

Mapping an orientation of the MEO instance back to a directed cut. An orientation in the MEO problem uniquely defines a cut in the MAX-DI-CUT instance. The number of satisfied paths in the MEO instance is identical to the number of directed edges from A to B.

Figure 3.

Figure 3.

Fraction of the objective function upper bound achieved on instances with simulated sources and targets. After local search, all approximation algorithms perform much better than the MAX-k-CSP theoretical guarantee on instances with simulated source–target pairs and find orientations whose objective function values are virtually indistinguishable. The number of undirected paths includes all paths from a source to a target before the network is oriented. The _y_-axis plots the ratio achieved by each algorithm, which is the score of the orientation returned by the algorithm divided by the upper bound on the optimal objective function value. For each instance, there are six points (one for each algorithm with and without local search) that have the same x-coordinate, the number of undirected paths, and different y-coordinates, the ratios achieved. Instances have been ordered along the _x_-axis by the number of distinct source–target paths in the network before orientation, which is a coarse indication of the difficulty of the instance.

Figure 4.

Figure 4.

The top-ranked pathways discovered by the random orientation plus local search algorithm. Solid edges were present in the gold standard and dashed edges were absent or oriented in the opposite direction. (A) Pathways that are completely contained within a known gold standard pathway. (B) Pathways that partially overlap a gold standard path but contain new edges as well. (C) Pathways that do not have any edges in common with our set of gold standard pathways. Images were generated with Cytoscape (

http://www.cytoscape.org/

) and do not contain all of the top-ranked paths per category but rather a highly overlapping subset.

Similar articles

Cited by

References

    1. Bar-Joseph Z, Gerber GK, Lee TI, Rinaldi NJ, Yoo JY, Robert F, Gordon DB, Fraenkel E, Jaakkola TS, Young RA, et al. Computational discovery of gene modules and regulatory networks. Nat. Biotechnol. 2003;21:1337–1342. - PubMed
    1. Segal E, Shapira M, Regev A, Pe’er D, Botstein D, Koller D, Friedman N. Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data. Nat. Genet. 2003;34:166–176. - PubMed
    1. Margolin AA, Nemenman I, Basso K, Wiggins C, Stolovitzky G, Dalla Favera R, Califano A. ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context. BMC Bioinformatics. 2006;7:S7. - PMC - PubMed
    1. Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO. Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004;429:92–96. - PubMed
    1. Fischer E, Sauer U. Large-scale in vivo flux analysis shows rigidity and suboptimal performance of Bacillus subtilis metabolism. Nat. Genet. 2005;37:636–640. - PubMed

Publication types

MeSH terms

LinkOut - more resources