GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods (original) (raw)

Journal Article

,

1Laboratory of Intelligent Systems, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, 2MIT Computer Science and Artificial Intelligence Laboratory, Cambridge and 3Broad Institute of MIT and Harvard, Cambridge, MA, USA

Search for other works by this author on:

,

1Laboratory of Intelligent Systems, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, 2MIT Computer Science and Artificial Intelligence Laboratory, Cambridge and 3Broad Institute of MIT and Harvard, Cambridge, MA, USA

1Laboratory of Intelligent Systems, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, 2MIT Computer Science and Artificial Intelligence Laboratory, Cambridge and 3Broad Institute of MIT and Harvard, Cambridge, MA, USA

Search for other works by this author on:

1Laboratory of Intelligent Systems, Ecole Polytechnique Fédérale de Lausanne (EPFL), Lausanne, Switzerland, 2MIT Computer Science and Artificial Intelligence Laboratory, Cambridge and 3Broad Institute of MIT and Harvard, Cambridge, MA, USA

* To whom correspondence should be addressed.

Search for other works by this author on:

Revision received:

11 June 2011

Cite

Thomas Schaffter, Daniel Marbach, Dario Floreano, GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods, Bioinformatics, Volume 27, Issue 16, August 2011, Pages 2263–2270, https://doi.org/10.1093/bioinformatics/btr373
Close

Navbar Search Filter Mobile Enter search term Search

Abstract

Motivation: Over the last decade, numerous methods have been developed for inference of regulatory networks from gene expression data. However, accurate and systematic evaluation of these methods is hampered by the difficulty of constructing adequate benchmarks and the lack of tools for a differentiated analysis of network predictions on such benchmarks.

Results: Here, we describe a novel and comprehensive method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW). In addition to the generation of detailed dynamical models of gene regulatory networks to be used as benchmarks, GNW provides a network motif analysis that reveals systematic prediction errors, thereby indicating potential ways of improving inference methods. The accuracy of network inference methods is evaluated using standard metrics such as precision-recall and receiver operating characteristic curves. We show how GNW can be used to assess the performance and identify the strengths and weaknesses of six inference methods. Furthermore, we used GNW to provide the international Dialogue for Reverse Engineering Assessments and Methods (DREAM) competition with three network inference challenges (DREAM3, DREAM4 and DREAM5).

Availability: GNW is available at http://gnw.sourceforge.net along with its Java source code, user manual and supporting data.

Supplementary information: Supplementary data are available at Bioinformatics online.

Contact: dario.floreano@epfl.ch

1 INTRODUCTION

A challenging issue in systems biology is the development of computational tools for the reverse engineering of gene regulatory networks from quantitative experimental data. Over the last decade, high-throughput assays for mRNA expression have opened the door to the inference of regulatory networks by allowing simultaneous measurements of the expression levels of thousands of genes. Technologies such as spotted microarrays (Davis et al., 1995) and oligonucleotide chips (Lockhart et al., 1996) have enabled genome-wide quantification of differential gene expression profiles and, more recently, short read sequencing technologies such as RNA-seq (Mortazavi et al., 2008) have provided more precise quantification of mRNA levels.

Researchers have proposed a plethora of methods for reverse engineering the complex network of interactions between the genes and their RNA and protein products (also called regulatory program) from spatial and temporal high-throughput gene expression data (Bansal et al., 2007). Regulatory networks are often represented as directed, signed graphs in which nodes represent genes or transcription factors (TFs). In this context, edges correspond to enhancing or inhibitory regulations that affect gene transcription rates. Network inference methods rely on various computational approaches such as correlation (Rice et al., 2005), mutual information (MI) (Faith et al., 2007; Margolin et al., 2006), ordinary differential equations (ODE) models (Äijö and Lähdesmäki, 2009; Bonneau et al., 2006), Bayesian networks (Yu et al., 2004) or hybrid algorithms (Yip et al., 2010).

Numerous methods have been developed for inference of gene regulatory networks; however, relatively little effort has been put into evaluating the performance of those methods on adequate benchmarks. So far, three main strategies have been proposed to generate benchmark networks. A first strategy consists in evaluating network predictions made by reverse engineering algorithms on well-studied in vivo pathways from model organisms (Gama-Castro et al., 2011; Kim et al., 2003). However, those networks are incomplete maps of the physical interactions in the cell that are responsible for cellular functions and using them as benchmarks imply making error when evaluating network predictions. Another strategy consists of genetically engineering synthetic in vivo networks (Camacho and Collins, 2009; Cantone et al., 2009). The main drawback of this strategy is that only a few small networks are available. Yet another strategy consists in developing in silico gene regulatory networks that can be simulated to produce artificial gene expression data. The simulation of in silico networks has the advantages of being fast, easily reproducible and less expensive than biological experiments. A few instances of small in silico networks with handcrafted topologies (Kremling et al., 2004) have been proposed as benchmarks for reverse engineering algorithms. More recently, several generators have been developed to automate the construction of in silico regulatory networks including up to thousands of genes to be used as benchmark networks for reverse engineering algorithms (Di Camillo et al., 2009; Mendes et al., 2003; Van den Bulcke et al., 2006).

Benchmark generators such as AGN (Mendes et al., 2003) aim to produce in silico gene networks exhibiting topological properties observed in biological networks using Erdös-Renyi, Watts-Strogatz (small-world) or Albert-Barabási (scale-free) random graph models. However, the structures generated using random graphs capture only few of the structural properties of gene regulatory networks (Van den Bulcke et al., 2006) and do generally not display important properties such as modularity (Ravasz et al., 2002) or occurrences of network motifs, which are statistically overrepresented regulatory patterns in biological networks (Shen-Orr et al., 2002). Instead of constructing more complex random structures based on graph theory, which may be difficult to justify (Mendes et al., 2003), SynTReN (Van den Bulcke et al., 2006) and ReTRN (Li et al., 2009) chose to generate network structures by extracting parts of known in vivo regulatory network structures. This approach has the advantage of capturing several structural properties observed in in vivo network structures (Van den Bulcke et al., 2006).

In order to produce gene expression data, the generated structures must be endowed with dynamical models of gene regulation. Systems of non-linear ODEs are widely used (Hache et al., 2009; Roy et al., 2008), but other approaches exist (Di Camillo et al., 2009). ODE systems allow to continuously describe levels of gene products and rates of reactions taking place in the network models where biological processes that have not been fully characterized yet are abstracted. Because current high-throughput technologies do not allow the monitoring of protein expression as microarrays do for RNA (Di Camillo et al., 2009), some benchmark generators consider mRNA as a proxy for protein expression and thus do not model translation independently of transcription (Li et al., 2009; Van den Bulcke et al., 2006). Protein expression, however, does not correlate perfectly with mRNA expression in real biological systems due in part to different degradation rates of mRNA and protein products (Belle et al., 2006). RENCO (Roy et al., 2008), GeNGe (Hache et al., 2009) and GRENDEL (Haynes and Brent, 2009) are examples of available benchmark generators considering both transcription and translation processes in their respective dynamical models.

Here, we describe a method for in silico benchmark generation and performance profiling of network inference methods available to the community as an open-source software called GeneNetWeaver (GNW) (Fig. 1). GNW has an intuitive graphical user interface that makes the generation and simulation of gene network models as simple as a few clicks. Network topologies are generated by extracting modules from known in vivo gene regulatory network structures such as those of E.coli (Gama-Castro et al., 2011) and S.cerevisiae (Kim et al., 2003). These structures are then endowed with detailed dynamical models of gene regulation including both transcription and translation processes using a thermodynamic approach accounting for both independent and synergistic interactions (Ackers et al., 1982). Expression data can be generated either deterministically or stochastically to model molecular noise in the dynamics of the networks, and experimental noise can be added using a model of noise observed in microarrays (Stolovitzky et al., 2005). Different types of in vivo experimental procedures, such as wild type, knockout (null-mutant), knockdown (heterozygous) and multifactorial perturbations, can be reproduced by the software. In addition, a unique feature of GNW is the systematic and comparative evaluation of predictions by different inference methods, which none of the existing benchmark generators provide. GNW performs an exhaustive network motif analysis for a set of network predictions, which often reveals systematic prediction errors, thereby indicating potential ways of network reconstruction improvements. The accuracy of network inference is also assessed using standard metrics such as precision–recall (PR) and receiver operating characteristic (ROC) curves.

Benchmarking and performance assessment of network inference methods using GNW. (A) In silico gene networks are obtained by extracting subnetwork structures from known transcriptional networks (Escherichia coli, Saccharomyces cerevisiae, etc.) before being endowed with detailed dynamical models of gene regulation accounting for both transcription and translation, independent and synergistic interactions, as well as molecular and measurement noise. (B) In silico gene networks are simulated to produce steady-state and time-series expression data for a variety of experiments such as wild-type, knockout, knockdown and multifactorial perturbation experiments. (C) Inference methods are asked to predict structures of in silico benchmark networks from gene expression data. (D) From network prediction files, GNW performs a network motif analysis which often reveals systematic prediction errors, thereby indicating potential ways of network reconstruction improvements. It also automatically generates comprehensive reports including standard metrics such as PR and ROC curves.

Fig. 1.

Benchmarking and performance assessment of network inference methods using GNW. (A) In silico gene networks are obtained by extracting subnetwork structures from known transcriptional networks (Escherichia coli, Saccharomyces cerevisiae, etc.) before being endowed with detailed dynamical models of gene regulation accounting for both transcription and translation, independent and synergistic interactions, as well as molecular and measurement noise. (B) In silico gene networks are simulated to produce steady-state and time-series expression data for a variety of experiments such as wild-type, knockout, knockdown and multifactorial perturbation experiments. (C) Inference methods are asked to predict structures of in silico benchmark networks from gene expression data. (D) From network prediction files, GNW performs a network motif analysis which often reveals systematic prediction errors, thereby indicating potential ways of network reconstruction improvements. It also automatically generates comprehensive reports including standard metrics such as PR and ROC curves.

Furthermore, we show how GNW can be used to generate in silico benchmark suites to assess the performance and identify strengths and weaknesses of six network inference methods. We also show how the performance of those inference methods are affected by the structural properties and the size of the gene regulatory networks to infer, and how GNW can help to identify the most informative type of gene expression data to provide to a given inference method. Finally, we assess the performance of those six inference methods on the network inference challenge that we provided to the international DREAM4 competition (Dialogue for Reverse Engineering Assessments and Methods).

2 METHODS

2.1 Topology

Instead of using random graph models, which are known to only partly capture the structural properties of biological networks (Van den Bulcke et al., 2006), we generate network structures by extracting modules from known biological interaction networks such as those of E.coli (Gama-Castro et al., 2011) and S.cerevisiae (Kim et al., 2003) (the source networks). Our approach is based on the extraction of modules, that is, groups of genes that are more highly connected than expected in a random network (Marbach et al., 2009). We have shown that the topological modules extracted using our method correlate with functional modules of the source networks (Marbach et al., 2009). Hence, obtained network structures are meaningful targets for reverse engineering algorithms because in practice, one typically tries to infer the structure of a set of functionally related genes.

2.2 Dynamical model

Network topologies are endowed with detailed dynamical models of gene regulation. Both transcription and translation are modeled using a standard thermodynamic approach (Ackers et al., 1982) allowing for both independent (‘additive’) and synergistic (‘multiplicative’) regulatory interactions. For each gene i of a network, the rate of change of mRNA concentration F _i_RNA and the rate of change of protein concentration F _i_Prot are described by

formula

(1)

formula

(2)

Where m i is the maximum transcription rate, r i the translation rate, λ_i_RNA and λ_i_Prot are the mRNA and protein degradation rates and x and y are vectors containing all mRNA and protein concentration levels, respectively. f i(·) is the activation function of gene i, which computes the relative activation of the gene, which is between 0 (the gene is shut off) and 1 (the gene is maximally activated), given the protein or TF concentrations y. A more detailed description of the activation function used is given by Marbach et al. (2010). Note that our approach conserves the nature of the gene interactions (enhancing or inhibitory) of the imported or extracted network structures.

The integration of the system of equations defined by (1) and (2) results in noiseless mRNA and protein concentration levels, respectively x i(t) and y i(t) for gene i. In living cells, molecular noise originates from thermal fluctuations and noisy processes such as transcription and translation (Becskei and Serrano, 2000). Hence, random fluctuations affect concentration levels of mRNA and protein, whose expression can be viewed as a stochastic process (Gardner and Collins, 2000). Both F _i_RNA and F _i_Prot are of the form

formula

(3)

where V(X t) is the production and D(X t) the degradation term. The corresponding chemical Langevin equation (CLE) (Gillespie, 2000) we use to model molecular noise in transcription and translation processes is

formula

(4)

where η_v_ and η_d_ are independent Gaussian white-noise processes (Gillespie, 2000). c is a multiplicative constant to control the amplitude of the molecular noise. For each gene i, we use the Stratonovich scheme and the Milstein method to integrate two equations of the form of (4), one describing the rate of change of mRNA concentration and one for the rate of change of protein concentration (Schaffter, 2010).

This model is derived from stochastic kinetics and the underlying assumptions are discussed by Gillespie (2000). Note that, according to this model, a gene that is not activated (V(X t) close to zero) has a very low level of noise (leakage) and it cannot suddenly have a very high transcription rate due to noise. In contrast, a gene that is activated has a higher level of noise (which may be interpreted as transcriptional bursts, for instance).

The measurement noise depends on the technology used to monitor gene expression concentrations (Stolovitzky et al., 2005) and is modeled here independently of the molecular noise. GNW implements Gaussian and log-normal models of experimental noise as well as a model of noise observed in microarrays (Stolovitzky et al., 2005).

2.3 Synthetic expression datasets

The next step in generating in silico benchmark networks consists in simulating the generated in silico regulatory networks to produce synthetic gene expression datasets. Available experiments in GNW are as follows:

Custom perturbations can also be specified. Experiments can be simulated as steady states and/or time-series with user-defined duration and number of measurement points.

2.4 Evaluation of network inference methods

We not only provide researchers with a method for generating in silico gene network models to be used as benchmarks for reverse engineering algorithms, but also tools to facilitate the evaluation of network predictions. From a set of predictions from one or several inference methods, GNW automatically generates a comprehensive report including the result of a network motif analysis, where the performance of inference methods is profiled on local connectivity patterns. The network motif analysis often reveals systematic prediction errors, thereby indicating potential ways of network reconstruction improvements (Marbach et al., 2010). Furthermore, PR and ROC curves are evaluated for each network prediction (Prill et al., 2010). The relation between ROC and PR curves is discussed by Davis and Goadrich (2006).

3 RESULTS

We assessed the performance of six inference methods to illustrate benchmarking and performance profiling of network inference methods using GNW (Table 1). We first describe how to generate suitable network benchmark suites for the testing of various hypotheses. Specifically, we designed benchmark suites to show how the performance of inference methods is affected by different sizes and structural properties of regulatory networks. In addition, we show how GNW can help to identify the most informative type of gene expression data that a given inference method could use to achieve the best possible reconstruction from in vivo experiments. Finally, we introduce the DREAM4 Network Inference Challenge we generated, which has been used to assess the performance of many inference methods (Klamt et al., 2010; Menéndez et al., 2010).

Table 1.

Gene network inference methods evaluated using GNW

ARACNE2 and CLR are two of the most widely used inference methods. The following methods have been best performer or co-best performer in at least one DREAM challenge: Yip et al. (DREAM3 In Silico Challenge Size 10, 50 and 100), Pinna et al. (DREAM4 In Silico Challenge Size 100) and Huynh-Thu et al. (DREAM4 In Silico Challenge multifactorial).

Table 1.

Gene network inference methods evaluated using GNW

ARACNE2 and CLR are two of the most widely used inference methods. The following methods have been best performer or co-best performer in at least one DREAM challenge: Yip et al. (DREAM3 In Silico Challenge Size 10, 50 and 100), Pinna et al. (DREAM4 In Silico Challenge Size 100) and Huynh-Thu et al. (DREAM4 In Silico Challenge multifactorial).

3.1 Generation of network benchmark suites

We generated several network benchmark suites using the approach described in Section 2. Each benchmark suite is composed of several in silico regulatory networks (the so-called gold standards or target networks). Figure 2A shows one gold standard extracted from a regulatory network of the yeast S.cerevisiae (Kim et al., 2003). The extracted structures have been endowed with stochastic dynamical models of gene regulation accounting for molecular noise in transcription and translation processes.

Generation and simulation of in silico gene network models using GNW. (A) Network structure containing 100 genes and extracted from a regulatory network in yeast. (B) Effects of both molecular and measurement noise on gene expression data. (Top) The integration of the ODE model defined in (1) and (2) leads to noiseless gene expression. (Middle) Molecular noise is introduced by replacing Equations (1) and (2) with stochastic differential equations (SDEs) defined in (4). (Bottom) Superposition of both molecular and experimental noise.

Fig. 2.

Generation and simulation of in silico gene network models using GNW. (A) Network structure containing 100 genes and extracted from a regulatory network in yeast. (B) Effects of both molecular and measurement noise on gene expression data. (Top) The integration of the ODE model defined in (1) and (2) leads to noiseless gene expression. (Middle) Molecular noise is introduced by replacing Equations (1) and (2) with stochastic differential equations (SDEs) defined in (4). (Bottom) Superposition of both molecular and experimental noise.

The dynamical models of gene regulation have then been simulated to reproduce wild-type, knockout, knockdown and multifactorial perturbation experiments. Figure 2B illustrates the evolution of mRNA concentration levels without noise, when only molecular noise is introduced, and with both molecular and experimental noise. We generated the following benchmark suites:

At least half of the genes included in each gold standard are regulators, i.e. genes which regulate the mRNA production of at least one other gene. This is to avoid structures where there are many genes that do not regulate any other genes (out-degree = 0). We used the default parameter values proposed by GNW to simulate the gene expression experiments (Supplementary Material).

3.2 Effect of network structural properties on inference method performance

The performance of network inference methods may strongly vary depending on the structural properties of the target networks. Figure 3 shows systematic errors made by each inference method on four three-node motifs overrepresented in the in vivo regulatory network structures of E.coli and yeast (Marbach et al., 2009), and therefore in the gold standard structures we generated.

Systematic errors made by network inference methods in predicting network motifs. GNW analyses 13 configurations of three-node motifs, including fan-out, fan-in, cascade and feed-forward loop (FFL) motifs, which are overrepresented motifs in E.coli and yeast regulatory network. The first column displays the network motifs to infer and additional columns show the systematic errors made by each inference method when trying to infer the corresponding network motif.

Fig. 3.

Systematic errors made by network inference methods in predicting network motifs. GNW analyses 13 configurations of three-node motifs, including fan-out, fan-in, cascade and feed-forward loop (FFL) motifs, which are overrepresented motifs in E.coli and yeast regulatory network. The first column displays the network motifs to infer and additional columns show the systematic errors made by each inference method when trying to infer the corresponding network motif.

_Z_-score, Pinna et al., and Yip et al. have different error profiles than CLR, ARACNE2 (both based on mutual information) and GENIE3, which make systematically false positive errors between Gene 2 and 3 in predicting fan-out motifs. Note that ARACNE2 seems to make less errors on that particular motif because the gene interactions present in the gold standards are in general less reliably identified than with CLR or GENIE3, independently of any network motifs considered. On the other hand, _Z_-score, Pinna et al. and Yip et al. are strongly affected by cascade motifs, where these methods systematically predict false positive interactions between Gene 1 and 3.

We show that inference methods have changing performance when used to make predictions about the structure of regulatory networks having specific structural properties. Thus, we evaluated the selected inference methods (Table 1) against the benchmark suite A described in Section 3.1. Figure 4 shows the AUROC and AUPR values obtained by those methods when applied to infer E.coli and yeast network structures from knockout expression data.

Effect of structural properties of target networks on performance of inference methods. The 20 benchmark networks containing 500 genes each have been generated for each condition using GNW (benchmark suite A, see Section 3.1). The inference methods have been applied to predict the directed structure of each benchmark network from knockout expression data and the corresponding AUROC and AUPR values have been evaluated. Methods strongly impeded by the cascade motif (Z-score, Pinna et al. and Yip et al.) as shown in Figure 3 exhibit a performance degradation on yeast because yeast structure is composed of more cascade motifs than E.coli network structure.

Fig. 4.

Effect of structural properties of target networks on performance of inference methods. The 20 benchmark networks containing 500 genes each have been generated for each condition using GNW (benchmark suite A, see Section 3.1). The inference methods have been applied to predict the directed structure of each benchmark network from knockout expression data and the corresponding AUROC and AUPR values have been evaluated. Methods strongly impeded by the cascade motif (_Z_-score, Pinna et al. and Yip et al.) as shown in Figure 3 exhibit a performance degradation on yeast because yeast structure is composed of more cascade motifs than E.coli network structure.

The AUROC and AUPR values obtained by _Z_-score, Pinna et al. and Yip et al. on yeast gold standards are significantly lower than on E.coli benchmark networks (Mann–Whitney U-test, P<0.01). The performance degradation observed on yeast is due to the fact that these methods make systematic errors in predicting cascade motifs, and because structures extracted from yeast contain more cascade motifs than in E.coli structures (data not shown). We observe a linear correlation between the number of cascade motifs to predict in a regulatory network and the AUROC and AUPR values obtained for _Z_-score, Pinna et al., and Yip et al. (Pearson's correlation, −0.703≤_r_≤−0.552, P<0.05). ARACNE2, CLR and GENIE3 are less affected by the cascade motif (Fig. 3).

Interestingly, Figure 3 also shows that _Z_-score and Pinna et al. exhibit very similar error profiles. _Z_-score is one of the simplest inference methods (Prill et al., 2010), yet it has relatively high accuracy in predicting network structures from knockout steady states. Pinna et al. first performs a _Z_-score analysis followed by a refinement stage, which aims to suppress the errors made by _Z_-score on cascade motifs (Pinna et al., 2010). Figure 3 does not show any noticeable difference between _Z_-score and Pinna et al. This is confirmed by the fact that AUROC and AUPR values for _Z_-score and Pinna et al. are not significantly different (Mann–Whitney U-test, _P_>0.05).

3.3 Effect of network size on inference method performance

We are interested in showing how the performances of inference methods scale with the size of the regulatory networks to reconstruct. Using GNW, it is very simple to generate in silico benchmark network of size N<M, where M is the size of the source network used (e.g. E.coli or yeast). Here, we used the benchmark suite B described in Section 3.1, where each benchmark network has been simulated using the above methodology to produce knockout gene expression data. Figure 5 shows the performance of the inference methods listed in Table 1 when applied to infer regulatory networks containing 100, 200 and 500 genes.

Performance assessment of inference methods on GNW-generated in silico benchmark networks of size 100, 200 and 500 genes. The 20 benchmark networks have been generated for each condition (benchmark suite B, see Section 3.1). The inference methods have been applied to predict the directed structures of benchmark networks from knockout expression data and the corresponding AUROC and AUPR values have been evaluated. We observed that the performance of inference methods to reconstruct decreases with the size of the regulatory networks.

Fig. 5.

Performance assessment of inference methods on GNW-generated in silico benchmark networks of size 100, 200 and 500 genes. The 20 benchmark networks have been generated for each condition (benchmark suite B, see Section 3.1). The inference methods have been applied to predict the directed structures of benchmark networks from knockout expression data and the corresponding AUROC and AUPR values have been evaluated. We observed that the performance of inference methods to reconstruct decreases with the size of the regulatory networks.

CLR has both AUROC and AUPR values significantly higher than those obtained by ARACNE2 for gold standards of size 100, 200 and 500 (Mann–Whitney U-test, P<0.01). Leaving ARACNE2 aside, AUROC values of the five remaining methods are comparable. However, we identified three methods with relatively high AUPR values. They are _Z_-score, and the methods developed by Pinna et al. and Yip et al. AUROC and AUPR values obtained by _Z_-score and Pinna et al. are significantly higher than those of Yip et al., and this is valid for every gold standard size (Mann–Whitney U-test, P<0.05). Also, _Z_-score, Pinna et al. and Yip et al. have high AUPR variances because they are strongly affected by cascade motifs (Fig. 3), which are more frequent in gold standards extracted from yeast than E.coli (each condition in benchmark suite B is composed of 20 gold standards, half being extracted from E.coli and half from yeast).

Figure 5 shows that the AUPR values of inference methods decreases as the sizes of the gold standards increase. The reason is that the connectivity density of the regulatory networks is higher for smaller networks. The higher the connectivity density, the easier it is for each of the six inference methods to have a high AUPR value (Pearson's correlation, 0.383≤_r_≤0.839, P<0.01).

3.4 Design of in vivo gene expression experiments

A given inference method may require a very specific type of expression data in order to enable accurate network reconstruction. We show that in silico benchmark networks have also the ability to support the design of suitable in vivo gene expression experiments, which are typically time consuming and expensive (Haynes and Brent, 2009). The benchmark suite C described in Section 3.1 is formed of 20 in silico networks consisting of 100 genes each, which we simulated using GNW to produce steady-state data for systematic knockout and knockdown, as well as 100 multifactorial perturbation experiments. Figure 6 shows the AUROC and AUPR values obtained by the inference methods reviewed here (Table 1).

Identification of the most informative type of gene expression data required by a given inference method using in silico benchmark networks. Knockout (ko), knockdown (kd) and multifactorial (mf) perturbations were applied on 20 gold standards to generate three datasets containing each 100 measured steady states (benchmark suite C, see Section 3.1). Note, Z-score, Pinna et al. and Yip et al. are not applicable to the multifactorial data.

Fig. 6.

Identification of the most informative type of gene expression data required by a given inference method using in silico benchmark networks. Knockout (ko), knockdown (kd) and multifactorial (mf) perturbations were applied on 20 gold standards to generate three datasets containing each 100 measured steady states (benchmark suite C, see Section 3.1). Note, _Z_-score, Pinna et al. and Yip et al. are not applicable to the multifactorial data.

The most accurate network reconstructions are obtained using GENIE3, _Z_-score and the methods developed by Pinna et al. and Yip et al. on knockout data. Knockout experiments are very informative, because they provide network responses to individual and large perturbations (genes are ‘deleted’). Knockdown expression data, where the maximum transcription rate of genes is halved, are less informative than knockout data and thus lead to less accurate network reconstructions. Figure 6 shows that ARACNE2 obtained AUROC and AUPR values comparable to CLR and GENIE3 when using multifactorial perturbation data. In addition, we considered providing knockout, knockdown and multifactorial perturbation data together to ARACNE2, CLR and GENIE3. We observed that AUROC and AUPR values obtained were slightly higher than when providing the three expression datasets individually (data not shown). We also added successively 100, 200, 300 and 400 additional multifactorial perturbations; however, the AUROC and AUPR values did not improve significantly for all methods (Mann–Whitney U-test, P<0.05). Furthermore, it has been shown using GNW and time-series data that the inference accuracy of inference methods reaches a saturation point after a specific data size (Vijender et al., 2010). This reveals that simply adding more expression data does not necessarily imply performance improvement.

3.5 DREAM Network inference challenges

We have used GNW to generate the target networks for three international competitions on gene network reverse engineering: DREAM3 (2008), DREAM4 (2009) and DREAM5 (2010). Participants of the DREAM4 In Silico challenge were asked to provide network predictions for two subchallenges made of networks of size 10 and 100, respectively. Each subchallenge was composed of five in silico gene networks (two extracted from E.coli and three from yeast), which have been simulated to produce steady-state wild-type, knockout, knockdown and multifactorial perturbation experiments. In addition, time-series data have been made available.

For each subchallenge, network predictions made by participating teams have been evaluated by computing _P_-values, which indicate the probability that random lists of genetic interaction predictions would be of the same or better quality (Prill et al., 2010). The overall score that has been used for ranking of the methods applied in the DREAM4 In Silico Challenge was a negative log-transformed _P_-value given by

formula

(5)

where _p_1 and _p_2 are, respectively, the geometric means of AUPR _P_-values and AUROC _P_-values taken over the five networks. Thus, larger scores indicate smaller _P_-values, hence better predictions. Figure 7 compares the overall scores of the inference methods reviewed here (Table 1) to those obtained by the participating methods applied in the DREAM4 In Silico Size 100 Challenge.

Performance assessment of inference methods listed in Table 1 on the DREAM4 In Silico Size 100 Challenge. Methods are ranked according to the geometric means of AUPR P-values and AUROC P-values taken over five networks. Pinna et al. was the best performer in that challenge, hence the two first bars correspond both to the overall score of Pinna et al. Typically, inference methods accept different types of gene expression data as input. Each method reviewed here has been fed with the maximum amount of accepted expression data.

Fig. 7.

Performance assessment of inference methods listed in Table 1 on the DREAM4 In Silico Size 100 Challenge. Methods are ranked according to the geometric means of AUPR _P_-values and AUROC _P_-values taken over five networks. Pinna et al. was the best performer in that challenge, hence the two first bars correspond both to the overall score of Pinna et al. Typically, inference methods accept different types of gene expression data as input. Each method reviewed here has been fed with the maximum amount of accepted expression data.

The most accurate reconstruction of the five gene networks of size 100 genes was achieved by Pinna et al. (2010). They participated to the DREAM4 In Silico Size 100 Challenge, in which their method was best performer (_OS_=71.589). Hence, both first bars in Figure 7 correspond to the score of Pinna et al. We have shown in Figure 3 that AUROC and AUPR values obtained by Pinna et al. are not significantly higher than those obtained using the original _Z_-score method. This can be explained by the fact that transitive causal effects are almost always weaker than the direct effects. We expect that if many amplifying cascades occur, the refinement stage introduced by Pinna et al. (2010) will enable more reliable network predictions as compared to _Z_-score alone.

It is also interesting to note that the method of Yip et al. has been the best performer on all DREAM3 In Silico Challenges of size 10, 50 and 100 genes we also provided. Yet, it would have been ranked 7th on the DREAM4 size 100 challenge (_OS_=57.079). While the original algorithm is composed of several batches using both steady-state and time-series data, Yip et al. only used the first batch to build a noise model from knockout steady-state data (Yip et al., 2010). The achievement of the 7th rank in DREAM4 can be partially explained by the fact that Yip et al. made a strong and correct assumption on the Gaussian measurement noise we used in DREAM3, which is no longer valid in DREAM4. Indeed, we modeled molecular noise in addition to a model of experimental noise observed in microarrays (Stolovitzky et al., 2005).

4 DISCUSSION

We propose a comprehensive and powerful framework for in silico benchmark generation and performance profiling of network inference methods. We implemented this framework as an open-source tool called GeneNetWeaver (GNW). Biologically plausible network structures are generated by extracting modules from known biological interaction networks such as those of E.coli and the yeast S.cerevisiae. Network structures are then endowed with detailed dynamical models of gene regulation describing both transcription and translation processes. Transcriptional regulation is modeled using a thermodynamic approach accounting for both independent (‘additive’) and synergistic (‘multiplicative’) interactions. In addition, our models account for stochastic molecular noise as well as experimental noise observed in microarrays. The generated in silico benchmark networks can be simulated in GNW to reproduce wild-type, knockout (null-mutant), knockdown (heterozygous) and multifactorial perturbation gene expression experiments. As an example of the application, we have used GNW to generate the target networks for three international competitions on gene network reverse engineering: DREAM3 (2008), DREAM4 (2009) and DREAM5 (2010). In total, 91 teams have submitted over 900 network predictions on GNW-generated networks, making GNW one of the most widely used benchmark generators by the community.

In contrast to the previously proposed benchmark generators, GNW also integrates tools for systematic evaluation of the predictions from inference methods on benchmark networks. A unique feature of GNW is the ability to perform a network motif analysis from a set of network predictions and their corresponding benchmark networks. The network motif analysis reveals systematic prediction errors made by inference method on specific network motifs, thereby indicating potential ways of network reconstruction improvements. The accuracy of network inference is assessed using standard metrics such as PR and ROC curves.

We have used GNW to generate in silico benchmark suites to assess the performance and identify the strengths and weaknesses of six network inference methods. We show that _Z_-score, and the inference methods developed by Pinna et al. and Yip et al. make more accurate network predictions than the two widely used methods, ARACNE2 and CLR. This good performance is achieved apparently because those methods target the inference of causal relationships between genes. However, ARACNE2 and CLR do not require systematic knockout gene expression data, which are not always available in practice, to infer undirected networks. Yet ARACNE2, CLR and GENIE3 methods can be applied to infer regulatory networks even if no systematic knockout or knockdown experiments are provided. Furthermore, our results show that at some point simply giving more expression data to inference methods does not necessarily imply performance improvement. Therefore, the integration of additional information about the target regulatory networks should be considered, for instance using prior knowledge about the network structures.

The novelty of GNW is that it additionally provides a unique network motif analysis, which we used to show that the structural properties of the target regulatory networks affect the performance of inference methods. We observed that the performances of _Z_-score, and the methods developed by Pinna et al. and Yip et al. are impeded by the presence of cascade motifs in the target networks. Thus, we show that those methods make significantly less accurate network predictions on the yeast S.cerevisiae, whose structure includes more cascade motifs than E.coli transcriptional network structure. Finally, we also provide evidence that in silico benchmark networks can be used to identify the most informative type of gene expression data that a given inference method could use to achieve the best possible reconstruction from in vivo experiments.

ACKNOWLEDGEMENT

The authors would like to express their thanks to Gilles Roulet for his collaboration in software development, and Steffen Wischmann, Peter Dürr and Pradeep Fernando for their careful reading and suggestions on the article.

Funding: This work is supported by the SystemsX.ch initiative (WingX project) to T.S.; Swiss National Science Foundation (200021-112060) to D.M. and (200021- 127143) to D.F.

Conflict of Interest: none declared.

REFERENCES

, et al.

Quantitative model for gene regulation by lambda phage repressor

,

Proc. Natl Acad. Sci. USA

,

1982

, vol.

79

pg.

1129

, .

Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics

,

Bioinformatics

,

2009

, vol.

25

pg.

2937

, et al.

How to infer gene networks from expression profiles

,

Molecular Syst. Biol.

,

2007

, vol.

3

pg.

78

, .

Engineering stability in gene networks by autoregulation

,

Nature

,

2000

, vol.

405

(pg.

590

-

593

)

, et al.

Quantification of protein half-lives in the budding yeast proteome

,

Proc. Natl Acad. Sci.

,

2006

, vol.

103

pg.

13004

, et al.

The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo

,

Genome Biol.

,

2006

, vol.

7

pg.

R36

, .

Systems biology strikes gold

,

Cell

,

2009

, vol.

137

pg.

24

, et al.

A yeast synthetic network for in vivo assessment of reverse-engineering and modeling approaches

,

Cell

,

2009

, vol.

137

(pg.

172

-

181

)

, .

The relationship between precision-recall and roc curves

,

Proceedings of the 23rd International Conference on Machine Learning.

,

2006

New York, NY, USA

ACM

(pg.

233

-

240

)

, et al.

Quantitative monitoring of gene expression patterns with a complementary DNA microarray

,

Science

,

1995

, vol.

270

pg.

467

, et al.

A gene network simulator to assess reverse engineering algorithms

,

Ann. N Y Acad. Sci.

,

2009

, vol.

1158

(pg.

125

-

142

)

, et al.

Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles

,

PLoS Biol.

,

2007

, vol.

5

pg.

e8

, et al.

RegulonDB version 7.0: transcriptional regulation of Escherichia coli K-12 integrated within genetic sensory response units (Gensor Units)

,

Nucleic Acids Res.

,

2011

, vol.

39

Suppl. 1

pg.

D98

, .

Neutralizing noise in gene networks

,

Nature

,

2000

, vol.

405

(pg.

520

-

521

)

.

The chemical Langevin equation

,

J. Chem. Phys.

,

2000

, vol.

113

pg.

297

, et al.

GeNGe: systematic generation of gene regulatory networks

,

Bioinformatics

,

2009

, vol.

25

pg.

1205

, .

Benchmarking regulatory network reconstruction with GRENDEL

,

Bioinformatics

,

2009

, vol.

25

pg.

801

, et al.

Inferring Regulatory Networks from Expression Data Using Tree-Based Methods

,

PLoS One

,

2010

, vol.

5

pg.

e12776

, et al.

Inferring gene networks from time series microarray data using dynamic Bayesian networks

,

Brief. Bioinformatics

,

2003

, vol.

4

pg.

228

, et al.

TRANSWESD: inferring cellular networks with transitive reduction

,

Bioinformatics

,

2010

, vol.

26

pg.

2160

, et al.

A benchmark for methods in reverse engineering and model discrimination: problem formulation and solutions

,

Genome Res.

,

2004

, vol.

14

pg.

1773

, et al.

ReTRN: A retriever of real transcriptional regulatory network and expression data for evaluating structure learning algorithm

,

Genomics

,

2009

, vol.

94

(pg.

349

-

354

)

, et al.

Expression monitoring by hybridization to high-density oligonucleotide arrays

,

Nat. Biotechnol.

,

1996

, vol.

14

pg.

1675

, et al.

Generating realistic in silico gene networks for performance assessment of reverse engineering methods

,

J. Comput. Biol.

,

2009

, vol.

16

(pg.

229

-

239

)

, et al.

Revealing strengths and weaknesses of methods for gene network inference

,

Proc. Natl Acad. Sci. USA

,

2010

, vol.

107

(pg.

6286

-

6291

)

, et al.

ARACNE: an algorithm for the reconstruction of gene regulatory networks in a mammalian cellular context

,

BMC Bioinformatics

,

2006

, vol.

7

Suppl. 1

pg.

S7

, et al.

Artificial gene networks for objective comparison of analysis algorithms

,

Bioinformatics

,

2003

, vol.

19

Suppl. 2

(pg.

ii122

-

ii129

)

, et al.

Gene regulatory networks from multifactorial perturbations using graphical lasso: application to the dream4 challenge

,

PloS One

,

2010

, vol.

5

pg.

e14147

, et al.

Mapping and quantifying mammalian transcriptomes by RNA-Seq

,

Nat. Methods

,

2008

, vol.

5

(pg.

621

-

628

)

, et al.

From knockouts to networks: establishing direct cause-effect relationships through graph analysis

,

PloS One

,

2010

, vol.

5

(pg.

218

-

223

)

, et al.

Towards a rigorous assessment of systems biology models: the DREAM3 challenges

,

PloS One

,

2010

, vol.

5

pg.

e9202

, et al.

Hierarchical organization of modularity in metabolic networks

,

Science

,

2002

, vol.

297

pg.

1551

, et al.

Reconstructing biological networks using conditional correlation analysis

,

Bioinformatics

,

2005

, vol.

21

pg.

765

, et al.

A system for generating transcription regulatory networks with combinatorial control of transcription

,

Bioinformatics

,

2008

, vol.

24

pg.

1318

.

Numerical integration of SDEs: a short tutorial

,

Technical Report LIS-REPORT-2010-001

,

2010

Swiss Federal Institute of Technology in Lausanne (EPFL)

, et al.

Network motifs in the transcriptional regulation network of Escherichia coli

,

Nat. Genet.

,

2002

, vol.

31

(pg.

64

-

68

)

, et al.

Statistical analysis of MPSS measurements: application to the study of LPS-activated macrophage gene expression

,

Proc. Natl Acad. Sci. USA

,

2005

, vol.

102

pg.

1402

, et al.

SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms

,

BMC Bioinformatics

,

2006

, vol.

7

pg.

43

, et al.

Time lagged information theoretic approaches to the reverse engineering of gene regulatory networks

,

BMC Bioinformatics

,

2010

, vol.

11

pg.

S19

, et al.

Improved reconstruction of in silico gene regulatory networks by integrating knockout and perturbation data

,

PloS One

,

2010

, vol.

5

, et al.

Advances to Bayesian network inference for generating causal networks from observational biological data

,

Bioinformatics

,

2004

, vol.

20

(pg.

3594

-

3603

)

Author notes

Associate Editor: John Quackenbush

© The Author 2011. Published by Oxford University Press. All rights reserved. For Permissions, please email: journals.permissions@oup.com

Supplementary data

Citations

Views

Altmetric

Metrics

Total Views 10,304

7,635 Pageviews

2,669 PDF Downloads

Since 11/1/2016

Month: Total Views:
November 2016 10
December 2016 5
January 2017 18
February 2017 60
March 2017 35
April 2017 42
May 2017 59
June 2017 53
July 2017 18
August 2017 24
September 2017 52
October 2017 32
November 2017 43
December 2017 133
January 2018 89
February 2018 101
March 2018 112
April 2018 82
May 2018 124
June 2018 118
July 2018 108
August 2018 108
September 2018 107
October 2018 107
November 2018 105
December 2018 106
January 2019 78
February 2019 117
March 2019 160
April 2019 164
May 2019 145
June 2019 90
July 2019 152
August 2019 102
September 2019 124
October 2019 158
November 2019 100
December 2019 100
January 2020 115
February 2020 94
March 2020 95
April 2020 151
May 2020 68
June 2020 131
July 2020 74
August 2020 94
September 2020 103
October 2020 119
November 2020 101
December 2020 86
January 2021 110
February 2021 120
March 2021 167
April 2021 118
May 2021 116
June 2021 141
July 2021 163
August 2021 167
September 2021 157
October 2021 128
November 2021 147
December 2021 168
January 2022 166
February 2022 174
March 2022 152
April 2022 157
May 2022 164
June 2022 110
July 2022 127
August 2022 126
September 2022 176
October 2022 154
November 2022 95
December 2022 120
January 2023 154
February 2023 123
March 2023 139
April 2023 142
May 2023 138
June 2023 81
July 2023 96
August 2023 92
September 2023 87
October 2023 79
November 2023 138
December 2023 106
January 2024 126
February 2024 133
March 2024 193
April 2024 88
May 2024 103
June 2024 52
July 2024 91
August 2024 69
September 2024 79

Citations

374 Web of Science

×

Email alerts

Citing articles via

More from Oxford Academic