Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks - PubMed (original) (raw)

Robust data-driven incorporation of prior knowledge into the inference of dynamic regulatory networks

Alex Greenfield et al. Bioinformatics. 2013.

Abstract

Motivation: Inferring global regulatory networks (GRNs) from genome-wide data is a computational challenge central to the field of systems biology. Although the primary data currently used to infer GRNs consist of gene expression and proteomics measurements, there is a growing abundance of alternate data types that can reveal regulatory interactions, e.g. ChIP-Chip, literature-derived interactions, protein-protein interactions. GRN inference requires the development of integrative methods capable of using these alternate data as priors on the GRN structure. Each source of structure priors has its unique biases and inherent potential errors; thus, GRN methods using these data must be robust to noisy inputs.

Results: We developed two methods for incorporating structure priors into GRN inference. Both methods [Modified Elastic Net (MEN) and Bayesian Best Subset Regression (BBSR)] extend the previously described Inferelator framework, enabling the use of prior information. We test our methods on one synthetic and two bacterial datasets, and show that both MEN and BBSR infer accurate GRNs even when the structure prior used has significant amounts of error (>90% erroneous interactions). We find that BBSR outperforms MEN at inferring GRNs from expression data and noisy structure priors.

Availability and implementation: Code, datasets and networks presented in this article are available at http://bonneaulab.bio.nyu.edu/software.html.

Supplementary information: Supplementary data are available at Bioinformatics online.

PubMed Disclaimer

Figures

Fig. 1.

Fig. 1.

Method flow chart. Our method takes as input an expression dataset. To build a mechanistic model of gene expression, we create time-lagged response and design variables, such that the expression of the TF is time-lagged with respect to the expression of the target. We then resample the response and designing matrices, running model selection (using either

MEN

or

BBSR

) for each resample. This generates an ensemble of networks, which we rank combine into one final network

Fig. 2.

Fig. 2.

Effect of weight parameter on performance. We use all GSIs as the set of PKIs, and evaluate performance (in terms of AUPR) against the set of GSIs. We evaluate this performance for a variety of choices of the weight parameter for both methods

Fig. 3.

Fig. 3.

Incorporation of prior interactions is data driven. For all three datasets, we used all GSIs as PKIs. Here, we display the distribution of time-lagged correlation of predicted TF-target pairs at a recall level of formula image (higher ranked, blue), and low ranked interactions that are in the gold standard (lower ranked, red). Note that high ranked interactions are less likely to have low absolute time-lagged correlation, and the low ranked GSIs are centred around 0

Fig. 4.

Fig. 4.

Performance change on the leave-out set. PKIs were sampled randomly from 20%, 40%, 60% and 80% of the GSIs in five repetitions. We define the leave-out set as the set of GSIs that are not PKIs. Here, we compare the AUPR of the leave-out set when using PKIs (_y_-axis) to the AUPR when not using PKIs (_x_-axis). Points above the line indicate a performance increase when PKIs are used

Fig. 5.

Fig. 5.

Robustness to incorrect prior information. For each dataset, we considered half of the GSIs as TPIs, and added varying numbers of FPIs that were not GSIs. We show the AUPR of both methods for multiple choices of the respective weight parameters, as well as methods that do not use any PKIs (horizontal lines). Additionally, we show the performance of a naive interaction ranking method, which places all PKIs at the top of the list (gray bars)

Similar articles

Cited by

References

    1. Bar-Joseph Z, et al. Studying and modelling dynamic biological processes using time-series gene expression data. Nat. Rev. Genet. 2012;13:552–564. - PubMed
    1. Bonneau R, et al. The Inferelator: an algorithm for learning parsimonious regulatory networks from systems-biology data sets de novo. Genome Biol. 2006;7:R36. - PMC - PubMed
    1. Bonneau R, et al. A predictive model for transcriptional control of physiology in a free living cell. Cell. 2007;131:1354–1365. - PubMed
    1. Carro MS, et al. The transcriptional network for mesenchymal transformation of brain tumours. Nature. 2010;463:318–325. - PMC - PubMed
    1. Ciofani M, et al. A validated regulatory network for Th17 cell specification. Cell. 2012;151:289–303. - PMC - PubMed

Publication types

MeSH terms

Substances

Grants and funding

LinkOut - more resources