Using Bayesian network inference algorithms to recover molecular genetic regulatory networks (original) (raw)

Recent advances in high-throughput molecular biology has motivated in the field of bioinformatics the use of network inference algorithms to predict causal models of molecular networks from correlational data. However, it is extremely difficult to evaluate the effectiveness of these algorithms because we possess neither the knowledge of the correct biological networks nor the ability to experimentally validate the hundreds of predicted gene interactions within a reasonable amount of time. Here, we apply a new approach developed by Smith, et al. (2002) that tests the ability of network inference algorithms to accurately and efficiently recover network structures based on gene expression data taken from a simulated biological pathway in which the structure is known a priori. We simulated a genetic regulatory network and used the resultant sampled data to test variations in the design of a Bayesian Network inference algorithm, as well as variations in total quantity of available data, length of sampling interval, method of data discretization, and presence of interpolated data between observed data points. We also advanced the inference algorithm by developing a heuristic influence score that infers the strength and sign of regulation (up or down) between genes. In these experiments, we found that our inference algorithm worked best when presented with data discretized into three categories, when using a greedy search algorithm with random restarts, and when evaluating networks using the BDe scoring metric. Under these conditions, the algorithm was both accurate and efficient in recovering the simulated molecular network when the sampled data sets were large. Under more biologically reasonable small amounts of sampled data, the algorithm worked best only when interpolated data was included, but had difficulty recovering relationships describing genes with more than one regulatory influence. These results suggest that network inference algorithms and sampling methods must be carefully designed and tested before they can be used to recover biological genetic pathways, especially in the context of highly limited quantities of data.