Bayesian Models for Integrative Genomics (original) (raw)

Advances in Statistical Bioinformatics: Models and Integrative Inference for High- Throughput Data, Kim-Anh Do, Zhaohui Steve Qin and Marina Vannucci (Eds). Cambridge University Press, 272-291.

Bayesian methods have found many successful applications in genomics. Methods that employ variable selection have been particularly successful, as they allow to handle situations where the amount of measured variables can be much greater than the number of observations. Here we describe Bayesian variable selection models for integrative genomics. We first look into models that incorporate external biological information into the analysis of experimental data, in particular gene expression data. We address linear settings, including regression and classification models, and mixture models, including clustering and discriminant analysis. We then focus on Bayesian models that achieve an even greater type of integration, by incorporating into the modeling experimental data from different platforms, together with prior knowledge. We look in particular at graphical models, integrating gene expression data with microRNA expression data. All modeling settings we consider exploit variable selection techniques and utilize prior constructions that cleverly incorporate biological knowledge about structural dependencies among the variables.

Bayesian Models for Variable Selection that Incorporate Biological Information (with discussion)

Variable selection has been the focus of much research in recent years. Bayesian methods have found many successful applications, particularly in situations where the amount of measured variables can be much greater than the number of observations. One such example is the analysis of genomics data. In this paper we first review Bayesian variable selection methods for linear settings, including regression and classi fication models. We focus in particular on recent prior constructions that have been used for the analysis of genomic data and briefly describe two novel applications that integrate di fferent sources of biological information into the analysis of experimental data. Next, we address variable selection for a di fferent modeling context, i.e. mixture models. We address both clustering and discriminant analysis settings and conclude with an application to gene expression data for patients a ffected by leukemia.

INCORPORATING BIOLOGICAL INFORMATION INTO LINEAR MODELS: A BAYESIAN APPROACH TO THE SELECTION OF PATHWAYS AND GENES

2011

The vast amount of biological knowledge accumulated over the years has allowed researchers to identify various biochemical interactions and define different families of pathways. There is an increased interest in identifying pathways and pathway elements involved in particular biological processes. Drug discovery efforts, for example, are focused on identifying biomarkers as well as pathways related to a disease. We propose a Bayesian model that addresses this question by incorporating information on pathways and gene networks in the analysis of DNA microarray data. Such information is used to define pathway summaries, specify prior distributions, and structure the MCMC moves to fit the model. We illustrate the method with an application to gene expression data with censored survival outcomes. In addition to identifying markers that would have been missed otherwise and improving prediction accuracy, the integration of existing biological knowledge into the analysis provides a better understanding of underlying molecular processes.

A Bayesian Graphical Modeling Approach to MicroRNA Regulatory Network Inference

In this project I construct a statistical procedure to infer a biological network of very high dimensionality, where microRNAs, small RNAs, are supposed to down-regulate mRNAs, also called target genes. From a statistical point of view, I address this problem by building a network that represents the biological regulatory system, indicating which microRNA regulates which gene. In particular, this method provides a novel graphical modeling approach that includes constraints on the regression coefficients to take into account the down-regulatory e ffect of the network. This approach is able to select single connections in the network, unlike previous methods in the Bayesian variable selection literature, which only allow the selection of covariates (microRNAs) that a ffect either all the genes or none of them. The main challenge of this project is represented by the dimensionality of the data. The stochastic search variable selection algorithm is able to efficiently explore the space of all possible networks and to fi nd, for each gene, which microRNAs have high posterior probability of being down-regulating the gene. To help the selection, I also propose a new prior formulation which is able to integrate di fferent sources of data, by exploiting information from sequence and structure analyses. Because many sources of information are integrated, the model is also able to determine which information is consistent with the data via posterior inference on the parameters de fined in the data integration prior. The proposed method is general and can be easily applied to other types of network inference by integrating multiple data sources.

Loading...

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.