Modeling cumulative biological phenomena with Suppes-Bayes causal networks (original) (raw)

Quantifying cancer progression with conjunctive Bayesian networks

Bioinformatics, 2009

MOTIVATION: Cancer is an evolutionary process characterized by accumulating mutations. However, the precise timing and the order of genetic alterations that drive tumor progression remain enigmatic. RESULTS: We present a specific probabilistic graphical model for the accumulation of mutations and their interdependencies. The Bayesian network models cancer progression by an explicit unobservable accumulation process in time that is separated from the observable but error-prone detection of mutations. Model parameters are estimated by an Expectation-Maximization algorithm and the underlying interaction graph is obtained by a simulated annealing procedure. Applying this method to cytogenetic data for different cancer types, we find multiple complex oncogenetic pathways deviating substantially from simplified models, such as linear pathways or trees. We further demonstrate how the inferred progression dynamics can be used to improve genetics-based survival predictions which could support diagnostics and prognosis. AVAILABILITY: The software package ct-cbn is available under a GPL license on the web site cbg.ethz.ch/software/ct-cbn CONTACT:

Inferring Tree Causal Models of Cancer Progression with Probability Raising

PLoS ONE, 2014

Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.

Inferring causal models of cancer progression with a shrinkage estimator and probability raising

Existing techniques to reconstruct tree models of progression for accumulative processes, such as cancer, seek to estimate causation by combining correlation and a frequentist notion of temporal priority. In this paper, we define a novel theoretical framework called CAPRESE (CAncer PRogression Extraction with Single Edges) to reconstruct such models based on the notion of probabilistic causation defined by Suppes. We consider a general reconstruction setting complicated by the presence of noise in the data due to biological variation, as well as experimental or measurement errors. To improve tolerance to noise we define and use a shrinkage-like estimator. We prove the correctness of our algorithm by showing asymptotic convergence to the correct tree under mild constraints on the level of noise. Moreover, on synthetic data, we show that our approach outperforms the state-of-the-art, that it is efficient even with a relatively small number of samples and that its performance quickly converges to its asymptote as the number of samples increases. For real cancer datasets obtained with different technologies, we highlight biologically significant differences in the progressions inferred with respect to other competing techniques and we also show how to validate conjectured biological relations with progression models.

Application of Bayesian networks for inferring cause–effect relations from gene expression profiles of cancer versus normal cells

Mathematical Biosciences, 2007

The paper is devoted to two questions: whether distinction of causes versus effects of neoplasia leaves a signature in the cancer versus normal gene expression profiles and whether roles of genes, ''causes'' or ''effects'', can be inferred from repeated measurements of gene expressions. We model joint probability distributions of logarithms of gene expressions with the use of Bayesian networks (BN). Fitting our models to real data confirms that our BN models have the ability to explain some aspects of observational evidence from DNA microarray experiments. Effects of neoplastic transformation are well seen among genes with the highest power to differentiate between normal and cancer cells. Likelihoods of BNs depend on the biological role of selected genes, defined by Gene Ontology. Also predictions of our BN models are coherent with the set of putative causes and effects constructed based on our data set of papillary thyroid cancer.

Tumor-specific Causal Inference (TCI): A Bayesian Method for Identifying Causative Genome Alterations within Individual Tumors

Precision medicine for cancer involves identifying and targeting the somatic genome alterations (SGAs) that drive the development of an individual tumor. Much of current efforts at finding driver SGAs have involved identifying the genes that are mutated more frequently than expected among a collection of tumors. When these population-derived driver genes are altered (perhaps in particular ways) in a given tumor, they are posited as driver genes for that tumor. In this technical report, we introduce an alternative approach for identifying causative SGAs, also known as “drivers”, by inferring causal relationships between SGAs and molecular phenotypes at the individual tumor level. Our tumor-specific causal inference (TCI) algorithm uses a Bayesian method to identify the SGAs in a given tumor that have a high probability of regulating transcriptomic changes observed in that specific tumor. Thus, the method is focused on identifying the tumor specific SGAs that are causing expression ch...

Efficient inference of cancer progression models

2014

We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems.

Restricted-derestricted dynamic Bayesian Network inference of transcriptional regulatory relationships among genes in cancer

Understanding transcriptional regulatory relationships among genes is important for gaining etiological insights into diseases such as cancer. To this end, high-throughput biological data have been generated through advancements in a variety of technologies. These rely on computational approaches to discover underlying structures in such data. Among these computational approaches, Bayesian networks (BNs) stand out because their probabilistic nature enables them to manage randomness in the dynamics of gene regulation and experimental data. Feedback loops inherent in networks of regulatory relationships are more tractable when enhancements to BNs are applied to them. Here, we propose Restricted-Derestricted dynamic BNs with a novel search technique, Restricted-Derestricted Greedy Method, for such tasks. This approach relies on the Restricted-Derestricted Greedy search technique to infer transcriptional regulatory networks in two phases: restricted inference and derestricted inference. An application of this approach to real data sets reveals it performs favourably well compared to other existing well performing dynamic BN approaches in terms of recovering true relationships among genes. In addition, it provides a balance between searching for optimal networks and keeping biologically relevant regulatory interactions among variables.

CAPRI: Efficient Inference of Cancer Progression Models from Cross-sectional Data

Bioinformatics, 2014

We devise a novel inference algorithm to effectively solve the cancer progression model reconstruction problem. Our empirical analysis of the accuracy and convergence rate of our algorithm, CAncer PRogression Inference (CAPRI), shows that it outperforms the state-of-the-art algorithms addressing similar problems. Motivation: Several cancer-related genomic data have become available (e.g., The Cancer Genome Atlas, TCGA) typically involving hundreds of patients. At present, most of these data are aggregated in a cross-sectional fashion providing all measurements at the time of diagnosis. Our goal is to infer cancer “progression” models from such data. These models are represented as directed acyclic graphs (DAGs) of collections of “selectivity” relations, where a mutation in a gene A “selects” for a later mutation in a gene B. Gaining insight into the structure of such progressions has the potential to improve both the stratification of patients and personalized therapy choices. Results: The CAPRI algorithm relies on a scoring method based on a probabilistic theory developed by Suppes, coupled with bootstrap and maximum likelihood inference. The resulting algorithm is efficient, achieves high accuracy, and has good complexity, also, in terms of convergence properties. CAPRI performs especially well in the presence of noise in the data, and with limited sample sizes. Moreover CAPRI, in contrast to other approaches, robustly reconstructs different types of confluent trajectories despite irregularities in the data. We also report on an ongoing investigation using CAPRI to study atypical Chronic Myeloid Leukemia, in which we uncovered non trivial selectivity relations and exclusivity patterns among key genomic events.

Integrative Bayesian Network Analysis of Genomic Data

Rapid development of genome-wide profiling technologies has made it possible to conduct integrative analysis on genomic data from multiple platforms. In this study, we develop a novel integrative Bayesian network approach to investigate the relationships between genetic and epigenetic alterations as well as how these mutations affect a patient’s clinical outcome. We take a Bayesian network approach that admits a convenient decomposition of the joint distribution into local distributions. Exploiting the prior biological knowledge about regulatory mechanisms, we model each local distribution as linear regressions. This allows us to analyze multi-platform genome-wide data in a computationally efficient manner. We illustrate the performance of our approach through simulation studies. Our methods are motivated by and applied to a multi-platform glioblastoma dataset, from which we reveal several biologically relevant relationships that have been validated in the literature as well as new genes that could potentially be novel biomarkers for cancer progression.

Bayesian Pathway Analysis of Cancer Microarray Data

PLoS ONE, 2014

High Throughput Biological Data (HTBD) requires detailed analysis methods and from a life science perspective, these analysis results make most sense when interpreted within the context of biological pathways. Bayesian Networks (BNs) capture both linear and nonlinear interactions and handle stochastic events in a probabilistic framework accounting for noise making them viable candidates for HTBD analysis. We have recently proposed an approach, called Bayesian Pathway Analysis (BPA), for analyzing HTBD using BNs in which known biological pathways are modeled as BNs and pathways that best explain the given HTBD are found. BPA uses the fold change information to obtain an input matrix to score each pathway modeled as a BN. Scoring is achieved using the Bayesian-Dirichlet Equivalent method and significance is assessed by randomization via bootstrapping of the columns of the input matrix. In this study, we improve on the BPA system by optimizing the steps involved in ''Data Preprocessing and Discretization'', ''Scoring'', ''Significance Assessment'', and ''Software and Web Application''. We tested the improved system on synthetic data sets and achieved over 98% accuracy in identifying the active pathways. The overall approach was applied on real cancer microarray data sets in order to investigate the pathways that are commonly active in different cancer types. We compared our findings on the real data sets with a relevant approach called the Signaling Pathway Impact Analysis (SPIA).