Non-linearity of Metabolic Pathways Critically Influences the Choice of Machine Learning Model (original) (raw)

Machine learning methods for metabolic pathway prediction

BMC Bioinformatics, 2010

A key challenge in systems biology is the reconstruction of an organism's metabolic network from its genome sequence. One strategy for addressing this problem is to predict which metabolic pathways, from a reference database of known pathways, are present in the organism, based on the annotated genome of the organism.

Stochastic simulation and modelling of metabolic networks in a machine learning framework

Simulation Modelling Practice and Theory, 2011

Metabolomics is increasingly becoming an important field. The fundamental task in this area is to measure and interpret complex time and condition dependent parameters such as the activity or flux of metabolites in cells, their concentration, tissues elements and other biosamples. The careful study of all these elements has led to important insights in the functioning of metabolism. Recently, however, there is a growing interest towards an integrated approach to studying biological systems. This is the main goal in Systems Biology where a combined investigation of several components of a biological system is thought to produce a thorough understanding of such systems. Biological circuits are complex to model and simulate and many efforts are being made to develop models that can handle their intrinsic complexity. A significant part of biological networks still remains unknown even though recent technological developments allow simultaneous acquisition of many metabolite measurements. Metabolic networks are not only structurally complex but behave also in a stochastic fashion. Therefore, it is necessary to express structure and handle uncertainty to construct complete dynamics of these networks. In this paper we describe how stochastic modeling and simulation can be performed in a symbolic-statistical machine learning (ML) framework. We show that symbolic ML deal with structural and relational complexity while statistical ML provides principled approaches to uncertainty modeling. Learning is used to analyze traces of biochemical reactions and model the dynamicity through parameter learning, while inference is used to produce stochastic simulation of the network.

Evaluating the Potential of Applying Machine Learning Tools to Metabolic Pathway Optimization

2020

Successful engineering of a microbial host for efficient production of a target product from a given substrate can be viewed as an extensive optimization task. Such a task involves the selection of high activity enzymes as well as their gene expression regulatory control elements (i.e., promoters and ribosome binding sites). Finally, there is also the need to tune expression of multiple genes along a heterologous pathway to relieve constraints from rate-limiting step and help reduce metabolic burden on cells from unnecessary over-expression of high activity enzymes. While the aforementioned tasks could be performed through combinatorial experiments, such an approach incurs significant cost, time and effort, which is a handicap that can be relieved by application of modern machine learning tools. Such tools could attempt to predict high activity enzymes from sequence, but they are currently most usefully applied in classifying strong promoters from weaker ones as well as combinatoria...

Reconstruction of metabolic pathways by combining probabilistic graphical model-based and knowledge-based methods

BMC proceedings, 2014

Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Here we combine existing pathway knowledge and a new ab initio Bayesian probabilistic graphical model together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known, individual gene / protein interactions and metabolic reactions extracted from existing reference pathways. Known...

A new algorithm for Predicting Metabolic Pathways

The reconstruction of the metabolic network of an organism based on its genome sequence is a key challenge in systems biology. The aim of the work described here is to develop a new algorithm to predict pathway classes and individual pathways for a previously unknown query molecule. The main idea is to use a dense graph, where the compounds are represented as vertices and the enzymes are represented as edges, the weights are assigned to the edges according to the previous known pathways. The shortest path algorithm is applied for each missing enzyme in a pathway. A pathway is considered belong to an organism if the total cost between the initial and final compound is higher than a threshold. Validation experiments show that the suggested algorithm is capable to classify more than 90% of pathways correctly.

Evaluation of regression models in metabolic physiology: predicting fluxes from isotopic data without knowledge of the pathway

Metabolomics, 2006

This study explores the ability of regression models, with no knowledge of the underlying physiology, to estimate physiological parameters relevant for metabolism and endocrinology. Four regression models were compared: multiple linear regression (MLR), principal component regression (PCR), partial least-squares regression (PLS) and regression using artificial neural networks (ANN). The pathway of mammalian gluconeogenesis was analyzed using [U) 13 C]glucose as tracer. A set of data was simulated by randomly selecting physiologically appropriate metabolic fluxes for the 9 steps of this pathway as independent variables. The isotope labeling patterns of key intermediates in the pathway were then calculated for each set of fluxes, yielding 29 dependent variables. Two thousand sets were created, allowing independent training and test data. Regression models were asked to predict the nine fluxes, given only the 29 isotopomers. For large training sets (>50) the artificial neural network model was superior, capturing 95% of the variability in the gluconeogenic flux, whereas the three linear models captured only 75%. This reflects the ability of neural networks to capture the inherent non-linearities of the metabolic system. The effect of error in the variables and the addition of random variables to the data set was considered. Model sensitivities were used to find the isotopomers that most influenced the predicted flux values. These studies provide the first test of multivariate regression models for the analysis of isotopomer flux data. They provide insight for metabolomics and the future of isotopic tracers in metabolic research where the underlying physiology is complex or unknown.

Combining Probabilistic Graphical Model-based and Knowledge-based Methods for Automatic Reconstruction of Metabolic Pathways

Background Automatic reconstruction of metabolic pathways for an organism from genomics and transcriptomics data has been a challenging and important problem in bioinformatics. Traditionally, known reference pathways can be mapped into an organism-specific ones based on its genome annotation and protein homology. However, this simple knowledge-based mapping method might produce incomplete pathways and generally cannot predict unknown new relations and reactions. In contrast, ab initio metabolic network construction methods can predict novel reactions and interactions, but its accuracy tends to be low leading to a lot of false positives. Results Here we combine existing pathway knowledge and ab initio Bayesian probabilistic graphical models together in a novel fashion to improve automatic reconstruction of metabolic networks. Specifically, we built a knowledge database containing known gene / protein interactions and metabolic reactions extracted from existing reference pathways. Kno...

Automated refinement and inference of analytical models for metabolic networks

Physical Biology, 2011

The reverse engineering of metabolic networks from experimental data is traditionally a labor-intensive task requiring a priori systems knowledge. Using a proven model as a test system, we demonstrate an automated method to simplify this process by modifying an existing or related model--suggesting nonlinear terms and structural modifications--or even constructing a new model that agrees with the system's time series

Towards Elucidating Regulatory Structure of Metabolic Networks for Dynamic Modeling

2020

The ability to understand and manipulate metabolism is of great value in the chemical industry, as it opens the door to engineering organisms to make valuable small molecule chemicals and intermediates. However, even simple organisms like bacteria and yeast have extremely complex metabolic networks, consisting of typically well-characterized stoichiometric relationships and often poorly-characterized regulatory relationships. We have recently developed a framework for constraintbased dynamic modeling of metabolic networks, but one of the outstanding challenges in applying this framework is the need for better ways to infer the regulatory network structure in cases where only stoichiometry, not regulatory structure, is known. We will discuss the applications of machine learning relevant to developing a predictive understanding of cellular metabolism, including the use of data from systems-scale measurement of small molecules (known as metabolomics) coupled with inferred or explicitly...