A protocol for generating a high-quality genome-scale metabolic reconstruction - PubMed (original) (raw)

A protocol for generating a high-quality genome-scale metabolic reconstruction

Ines Thiele et al. Nat Protoc. 2010 Jan.

Abstract

Network reconstructions are a common denominator in systems biology. Bottom-up metabolic network reconstructions have been developed over the last 10 years. These reconstructions represent structured knowledge bases that abstract pertinent information on the biochemical transformations taking place within specific target organisms. The conversion of a reconstruction into a mathematical format facilitates a myriad of computational biological studies, including evaluation of network content, hypothesis testing and generation, analysis of phenotypic characteristics and metabolic engineering. To date, genome-scale metabolic reconstructions for more than 30 organisms have been published and this number is expected to increase rapidly. However, these reconstructions differ in quality and coverage that may minimize their predictive potential and use as knowledge bases. Here we present a comprehensive protocol describing each step necessary to build a high-quality genome-scale metabolic reconstruction, as well as the common trials and tribulations. Therefore, this protocol provides a helpful manual for all stages of the reconstruction process.

PubMed Disclaimer

Figures

Figure 1

Figure 1. Overview of the procedure to iteratively reconstruct metabolic networks

In particular stages 2 to 4 are continuously iterated until model predictions are similar to the phenotypic characteristics of the target organism and/or all experimental data for comparison are exhausted.

Figure 2

Figure 2. Refinement of reconstruction content

The draft reconstruction is converted into a curated reconstruction by re-evaluation of the content. In particular, the metabolic reactions, obtained from biochemical databases or the literature, need to be tested for mass- and charge balancing. Many resources omit protons and water. Furthermore, adjusting metabolites to a particular pH may change their charged formulae and thus may require correction of the network reaction. For instance, the reaction catalyzed by the glucokinase which was obtained from KEGG is not mass- and charge-balanced when charged metabolite formula at pH 7.2 is considered. The right hand side (RHS) is missing an H and the charge is unbalanced. Adding a proton to the RHS balances both sides of the equation in terms of protons and electrons/charge. Abbreviations: glc – D-glucose, g6p – D-glucose-6-phosphate, atp – adenosine-triphosphate, adp – adenosine-diphosphate, H+ - proton. CS – confidence level.

Figure 3

Figure 3

List of functional groups, their charge formula and the corresponding pKa.

Figure 4

Figure 4. Examples of network evaluation

The network evaluation and debugging stage (stage 4) includes various QC/QA tests, some of which are illustrated in this figure. For instance, mass-and charge-balancing of the network reaction is crucial to ensure similar properties of the model and the cell or organism. A standard test for most metabolic reconstructions is to verify that each biomass precursor, which makes up a new cell, can be produced by the model in different growth conditions (e.g., minimal medium, different carbon sources, etc.). Other QC/QA tests may include the capability to secrete certain metabolites given a particular growth condition. At its end, the models will have similar properties as the cell and error cases can be used to systematically refine the models and thus the reconstruction content.

Figure 5

Figure 5. Gene-protein-reaction (GPR) associations

Examples of GPR associations and their representation in Boolean format are shown for E. coli.

Figure 6

Figure 6. Growth associated maintentance (GAM) and non-growth associated maintenance (NGAM)

The best way to obtain accurate information regarding the GAM and NGAM is by plotting growth data obtained from chemostat growth experiments. GAM and NGAM can be directly read from the plot.

Figure 7

Figure 7. Conversion of reconstruction into a condition-specific model

This conversion requires three main steps. 1. The first step involves the mathematical representation by a stoichiometric matrix, S, of the network reaction list. The columns of S correspond to the network reactions, while the rows represent the network metabolites. The substrates in a reaction are defined to have a negative coefficient, while products have a positive value. The metabolites participating in a reaction have non-zero entry in the S matrix. 2. Now that the reconstruction is in a computer-readable format, the systems boundaries need to be defined. In particular, this means that for all metabolites that can be consumed or secreted by the target cell a so-called exchange reaction needs to be added to the reconstruction. The exchange reactions can be employed in later simulation to define for example environmental conditions (e.g., carbon source). 3. As a last step, constraints will be added to the reconstruction, thus rendering it to a condition-specific model. Mass conservation is a basic physical law. All steady-states can be thus described by S.v = 0 where v is a vector of reaction fluxes. Adding further constraints such as thermodynamics (reaction directionality), enzyme capacity or regulation (i.e., presence or absence of an enzyme) to the model will lead to a smaller, more confined set of feasible steady-states flux solutions.

Figure 8

Figure 8. Gap analysis

The gap analysis includes the identification and the tentative filling of network gaps. A. While many dead-end metabolites that create network gaps can be connected to the network by re-evaluating genomic and experimental data, some dead-end metabolites will remain in the refined, curated reconstruction. These dead-end metabolites can be grouped into two groups, depending on which type of reactions could connect them to the remaining network: knowledge gaps and scope gaps. While knowledge gaps represent missing biochemical knowledge for the target organism, the scope gaps include reactions and cellular processes, which are currently not accounted for in the metabolic reconstruction (e.g., DNA methylation). B. There are at least two approaches to identify gaps in the reconstruction. In the connectivity based approach, one can count the non-zero entries in each row of the S matrix and identify those metabolites, which are only produced or consumed. In the example, metabolite D is only produced by reaction v3 and the S matrix contains only one entry in the row corresponding to metabolite D. A second approach is based on model functionality: In this approach the models capability to carry flux through every network reaction is tested. This approach identifies blocked reactions, which are directly or indirectly associated with one or more dead-end metabolites. In the shown example, one would not identify metabolite E as a dead-end metabolite with the connectivity based approach as it is produced and consumed in the network. However, testing for flux through reactions containing E will show that reaction v3 and b3 cannot carry any flux in this model. C. Two sample cases are shown which address the question of filling a gap or not.

Figure 9

Figure 9. in silico gene essentiality study as network evaluation tool

While agreement of gene essentiality between experimental and in silico data is very helpful to validate the reconstruction content and model setup, analysis of the inconsistencies will enable discovery of new biological knowledge

Figure 10

Figure 10. Components of the model structure in Matlab

The reconstruction is imported into Matlab (Step 39). The entire reconstruction content is stored in a structure array. The screen shot illustrates the main fields contained in the model structure. The information is stored in subarrays in these fields. Note that the order of the reactions and metabolites corresponds to the order of columns and rows in the S matrix, respectively.

Figure 11

Figure 11. Flow chart to calculate the fractional contribution of a precursor to the biomass reaction

This approach can be used for amino acids, nucleotide triphosphates (ATP, GTP, CTP, UTP), and deoxy-nucleotide triphosphates (dATP, dGTP, dCTP, dTTP). The steps are illustrated for L-alanine (Ala). (A) The fractional contribution of alanine to the proteome is obtained from experimental data or estimated from genome sequence. (B) To convert the molar percentage into weight of alanine per mole protein, the molar percentage is multiplied by the molecular weight of alanine. Note that the polymerization of amino acid leads to the loss of a water molecule, which needs to be considered when calculating the molecular weight. Once the weight of amino acid per mole protein is obtained for all amino acids, they are summed to obtain the weight of protein per mole protein. (C) The weight of alanine per mole protein is converted into weight alanine per weight protein by multiplying with the sum of all amino acids’ weight. (D) Finally, the weight of alanine is multiplied by the cellular content of protein (see Figure 13A) and divided by its molecular weight to obtain the mole alanine per cell dry weight. Multiplying this molar contribution by a factor of 1000 will result in a final unit of mmol alanine per gram dry weight.

Figure 12

Figure 12. Determination of the content of soluble pool

Depending on the available information from literature, measurements or database entries the conversion into mmol/gDW and g/gDW is shown. The value in the purple box corresponds to the stoichiometric coefficient in the biomass reactions for the precursor. a Information was obtained from Cybercell Database (CCDB, see Table 1 for the link).

Figure 13

Figure 13. Determination of growth associated maintenance (GAM) cost

A. Calculation of growth-associated maintenance cost. B. Sample calculation for E. coli. The energy necessary for the synthesis of the macromolecules from the building blocks were obtained from Table 5 – 6 of Chapter 3 in Neidhardt et al.. The coefficient cP, cD, cR were calculating the total energy necessary for the macromolecules divided by the total number of building blocks (See Neidhardt et al.).

Figure 14

Figure 14. Flow chart on debugging network reactions that cannot carry flux

‘rxn ‘ stands for reaction. ‘conf’ stands for confidence score. ‘met’ stands for metabolite.

Similar articles

Cited by

References

    1. Almaas E, Kovacs B, Vicsek T, Oltvai ZN, Barabasi AL. Global organization of metabolic fluxes in the bacterium Escherichia coli. Nature. 2004;427:839–843. - PubMed
    1. Thiele I, Price ND, Vo TD, Palsson BO. Candidate metabolic network states in human mitochondria: Impact of diabetes, ischemia, and diet. J Biol Chem. 2005;280:11683–11695. - PubMed
    1. Pal C, et al. Chance and necessity in the evolution of minimal metabolic networks. Nature. 2006;440:667–670. - PubMed
    1. Barrett CL, Herring CD, Reed JL, Palsson BO. The global transcriptional regulatory network for metabolism in Escherichia coli attains few dominant functional states. Proceedings of the National Academy of Sciences of the United States of America. 2005;102:19103–19108. - PMC - PubMed
    1. Covert MW, Knight EM, Reed JL, Herrgard MJ, Palsson BO. Integrating high-throughput and computational data elucidates bacterial networks. Nature. 2004;429:92–96. - PubMed

Publication types

MeSH terms

LinkOut - more resources