Historical development of Escherichia coli genome-scale models. Development of existing and potential future genome-scale models (both metabolic, shown in orange, and metabolic and macromolecular expression (ME) models shown in blue) of E. coli. The genome-scale metabolic model of E. coli first appeared in the early 2000s. An increasing scope of biological functions has been incorporated into the model, leading to various generations of the metabolic models as new discoveries were made. In the early 2010s, ME models that incorporate transcription and translation mechanisms emerged. Multiple efforts followed to improve and expand the ME model. Going into the 2020s, extensions of stress response modules have been added to ME models. Future directions involve incorporation of the sensome [G] to form the StressMe model, and the inclusion of toxins, biosynthetic gene clusters and cell cycle. Ovals indicate models, and boxes represent data incorporated to generate the models. According to the naming convention for network reconstructions, model names consist of an ‘i’ for in silico followed by the initials of the person(s) who built the model, and the number of open reading frames accounted for in the reconstruction. Figure taken from [11].

Genome-scale network reconstructions are built from curated and systematized knowledge that enables them to quantitatively describe genotype–phenotype relationships [1] [2]. In the case of metabolism, by mapping the annotated genome sequence to a metabolic knowledge base such as the Kyoto Encyclopedia of Genes and Genomes (KEGG) [3], one can reconstruct a metabolic network composed of all known metabolic reactions. Such genome-scale reconstructed networks represent organized and systematized knowledge-bases that have multiple uses, including conversion into computational models that interpret and predict phenotypic states and the consequences of environmental and genetic perturbations.

The metabolic network in a genome-scale metabolic reconstruction can be converted into a mathematical format — a stoichiometric matrix (S matrix) — where the columns represent reactions, rows represent metabolites, and each entry is the corresponding coefficient of a particular metabolite in a reaction. The resulting genome-scale model (GEM) is a mathematical representation of the reconstructed network that facilitates computation and prediction of multi-scale phenotypes through the optimization of an objective function of interest. These models have been developed for a wide variety of organisms, including iterative updates for critical organisms such as E. coli that continually improve the predictive performance of these models (Figure 1).

Flux balance analysis (FBA) is the most widely used approach to characterize GEMs [4]. GEMs can simulate metabolic flux states of the reconstructed network while incorporating multiple constraints to ensure the solution identified by FBA is physiologically relevant and compliant with governing constraints; such as the metabolic network topology represented by the S matrix, a steady-state assumption (for example, the internal metabolites must be produced and consumed in a flux-balanced manner), and other limits on nutrient uptake rates, enzyme capacities, and protein/gene expression profiles. The S matrix and the objective function define a system of linear equations that can be solved given the imposed constraints, resulting in a solution space (that is, a space where all feasible phenotypic states exist). FBA can identify a single or multiple optimal flux distributions that optimize the objective function in the solution space. FBA and many other GEM analysis methods are available through the COBRApy package in Python [5] or the COBRA Toolbox in MATLAB [6].

GEMs have been successfully implemented for a wide range of applications, including understanding microorganisms, metabolic engineering, drug development, prediction of enzyme functions, understanding microbial community interactions and human disease. These genome-scale models now enable us to develop pan-genome analyses [7] that provide mechanistic insights, detail the selection pressures on proteome allocation [8], and address stress phenotypes [9] [10].


Genome-scale metabolic modeling is continually being developed as a part of the Genome Design program managed by Dr. Daniel Zielinski, a Project Scientist in the Systems Biology Research Group. The Genome Design group is developing computational modeling and machine learning technologies to predict microbial phenotypes.


Citations/Further Reading

  1. Thiele I, Palsson BO. A protocol for generating a high-quality genome-scale metabolic reconstruction. Nat Protoc. 2010;5: 93–121.
  2. Norsigian CJ, Fang X, Seif Y, Monk JM, Palsson BO. A workflow for generating multi-strain genome-scale metabolic models of prokaryotes. Nat Protoc. 2020;15: 1–14.
  3. Kanehisa M, Furumichi M, Tanabe M, Sato Y, Morishima K. KEGG: new perspectives on genomes, pathways, diseases and drugs. Nucleic Acids Res. 2017;45: D353–D361.
  4. Orth JD, Thiele I, Palsson BØ. What is flux balance analysis? Nat Biotechnol. 2010;28: 245–248.
  5. Ebrahim A, Lerman JA, Palsson BO, Hyduke DR. COBRApy: COnstraints-Based Reconstruction and Analysis for Python. BMC Syst Biol. 2013;7: 74.
  6. Heirendt L, Arreckx S, Pfau T, Mendoza SN, Richelle A, Heinken A, et al. Creation and analysis of biochemical constraint-based models using the COBRA Toolbox v.3.0. Nat Protoc. 2019;14: 639–702.
  7. Monk JM, Charusanti P, Aziz RK, Lerman JA, Premyodhin N, Orth JD, et al. Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments. Proc Natl Acad Sci U S A. 2013;110: 20338–20343.
  8. Lloyd CJ, Ebrahim A, Yang L, King ZA, Catoiu E, O’Brien EJ, et al. COBRAme: A computational framework for genome-scale models of metabolism and gene expression. PLoS Comput Biol. 2018;14: e1006302.
  9. Du B, Olson CA, Sastry AV, Fang X, Phaneuf PV, Chen K, et al. Adaptive laboratory evolution of Escherichia coli under acid stress. Microbiology. 2020;166: 141–148.
  10. Chen K, Gao Y, Mih N, O’Brien EJ, Yang L, Palsson BO. Thermosensitivity of growth is determined by chaperone-mediated proteome reallocation. Proc Natl Acad Sci U S A. 2017;114: 11548–11553.
  11. Fang X, Lloyd CJ, Palsson BO. Reconstructing organisms in silico: genome-scale models and their emerging applications. Nat Rev Microbiol. 2020;18: 731–743.