NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …
Transcript of NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME …
The Pennsylvania State University
The Graduate School
College of Engineering
NONLINEAR OPTIMIZATION TECHNIQUES FOR GENOME-SCALE FLUX
ELUCIDATION AND KINETIC PARAMETERIZATION
A Dissertation in
Chemical Engineering
by
Saratram Gopalakrishnan
© 2019 Saratram Gopalakrishnan
Submitted in Partial Fulfillment
of the Requirements
for the Degree of
Doctor of Philosophy
August 2019
The dissertation of Saratram Gopalakrishnan was reviewed and approved* by the
following:
Costas D. Maranas
Donald B. Broughton Professor of Chemical Engineering
Dissertation Advisor
Chair of Committee
Phillip Savage
Walter L. Robb Family Department Head Chair of CHE
Kristen Fichthorn
Merrell Fenske Professor of Chemical Engineering
Professor of Physics
Andrew Patterson
Tombros Early Career Professor
Associate Professor of Molecular Toxicology
Associate Professor of Biochemistry & Molecular Biology
*Signatures are on file in the Graduate School
iii
ABSTRACT
Modeling metabolism elucidates the relationship between the genetic state, the
environment, and the phenotype of an organism which provides insights into its biological
objectives and informs metabolic engineering strategies. Steady-state metabolism is
typically modeled using stoichiometric frameworks based on Flux Balance Analysis (FBA)
which cannot capture the effect of intracellular metabolite concentrations, enzyme
abundances, and regulatory effects. This often results in over-prediction of fluxes or
prediction of metabolic states that are not physiologically relevant. A first step towards the
construction of predictive metabolic models is the inclusion of kinetic descriptions for
metabolic reactions that enable the model to faithfully capture the influence of metabolite
concentrations. Since kinetic information generally cannot be imported from databases
such as BRENDA due to paucity of organism-specific kinetic descriptions or differences
in assay conditions, in vivo kinetic parameters must be estimated using metabolic flux and
intracellular concentration data. This thesis details the development of nonlinear
regression-based tools to first elucidate genome-scale metabolic fluxes for all intracellular
reactions using 13C-Metabolic Flux Analysis (13C-MFA) and then use this fluxomic data
to construct a large-scale kinetic model of metabolism that recapitulates the effects of
single gene-deletions.
Metabolic models used in 13C-MFA generally include a limited number of reactions
primarily from central metabolism. They typically omit degradation pathways, complete
cofactor balances, and atom transition contributions for reactions outside central
metabolism. Scaling up 13C-MFA to the genome-scale first requires the construction of a
iv
genome-scale carbon mapping model that accurately traces the path of all carbon atoms
through the various intracellular reactions. Two mapping models imEco726 and imSyn617
are constructed for E. coli and Synechocystis PCC 6803, respectively. imEco726 is
deployed for steady-state flux elucidation in E. coli to reveal the expansion in flux ranges
relative to a core metabolic model due to the inclusion of redundant carbon paths and
elucidate the loss of information arising from projecting fluxes elucidated from core
models onto expanded models for subsequent analyses such as strain design and kinetic
modeling. imSyn617 is deployed for flux elucidation in Synechocystis using transient
labeling data to uncover the role of a novel bifurcated pathway topologies central to
maximizing the routing of carbons towards growth.
Finally, K-FIT, a decomposition-based approach for estimating kinetic parameters given
steady-state fluxomic data is introduced. K-FIT offers orders of magnitude improvements
in CPU time over meta-heuristic based approaches. The speed-up is mostly due to the
efficient identification of steady-state fluxes using a fixed-point iteration scheme that
iterates between two linear sub-problems, thereby largely bypassing the computationally
expensive numerical integration steps. The applicability of this approach to large-scale
models is demonstrated by parameterizing an expanded kinetic model for E. coli (307
reactions and 258 metabolites) using fluxomic data for six mutants to explain the role of
flux rerouting through energy metabolism to meet biosynthetic ATP and NADPH
demands. The speed-up afforded by K-FIT is transformational as it enables follow-up
robustness of inference analyses and optimal design of experiments that can inform
metabolic engineering strategies.
v
TABLE OF CONTENTS
LIST OF FIGURES ..................................................................................................... viii
LIST OF TABLES ....................................................................................................... x
ACKNOWLEDGEMENTS ......................................................................................... xi
Chapter 1
Introduction......................................................................................................... 1
1.1. Modeling metabolism .................................................................................... 1
1.2. Requirements for constructing predictive models of metabolism ................. 4
1.3. Flux elucidation using 13C-MFA .................................................................. 5
1.4. Construction of kinetic models of metabolism .............................................. 10
1.5. Aim and outline of the thesis ......................................................................... 10
Chapter 2
13C Metabolic flux analysis at the genome-scale .................................................... 21
2.1. Introduction .................................................................................................... 21
2.2. Methods ......................................................................................................... 26
2.2.1. Genome-scale atom mapping model ................................................... 26
2.2.2. Flux estimation procedure ................................................................... 27
2.2.3. Confidence intervals ............................................................................ 28
2.3. Results............................................................................................................ 29
2.3.1. Active EMU network .......................................................................... 29
2.3.2. Flux identifiability and statistical validity of the model ...................... 33
2.3.3. Flux and range estimation at the genome-scale ................................... 35
2.4. Discussion ...................................................................................................... 42
Chapter 3
Elucidation of photoautotrophic carbon flux topology in Synechocystis
PCC 6803 using genome-scale carbon mapping models .................................. 63
3.1. Introduction .................................................................................................... 63
3.2. Methods ......................................................................................................... 67
3.2.1. Construction of imSyn617 ................................................................... 67
3.2.2. Algorithmic procedure for flux estimation based on least-squares
minimization .................................................................................................. 69
3.3. Results............................................................................................................ 70
3.3.1. New carbon paths covered by mapping model imSyn617 .................. 70
3.3.2. Comparison of elucidated fluxes between using imSyn617 and
core mapping models ..................................................................................... 72
vi
3.3.3. New insights on carbon paths gained using imSyn617 ....................... 76
3.4. Discussion ...................................................................................................... 79
Chapter 4
K-FIT: An accelerated kinetic parameterization algorithm using steady-
state fluxomic data .............................................................................................. 98
4.1. Introduction .................................................................................................... 98
4.2. Methods ......................................................................................................... 103
4.2.1. Kinetic parameterization using K-FIT ................................................. 103
4.2.2. Construction of the expanded kinetic model for E. coli, k-ecoli307 ... 104
4.3. Results............................................................................................................ 105
4.3.1. The K-FIT algorithm ........................................................................... 105
4.3.2. Benchmarking K-FIT against Ensemble Modeling ............................. 107
4.3.3. Parameterization of a kinetic model (k-ecoli307) for E. coli with
near-genome-scale coverage ......................................................................... 111
4.4. Discussion ...................................................................................................... 117
Chapter 5
Summary and future work ................................................................................ 136
5.1. Summary ........................................................................................................ 136
5.2. Completed and ongoing research ................................................................... 139
5.3. Future directions ............................................................................................ 141
Appendix A
Flux elucidation at isotopic steady-state ........................................................... 144
A.1. Predicting labeling patterns .......................................................................... 144
A.2. Least-squares NLP ........................................................................................ 145
A.3. Implementation ............................................................................................. 146
A.4. Estimation of confidence intervals ............................................................... 147
Appendix B
Flux elucidation procedure for isotopic instationary MFA ............................ 149
B.1. Least-squares NLP for flux and pool size estimation ................................... 149
B.2. Dynamic EMU balances and simulation of labeling distributions ............... 155
B.3. An improved algorithm for simulating labeling dynamics and
sensitivities .................................................................................................... 163
vii
Appendix C
Mathematical description of K-FIT .................................................................. 166
C.1. Overview of elementary step decomposition ................................................ 166
C.2. Nonlinear least-squares regression0based procedure for kinetic
parameterization ............................................................................................ 181
C.3. K-SOLVE: Anchoring kinetic parameters to the WT flux distributions ...... 190
C.4. SSF-Evaluator: Evaluation of steady-state fluxes for the mutant
networks using the kinetic parameter assignments of K-SOLVE ................. 196
C.4.1. Fixed-point iteration (FPI) .................................................................. 199
C.4.2. Netwon’s method for accelerating convergence ................................. 200
C.4.3. Richardson’s Extrapolation when 𝑱 becomes singular ....................... 204
C.4.4. Integration of FPI, Newton’s method, and semi-implicit
integration into a single pipeline ................................................................... 208
C.5. NLP problem K-FIT ...................................................................................... 210
C.6. K-UPDATE procedure that checks for convergence and updates kinetic
parameters using the approximate gradient and Hessian of 𝜙 ...................... 212
C.7. Algorithmic description of K-FIT ................................................................. 219
References ............................................................................................................................... 222
viii
LIST OF FIGURES
Figure 1.1: A toy reaction network example for MFA. ........................................................... 16
Figure 1.2: Isotopomers, cumomers, and EMUs for metabolite A. ......................................... 17
Figure 2.1: Comparison of prediction of experimentally observed amino acid MS
data by the core model and the GSM model .................................................. 50
Figure 2.2: Comparison of fluxes elucidated using 2-13C-glucose with the core
model and GSM model... .................................................................................. 51
Figure 2.3: Resolution of energy metabolism in core model and GSM model. ............ 56
Figure 2.4: Loss of information flux ranges are estimated using FVA with core
model-based MFA derived flux ranges as constraints... ............................... 57
Figure 2.5: Flux distribution comparison for core model and GSM model using 5-
13C glucose tracer... .......................................................................................... 58
Figure 3.1: Representation of central metabolism in Synechocystis ............................... 84
Figure 3.2: Carbon incorporation paths and conserved moiety cycling in
imSyn617 ............................................................................................................ 85
Figure 3.3: Recapitulation of experimentally observed labeling distributions .............. 87
Figure 3.4: Flux ranges for central metabolism in Synechocystis ................................... 88
Figure 3.5: Bifurcated topology in the photorespiratory pathway and the TCA
cycle ..................................................................................................................... 89
Figure 3.6: Recapitulation of labeling dynamics of CBB intermediates ........................ 91
Figure 3.7: Carbon positional shifts in upper glycolysis of Synechocystis .................... 93
Figure 3.8: F-Test on the oxidative pentose phosphate pathway ..................................... 95
Figure 3.9: F-Test on Transaldolase .................................................................................... 97
Figure 4.1: Overview of the core loop of the K-FIT algorithm ....................................... 122
Figure 4.2: Flux distribution through central metabolism in k-ecoli307 ........................ 123
Figure 4.3: Uncertainty in estimation of Michaelis-Menten parameters in k-
ecoli307 ............................................................................................................... 129
ix
Figure 4.4: Overview of the K-FIT algorithm showing the flow of information
between various components ........................................................................... 130
Figure 4.5: Test models used for benchmarking the performance of K-FIT against
GA-based EM procedure .................................................................................. 131
Figure 4.6: Uncertainty in estimation of kinetic parameters and WT enzyme
fractions ............................................................................................................... 134
Figure B.1.: Flux balance for EMU M23 ............................................................................. 158
x
LIST OF TABLES
Table 1.1: Reaction stoichiometry and atom mapping for toy network. .................................. 20
Table 2.1: 𝜒2 degrees of freedom for the core model and the genome-scale model. ... 48
Table 2.2: Additional suggested MS measurements for resolving various alternate
routes..................................................................................................................... 49
Table 4.1: Comparison of product yields predicted by k-ecoli307 against experimental
yields.. .................................................................................................................... 49
Table B.1: Four types of reaction classes impacting EMU balances.. .................................... 156
Table C.1: List of elementary steps describing the catalytic mechanism and regulation of
enzyme activity.. .................................................................................................... 167
Table C.2: Elementary step decomposition for various reactions.. ......................................... 180
xi
ACKNOWLEDGEMENTS
Completing the requirements for a PhD degree requires the professional and personal contribution
of many people. First and foremost, I would like to extend my most sincere gratitude to my advisor
Dr. Costas Maranas who guided me through every step of the way, taught me to perform high
quality research and communicate the work in a coherent manner. The long-term goals of this
project have been achievable largely due to his broad and deep scientific knowledge and expert
student advising style. His strong emphasis on communication of research in addition to his
guidance on application of correct research methodology has help shape me into the researcher I
am today. Words cannot express the crucial role played by the constant encouragement and
unwavering faith from my parents Gopal and Kala in the successful completion of my PhD. My
brother Saran deserves a special mention for providing an outsider’s perspective to scientific
research and bringing to my attention the importance of recording failed experiments. I would also
like to thank Rajib Saha, Akhil Kumar, Anupam Chowdhury, Ali Khodayari, Satyakam Dash, Ratul
Chowdhury, Shyam Srinivasan, John Hendry, Thomas Mueller, and the other members of Dr.
Maranas’ research group for all the engaging discussions, research and otherwise, on an almost
daily basis. Finally, I would like to thank Achyut, Gaurav Kumar, Sandeep, and Arpan Sircar for
providing a weekly reminder that a world outside of research also exists.
Chapter 1
Introduction
1.1. Modeling metabolism
Metabolism is a complex network of biochemical reactions that fuels growth and
homeostasis in all organisms and determines its phenotype. By leveraging pathways of
enzyme-catalyzed reactions, metabolism enables the production of antibiotics,
nutraceuticals, and biopolymers at ambient conditions (temperature and pressure) that
would typically be economically unviable using traditional chemical processes.
Quantification of metabolism provides insights into driving forces behind cellular
physiology and is required to study the pathophysiology of non-infectious diseases,
identify efficient intervention strategies to aid drug discovery, and suggest engineering
strategies to increase the production of high-value chemicals. Owing to its complexity, it
is desirable to study metabolism with the aid of predictive mathematical models which
provide insights into pathway usage. Furthermore, with advancements in gene-editing
technologies, the emphasis falls on predictive models to inform decisions on metabolic
engineering and accelerate build-design-test cycles for the construction of engineered
organisms capable of carrying out specialized functions.
Generally, metabolism is modeled at metabolic steady-state conditions where the
concentrations of intracellular metabolites, enzymes, and the other cellular components are
unchanging. If 𝐼 = {1,2, … ,𝑀} is the set of metabolites, 𝐽 = {1,2, … ,𝑁} is the set of
2
reactions in a metabolic model, 𝑣𝑗 is the metabolic flux (reaction rate per cell) through
reaction 𝑗 ∈ 𝐽, and 𝑆𝑖𝑗 is the stoichiometric coefficient for metabolite 𝑖 ∈ 𝐼 in reaction 𝑗 ∈
𝐽, conservation of mass across any metabolite 𝑖 at pseudo-steady-state is defined as:
∑𝑆𝑖𝑗𝑣𝑗
𝑁
𝑗=1
= 0
Since the number of metabolites is always less than the number of reactions in the
metabolic model, the above equality represents the set of all feasible metabolic flux
distributions attainable by the metabolic model. 𝑣𝑗 is actually expressed as a function of
enzyme concentration and metabolite concentrations that represents the kinetic rate law for
the enzyme-catalyzed reaction 𝑗. However, since all concentrations are unchanging at
metabolic steady-state, concentrations and kinetic constants are lumped together into the
quantity 𝑣𝑗 in the stoichiometric framework.
Flux prediction in the stoichiometric framework is generally performed using Flux Balance
Analysis (FBA) (Varma and Palsson, 1994) which elucidates fluxes by maximizing a
“biological objective” such as growth rate. The corresponding flux ranges are elucidated
using Flux Variability Analysis (FVA) (Mahadevan and Schilling, 2003) to account for the
fact that FBA solves an underdetermined system of linear algebraic equations which can
have alternate solutions. Large flux ranges reported by FVA have motivated the
development of more data-driven approaches such as 13C-Metabolic Flux Analysis (13C-
MFA) which traces metabolism using a stable-isotope tracer (usually 13C) and elucidates
fluxes by leveraging the property that different pathways rearrange the carbon backbone of
3
intracellular metabolites in different ways. Although, 13C-MFA affords higher accuracy
in flux elucidation, it requires as much as two minutes to elucidate all fluxes in central
metabolism with high precision. In comparison, FBA predicts fluxes by solving a linear
programming (LP) problem and is able to report fluxes for the same metabolic network
within a fraction of a second, albeit with very low precision. Although 13C-MFA is able
to provide more meaningful insights, it must be noted that it is an analysis tool and has no
predictive capabilities whatsoever. For a standard model organism such as E. coli, FBA
predicts that the maximum biomass yield is 93 gDW/mol-glucose which is 17% higher
than the actual biomass yield of 79.2 gDW/mol-glucose for the wild-type (WT) strain of
E. coli grown in M9 minimal media under aerobic conditions with glucose as the sole
carbon source (Feist et al., 2007). This is because, FBA cannot capture acetate secretion by
E. coli which is driven by regulation of enzyme activities and reaction kinetics.
Several strain design algorithms have been designed in the stoichiometric framework such
as Optknock (Burgard et al., 2003), RobustKnock (Tepper and Shlomi, 2010), BiMOMA
(Kim et al., 2011), and OptForce (Ranganathan et al., 2010). These approaches have
successfully aided the construction of glutamate and succinate overproducing strains in E.
coli (Kim et al., 2011), hydrogen overproduction in Clostridium acetobutylicum and
Methylobacterium extorquens (Pharkya et al., 2004), glycerol overproduction in
Saccharomyces cerevisiae (Patil et al., 2005), fatty acid production in E. coli (Ranganathan
et al., 2012; Xu et al., 2011) and overproduction of flavonoid precursor shikimate in
Saccharomyces cerevisiae (Suastegui et al., 2017). However, these frameworks are unable
to identify interventions to regulatory interactions such as the ptsG knockout for succinate
4
overproduction in E. coli (Chowdhury et al., 2015b) and engineering the transcription
regulator FapR for malonyl-CoA overproduction in E. coli (Xu et al., 2014). Limitations
with stoichiometric frameworks stem from the limited ability to capture the effects of
metabolomic fluctuations, proteomic fluctuations, enzyme saturation, and allosteric
regulation of enzyme activity (Chowdhury et al., 2015a; Saa and Nielsen, 2017).
Furthermore, stoichiometric frameworks afford limited support for integration of
transcriptomic, proteomic, and metabolomic datasets (Machado and Herrgard, 2014; Tian
and Reed, 2018) as changes in gene expression do not translate to proportional changes in
metabolic fluxes and the lack of kinetic descriptions precludes the investigation of model
dynamics, limiting assessment to metabolic steady-states only.
1.2. Requirements for constructing predictive models of metabolism
In response to these limitations, it is of interest to construct models of metabolism that can
explain and support the integration of metabolomic, proteomic, and transcriptomic data in
addition to fluxomic data. These facets of metabolism are captured by kinetic models that
relate fluxes to both metabolite concentrations and enzyme abundances. However, unlike
stoichiometric models that are constructed using genome annotations, biomass
composition measurements, and experimental yield measurements, construction of a
kinetic model is substantially more demanding in terms of data requirements and
appropriate kinetic parameter identification. The key requirements for parameterizing a
kinetic model include (i) precise metabolic flux distributions (generally obtained using
13C-MFA) in multiple genetic and/or environmentally perturbed conditions, (ii) an
appropriate rate law formalism that relates fluxes to metabolite concentrations, and (iii) an
5
efficient procedure for identifying the optimal kinetic parameters that recapitulates the
available experimental data. These requirements are further elaborated in the following
subsections.
1.3. Flux elucidation using 13C-MFA
The objective of 13C-MFA is to identify a suitable flux distribution that recapitulates the
labeling distribution of intracellular metabolites measured using NMR spectroscopy or
Mass Spectrometry using nonlinear least-squares regression. The primary requirements for
performing 13C-MFA at the genome-scale are (i) the availability of a well curated GSM
model, (ii) availability of a curated atom mapping model, (iii) availability of partial
positional labeling information, and (iv) a simulation platform for predicting intracellular
metabolite labeling distributions given flux distributions. For well-studied model
organisms such as E. coli, curated GSM models that faithfully capture the dispensability
of the reactions in the model and the yields of biomass and metabolic byproducts are
available and can be reliably used for flux elucidation. Such models do not contain non-
native pathways and accurately predict the functionality of existing metabolic pathways.
This is typically quantified using the sensitivity and specificity metrics that represent the
fraction of correctly predicted viable and lethal mutants, respectively given a set of gene
knockout data. Specificity is lowered when the model fails to recapitulate the essentiality
of any particular reaction due to the presence of alternate pathways with providing the same
functionality and sensitivity is lowered when the model incorrectly identifies a non-
essential gene as essential (Zomorrodi and Maranas, 2010) due to missing reactions in the
model. Generally, specificity can be improved by accounting for condition-specific
6
expression using frameworks such as R-GPRs (Nazem-Bokaee et al., 2016) and sensitivity
can be improved by looking for alternate genes capable of performing the same function
using a bidirectional BLAST search (Zomorrodi and Maranas, 2010). Models with low
sensitivity are often more difficult to resolve as the issue stems from incomplete and/or
incorrect gene annotation as well as uncharacterized side reactions catalyzed by already
annotated reactions. Models for well characterized organisms such as E. coli and
Synechocystis PCC 6803 typically have a sensitivity and specificity >80% and can be safely
used for 13C-MFA. On the other hand, less characterized organisms such as Clostridium
thermocellum (Xiong et al., 2018) must be deployed with caution for flux elucidation using
13C-MFA.
Atom mapping information for central metabolism remains conserved across all species
and is largely readily available. Atom mapping for peripheral metabolism can be obtained
from online databases such as MetaCyc (Caspi et al., 2014), KEGG (Tanabe and Kanehisa,
2012) and MetRxn (Kumar et al., 2012). When information is unavailable from databases,
automated mapping algorithms such as MCS (Chen et al., 2013), PMCD (Jochum et al.,
1980), EC (Morgan, 1965), MWED (Latendresse et al., 2012), and CLCA (Kumar and
Maranas, 2014) may be used to infer plausible mappings. Care must be taken to account
for organism-specific pathways and promiscuity of enzyme activity. For example, S.
cerevisiae contains yeast-specific pathways such as the α-aminoadipate pathway for lysine
biosynthesis (Xu et al., 2006) and Synechococcus elongatus UTEX 2973 contains the
phosphoketolase pathway which must be included in their respective atom mapping
models. In addition, Characterization of promiscuity of enzyme activity has added novel
7
metabolic reactions such as the riboneogenesis pathway (Clasquin et al., 2011) for which
atom mapping remains poorly established. For such reactions, mapping algorithms based
on graph theory are available (Latendresse et al., 2012). In particular, the recent CLCA
algorithm has been shown to be faster and more accurate in generating reaction atom maps
in compared to previous algorithms due to the constraints imposed by chemical and stereo-
chemical properties of reactions (Kumar and Maranas, 2014). Complex chemical entities
and incorrect determination of alternate reaction maps necessitate that the generated maps
must be manually inspected. Computational mapping algorithms generally rely on
SMILES notation (Latendresse et al., 2012) or graph invariance numbers (Weininger et al.,
1989), which is often very different from IUPAC numbering schemes. The limited
availability of inter-nomenclature conversion tools further complicates the inspection and
correction of data, often requiring additional visual support provided in MetaCyc
(Latendresse et al., 2012) and MetRxn databases (Kumar et al., 2012). Atom mapping is
represented using a string of identifiers such that the position of the atom identifier on the
reactants side maps to the position of the identifier on the products side. For the example
network shown in Figure 1.1, the atom mapping for all the relevant reactions is shown in
Table 1.1.
Once the stoichiometric model and the corresponding atom mapping model are available,
the next step in flux elucidation is the setting up of a framework to predict labeling
distributions when fluxes are known. A number of frameworks are available for this
purpose that operate on the concept of isotopomers. Isotopomers are the exhaustive set of
all possible configurations of labeled and unlabeled atoms for a given metabolite. For the
8
three-carbon metabolite 𝐴, the set and description of isotopomers is shown in Figure 1.2a.
for a molecule containing 𝑛 possibly labeled atoms, 2𝑛 isotopomers can be defined.
Therefore, for the three-carbon metabolite 𝐴, eight isotopomers exist. The [2𝑛 × 1] vector
of fractional abundance of each isotopomers is known as an isotopomers distribution vector
(IDV). Since stable isotopes of atoms cannot be created or destroyed in a biochemical
reaction network, conservation of mass across all isotopomers can be enforced using a
system of balance equations. This system of nonlinear algebraic equations forms the
Isotopomer framework for flux elucidation (Schmidt et al., 1997) from which IDVs can be
calculated either by numerical integration or by solving the system of algebraic equations
using Newton’s method.
It is important to note that experimental measurements using NMR spectroscopy or Mass
Spectrometry (MS) do not usually provide information on positional enrichment of atoms.
Instead, they detect the total number of labeled atoms per molecule based on mass shifts
arising from incorporation of heavier isotopes. Isotopomers that differ in the number of
labeled atoms only are termed mass isotopomers. For a molecule containing 𝑛 atoms, 𝑛 +
1 mass isotopomers can exist. The [(𝑛 + 1) × 1] vector containing the fractional
abundance of mass isotopomers is known as the mass-isotopomers distribution vector
(MDV) and are assembled from isotopomers for metabolite 𝐴 as shown in Figure 1.2a.
Once IDVs are computed in the isotopomers framework, the corresponding MDVs for the
metabolites whose labeling distribution is quantified by NMR or MS are assembled.
Finally, fluxes are elucidated that minimizes the deviation between predicted and measured
MDVs.
9
Since the number of isotopomers scales exponentially with number of atoms, it is often
intractable for large models, particularly since a system of nonlinear equations must be
solved repeatedly to compute IDVs and MDV. To partially alleviate this difficulty, the
concept of a cumulative isotopomers or cumomer was proposed (Wiechert et al., 1999).
The cumomers for metabolite 𝐴 are shown in Figure 1.2b. Note that the number of
isotopomers is equal to the number of cumomers for any metabolite and a linear mapping
can be established between isotopomers and cumomers. As such, recasting the isotopomers
balances in the cumomer space does not reduce the number of unknown variables.
However, recasting the equations in the cumomer space allows the equations to be
decomposed into a cascaded system of equations based on the fact that cumomer weights
are directionally coupled. This means that, for a given cumomer weight, a mass balance
equation only depends on cumomers of the same weight or less weight, but never depends
on a cumomer of higher weight. This reduces the cumomer framework to a system of linear
equations in unknown cumomers of a specified weight when fluxes are specified. The
ability to solve for cumomers as a cascaded system of linear algebraic equations
dramatically lowers the computation time required to solve for MDVs thereby allowing
networks as large as a core metabolic model to be analyzed using the cumomer framework
(Wiechert et al., 1997) while also opening up the possibility of performing local statistical
analyses on inferred fluxes in the form of confidence intervals.
While the computational advancement with the cumomer framework is noteworthy, the
main limitation with the isotopomers framework in terms of scalability and applicability to
more descriptive models of metabolism remains unaddressed by the cumomer framework.
10
In response to this limitation, the elementary metabolite units (EMU) framework
(Antoniewicz et al., 2007) was introduced. The EMU method introduces two major
modifications to the cumomer framework to improve tractability. First, EMUs group
together isotopomers so that they now represent MDVs for the specified EMU. This allows
EMUs to operate in the MDV space directly as opposed to the isotopomer space. As an
example, the grouping of isotopomers to EMUs is shown in Figure 1.2c. Following this,
the EMU method employs a depth-first search algorithm to identify the minimum number
of EMU balances required to simulate the MDV of a measured metabolite fragment, further
reducing the number of relevant balance equations to be solved. These improvements are
implemented while retaining the same cascaded problem structure in the cumomer
framework. Overall, this contributes to a 95% reduction in the number of balance equation
for a central metabolic model for E. coli that also contains amino acid pathways. Although
the EMU framework remains tractable and provides a substantial speed-up in flux
elucidation for a central metabolic model, tractability at the genome-scale has never been
demonstrated, due to which, it is unclear whether 13C-assisted flux elucidation can even
be applied to larger metabolic models directly in their current form.
1.4. Construction of kinetic models of metabolism
Kinetic parameters are identified by solving a nonlinear least-squares regression problem
that recapitulates experimentally measured temporal concentration profiles (Jahan et al.,
2016) or steady-state fluxes and metabolite concentrations (Khodayari et al., 2014) in
response to genetic and environmental perturbations. Steady-state fluxes are generally
elucidated using 13C-MFA when available. Otherwise, the WT flux distribution is sampled
11
using FBA and kinetic parameters are identified that recapitulate yields of products such
as biomass, ethanol, and acetate under multiple mutant conditions (Dash et al., 2017).
Flexibility exists in the choice of modeling framework relating fluxes to metabolite
concentrations and selection of optimization method used for estimation of kinetic
parameters. Bottom-up approaches are avoided due of lack of organism-specific
information and kinetic data generated at conditions different than in vivo growth
conditions, leading to haphazard data integration and construction of models that are either
unstable or have poor predictive capabilities. Limitations with bottom-up approaches have
motivated the development of various data-driven kinetic parameterization frameworks
such as ORACLE (Miskovic and Hatzimanikatis, 2010) which expresses rate laws using a
log-linear formalism (Hatzimanikatis and Bailey, 1997), MASS models (Jamshidi and
Palsson, 2008) expressing fluxes using mass-action kinetics, Ensemble Modeling (EM)
(Tran et al., 2008) relating fluxes to metabolite concentrations using mass-action kinetics
in conjunction with elementary-step decomposition of the mechanism of enzyme catalysis,
and GRASP (Saa and Nielsen, 2015) which uses the general Monod-Wyman-Changeaux
formalism (Monod et al., 1965) within a Bayesian paradigm. Besides these, models
employing Michaelis-Menten and Hill kinetic formalisms have also been parameterized
(Chassagnole et al., 2002; Srinivasan et al., 2018). Of all these frameworks, only ORACLE
and GRASP can provide a reliable estimate of local sensitivity of fluxes to metabolite
concentration fluctuations as they support easy computation of thermodynamically feasible
elasticities and control coefficients. Furthermore, GRASP is also able to provide a
distribution of kinetic parameters explaining the available experimental data as it is cast
within a Bayesian statistical framework. However, its predictive capabilities of ORACLE
12
are limited to the vicinity of the reference state about which linearization is performed and
GRASP has limited scalability due to insufficient sampling of higher dimensional kinetic
spaces using Monte-Carlo-based sampling techniques. The MASS framework assumes that
all reactions follow generalized mass-action kinetics. Since it does not account for enzyme
saturation effects, good predictive capabilities are limited to the substrate-limited regime
(Du et al., 2016). Of all the frameworks, MASS represents the kinetic model using the
fewest number of parameters and is therefore the most scalable (Saa and Nielsen, 2017).
Mechanistic frameworks such as EM and Michaelis-Menten-based formalisms represent
conservation of mass across metabolites using a system of ODEs, thereby requiring an
ODE solver (Hoops et al., 2006; Tran et al., 2008) for steady-state evaluation. Compared
to Michaelis-Menten formalisms, EM offers a tractable framework for relating fluxes,
enzyme abundances, metabolite concentrations, and kinetic parameters through specified
mechanisms which decompose into systems of bilinear equations. This allows easy
insertion/deletion of regulatory components without the need for reformulating the kinetic
rate-law expression. The main limitation with mechanistic frameworks is that due to the
large dynamic range of kinetic parameters, the system of ODEs can be stiff, thereby
rendering integration computationally expensive and susceptible to failure. Gradient
calculations must be performed using forward sensitivity analysis (Raue et al., 2013) as the
use of finite difference approximations for functions that are the solution to a system of
ODEs is computationally expensive, inefficient, and inaccurate (Frohlich et al., 2017).
Forward sensitivity analysis has very poor scalability and can require the solution of over
100,000 ODEs for a model with only 1,000 kinetic parameters. Owing to these limitations,
the use of metaheuristic approaches such as genetic algorithm (GA) (Khodayari et al.,
13
2014) and particle swarm optimization (Millard et al., 2017) for traversal of the feasible
solution space is favored. Meta-heuristic algorithms suffer from two key limitations: (i) the
exponential increase in the number of function evaluations required to adequately sample
the kinetic space upon model scale-up, and (ii) the inability to confirm optimality of a
reported solution due to the exclusion of gradient evaluations. The exclusion of gradient
calculations also prevents the evaluation of local sensitivities and any follow-up
calculations on uncertainty of estimated kinetic parameters. Although the GRASP
framework is compatible with the EM formalism, the poor scalability of the underlying
Monte-Carlo approach limits its application in uncertainty analysis of larger models. This
motivates the development of a kinetic parameterization framework that overcomes
difficulties associated with numerical integration and allows convenient calculation of
sensitivities to improve compatibility with local optimization solvers.
1.5. Aim and outline of the thesis
The objective of this thesis is to develop computational tools based on nonlinear
optimization to construct large-scale predictive models of metabolism integrated with
kinetic descriptions for fluxes using 13C labeling data. First, tools for flux elucidation and
confidence interval estimation for genome-scale will be described. Following this, a novel
decomposition-based algorithm for accelerated and reproducible parameterization of
kinetic models is introduced. This thesis is outlined as follows:
• Chapter 2 details the application of isotopic steady-state 13C-MFA to a genome-
scale metabolic network of E. coli. Metabolic models used in 13C metabolic flux
analysis generally include a limited number of reactions primarily from central
14
metabolism. They typically omit degradation pathways, complete cofactor
balances, and atom transition contributions for reactions outside central
metabolism. This chapter addresses the impact on prediction fidelity upon scaling-
up mapping models to a genome-scale. To this end, the genome-scale metabolic
mapping model (GSMM) (imEco726) is constructed using as a basis the iAF1260
model upon eliminating reactions guaranteed to not carry flux based on growth and
fermentation data for a minimal glucose growth medium. This chapter discusses
the role of stoichiometric flux coupling in the resolution of metabolic fluxes at the
genome-scale and the loss of information associated with mapping fluxes from
MFA on a core model to a GSM model is quantified.
• Chapter 3 describes the scale-up of existing algorithms for isotopic instationary
MFA to genome-scale models and demonstrates an application for flux elucidation
in Synechocystis PCC 6803. Completeness and accuracy of metabolic mapping
models impacts the reliability of flux estimation in photoautotrophic systems. In
this chapter, metabolic fluxes under photoautotrophic growth conditions in the
widely-used cyanobacterium Synechocystis PCC 6803 are quantified by re-
analyzing an existing dataset using genome-scale isotopic instationary 13C-
Metabolic Flux Analysis (INST-MFA). Flux elucidation using the genome-scale
carbon mapping model reveals a qualitatively different solution relative to that
predicted by a core model and identifies a novel bifurcated pathway topology that
enables maximum carbon routing towards biomass. Flux prediction departures
from the ones obtained with the core model demonstrate the importance of
15
constructing mapping models with global coverage to reliably glean new biological
insights using labeled substrates.
• Chapter 4 introduces a novel decomposition-based algorithm for estimation of
kinetic parameters using available fluxomic data. Parameterization of organism-
level kinetic models that faithfully reproduce the effect of different genetic or
environmental perturbations remains an open challenge due to the intractability of
existing algorithms. This chapter introduces K-FIT, an accelerated kinetic
parameterization workflow that leverages a novel decomposition approach to
identify steady-state fluxes in response to genetic perturbations followed by a
gradient-based update of kinetic parameters until predictions simultaneously agree
with the metabolic flux data for all perturbed metabolic networks. The applicability
of this approach to large-scale models is demonstrated by parameterizing an
expanded kinetic model for E. coli (307 reactions and 258 metabolites) using
fluxomic data for six mutants. The 1,000-fold speed-up afforded by K-FIT is
transformational as it enables follow-up robustness of inference analyses and
optimal design of experiments that can inform metabolic engineering strategies.
• Chapter 5 summarizes the accomplishments of this thesis, details some of the
successful follow-up work enabled by the work presented in this thesis and
discusses the possible future directions in the field of metabolic modeling enabled
by the work presented in this thesis.
16
Figure 1.1: A toy reaction network example for MFA
17
Figure 1.2: Isotopomers, cumomers, and EMUs for metabolite 𝐴. (a) Isotopomers and
the grouping of isotopomers into mass isotopomers. (b) Grouping of isotopomers into
cumomers. (c) Grouping of isotopomers into EMUs. The solid black circles represent
labeled atoms whereas the white circles represent unlabeled atoms.
(a)
18
(b)
19
(c)
20
Table 1.1: Reaction stoichiometry and atom mapping for toy network
Reaction Reaction Stoichiometry Reaction Atom Mapping
𝑣1 𝐴 → 𝐵 𝐴(𝑎𝑏𝑐) → 𝐵(𝑎𝑏𝑐)
𝑣2 𝐵 → 𝐷 𝐵(𝑎𝑏𝑐) → 𝐷(𝑐𝑏𝑎)
𝑣3 𝐷 → 𝐵 𝐷(𝑎𝑏𝑐) → 𝐵(𝑐𝑏𝑎)
𝑣4 𝐵 → 𝐷 𝐵(𝑎𝑏𝑐) → 𝐷(𝑎𝑏𝑐)
𝑣5 𝐵 → 𝐸 + 𝐶 𝐵(𝑎𝑏𝑐) → 𝐸(𝑎) + 𝐶(𝑏𝑐)
𝑣6 2𝐶 → 𝐷 + 𝐹 𝐶(𝑎𝑏) + 𝐶(𝑎𝑏) → 𝐷(𝑎𝑏𝑎) + 𝐹(𝑏)
𝑣7 𝐶 + 𝐹 → 𝐷 𝐶(𝑎𝑏) + 𝐹(𝑐) → 𝐷(𝑎𝑏𝑐)
21
Chapter 2
13C Metabolic flux analysis at the genome-scale
This chapter has been previously published in modified form in Metabolic Engineering
(Saratram Gopalakrishnan and Costas D. Maranas. 13C Metabolic flux analysis at the
genome-scale. Metabolic Engineering 32(2015): 12-22.)
2.1. Introduction
Cellular metabolism is a direct indicator of its physiological state (Nielsen, 2003).
Estimation of fluxes using 13C metabolic flux analysis involves solving a nonlinear least-
squares problem for the flux distribution capable of matching experimentally measured
labeling patterns of analyzed metabolites (Zomorrodi et al., 2012), typically amino-acids
and fatty acids. Labeling patterns given a flux distribution can be predicted by relating the
target labeling patterns to input tracers and a flux distribution using a system of algebraic
equations. This can be achieved by decomposing the network using various frameworks
such as isotopomers (Schmidt et al., 1997), cumomers (Wiechert et al., 1999), or the EMU
method (Antoniewicz et al., 2007) all of which are based on the atom mapping matrix
(AMM) concept (Zupke and Stephanopoulos, 1994). The computational complexity arises
from the fact that the number of equations scales super-linearly with network size. Network
decomposition using the isotopomer approach results in 4,612 unknown mass isotopomers
involving bilinear terms for a complete central metabolic network of E. coli (Antoniewicz
et al., 2007). Efforts in the last decade have focused on reducing complexity and proposing
better algorithms to solve the mass isotopomer distributions (MIDs) (Wiechert and de
22
Graaf, 1996) (Wiechert et al., 1997). For instance, the EMU method reduces the number
of isotopomer variables from 4,612 to 310 for a central metabolic network of E. coli. Sub-
networks can be simplified further using the Dulmage-Mendelsohn decomposition to
improve the speed of estimation (Young et al., 2008). A variety of optimization approaches
(Schmidt et al., 1999) have been used to infer the metabolic fluxes that minimize the sum
of the least squares while the statistical significance of the estimated flux distribution is
evaluated using the χ2 test (Pazman, 1993). Due to the limited number of analyzed
metabolites and inherent measurement error, flux ranges rather than unique values are
obtained for the metabolic fluxes using either linearized statistics (Mollney et al., 1999),
grid search, or non-linear statistics (Antoniewicz et al., 2006). All of these approaches are
iterative in nature, requiring repeated solution of the least-squares minimization problem
placing additional computational burden.
The general practice is for 13-C MFA models to include only a skeletal representation of
central metabolism comprised of the EMP pathway, PPP, TCA cycle, glyoxylate shunt,
and the ED pathway. Important pathways such as serine and arginine degradation are
typically absent. 13C MFA has been used extensively to elucidate the metabolic properties
of knockout strains (Flores et al., 2002; Hua et al., 2003; Shimizu, 2004; Usui et al., 2012;
Zhao and Shimizu, 2003), identify metabolic bottlenecks (Antoniewicz et al., 2007), and
even confirm the activity of various pathways (Crown et al., 2011; You et al., 2014; Young
et al., 2011). Earlier E.coli mapping models (Shimizu, 2004; Zhao and Shimizu, 2003) did
not use a biomass equation, instead they included specific drains proportional to the
specific growth rate (Hua et al., 2003) to account for biomass formation (Holms, 1996).
23
Newer models use a defined biomass equation obtained from macromolecular composition
of a wild-type strain (Antoniewicz et al., 2007). While the use of a biomass equation in
MFA constrains metabolism by imposing specific requirements on macromolecule
precursors from central metabolism, it neglects the contribution of the soluble pool and the
energetic requirements that are part of a genome-scale model. The use of cofactor balances
in MFA, limited to newer prokaryotic models (Bonarius et al., 1998; van Gulik and
Heijnen, 1995), can sharpen the reaction bounds involved in energy metabolism. However,
such MFA models assume that the cell is purely biosynthetic neglecting possibly active
pathways such as gluconeogenesis and amino acid degradation. MFA models for
organisms such as Synechocystis (Young et al., 2011), CHO cells, (Ahn and Antoniewicz,
2011), and hybridoma cell lines (Murphy et al., 2013) have so far omitted cofactor
balances. Nevertheless, it has been previously shown that neglecting potentially active
reactions that contribute to cofactor balances can alter the estimated flux ranges using 13C
MFA (Bonarius et al., 1998). The key advantage of using a genome-scale model in MFA
is that it represents the totality of reactions that can be carried out by the organism avoiding
any biases introduced by lumping reactions or omitting pathways pre-judged as non-
functional. An earlier MFA study on a larger metabolic model of E.coli (Suthers et al.,
2007) proposed the possibility of the integration of a number of non-central pathways, and
found that nearly half of the fluxes were fixed by stoichiometry alone. Here we take the
next step by making use of a genome-scale model for flux elucidation thereby avoiding
any pre-conceived assumptions about which pathways should be active or inactive.
24
Mapping models used for MFA typically include less than 10% of the reactions contained
within a genome-scale model. Flux ranges obtained using 13C MFA have been used
extensively to test the validity of genome-scale models (Chen et al., 2011; Dash et al.,
2014; Saha et al., 2012). However, this transfers the assumptions used in the construction
of MFA models to the GSM model, thereby providing a solution space which may be more
constrained than what the labeling data supports. On the other hand, GSMs are generally
analyzed using methods such as Flux Balance Analysis (Varma and Palsson, 1994), Flux
Variability Analysis (Mahadevan and Schilling, 2003), and MOMA (Segre et al., 2002).
Often, the predicted metabolic phenotypic space is quite large with split ratios and cycles
poorly resolved. 13C MFA at a genome-scale holds the promise of resolving split ratios
and cycles while avoiding making any assumptions about which pathways should be active.
As a result, it can identify the activity of all degradation pathways which are generally
neglected by existing mapping models, impose detailed cofactor balances, generate
unbiased confidence intervals for all fluxes within the network, provide insight into which
fluxes can or cannot be resolved using C-13 labeling data (i.e., identifiability problem),
maintain consistency with a comprehensive biomass equation describing metabolite
demands for macromolecule biosynthesis, soluble pool, and experimentally measured
energy demands, and even accurately predict the impact of genetic modifications which
are essentially unresolvable by constraint-based modeling techniques (Copeland et al.,
2012).
Successful 13C-MFA at a genome-scale requires a reliable GSM model along with detailed
atom maps of every reaction in the network. Atom mapping information for central
25
metabolism reactions is readily available from biochemistry textbooks. For other pathways,
online databases such as KEGG (Latendresse et al., 2012), MetaCyc (Korner and
Apostolakis, 2008), or MetRxn (Kumar et al., 2012) are useful resources. MetRxn includes
reaction mapping information for over 27,000 reactions generated using a novel sub-
structure search algorithm known as Canonical Labeling for Clique Approximation
(CLCA) (Kumar and Maranas, 2014) which offers improved accuracy and memory
utilization over existing heuristic algorithms. The approach utilizes number theory to
generate unique ids for each atom followed by a maximum common substructure search.
The MetRxn database contains atom mapping information for reactions from 112
metabolic models including iAF1260 directly downloadable from
http://www.metrxn.che.psu.edu/.
In this study we carry out estimation of flux ranges for E-coli using both a core mapping
model (Leighty and Antoniewicz, 2013) and for a genome-scale model iAF1260 (Feist et
al., 2007) using measured fluxes and 13C labeling data as constraints. The GSM model is
refined further by imposing measured extracellular fluxes as constraints and then
performing flux variability analysis (FVA) to identify the part of the network that can carry
non-zero fluxes. Subsequently, active sub-networks of different size are generated by
decomposing the network using the EMU algorithm (Antoniewicz et al., 2007). The fluxes
are estimated by solving a least squares problem involving the minimization of the sum of
squares of difference between predicted metabolite labeling and experimentally observed
metabolite labeling patterns. 95% confidence intervals are generated by varying the fluxes
individually until the minimum sum of squares exceeds a pre-defined threshold. We
26
demonstrate how a combination of the constraints involved in FBA and 13C-MFA can be
used in a concerted manner to effectively resolve fluxes through key branch points such as
the oxidative pentose phosphate pathway, the Entner-Doudoroff pathway, and the
glyoxylate shunt. Our results allude to the possibility of coexistence of anabolic and
catabolic reactions and bypasses resulting in expanded ranges for many reactions which
were previously reported to be precisely inferred. They also shed light on the inability of
MFA alone to resolve alternate pathways and energy metabolism when the entirety of
metabolic reactions implied by the GSM model is used. Surprisingly, we found that results
are largely insensitive to biomass composition fluctuations as the experimental error in the
labeling data is the dominant source of prediction uncertainty. The impact of using a core
model for MFA is quantitatively assessed by contrasting the corresponding flux ranges. In
addition, the loss of information when fluxes derived from MFA in the core metabolic
model are directly ported on a GSM model is assessed and discussed.
2.2. Methods
2.2.1. Genome-scale atom mapping model
The genome-scale metabolic model of E. coli (Feist et al., 2007) consisting of 2,382
reactions and 1,670 metabolites was pruned by eliminating reactions incapable of carrying
flux for the bioprocess data measured by Leighty et.al. (Leighty and Antoniewicz, 2013).
The model was further simplified by manually eliminating thermodynamically infeasible
cycles disjoint from the metabolic network. The resultant model has 697 reactions (29
reversible reactions) and 595 metabolites with glucose as the sole carbon source. Examples
of reactions eliminated from the model include uptake systems of other carbon sources,
27
beta-oxidation, and nucleotide salvage pathways in agreement with Suthers, et al (Suthers
et al., 2007). Atom mapping information for the core model was obtained from the study
conducted by Leighty, et al (Leighty and Antoniewicz, 2013). Atom mapping information
for the genome-scale model was obtained using the CLCA algorithm (Kumar and Maranas,
2014).
2.2.2. Flux estimation procedure
Network decomposition was accomplished using the EMU algorithm that relates the
labeling pattern of the input tracer and a flux distribution to a labeling pattern of all
analyzed intracellular metabolites. Fluxes were estimated by solving a non-linear least
squares problem described in detail in Appendix A. This problem minimizes the variance-
weighted sum of the squares of differences between the predicted and experimentally
observed labeling patterns for 18 fragments from 10 different intracellular amino acids
subject to flux non-negativity. Glucose labeled at the second carbon with 99.5% purity was
used as the tracer input in the analyzed dataset (Leighty and Antoniewicz, 2013) as it was
found to best resolve oxidative PPP. All other carbon atoms were assumed to contain the
heavy isotope of carbon equal to the natural abundance. The least squares objective
function 𝜑 (see Appendix A) depends on the subset of fluxes (𝒘) present in the EMU
model which is a subset of all fluxes (𝒗) in the S-matrix of the metabolic model. Since the
system of component balance equations describing the metabolic network is
underdetermined, the set of fluxes 𝒗 can be expressed in terms of the free fluxes 𝒖 by
means of a null-space decomposition (Antoniewicz et al., 2006). Consequently, the set of
fluxes describing the EMU model 𝒘 and the objective function 𝜑 can also be expressed in
28
terms of the subset of free fluxes 𝒖. This allows for the estimation of all the fluxes within
the metabolic network by the resolution of the free fluxes 𝒖. The problem described in
Appendix A was solved using the fmincon function from the optimization toolbox of
MATLAB. A user-supplied Hessian matrix was provided for the interior point algorithm
(Byrd et al., 2000; Byrd et al., 1999; Waltz et al., 2006) of fmincon using the procedure
introduced by Antoniewicz, et al (Antoniewicz et al., 2006). Given the nonconvex nature
of the objective function, the problem was solved 100 times and the best solution was
selected as the optimal flux distribution candidate. This solution was selected as the optimal
flux distribution for further analysis only if it satisfied two criteria: the optimization
problem converged to the same solution at least 70 times out of 100 runs and the obtained
flux distribution was unaffected by local perturbations. All fluxes (mmol/dmol-glc) are
reported using 100 mmol of glucose uptake per gram dry cell weight as the basis. Fluxes
were also estimated with the amino acid MS data obtained using glucose labeled at the fifth
carbon as well to clearly identify loss of resolution due to model scale-up.
2.2.3. Confidence intervals
The underdetermined nature of the metabolic network could result in multiple flux
distributions with the same labeling pattern. Furthermore, metabolite labeling
measurements are inherently noisy introducing error in the data that further contributes to
metabolic flux inference uncertainty. To this end, we estimated the lower and upper bounds
of the 95% confidence interval of each flux such that the sum of squares of residuals (SSR)
is within 3.84 of the minimum SSR. The value 3.84 corresponds to the 𝜒2 statistic for a p-
value of 0.05 and one degree of freedom. The lower and upper bounds were estimated
29
using an iterative procedure (Antoniewicz et al., 2006) where every flux 𝑣𝑗 is successively
varied up (or down for lower bound) and a new best flux distribution is re-calculated. The
upper (or lower) bound for 𝑣𝑗 defining the 95% confidence interval corresponds to the
value that renders the difference between the re-calculated and original SSR’s equal to
3.84. Since the genome-scale model contains a large number of measurement-coupled
reactions, a flux coupling analysis (Burgard et al., 2004) is performed to identify the list of
all reactions coupled to an extracellular measurement (i.e., 411 out of a total of 697
reactions). These reactions are assigned a range consistent with the extracellular
measurement variance to which they are fully coupled. In addition, a flux coupling analysis
between every reaction pair in the metabolic model further reduces the number of reactions
whose range needs to be estimated based on the procedure described above. We identified
250 coupled reaction pairs in the EMU-balanced network implying that there were only
186 remaining reaction fluxes whose confidence levels needed to be directly assessed.
Additional flux range reduction was achieved by successively performing FVA
(Mahadevan and Schilling, 2003) using the obtained 95% confidence level ranges as flux
bounds. The technical details of the procedure describing the implementation for 13C-
MFA for GSM models is described in Appendix A.
2.3. Results
2.3.1. Active EMU network
Decomposition of the genome-scale network using the EMU algorithm resulted in EMU
sub-networks of sizes 1 through 9 (i.e., number of carbons in the EMU fragment). The
30
network consisted of 1,400 balanced EMUs and 3,526 EMU reactions, spanning 432 out
of the 726 fluxes in the GSM model. Of the 3,526 EMU reactions, 1,405 reactions were
duplicates, contributed by redundant mappings. In comparison, the core model consists of
310 balanced EMUs and 863 EMU reactions with 181 duplicates, spanning 80 out of the
100 fluxes in the model. It is interesting to note that a nine-fold scaling up of the mapping
model resulted in only a five-fold increase in the number of EMUs, a four-fold increase in
the number of EMU reactions, and a three-fold increase in the number of unique EMU
reactions. This moderate increase is because only 256 out of 595 metabolites (present in
426 out of 726 fluxes) are required to predict the experimentally observed labeling patterns.
65% of all the fluxes involved in EMU balances are from central metabolism and amino
acid metabolism. The remaining 35% result from the contributions of cofactor
biosynthesis, lipid biosynthesis, and nucleotide biosynthesis, accounting for novel carbon
transformations absent in the core model. In comparison, the core EMU network includes
all reactions from central metabolism and a limited number of reactions from amino acid
metabolism including Serine hydroxymethyltransferase (SHMT), glycine cleavage system,
and threonine aldolase. The GSMM model sheds light on novel carbon transformations,
alternate atom mapping pathways and redundant atom maps, which provide the means to
explain the experimentally measured labeling patterns better and highlight the role of
assumptions involved in flux estimation using the core model.
The novel carbon transformations allowed by the pathways outside central and amino acid
metabolism in the GSM model stems from the production of molecules such as CO2,
glycoaldehyde, and formate which are eventually recycled into central metabolism. CO2 is
31
produced as a by-product of the synthesis of several cofactors and trace metabolites such
as coenzyme A, thiamine pyrophosphate, heme, pyridoxal phosphate, menaquinol 8, and
NAD. In addition, CO2 is also produced by the decarboxylation of serine to phosphatidyl
ethanolamine. The core model only accounts for CO2 production and consumption within
the central metabolic reactions. Similarly, formate is absent in the core model, whereas, it
is produced by the degradation of formyl-tetrahydrofolate, and biosynthesis of
tetrahydrofolate, riboflavin, and thiamine pyrophosphate in the GSM model. Glycolate,
which is absent in the core model, is produced as a by-product of tetrahydrofolate
biosynthesis in the GSM model. Since the FVA flux ranges for these reactions indicate a
non-zero lower bound, it provides quantitative evidence that these often ignored
transformations play a role in explaining the observed labeling data.
The GSMM model also traces alternate routes of existing pathways in the core model. For
example, the production of succinate from α-ketoglutarate occurs only through the TCA
cycle in the core model, whereas the GSMM allows for two additional routes: the
degradation of glutamate through the γ-aminobutyrate pathway and the degradation of
arginine. The key difference between these three pathways is the energy output. The TCA
cycle route produces one NADH and one ATP, whereas the γ-aminobutyrate pathway route
produces only one NADH and the arginine pathway produces one NADH but requires one
ATP to recycle the consumed acetyl-CoA. Since the atom transitions for succinate are
identical for all three pathways, 13C-MFA will fail to resolve fluxes between these
pathways thus the core model arbitrarily apportions the entire flux towards succinate
through the TCA cycle. Resolution between the three parallel pathways can be achieved
32
only after cofactor balances are included. A similar diversity of metabolic routes implied
by the GSM model can be seen with the production of pyruvate from succinate. The core
model only includes the pathway involving malic enzyme whereas the GSM model
accounts for an additional route through propionyl-CoA with an identical energy output
implying non-identifiability between the two alternatives. Such alternate routes with
identical atom mapping are ubiquitous in GSM models thereby resulting in non-
identifiability or poor resolution of misleadingly well resolved fluxes according to the core
model.
Multiple metabolic reactions are sometimes described by exactly the same EMU reaction
leading to non-identifiability. We identified 195 such instances among the 726 reactions in
the GSM model. The complete list of EMU reactions consisted of 122 duplicate reactions,
85 triplicate reactions, 27 quadruplicate reactions, and 48 EMU reactions describing
identically five or more reactions from the genome-scale model. The source of this
redundancy can be traced back to four factors: (i) isozymes with different cofactors, (ii)
alternate reactions facilitating the same atom transfer, and (iii) group transfer reactions such
as transaminases. For example, isozymes of Malic Enzyme catalyze the oxidation of malate
to pyruvate using either NAD or NADP as a cofactor. Thus, the EMU reaction describing
the atom transfer reaction from malate to pyruvate is identical for both isozymes. The
elongation of the fatty acid chain releases one CO2 from malonyl-ACP in the first step of
the cycle. As a result, the corresponding EMU reaction occurs eleven times accounting for
all the fatty-acid chain elongation reactions. The conversion of glutamate to α-ketoglutarate
occurs through the transaminase reactions for which glutamate is the amino group donor.
33
The GSM model contains 18 different aminotransferase reactions thereby resulting in
multiple EMU reactions producing α-ketoglutarate from glutamate. Similarly, the EMU
reaction describing the carbon transfer from ATP to ADP (or AMP) arises in an identical
manner for up to 94 reactions in the GSM model. Therefore, all GSM reactions that map
to the same EMU reaction cannot be resolved by MFA alone as only the sum of their
respective flux values is constrained.
2.3.2. Flux identifiability and statistical validity of the model
Because the EMU network spans a much smaller fraction of the GSM model, the number
of 𝜒2 degrees of freedom (DOF) for the regression model differs significantly depending
on whether it is defined with respect to the entire GSM model or with respect to the EMU
network alone. Table 2.1 shows the comparison of the 𝜒2 degrees of freedom for the core
model and the GSM model. The number of DOF is defined as the difference between the
number of data points (measured fluxes and metabolite mass fractions) and the number of
free variables in the network (EMU and GSM networks). Statistical significance of
inference requires that the estimated minimum sum of squares of residuals (SSR) be within
an expected range determined by the confidence level and the number of DOF for the
model.
For the given set of labeling data, the core model has 55 DOF. The core model contains 20
reactions (20% of the all fluxes within the metabolic network) that do not provide EMU
balances of which 15 are fully coupled to an extracellular flux measurement. Since the
number of variables outside the EMU balances included in the least-squares regression
model is very small, the impact on the expected sum of squares of residuals (SSR) is
34
minimal, thereby ensuring that it is safe to use the entire metabolic network as the
regression model. With a non-negative DOF, the core model seldom encounters issues with
statistical validity of the estimated fluxes. In contrast, the DOF for GSM model is -30 after
a cursory analysis. This is because the number of reactions unaccounted by EMU balances
for the GSM is considerably larger causing the number of free fluxes in the entire metabolic
network to exceed the number of available data points. A negative DOF for the GSM model
would imply over-fitting and lack of statistical significance. However, the EMU network
only spans about 60% of the metabolic network. Analyzing the EMU network associated
with the GSM model revealed that the regression model has 27 DOF. This non-negative
value of the DOF arise due to a 40% reduction in the number of variables (i.e., free fluxes)
and a large increase in the number of fluxes coupled to an extracellular flux measurement.
It was found that 256 out of the 595 balanced metabolites were involved in EMU balances.
Of these, 214 metabolites were completely balanced by reactions involved in EMU
balances, meaning that they did not feed into non-EMU pathways. Of the 42 metabolites
feeding into other pathways, 28 of them were consumed by measurement-coupled
pathways only, indicating that only 14 metabolites required additional equality constraints
to be considered balanced EMU metabolites. Therefore, the actual EMU model contains
439 reactions and 242 balanced metabolites. Further reduction of the model by elimination
of measurement-coupled reactions revealed that there are only 99 free fluxes. In
comparison, the entire metabolic network has 274 free fluxes. Therefore, 13C-MFA can
be performed on this model if there are at least 99 data points. Since the data set used in
this analysis contains 126 data points, the least-squares fitting approach can be safely
applied to obtain statistically acceptable fits. Also shown in Table 2.1 is the maximum
35
allowed SSR for an accepted fit with 95% confidence. An increase in the number of
variables causes a reduction in the degrees of freedom, as a result of which the maximum
allowed SSR for 95% confidence is reduced for the GSM model.
2.3.3. Flux and range estimation at the genome-scale
Flux elucidation using the GSMM model predicts the experimentally observed MS data
better than the core model (Figure 2.1). This improved prediction is attributed to the
improved prediction of alanine and valine MS measurements with minimal changes to the
quality of prediction of other labeling patterns. Improvements for alanine and valine are
due to changes in the estimated fluxes in central metabolism (Figure 2.2b, 2.2c, 2.2d and
2.2e). The inclusion of alternate pathways, new carbon mapping information, and complete
metabolite and cofactor balances in the GSMM model results in significant changes in the
PPP and wider flux ranges for reactions in glycolysis and the TCA cycle. The ED pathway
and glyoxylate shunt flux ranges remained similar in both the core model and GSM model.
Identical trends were observed upon analysis with a glucose tracer labeled at the fifth
carbon (Figure 2.5).
Among the glycolytic reactions (Figure 2.2a and 2.2b) flux through PGI was unaffected.
However, the remaining reactions had expanded flux ranges due to the inclusion of
gluconeogenesis and alternate pathways of pyruvate metabolism. An unaffected PGI flux
range indicates that 13C-MFA using a GSMM model is capable of resolving the
glycolysis/PPP split ratio despite the large increase in the number of reactions. Expanded
flux ranges for TPI, GAPD, PGM, and ENO arise from the presence of an alternate pathway
from dihydroxyacetone phosphate to pyruvate through methylglyoxal. However, the
36
alternate methylglyoxal pathway involves a different energy balance yielding no ATP and
less NADH than the EMP pathway, thus limiting its upper bound to 20 mmol/dmol-glc. As
a consequence of this, the glycolytic reactions have a non-zero lower bound of 65
mmol/dmol-glc. Both the lower and upper bounds of PYK are altered significantly. The
lower bound of PYK drops to 5 mmol/dmol-glc. Two factors contribute to this reduction:
(i) the availability of the phosphotransferase system (PTS) for glucose uptake as an
alternative to PYK, and (ii) a significant flux through the anaplerotic reaction PPC which
serves to replenish TCA metabolites. The inferred non-zero lower bound for PYK suggests
that the alternate pathways (methylglyoxal and PTS) can only carry a fraction of the flux
in lower glycolysis. The upper bound of PYK increases to 141 mmol/dmol-glc due to the
presence of a futile cycle with PPS (carrying a maximum flux of 20 mmol/dmol-glc)
resulting in the hydrolysis of one ATP per unit flux through this cycle. A similar effect was
also observed with the phosphorylation of glucose where G6PP can carry at most 20
mmol/dmol-glc of flux, thus increasing the upper bound of the PTS and HK reactions by
the same amount. The impact of gluconeogenesis is manifested in the flux range of PFK.
A decreased lower bound of this reaction compared to the core model is due to the reduced
contribution of the non-oxidative PPP towards fructose-6-phosphate production. The
increased upper bound of this reaction is due to the activity of the FBP reaction from
gluconeogenesis. With the upper bound of the FBP reaction limited to 8% of the total
glucose input to the network the upper bound of PFK is increased by the same amount to
account for this fully resolved in the GSMM futile cycle.
37
Among the monophosphate shunts (PPP and ED pathways; Figure 2.2a and 2.2e), G6PDH
and GND reactions showed a small difference between the core model and GSMM models
to account for the increased glucose-6-phosphate demand for glycogen synthesis in the
genome-scale model. In contrast, a significant shift was observed in the non-oxidative PPP
reactions: TKT1, TKT2 and TALA. Both the lower and upper bounds of these reactions
were reduced by a factor of 2.5 mmol/dmol-glc, implying reduced carbon flux through this
pathway. This is a consequence of increased drains for biomass components R5P, S7P, and
E4P in accordance with the biomass composition in the GSM model. Specifically, S7P was
diverted towards lipopolysaccharide biosynthesis, R5P was shunted towards
tetrahydrofolate biosynthesis and nucleotides, and E4P was used in pyridoxal phosphate
and aromatic amino acids synthesis. Under the experimental stated growth condition, these
drains amounted to 1%, 0.2%, and 0.04% of the total glucose uptake for S7P, R5P, and
E4P, respectively. In addition, some Ru5P was diverted for lipopolysaccharide
biosynthesis amounting to a drain of 1% of the total glucose uptake. In contrast, the core
model only contains drains for R5P for nucleotide biosynthesis and E4P for aromatic amino
acid biosynthesis thereby predicting higher fluxes through the non-oxidative branch of the
PPP. These differences arise from the fact that the biomass equation for the core model
neglects the soluble pool and other cell wall components, which constitute up to 12% of
the cell dry weight (Long and Antoniewicz, 2014).
Loss of flux identifiability when scaling up to the GSMM was manifested in the TCA cycle
and associated fluxes (Figure 2.2a and 2.2d). This inability to resolve fluxes was primarily
due to the presence of various alternate pathways between metabolites. Flux through PDH
38
was lower due to a required flux through pyruvate oxidase (POX) which converts pyruvate
directly to acetate. In fact, because Acetyl-CoA can be fully produced using the ACK
reaction this allows for a complete bypass of PDH. The lower bound of AKGDH decreased
to zero due to the presence of multiple alternate pathways between glutamate and succinate.
The conversion of glutamate to succinate via γ-aminobutyrate and γ-glutamylsuccinate
showed similar flux ranges as AKGDH indicating the inability of 13C-MFA to resolve
between these alternative pathways. Another pathway contributing to the expanded
AKGDH range was the degradation of arginine which had a non-zero lower bound to
account for the production of biomass components, putrescine and spermidine. Expansion
of FUM and MDH flux ranges were due to the presence of amino group transferring
mechanisms in the arginine and purine biosynthetic pathways which remove the amino
group from aspartate to produce fumarate. The lower bound of MDH was as low as zero
due to the presence of an alternate MDH (MQO) which uses ubiquinone as the electron
acceptor. Despite the presence of wide ranges at the individual flux level, the sum total of
all alternate pathways for a particular reaction resulted in similar bounds as in the core
model. The glyoxylate shunt was equally well resolved in the GSMM model as in the core
one but with an expanded repertoire of functions. The production of glycoaldehyde as a
by-product of tetrahydrofolate biosynthesis and the eventual conversion of glycoaldehyde
to glyoxylate resulted in a partial glyoxylate shunt activity in which MALS was active with
a non-zero lower bound.
Reactions that were insensitive to the C13 labels and thus outside EMU balances are
indirectly resolved by component balance constraints imposed by flux ranges of EMU-
39
resolved reactions. Such reactions include a majority of cofactor biosynthesis pathways,
lipid biosynthesis, pyrimidine biosynthesis, and energy metabolism. Figure 2.3 shows the
flux ranges corresponding the oxidative phosphorylation, NADH transhydrogenase, and
total free ATP within the network. Oxidative phosphorylation was well resolved as the
oxygen uptake limits total flux through this pathway. The presence of pathways that
convert NADPH to NADH at the expense of ATP results in a finite upper bound for the
transhydrogenase due to ATP limits. A negative lower bound for this reaction indicates
that the direction of this reaction could not be resolved by 13C-MFA. The lower bound of
ATPM is 8.39, which matches exactly the non-growth associated ATP maintenance
requirement in iAF1260. The upper bound of available ATP predicted using the GSMM
model was far lower than that of the core model due to the fact that the GSM model globally
accounts for all ATP requirements. The flux range of excess ATP availability predicted
using the GSM model was far less than the core model due to the fact that the core model
only accounts for quantifiable (i.e., polymerization and biosynthetic of macromolecules)
ATP costs, which only amounts to 56% of the total growth-associated requirement (Feist
et al., 2007). A side product of the well resolved ATP balance is that the activity of futile
cycles is highly constrained and can thus be resolved by FVA.
Reaction flux resolution by 13C-MFA requires that there exist distinct atom transition
profiles between alternatives. This property affects the resolvability of gluconeogenesis,
ED pathway and the glyoxylate shunt. The flux through gluconeogenesis can be estimated
by the intersection of the ranges of three reactions: PPS, FBP, and G6PP which reverse
reactions PYK, PFK, and HK, respectively from glycolysis. PFK alone provides a sharp
40
estimate for gluconeogenesis as it is the only well resolved flux out of the three. G6PP is
unresolvable despite the fact that glucose-6-phosphate and glucose have distinct labeling
patterns because the labeling pattern of intracellular glucose is not measured. In addition,
an inactive malic enzyme results in identical phosphoenolpyruvate and pyruvate labeling
patterns thereby rendering PPS unresolvable. The reversibility of TPI and FBA result in
altered labeling patterns of fructose-1,6-bisphosphate compared to fructose-6-phosphate.
An active FBP would alter the labeling pattern of all metabolites within the PPP thereby
affecting the observed labeling patterns of downstream amino acids such as glycine, serine,
alanine, valine, leucine, and isoleucine. This property aids in the resolution of FBP and
thus facilitating the resolution of gluconeogenesis to within 8% of the total glucose uptake.
No information regarding the activity of gluconeogenesis can be gleaned using only the
core model. The ED pathway is equally well resolved using the GSMM as in the core
model due to the fact that it produces a pyruvate molecule with a different carbon atom
arrangement compared to glycolysis. This directly impacts the predicted labeling of alanine
and the branched chain amino acids derived from pyruvate. Similarly, flux through the
glyoxylate shunt produces a differently labeled aspartate when compared to the
conventional TCA pathway.
A closer analysis of the obtained flux ranges provided an insight into the sensitivity of the
obtained flux ranges to biomass composition. Any perturbations to drains from central
metabolism lie within the estimated 95% confidence interval of all the fluxes thereby
rendering flux predictions insensitive to perturbed biomass composition assuming that the
range of perturbation does not exceed 10%. It was found that the size of the obtained flux
41
ranges was primarily due to errors in extracellular flux measurements. To confirm this, we
re-estimated the fluxes and flux ranges for the same network while allowing a 10% change
to each biomass component individually while maintaining the cell dry weight. The
absence of any significant flux range shift, along with a proportional increase to the drain
of our target biomass component from central metabolism corresponding to perturbation
confirmed our hypothesis of insensitivity of flux ranges to biomass composition
perturbation given this data set. Even though fluxes of biomass components changed in
proportion to the perturbation, any additional impact on central metabolism was minimal.
As described earlier a common practice is to perform MFA using a core model and then
project the identified flux ranges onto a genome-scale model. We performed the same two-
stage implementation and compared the results with MFA using the full GSMM model.
We found that the use of the core model for MFA generates flux ranges that upon mapping
onto the GSM model, propagate all assumptions made during the construction of the core
model. For as many as 90% of the reactions in the GSM flux ranges are more restricted
than when the full GSMM is used for MFA. Figure 2.4 shows the distribution of flux range
contraction upon imposing flux ranges derived for the core model to obtain GSM flux
ranges using FVA. For more than half of the reactions in the GSM model, ranges are more
than halved compared to the correct elucidation using the GSMM model. Notably, there
are 295 reactions whose estimated range upon projection to the GSM model was 54%
narrower than supported by the data. This demonstrates in a quantitative manner the fact
that assumptions made during the core model construction propagate onto the GSM model
leading to possibly erroneous conclusions about reaction flux identifiability. Overly tight
42
flux ranges predicted using the core model often shut down alternate pathways which
confounds reaction essentiality prediction. For example, acetate kinase is essential based
on predictions using core model MFA, however, using a GSMM based MFA the reaction
is correctly predicted (Baba et al., 2006) as non-essential. Acetate production can be taken
over by the POX reaction which generates ATP by transferring electrons from pyruvate to
the electron transport chain while producing acetate and carbon dioxide. In contrast, for
about 10% of reactions MFA using the GSMM model leads to tighter flux resolution
compared to the core model. These reactions include energy balance reactions such as
oxidative phosphorylation which requires network-wide resolution of redox balances for
proper resolution. This quantitatively demonstrates the significance of describing
metabolism at the genome-scale and the feasibility of inferring fluxes using MFA at a
genome-scale.
2.4. Discussion
In this chapter, we have applied the framework of 13C-MFA to perform flux and range
elucidation using a genome-scale model. Using available extracellular flux measurement
data, we eliminated blocked reactions and those incapable of carrying flux in the iAF1260
GSM model of E. coli using FVA. The resulting GSM model contained 697 reactions and
595 metabolites with 29 reversible reactions. The corresponding atom mapping model was
generated by integrating the atom mapping information for every reaction in this GSM
model using the CLCA algorithm. Finally, the network was decomposed using the EMU
algorithm so as to evaluate MIDs of target metabolite fragments for a given flux
distribution, so as to iteratively obtain an optimal flux distribution which best explains
43
experimentally observed GCMS labeling data. In order to account for degeneracy within
the metabolic network and the inherent error associated with experimental data, we also
evaluated the 95% confidence intervals associated with each flux. Given the
computationally intensive nature of the 13C-MFA procedure, we modified the confidence
interval estimation procedure so as to decrease the computational time by up to 67% by
identifying the minimal set of fluxes whose 95% confidence intervals need to be evaluated
using flux coupling analysis checks. We also redefined the definition of χ2 degrees of
freedom to better describe the structural properties of the genome-scale EMU model and
found that our obtained optimal flux distribution was statistically acceptable.
We found that the GSM model is able to produce a better fit compared to the core model
owing to improved prediction of alanine and valine MS data. While the overall flux
distribution remained similar to that of the core model, the comprehensive biomass
equation used in the GSM model resulted in a shifted PPP range. The presence of
gluconeogenesis along with glycolysis created futile cycles for three key reactions, of
which only PFK could be resolved properly using 13C-MFA. The availability of the
methylglyoxal pathway as an alternative to lower glycolysis resulted in a reduction of the
lower bound for TPI, GAPD, PGK, and ENO. PYK experienced significant bound
expansion in the GSM model compared to the core model due to the availability of the PTS
mechanism for glucose uptake as an alternate pathway facilitating the conversion of
phosphoenolpyruvate to pyruvate. The reduced flux through PDH was found to be due to
the activity of POX, which is known to be a key reaction in the growth of E.coli under
aerobic condition (Abdel-Hamid et al., 2001; Li et al., 2007). The expansion in the TCA
44
cycle ranges was due to increased glutamate synthesis for biomass production, availability
of arginine and glutamate degradation as alternatives to AKGDH, and the presence of an
additional amino group donation mechanism with aspartate as the donor and fumarate as
the product. Since both evaluated tracer schemes produced consistent results, it is evident
that the inability to resolve certain pathways is not tracer-specific (Figure 2.2 and Figure
2.5), but a property of the GSMM model, indicating the need for additional experimental
data to completely resolve all fluxes in the metabolic network. We found that energy
metabolism was quite well resolved with the amount of free ATP being greatly limited
thereby facilitating the resolution of futile cycles. The transhydrogenase reaction, on the
other hand was poorly resolved with much uncertainty in its predicted direction. Finally,
we found that utilization of the bounds estimated using the core model resulted in much
smaller flux ranges in as many as 90% of the reactions due to the fact that these bounds
carry with them the assumptions involved in the creation of the core model. The source of
this reduction was found to be the inactivation of alternate pathways such as glutamate
degradation, and futile cycles such as gluconeogenesis, occurring due to tighter mass
balance constraints imposed by the flux ranges estimated using the core model. While such
assumptions do accelerate flux computations by elimination of variables and provide
information on total sum of fluxes through all alternate routes for a given transformation,
they fail to account for flexibility within the network, thus having an adverse impact on
secondary inferences from the estimated flux ranges. As such, this study highlights the
need for the use of more comprehensive models for flux elucidation so as to obtain a better
agreement with experimentally observed data and improved quality of inference. The flux
range expansion cannot be resolved by optimal tracer design alone because it stems from
45
the presence of alternate pathways with identical overall atom transitions. This loss of
resolution will persist even if every branch point were to be resolved perfectly (e.g., by
COMPLETE-MFA). Therefore, flux and range estimation using alternate glucose-based
tracer schemes (Crown et al., 2015) or simultaneous fitting of multiple data sets (Leighty
and Antoniewicz, 2013) would be unable to resolve this ambiguity in alternate pathway
activity, as confirmed by the evaluation of multiple tracer schemes in this study.
A major challenge with MFA using GSM models is the increase in computation time
associated with the vast increase in the number of variables. An effective way to address
this issue would be to simplify the genome-scale MFA model to the size of a (near) core
model. The strategy here is to decrease the number of variables while retaining all the
information regarding novel mappings and alternate pathways contained within the GSM
model. An elementary reduction method for simplification of linear pathways at the EMU
network level has already been proposed (Antoniewicz et al., 2007), but not yet
implemented. Analysis of the EMU model has already revealed that only 60% of all the
reactions contained within the GSM model can be resolved using 13C-MFA. We have also
seen that the actual regression model described by EMU balances has only 99 free
variables. As such, the decoupling of the EMU network from the entire GSM model alone
can result in a 63% reduction in the number of free fluxes. Further reduction of EMU
networks can be achieved by elimination of linear pathways, so as to obtain the simplest
possible EMU network of each size. The reduction of model size will help alleviate some
of the complexity associated with inclusion of intracellular compartments.
46
Better resolution of fluxes using a GSM model requires an optimal tracer design and the
maximum set of extracellular flux measurements. It has already been shown that no single
tracer is sufficient to resolve all the fluxes within a metabolic network, and that different
tracers promote the resolution of specific branch points (Leighty and Antoniewicz, 2013).
While MFA using multiple datasets does work effectively on a core model, it faces the
same problems associated with scale-up at the genome-scale. This can be overcome by
designing an isotope-labeling experiment using an optimal set of tracers and GCMS
measurements, which can reliably resolve all the key branch points in the metabolic
network. The basis for an optimal tracer design has already been proposed in the form of
EMU basis vectors (Crown and Antoniewicz, 2012), which decouples substrate labeling
from the fluxes in the model. The availability of MS measurements for other metabolites
besides amino acids will further improve the resolution of poorly resolved fluxes in the
GSM model. MS measurements of fatty acids (Yoo et al., 2008) and intracellular
metabolites (Luo et al., 2007; Metallo et al., 2012) have already been utilized to infer flux
distributions. Regarding the flux elucidation presented in this paper, Table 3.2 provides a
candidate list of metabolites that if measured would resolve alternate routes. This is based
on the idea that, if the labeling distribution of a metabolite unique to a given alternate
pathway is different from the labeling distribution at the start of the experiment then the
pathway carries flux and the steady-state flux through the pathway can be evaluated using
an isotopic non-stationary flux analysis procedure for E. coli as described previously (Noh
et al., 2007; Young et al., 2008). Noh et al (2007) have used a rapid sampling approach
followed by methanol quenching to obtain multiple isotopic non-stationary data points
before attainment of isotopic steady-state. Pool size measurement using already established
47
protocols for methylglyoxal (Girgis et al., 2012) and γ-aminobutyrate (O'Byrne et al., 2011)
are already available for resolving flux through their corresponding pathways.
Incorporation of these additional measurements can further sharpen the estimated flux
ranges to provide a better resolution of metabolism. The estimated flux ranges can be
refined further with the availability of a more complete set of extracellular flux
measurements to include secretion profiles of additional products, such as succinate and
other organic acids, so as to close the carbon balance for the given growth condition. These
set of measurements will reduce the solution space by constraining the flux through various
production pathways, thereby further simplifying the task of identifying the optimal flux
distribution. A complete set of measurements required for maximum flux elucidation can
be obtained using a formulation such as OptMeas (Chang et al., 2008).
The extent of resolution obtained using a GSM model for E. coli points to the possibility
of more reliable flux elucidation and biologically relevant inferences in more complex
systems such as yeast, plants, and mammalian systems with compartmentalized
metabolism. Application of genome-scale MFA to such systems will allow the use of
closed cofactor balances without the risk of altering the actual flux distribution predicted
using the flux estimation procedure. This will enable identifying metabolic bottlenecks
leading to more informed metabolic engineering interventions that improve the yield of
target products. It is important to note that flux resolution using 13C-MFA ultimately
depends on how well the organism’s genome is annotated, the complexity of the underlying
EMU network, and the quality of experimental data used for flux estimation.
48
Table 2.1: 𝜒2 degrees of freedom for the core model and the genome-scale model.
Statistical significance requires that the number of degrees of freedom be positive as a 𝜒2
value is defined only for positive integers. The difference in the number of degrees of
freedom estimated using free fluxes and based on the EMU network points to an inherent
flaw in using the free fluxes as the number of model parameters.
Degrees of Freedom Maximum SSRES
Core model 55 96
Genome-scale model
(based on free fluxes)
-30 Not defined
Genome-scale model
(based on EMU network)
27 44
49
Table 2.2: Additional suggested MS measurements for resolving various alternate
routes.
Central Metabolic
pathway
Alternate
Pathway
Metabolite
measurement
candidate
Measurement
Type
Lower glycolysis Methylglyoxal
pathway methylglyoxal
Time-course MS
with known pool
size
TCA cycle Arginine
degradation γ-aminobutyrate
Time-course MS
with known pool
size
HEX1-G6PP futile
cycle G6PP
Intracellular
glucose Steady-state MS
50
Figure 2.1: Comparison of prediction of experimentally observed amino acid MS data
by the core model (green bars) and the GSM model (brown bars).
51
Figure 2.2: Comparison of fluxes elucidated using 2-13C-glucose with the core model
and GSM model. (a) Schematic representation of all reactions and metabolites involved in
central metabolism of E. coli. Comparison of flux ranges (in mmol/dmol-glucose) using
core model (green bars) and GSM model (brown bars) for (b) glycolysis and
gluconeogenesis, (c) anaplerotic reactions and glyoxylate shunt, (d) TCA cycle, (e) PPP
and ED pathway.
(a)
PGI
PFK
FBA
TPI
G6PDH
GND
GAPD
PGK/PGM
ENO
PYK
PDH
ACONT
IDH
AKGDH
SUCOAS
SDH
FUM
MDH
CS
RPI RPE
TKT2
TALA
TKT1
EDA/EDD
PPCPPCK
ME
ICLMALS
glc-D
g6p
f6p
fdp
dhap g3p
6pg
ru5p
xu5pr5p
s7p
e4p
3pg
2pg
pep
pyr
accoa
cit
icit
akg
succ
fum
mal
succoa
oaa
f6p
g3p
glyox
52
(b)
53
(c)
54
(d)
55
(e)
56
Figure 2.3: Resolution of energy metabolism in core model (green bars) and GSM
model (brown bars).
57
Figure 2.4: Loss of information expressed as % bound contraction of flux ranges for
every reaction in the GSM model when flux ranges are estimated using FVA with core
model-based MFA derived flux ranges as constraints.
58
Figure 2.5: Flux distribution comparison for core model and GSM model using 5-13C
glucose tracer. Comparison of flux ranges (in mmol/dmol-glucose) using core model
(green bars) and GSM model (brown bars) for (a) glycolysis and gluconeogenesis, (b)
Pentose phosphate pathway, (c) anaplerotic reactions and glyoxylate shunt, (d) TCA cycle,
(e) energy metabolism. The reaction nomenclature is described in Figure 2.2a.
(a)
59
(b)
60
(c)
61
(d)
62
(e)
63
Chapter 3
Elucidation of photoautotrophic carbon flux topology in Synechocystis
PCC 6803 using genome-scale carbon mapping models
This chapter has been previously published in modified form in Metabolic Engineering
(Saratram Gopalakrishnan, Himadri B. Pakrasi, and Costas D. Maranas. Elucidation of
photoautotrophic carbon flux topology in Synechocystis PCC 6803 using genome-scale
carbon mapping models. Metabolic Engineering 47(2018): 190-199.)
3.1. Introduction
Metabolic engineering of photosynthetic organisms is aimed at the sustainable
bioconversion of abundant and inexpensive substrates such as sunlight and CO2 into
valuable products such as biomass (Maurino and Weber, 2013), biofuels (Atsumi et al.,
2009), and secondary metabolites (Giuliano, 2014). The efficacy of metabolic engineering
interventions is evaluated by measuring internal fluxes via 13C-Metabolic flux analysis
(13C-MFA) methods (Metallo et al., 2009; Sauer, 2006; Tang et al., 2009). These methods
determine the intracellular flux distributions consistent with experimentally measured
metabolite labeling distributions given a stable-isotope-labeled input carbon substrate
(Zomorrodi et al., 2012). Having CO2 as the only carbon substrate in cyanobacterial
photoautotrophic metabolism, implies uniformity of metabolite labeling distributions
under isotopic steady-state (Shastri and Morgan, 2007). As a consequence of this, flux
elucidation under photoautotrophic conditions requires transient labeling experiments and
aims to recapitulate metabolite labeling dynamics in addition to steady-state labeling
64
distributions (Young et al., 2008). While this approach presents the opportunity to address
key questions pertinent to cyanobacterial metabolism such as (i) completion of the TCA
cycle, (ii) utilization of the photorespiratory pathway, (iii) existence of the glyoxylate
shunt, and (iv) carbon fixation efficiency, experimental and computational challenges have
so far restricted wide applicability resulting in a limited number of isotopic instationary
MFA (INST-MFA) studies. These include a demonstration of sub-optimal carbon
incorporation in Synechocystis PCC 6803 (hereafter Synechocystis) (Young et al., 2011)
and assessment of TCA cycle functionality (Xiong et al., 2015) using a simplified central
metabolic model. Other studies aimed at capturing the metabolic response to nitrogen
depletion (Hasunuma et al., 2013) and essentiality of the photorespiratory pathway (Huege
et al., 2011) only obtained split ratios using fractional labeling and turnover of metabolites
as opposed to network-wide fluxes. Targeted flux ratio elucidation in any labeling
experiment is vulnerable to errors arising from distal influences (Gopalakrishnan and
Maranas, 2015a; McCloskey et al., 2016b; Suthers et al., 2007). Moreover, ignoring pre-
existing (unlabeled) carbon pools upon reaction lumping in core metabolic models causes
the artificial acceleration of labeling dynamics leading to significant disagreements
between model predictions and experimental data (Noh and Wiechert, 2011). Furthermore,
pool sizes are often not measured despite being co-estimated with fluxes, resulting in poor
resolution of most metabolite pools sizes. These factors are likely to bias the analysis of
labeling data using core metabolic models, motivating the re-analysis of metabolite
labeling dynamics obtained during transient labeling experiments using a genome-scale
metabolic mapping (GSMM) model.
65
The accuracy of flux estimation using a GSMM model is contingent on the curation of the
base genome-scale metabolic (GSM) model. The GSM model for Synechocystis iSyn731
(Saha et al., 2012) accurately predicts 95% of the available gene (non)essentiality data
which is better than the prediction capability of the iAF1260 model for E. coli (Zomorrodi
and Maranas, 2010). This safeguards against incorrect flux inference arising from omission
of too permissive inclusion of reactions in the model (Gopalakrishnan and Maranas,
2015a). With the availability of a curated GSM model and transient metabolite labeling
distributions (Shastri and Morgan, 2007), (i) the construction of a genome-scale metabolic
mapping (GSMM) model and (ii) scalability of existing algorithms become the bottlenecks
for successful flux elucidation at the genome-scale (Gopalakrishnan and Maranas, 2015b).
In addition to the carbon paths contained within core models (Abernathy et al., 2017;
Alagesan et al., 2013; Feng et al., 2010; Yang et al., 2002a, b, c; You et al., 2014; Young
et al., 2011; Zhang and Bryant, 2011), the GSMM model affords expanded pathway
coverage to include glyoxylate metabolism, completion of the TCA cycle, and recycling of
by-products of peripheral metabolism such as CO2, formate, glycolate and acetate. While
the most reliable source of atom mapping data is by directly tracing the reaction
mechanism, it is not available for most reactions, thus requiring the use of computational
procedures such as MCS (Chen et al., 2013), PMCD (Jochum et al., 1980), EC (Morgan,
1965), MWED (Latendresse et al., 2012), and CLCA (Kumar and Maranas, 2014) to infer
plausible mappings. Simulation of labeling distributions for a given flux distribution is
performed via integration of a system of ordinary differential equations (ODEs) (Noh et
al., 2006; Young et al., 2008) upon decomposition of the mapping model using frameworks
such as cumomers (Wiechert et al., 1999) or Elementary Metabolite Units (EMUs)
66
(Antoniewicz et al., 2007). Fluxes are estimated as the solution of a non-linear least-squares
fitting problem that minimizes the deviation of predicted intracellular metabolite labeling
distributions and dynamics from experimental data. Since the analytical solution for the
system of ODEs describing labeling dynamics is not tractable, the ODEs must be integrated
numerically. Memory requirements limits the use of available integration packages, thus
requiring the development of customized integrators. The state-of-the art algorithm (Young
et al., 2008) utilizes an exponential integrator in conjunction with a first-order hold
equivalent. When expressed in state-space form, the solution to these equations involves
the computation of the exponential of a matrix, which scales poorly with network size
requiring the development of more efficient algorithms undertaken in this study.
In this chapter, genome-scale INST-MFA is performed to glean insights into the metabolic
map of photoautotrophically grown Synechocystis. A GSMM model imSyn617 for
Synechocystis is constructed based on the corresponding GSM model iSyn731 (Saha et al.,
2012) to enable flux elucidation using previously measured metabolite labeling dynamics
(Young et al., 2011). The set of active reactions under photoautotrophic growth conditions
is identified by performing Flux Variability Analysis (Mahadevan and Schilling, 2003)
upon constraining the model with experimentally measured growth and product yields
(Young et al., 2011) for growth with bicarbonate as the sole carbon source. The GSMM
imSyn617 is constructed to encompass all active reactions involved in carbon balances.
Reaction mapping information is assembled from imEco726 (Gopalakrishnan and
Maranas, 2015a), reaction mechanisms and the CLCA algorithm (Kumar and Maranas,
2014). imSyn617 is deployed for genome-scale INST-MFA to uncover novel insights into
67
the biology of photoautotrophic growth of Synechocystis using the published labeling data
for 15 metabolites from central metabolism (Young et al., 2011). We infer that only 88%
of the assimilated bicarbonate is fixed via the Calvin-Benson-Bassham (CBB) cycle while
the rest is fixed by phosphoenol pyruvate carboxylase (PPC) but eventually off-gassed as
CO2 through malic enzyme, the TCA cycle, and peripheral metabolic pathways. We
confirmed that there is no flux through the oxidative pentose phosphate pathway and that
regeneration of pentose phosphates occurs through the transaldolase reaction. With no flux
through pyruvate kinase, pyruvate is synthesized indirectly from phosphoenol pyruvate
(PEP) via PPC and malic enzyme. Trace flux is observed from α-ketoglutarate (AKG) to
succinate indicating dispensability of the lower TCA cycle during photoautotrophic
growth. Moreover, the oxygenase reaction of RuBisCO is found to be the primary source
of glycine with serine being synthesized directly from 3-phosphoglycerate (3PG). These
modalities result in a bifurcated topology of the TCA cycle reactions and serine metabolism
enabling maximal conversion of RuBisCO fixed CO2 to biomass. This analysis confirmed
that maximization of biomass yield from fixed carbons explains the allocation of fluxes in
the metabolic network in Synechocystis as supported by experimental findings.
3.2. Methods
3.2.1. Construction of imSyn617
The GSM model for Synechocystis (iSyn731) (Saha et al., 2012) was simplified using Flux
Variability Analysis (FVA) (Mahadevan and Schilling, 2003) under photoautotrophic
conditions using bicarbonate as the sole carbon source to eliminate all reactions incapable
of carrying flux. The feasible solution space was constrained using growth rate and organic
68
acids yield (Young et al., 2011). Photon fluxes are calculated based on experimental
lighting conditions described earlier (Nogales et al., 2012). Thermodynamic infeasible
cycles (Schellenberger et al., 2011) in the form of isles (Wiechert and Wurzel, 2001) were
manually eliminated to further reduce the size of the metabolic model. The phosphoserine
pathway was included in accordance with recent genome annotation updates (Klemke et
al., 2015). The recently proposed Entner-Doudoroff pathway was excluded from the
metabolic model based on its dispensability under photoautotrophic growth conditions and
a lack of pathway characterization using tracer experiments (Chen et al., 2016). The
phototrophic growth model for Synechocystis contains 729 reactions and 679 metabolites.
The GSMM model imSyn617 was constructed for Synechocystis starting from the existing
GSMM for E. coli, imEco726 (Gopalakrishnan and Maranas, 2015a). Carbon mapping for
498 reactions were obtained directly from imEco726, spanning glycolysis, pentose
phosphate pathway, TCA cycle, biosynthesis of all amino acids except glycine and serine,
synthesis of palmitate and stearate, nucleotide biosynthesis, and the synthesis of cofactors:
NAD, tetrahydrofolate, and riboflavin. Of the originally 109 unmapped reactions, 96
reactions spanning carbon fixation, photorespiration, glyoxylate metabolism, glycolipid
and polyunsaturated fatty acid synthesis, and porphyrin biosynthetic pathways generated
metabolites recycled in central metabolism. Atom mapping for 68 unmapped reactions was
constructed from the reaction mechanism of each reaction (Supplementary File 1).
Mapping for the remaining 41 reactions including spontaneous reactions and those without
available mechanisms was obtained using the CLCA algorithm (Kumar and Maranas,
2014; Kumar et al., 2012). Alternate mappings were generated for 49 reactions based on
the presence of nine symmetric metabolites in the cyanobacterial models. Carbon
69
rearrangements within the triose phosphates were identified by carbon path tracing using
an EMU-based depth-first search algorithm. All carbon atoms (single and bonded) are
represented using their corresponding EMU following a carbon numbering scheme
consistent with the IUPAC convention. The metabolic model and the corresponding atom
mapping model are made available in Supplementary File 1.
3.2.2. Algorithmic procedure for flux estimation based on least-squares
minimization
Flux and range estimation following EMU decomposition (Antoniewicz et al., 2007) of the
mapping model was performed as described earlier (Gopalakrishnan and Maranas, 2015a).
The labeling dynamics of 15 central metabolites, spanning sugar phosphates, glycolytic
intermediates, and organic acids, utilized for flux estimation was obtained from a previous
study (Young et al., 2011) with Synechocystis grown under photoautotrophic growth
conditions and 50% 13C-bicarbonate tracer. Model decomposition resulted in the
identification of 851 EMUs, 156 free fluxes (Wiechert et al., 1999), and 204 pool sizes.
Metabolite labeling dynamics was modeled using a system of 8.4 × 105 simultaneous
ODEs relating metabolite labeling distributions 𝑿(𝑡) to the initial labeling state, 𝑿(0), the
carbon tracer, and the system state transition matrix, 𝑭, containing fluxes, 𝒗, and pool sizes,
𝒄. This system of ODEs simulates 2,311 EMU mass fractions and their sensitivity to 367
fitted parameters. The system of equations in continuous time domain was converted to
discrete time domain using the procedure described in Appendix B. The mathematical
expressions for the transition matrices, 𝜱, 𝜞, and 𝜴, in terms of the 𝑭 were obtained by
solving the system of ODEs after applying a non-causal first-order hold equivalent
70
(Franklin et al., 1997) as opposed to the previously described state-space form method
(Young et al., 2008) so as to improve the scalability of the flux estimation procedure. This
resulted in a 7% and 48% reduction in computation time for the core model and the GSM
model, respectively. This significance of this improvement is anticipated to increase with
model size. The NLP was solved using a modified Levenberg-Marquardt algorithm
(Madsen et al., 2004) equipped to handle linear inequality constraints (Gill et al., 1984).
The NLP was solved with 100 randomized initial feasible flux distributions and the best
solution was chosen for confidence interval calculations owing to the non-convex nature
of the objective function. The quality of the obtained flux distribution was evaluated using
a 𝜒2 goodness of fit test to ensure statistical significance of the obtained results. 95%
confidence intervals were determined as described earlier (Antoniewicz et al., 2006;
Gopalakrishnan and Maranas, 2015a). All fluxes, expressed in mmol/dmol bicarbonate
uptake (BCU), are normalized to 100 mmol/gdw-hr HCO3- uptake.
3.3. Results
This section highlights the novel carbon paths included upon scale-up to a GSMM model
and their role in facilitating prediction departures from flux distributions obtained using
core models. In addition, flux topologies of pathways absent in the core model are
elucidated and their biological implications are discussed.
3.3.1. New carbon paths covered by mapping model imSyn617
Expansion of pathway coverage in the GSMM model of Synechocystis to include
glyoxylate, amino acid, lipid, and peripheral metabolism contributes to 18 novel carbon
71
paths not captured by the core model (Figure 3.1). These novel paths arise from new carbon
skeleton rearrangements, conserved group recycling, and new mechanisms for CO2
incorporation in Synechocystis. Three alternate routes to lower glycolysis are traced
through methylglyoxal synthesis, photorespiration, and serine metabolism with identical
atom mapping. Two paths via arginine degradation and the GABA shunt are present with
atom transitions identical to the lower TCA cycle, indicating the presence of a TCA-like
carbon skeleton rearrangement despite the unresolved completion of the TCA cycle
(Steinhauser et al., 2012; Yu et al., 2013; Zhang and Bryant, 2011). Carbon recycling from
peripheral metabolism occurs via acetate, formate and CO2. The condensation of the
methyl group from S-adenosyl methionine and the δ-carbon of glutamate (GLU-5) in the
adenosylcobalamine pathway produces acetate, which is either metabolized via the TCA
cycle or channeled into lipid production. Formate is produced during the biosynthesis of
tetrahydrofolate (THF), riboflavin, and thiamin pyrophosphate, whereas CO2 is generated
as a byproduct of porphyrin, terpenoid, and pyridoxal phosphate biosynthetic pathways.
Formate and CO2 are also produced as the end products of glyoxylate oxidation via oxalate.
Formate is oxidized to CO2 via formate dehydrogenase, which is eventually reincorporated
via RuBisCO and the anaplerotic PPC reaction. The tetrahydrofolate pathway also
generates glycolate, which feeds into the photorespiratory pathway. Since Synechocystis
lacks PEP carboxykinase (PPCK) activity to drive carbon flow from the TCA cycle to
lower glycolysis, CO2 incorporated via PPC is routed to TCA cycle-derived metabolites
only. CO2 is also incorporated via glycine dehydrogenase (GLYDH) in glyoxylate and
glycine metabolism in which glycine is synthesized by condensation of one CO2 and a
methenyl group donated by methenyl-THF. This reaction, in conjunction with flux through
72
the photorespiratory pathway contributes to three novel carbon backbone arrangements
possibly unique to cyanobacteria (Figure 3.2a). It is important to note that these
mechanisms both incorporate and off-gas CO2 atoms of different origin resulting in
alteration of labeling distributions of various intracellular metabolites similar to 13C
labeling dilution effects seen during aeration of cell cultures (Leighty and Antoniewicz,
2012). However, it appears that net carbon fixation is performed by RuBisCO alone. In
addition to carbon skeleton rearrangements, the mapping model reveals the existence of
pathways facilitating conserved moiety cycling (E4P and G3P), which are capable of
delaying 13C incorporation (Figure 3.2b). The comprehensive inventory of carbon paths
contained within the GSMM model provides the means for better recapitulation of
experimental data and accurate flux elucidation with a high level of detail.
3.3.2. Comparison of elucidated fluxes between using imSyn617 and core mapping
models
The simulated labeling distributions are in much better agreement with experimental data
when fitted using imSyn617 (Sum of Squares of Residuals, SSRES = 511.4; Degrees of
freedom, DOF = 556) compared to the core model (Young et al., 2011) (SSRES = 684,
DOF = 697). The statistical significance of the reduction in SSRES using imSyn617 was
assessed using an F-test. The F-test provides a way of testing whether the improvement in
fit upon model expansion is not due an increased number of parameters but rather due to
better capturing of labeled carbon routes through metabolism. The F-statistic is calculated
to be 1.335 (p = 0.012). This value confirms the statistical significance of the additional
parameters introduced in the imSyn617 with a confidence level of 95%. Furthermore, the
73
substantially different flux distribution elucidated using imSyn617 associated with the
reduced SSRES captures a statistically significant new optimum in the least squares
objective function. The improved fit is attributed to the better recapitulation of the labeling
dynamics of PEP-167, 3PG-185, and RuBP-309 fragments indicated by a reduction in
SSRES (Figure 3.3). Because the experimentally measured metabolite labeling distribution
and dynamics are inconsistent with the sole action of the CBB cycle (Figure 3.6 and 3.7),
flux datasets inferred by both models employ compensatory mechanisms to delay the mass
shift associated with 13C incorporation. Simplification of reactions in the core model via
lumping of linear pathways may accelerate metabolite labeling dynamics (Noh and
Wiechert, 2011). As a result, the core model derives unlabeled carbons from glycogen
degradation in conjunction with flux through the oxidative pentose phosphate pathway to
delay turnover of metabolite pools (Figure 3.8). In contrast, imSyn617 does not lump
reactions and further delays metabolite pool turnover by favoring carbon paths involving
conserved moieties (Figure 3.2 and 3.4), thereby affording a reduction in deviation from
experimental data using imsyn617 (Figure 3.6). An immediate consequence of this flux
redistribution is the dispensability of flux through the oxidative PP pathway (Figure 3.4)
according to imSyn617. The biomass formation reaction in the core model is approximated
using precursors from central metabolism (Shastri and Morgan, 2007) whereas imSyn617
mirrors completely the biomass equation of iSyn731 parameterized using experimental
measurements (Nogales et al., 2012; Saha et al., 2012). This results in significant
differences in metabolite drains between the two models. Overall, differences in labeling
dynamics and stoichiometry associated with biomass metabolite drains contribute to stark
74
shifts in estimated central metabolic flux ranges upon scale-up from a core to a genome-
scale mapping model (Figure 3.4).
Flux elucidation using a GSMM model reveals that only 88% of the assimilated
bicarbonate is fixed by RuBisCO (Figure 3.4) compared to 120% in the core model. The
increased flux through RuBisCO predicted by the core model is attributed to a 16
mmol/dmol bicarbonate uptake (BCU) flux through G6PDH, resulting in a futile cycle
between the CBB cycle and the PP pathway. This futile cycle is shown to be inactive when
using imSyn617. This is consistent with experimentally verified dispensability of this
pathway inferred from unimpaired growth of the Synechocystis zwf mutant under
photoautotrophic growth conditions (Scanlan et al., 1995). As a consequence of this, an
83% reduction in the flux through PGI is seen using imSyn617 compared to the core model
with the only purpose of generating G6P for glycogen and glycolipid synthesis. imSyn617
leverages the E4P recycling mechanism (Figure 3.2b) to delay metabolite labeling
dynamics leading to a two-fold increase in flux through SBA and SBP reactions and a flux
of 37 mmol/dmol BCU through TAL in imSyn617 compared to the core model. The use
of this pathway for regeneration of pentose sugar phosphates results in no flux through
FBA and FBP reactions. Note that Synechocystis contains two FBAs: CI-FBA with higher
reactivity for fructose-bisphosphate and CII-FBA with higher reactivity for sedoheptulose-
bisphosphate. CI-FBA has been shown to be non-essential during photoautotrophic growth
of Synechocystis (Nakahara et al., 2003). It has been suggested that over 90% of the FBA
in Synechocystis under photoautotrophic growth conditions is CII-FBA (Liang and
Lindblad, 2016), consistent with the fact that CII-FBA has a 3-fold higher transcriptomic
75
abundance (Saha et al., 2016) and a 39-fold higher proteomic abundance (Takabayashi et
al., 2013) compared to CI-FBA. These findings support an inactive CI-FBA which results
in the absence of flux through the fructose bisphosphate aldolase and fructose-
bisphosphatase reactions. As a consequence of this, hexose phosphates for glycogen
synthesis can only be synthesized via the TAL reaction. This increased flux through TAL
is consistent with the experimentally observed higher expression levels of the tal gene
during photoautotrophic growth phase in Synechocystis (Kucho et al., 2005). In order to
assess the impact of the higher flux through the TAL reaction on the quality of fit, flux
elucidation was performed using imSyn617 upon constraining the flux through TAL to
zero. Removal of the TAL reaction redirects carbon flux through the FBA/FBP route and
the oxidative pentose phosphate pathway. Since the TAL reaction participates in a cycle
involving a conserved E4P moiety, flux through this cycle delays the 13C incorporation into
sugar phosphate intermediates in the CBB cycle. This delay is not possible via the
FBA/FBP route and therefore increases the SSRES to 541 due to poorly recapitulated
labeling dynamics of R5P229 and RuBP309 fragments (Figure 3.9). In comparison, the
core model uses the traditional CBB pathway for pentose phosphate regeneration, resulting
in a flux of 58 mmol/dmol BCU through FBA and FBP reactions while the directionality
of TAL remains unresolved.
While both models assume the same biomass macromolecular compositions, differences
in precursor drains result in significant flux range shifts downstream to carbon fixation in
the core and imSyn617 models. In particular, lipids are traced indirectly only through free
fatty acids in the core model as opposed to a complete coverage of glycolipids, di- and
76
triacylglycerols (DAGs and TAGs), phospholipids, and sulfoquinovosyl DAGs in
imSyn617. This results in a reduced acetyl-CoA demand and an increased DHAP demand
for biomass production in imSyn617. As a consequence, a 50%, 89%, and 67% reduction
in flux is predicted through lower glycolysis, PK, and PDH reactions, respectively. In
addition to this, the core model uses the glyoxylate shunt as a secondary source of
glyoxylate, thereby enabling completion of the TCA cycle without including the AKGDH
reaction. In contrast, the glyoxylate shunt is excluded from iSyn731 as this pathway is
shown to be absent in Synechocystis (Thiel et al., 2017) resulting in glyoxylate production
only in the photorespiratory pathway. Furthermore, iSyn731 accounts for multiple avenues
for the completion of the TCA cycle via AKGDH and its alternate routes and captures
glycine and serine metabolism, thereby elucidating parts of the metabolic network not
captured by the core model.
3.3.3. New insight on carbon paths gained using imSyn617
The overall carbon balance reveals that 86% of the assimilated bicarbonate is channeled
towards biomass production, 12% is ultimately off-gassed as CO2 and the remaining 2% is
distributed between organic acids and glycogen storage. 602 reactions are resolved with a
flux range narrower than 10 mmol/dmol BCU. 407 reactions are identified to be growth-
coupled. These flux ranges were compared to flux ranges generated using FVA upon
constraining the bounds of substrate uptake and product yields with MFA-derived lower
and upper bounds. The superior flux resolution afforded by INST-MFA compared to
simply FVA is attributed to the unambiguous elucidation of fluxes across all branch-points
such as CBB cycle/photorespiration, glycolysis/PP pathway, anaplerotic reactions, and the
77
TCA cycle. In addition to this, futile cycles involving central metabolic reactions such as
PK, GAK, and PFK are well resolved with zero flux using INST-MFA compared to FVA
based on their contribution to carbon skeleton rearrangement and impact on metabolite
labeling dynamics. 61 reactions outside the purview of the EMU model are poorly resolved
by both INST-MFA and FVA. These include reactions from energy metabolism such as
cyclic and non-cyclic photophosphorylation, Mehler reaction, and oxidative
phosphorylation, and reactions facilitating reversible transfer of reducing equivalents
between various carriers such as ferredoxin, NAD+ and NADP+ such as Glutamate
dehydrogenase, Glutamine synthase/Glutamate:Oxoglutarate aminotransferase system,
and isozymes of gluceraldehyde-3-phosphate dehydrogenase. The expanded pathway
coverage in iSyn731 provides insights into carbon flows through various pathways not
modeled in the core model such as aspartate, glutamate, glycine, and serine metabolism
and reveals the existence of pathway topologies supporting carbon conversion to biomass
with near 100% efficiency.
Glycine and serine metabolism exhibits a bifurcated topology involving reactions from the
photorespiratory pathway, the phosphoserine pathway, and SHMT (Figure 3.5a). Flux
through the carboxylation and oxygenation reactions of RuBisCO is partitioned in a 90:10
ratio with 9.7 mmol/dmol BCU of flux entering the photorespiratory pathway (Figure 4).
Oxygenation of RuBP produces one molecule 3PG and one molecule of 2PGLYC, which
is oxidized to glyoxylate in the photorespiratory pathway (Figure 3.5a). Since no oxidation
of glyoxylate to formate or CO2 occurs, all of the 2PGLYC synthesized via oxidation of
RuBP is converted to glycine. Absence of flux through glyoxylate oxidation is supported
78
by experimentally observed insignificant 13C incorporation into oxalate (Young et al.,
2011). 3PG is converted to serine via the phosphoserine (PHSER) pathway similar to E.
coli. Glycine is also produced from serine via the SHMT reaction. Since Synechocystis
does not accumulate or secrete glycine, and no glycine degradation occurs via GLYDH, it
is exclusively utilized for biomass production, as a result of which, the glycine producing
branch of the photorespiratory pathway is identified to be growth-coupled. Moreover, the
SHMT reaction is identified to be the sole source of the one-carbon pool carried by
tetrahydrofolate. A trace flux is observed through glycerate indicating that the second half
of the photorespiratory pathway is inactive causing the phosphoserine pathway to be
growth-coupled. This flux distribution results in a unique bifurcated topology achieving
complete carbon conversion of RuBP to glycine and serine with no losses in the form of
CO2. Furthermore, cysteine is also synthesized from serine and completely routed to
biomass as there is no flux through the cysteine-degrading mercaptopyruvate pathway. The
overall topology of this pathway allows glycine and serine biosynthesis from bicarbonate
with a 100% carbon conversion efficiency while reinforcing the essentiality of the
glycolate pathway in Synechocystis (Eisenhut et al., 2008). This observation is in contrast
to the linear pathway proposed in earlier GSM models of Synechocystis (Knoop et al.,
2010) and affords a higher 13C enrichment of serine than glycine, consistent with
experimental observations (Huege et al., 2007; Young et al., 2011).
The genome-scale mapping model imSyn617 achieves an unambiguous resolution of
fluxes around the pyruvate node (Figure 3.4). Pyruvate synthesis occurs indirectly from
PEP via the anaplerotic PPC and ME reactions due to the inactivity of PK, methylglyoxal,
79
and serine degradation pathways. In addition to this, no flux is seen through the PPS
reaction indicating unidirectional flux from glycolysis to the TCA cycle, thereby localizing
any CO2 incorporated via PPC to the TCA cycle only. Acetyl-CoA is produced via the
PDH reaction for lipid synthesis and TCA metabolism. Absence of flux through all
alternate routes connecting AKG and succinate in conjunction with the lack of a glyoxylate
shunt (Thiel et al., 2017; Varman et al., 2013) renders the TCA cycle incomplete with a
bifurcated topology incapable of completely oxidizing acetyl-CoA (Figure 3.4b). As a
consequence, all reactions of the TCA cycle are identified to be growth coupled as
Synechocystis does not produce any organic acids as byproducts of photoautotrophic
metabolism (Young et al., 2011). Fumarate is not synthesized directly via the TCA cycle
but is instead generated as a byproduct of arginine and purine biosynthetic pathways. This
fumarate serves as a precursor for succinate required for growth, while the excess fumarate
is converted to malate via fumarate hydratase.
3.4. Discussion
In this chapter, genome-scale INST-MFA is applied to elucidate photoautotrophic
metabolism in Synechocystis. Reactions capable of carrying flux in iSyn731 (Saha et al.,
2012) are identified via FVA using extracellular flux measurement data (Young et al.,
2011). The corresponding GSMM model imSyn617 includes all carbon-balanced reactions
Atom mapping for reactions shared with E. coli is derived from imEco726 (Gopalakrishnan
and Maranas, 2015a) and the remaining reactions are mapped using the CLCA algorithm
or based on reaction mechanism when available. A customized algorithm is developed with
improved scalability and memory efficiency leading to a 48% reduction per iteration in the
80
computational time required to simulate of metabolite labeling dynamics in larger
networks. INST-MFA is performed to identify a suitable flux distribution accurately
recapitulating the labeling distribution and dynamics of 15 central metabolites obtained
during photoautotrophic growth of Synechocystis with 50% 13C-labeled bicarbonate as the
tracer (Young et al., 2011). In response to degeneracy in the metabolic network and
experimental errors, 95% confidence intervals were also determined using the established
procedure (Antoniewicz et al., 2006; Gopalakrishnan and Maranas, 2015a) to identify flux
ranges for all reactions.
Upon evaluating the significance of the improved recapitulation afforded by imSyn617
using the F-test, the F-statistic is 1.335 (p = 0.012). In comparison, the corresponding F-
statistic for scale-up in E. coli was 0.152 (p = 0.999) indicating that the core model accounts
for the carbon paths necessary to recapitulate the labeling data used in that study
(Gopalakrishnan and Maranas, 2015a). The increased uncertainty of flux estimation was
attributed to the inclusion of alternate paths with identical atom mapping information. In
contrast, the statistical significance associated with model scale-up in this study implies
that unique and often surprising insights into the carbon flows under phototrophic growth
are obtained by the re-analysis of an existing dataset using a detailed description of the
entirety of metabolism in Synechocystis. Flux elucidation of photoautotrophic growth of
Synechocystis using imSyn617 reveals that Synechocystis deploys a carbon efficient
metabolism enabling maximal conversion of fixed carbons to biomass precursors with
minimal production of organic acids and glycogen. This is in contrast to heterotrophic
bacteria such as E. coli where 35% of the taken-up glucose is secreted as acetate (Sandberg
81
et al., 2016) resulting in a 30% biomass yield loss from the theoretical maximum biomass
yield (Feist et al., 2007). The flux ranges estimated in this study provide a comprehensive
set of essential and dispensable metabolic reactions in Synechocystis under
photoautotrophic growth conditions to serve as a guideline for editing photosynthetic
prokaryotic genomes. The estimated flux ranges reveal that net carbon fixation accounts
for only 88% of the assimilated bicarbonate. The remaining 12% is fixed by PPC, but is
subsequently oxidized to CO2 via malic enzyme, TCA cycle, and peripheral metabolic
reactions. These carbons are not recycled by the CBB cycle and are therefore off-gassed.
This inability to recycle these carbons via the CBB cycle is identified as a target to improve
upon in photosynthetic carbon fixation. It is unclear from this analysis whether this is
caused by a rate-limiting enzyme in the CBB cycle or a paucity of available NADPH and
ATP as the fluxes through the photosynthetic light reactions and oxidative phosphorylation
are poorly resolved by INST-MFA. Resolution of these reactions requires knowledge of
the spectral composition of the light source and photon flux partitioning between
photosystems I and II to distinguish ATP production via non-cyclic and cyclic
photophosphorylation. When combined with the measurement of net oxygen evolution
rate, these measurements will allow accurate elucidation of fluxes through the
photosynthetic light reactions and oxidative phosphorylation. This will enable resolution
of NADPH production and provide insights into the biological impact of a mandatory flux
through Malic Enzyme, consistent with experimentally verified essentiality of this gene
(Bricker et al., 2004). Unlike in E. coli (Gopalakrishnan and Maranas, 2015a), here
alternate routes to lower glycolysis and TCA cycle are extremely well resolved based on
differences in metabolite labeling dynamics, thereby demonstrating the superior capability
82
of INST-MFA in resolving pathways with similar atom transitions and establishing the
dispensability of the lower TCA cycle under photoautotrophic growth conditions.
The introduced algorithmic procedure (Appendix B) for performing flux elucidation at a
genome-scale offers a 48% reduction in computation time which will grow with larger
models. As this scheme employs an exponential integrator, a moderate level of stiffness
can still be handled when pool sizes exceed 10-4 mmol/gdw. Stiffness in INST-MFA
models arises from a degeneracy in pathway labeling dynamics due to the inclusion of
more pool size parameters than necessary to recapitulate experimentally observed labeling
distributions. As a consequence of this, the confidence interval estimation procedure will
fail to compute an upper bound for many metabolite turnover rates (defined as the ratio of
flux through a metabolite to its pool size). Since fluxes are scaled to bicarbonate uptake
and bounded by stoichiometric mass balance constraints, the uncertainties in the estimation
of metabolite turnover rates will be reflected in uncertainties in pool size resolution but
does not affect flux confidence interval calculation as long as the solution lies outside the
stiff regions. Due to this, pool size ranges are not computed in this study. This would
require the development of a higher order implicit method for ODE integration so as to
ensure accuracy and stability of the procedure. It has been previously reported that
channeling plays a key role in explaining the observed metabolite labeling dynamics
(Huege et al., 2007; Young et al., 2011). The presence of substrate channeling was
hypothesized based on the existence of segregated metabolite pools inferred from dilution
parameters (Young et al., 2011). Consistent with earlier findings, up to 10% of the 3PG
and F6P pools are found to be metabolically inactive with no segregation of the PEP pool,
83
alluding to the presence of a channeling mechanism from 3PG to PEP. Quantification of
pool sizes will provide detailed insights into substrate channeling mechanisms arising from
CBB cycle enzyme co-localization similar to that seen in plant chloroplasts (Anderson and
Carol, 2004; Anderson et al., 2005; Suss et al., 1993). In conjunction with the bi-
functionality of the SBPase enzyme (Yan and Xu, 2008), this would explain the preference
for TAL-SBPase-SBA route for the regeneration of pentose sugar phosphates as opposed
to the conventional TPI-FBA-FBPase pathway despite the lack of an energetic advantage.
Nevertheless, the flux estimation algorithm always converged outside the stiff regions in
the solution space, indicating that the obtained flux ranges are not confounded by stiffness
of the system of ODEs describing metabolite labeling dynamics in GSM models. imSyn617
coupled with customized integrators enables the elucidation of fluxes with a global
coverage and high statistical confidence by re-analyzing already available labeling
datasets. This newly reached scope and fidelity in flux elucidation promises to enhance
both kinetic model parametrization (Khodayari and Maranas, 2016) and facilitate the use
of strain design algorithms such as OptForce (Ranganathan et al., 2010), and k-OptForce
(Chowdhury et al., 2014).
84
Figure 3.1. Representation of central metabolism in Synechocystis. The reactions
exclusive to the core model and the GSM model are indicated in orange and green,
respectively. Metabolite drains for biomass formation and peripheral metabolism are
indicated in dashed arrows with GSM-specific interactions indicated in green. Completion
of the TCA cycle (AKGDH) is indicated using a dashed green arrow to represent the
existence of alternate routes between this pair of metabolites
85
Figure 3.2. Carbon incorporation paths and conserved moiety cycling in imSyn617. (a)
CO2 reincorporation via photorespiration. Solid black circles represent reincorporated CO2
atoms. Reversible glycine degradation is the primary carbon scrambling reaction in this
pathway allowing the incorporation of degraded glyoxylate carbons as well as substrate
bicarbonate to generate three unique carbon arrangement patterns of 3PG. (b) Recycling of
conserved moieties within central metabolism. The conserved E4P moiety generated due
to the interaction between TAL from the non-oxidative PP pathway and SBA and SBPase
from the regeneration phase of the CBB cycle is indicated in red whereas the conserved
triose phosphate moiety recycled between the serine biosynthetic pathway,
photorespiration, and lower glycolysis is indicated in blue.
(a)
1
2
3
4
5 3
4
5
1
2
1
2
1
2 2
2 1
1
1
2
1
1
11
2
1
1
1
1 1 1
1
2
1
1
1
1 1 1RuBP
CO2
FOR
GLX
OXL
GLY
SER3PGCO2
CO2
CO2
CO2
3PG
MEETHF
86
(b)
87
Figure 3.3. Recapitulation of experimentally observed labeling distribution and
dynamics expressed in terms of variance-weighted sum of squares of residuals (SSRES)
using the core model (orange bars) and the GSMM model (green bars) of Synechocystis.
Fragments with an SSRES difference exceeding 25 are indicated using a black box.
88
Figure 3.4. Flux ranges (expressed in mmol/dmol bicarbonate uptake (BCU)) of central
metabolic reactions in Synechocystis during photoautotrophic growth predicted using a
core model (orange bars) and a GSMM model (green bars). The bars represent the range
of flux from its lower bound to its upper bound. The reaction names on the y-axis are
consistent with the nomenclature used in Figure 3.1. Excluded reactions in each model are
assumed to carry no flux.
89
Figure 3.5. Bifurcated topology of the photorespiratory pathway (a) and the TCA cycle
(b). Flux (in mmol/dmol BCU) through each reaction is specified in blue. Arrows indicate
direction of flux. Reaction abbreviations are consistent with Figure 1. The dashed arrows
represent metabolite drains for biomass production.
(a)
90
(b)
91
Figure 3.6. Recapitulation of labeling dynamics of CBB intermediates. Fit quantified
by the standard deviation-weighted residuals for the mass isotopomers of (a) PEP-167, (b)
3PGA-185, and (c) RUBP-309 at various time points for the core model (black bars) and
the GSM model (gray bars).
(a)
92
(b)
(c)
93
Figure 3.7. Carbon positional shifts in Synechocystis due to scrambling in upper
glycolysis, PPP, and the Calvin Cycle. (A) Carbon paths mapping positions C1 and C2 of
RuBP (RuBP-1,2) to 3PG via PPP and the Calvin cycle depicted as EMU reactions. (B)
Positional shifts of glucose carbon positions C2 and C3 upon flux through the PPP.
A
94
B
95
Figure 3.8. F-Test on the oxidative pentose phosphate pathway. (a) Recapitulation of
experimentally observed labeling distribution and dynamics expressed in terms of
variance-weighted sum of squares of residuals (SSRES) using the core model of
Synechocystis when the oxidative pentose phosphate pathway is allowed to carry flux (gray
bars) and when it is constrained to carry no flux (black bars). The total SSRES increases
from 684 to 742. A statistically significant reduction in SSRES is seen upon permitting
flux through the oxidative pentose phosphate pathway (F = 59.1, p = 5 × 10−14). However,
both flux distributions are statistically acceptable. (b) Comparison of labeling dynamics of
RuBP309 fragment when the oxidative pentose phosphate pathway is active (gray bars)
and when it is constrained to carry no flux (black bars). The increase is SSRES arises from
an acceleration in labeling dynamics based on a reduction in the unlabeled fraction upon
constraining the flux through the oxidative pentose phosphate pathway to zero.
96
(a)
(b)
97
Figure 3.9. F-Test on Transaldolase. Recapitulation of experimentally observed
labeling distribution and dynamics expressed in terms of variance-weighted sum of squares
of residuals (SSRES) using imSyn617 when the transaldolase (TAL) reaction is allowed to
carry flux (gray bars) and when it is constrained to carry no flux (black bars). The total
SSRES increases from 511 to 541. A statistically significant reduction in SSRES is seen
upon permitting flux through the TAL reaction (F = 16.32, p = 6.1 × 10−5). However, both
flux distributions are statistically acceptable. Upon constraining the flux through TAL to
zero, carbon flux is diverted through the conventional FBA/FBPase route, accompanied by
a reduction of flux through the SBA reaction.
98
Chapter 4
K-FIT: An accelerated kinetic parameterization algorithm using steady-
state fluxomic data
4.1. Introduction
The pressing need for the rapid development of truly predictive models of metabolism to
accelerate build-design-test cycles for metabolic engineering has been widely reported
(Cheng and Alper, 2014; Dromms and Styczynski, 2012; Long et al., 2015). Advances in
synthetic biology (Chae et al., 2017; Cho et al., 2018; Stovicek et al., 2017) have alleviated
the challenge of genome editing placing the onus on the decision of what genetic
modifications to carry out in metabolic engineering projects. There already exist a number
of strain design algorithms that operate on genome-scale stoichiometric descriptions of
metabolism including Optknock (Burgard et al., 2003), RobustKnock (Tepper and Shlomi,
2010), BiMOMA (Kim et al., 2011), and OptForce (Ranganathan et al., 2010) which have
been successfully applied to engineer glutamate and succinate overproducing strains in E.
coli (Kim et al., 2011), fatty acid production in E. coli (Ranganathan et al., 2012; Xu et al.,
2011) and overproduction of flavonoid precursor shikimate in Saccharomyces cerevisiae
(Suastegui et al., 2017).
Because stoichiometric models do not directly capture the effect of metabolite
concentration changes, protein level fluctuations, enzyme saturation, or allosteric
regulation of enzymatic activity (Chowdhury et al., 2015a; Saa and Nielsen, 2017), strain
design algorithms cannot identify interventions to allosteric and transcriptional regulations
99
such as the ptsG knockout for succinate overproduction in E. coli (Chowdhury et al.,
2015b) or upregulation of the transcription regulator FapR for malonyl-CoA
overproduction in E. coli (Xu et al., 2014). Kinetic models of metabolism can alleviate
these shortcomings by quantitatively describing the relationship between fluxes, enzyme
levels, and metabolite concentrations based on mechanistic and/or approximate rate law
formalisms. This allows kinetic models to trace the effect of allosteric regulation and
enzyme level changes (van Eunen et al., 2012), assess metabolic changes in response to
altered carbon sources (Kotte et al., 2010), resolve the accessibility of metabolic steady-
states (Lafontaine Rivera et al., 2017), and predict metabolic changes in response to drug
interventions (Frohlich et al., 2018).
The promise of superior product yield prediction offered by kinetic models by tracking
both enzyme levels and metabolite concentrations through metabolism comes at the
expense of substantially increased experimental data requirements and complexity in
model assembly, parameterization and interpretation of results. Construction of a kinetic
model requires knowledge across all reactions of (i) the mechanism of enzyme catalysis,
(ii) the effect of regulators (activators and allosteric inhibitors), and (iii) a network model
that is elementally balanced accurately reflecting the reaction stoichiometry. The catalytic
mechanism of the enzymes can be obtained from literature or from kinetic model
repositories such as KiMoSys (Costa et al., 2014) and information about effectors can be
obtained from databases such as BRENDA (Placzek et al., 2017) and SABIO-RK (Wittig
et al., 2012). Although it is tempting to also rely on database entries from BRENDA for
obtaining enzyme kinetic parameter values derived largely from in vitro enzyme assays,
100
limited availability of organism-specific data and differences between in vivo and in vitro
assay conditions leads to the haphazard integration of heterogeneous datasets into the same
kinetic model. Even when sufficient in vitro derived kinetic information is available to
construct a kinetic model, meaningful results are not necessarily achieved (Teusink et al.,
2000).
Therefore, in vivo kinetic parameters must be estimated by solving a nonlinear
programming (NLP) problem that recapitulates experimentally measured temporal
concentration profiles (Jahan et al., 2016) or steady-state fluxes (Khodayari et al., 2014) in
WT and genetic and/or environmentally perturbed mutants. The computational difficulties
arising from non-convexity in this NLP have been alleviated in the past by linearization of
nonlinear rate laws around a reference steady-state using a log-linear formalism
(Hatzimanikatis and Bailey, 1997) which forms the basis for the ORACLE framework
(Miskovic and Hatzimanikatis, 2010). However, linearization about a reference state limits
the predictive capabilities to the vicinity of the reference state (Saa and Nielsen, 2017).
Other mechanistic frameworks bypassing linearization such as Ensemble Modeling (EM)
(Tran et al., 2008) relates fluxes to metabolite concentrations using mass-action kinetics in
conjunction with elementary-step decomposition of the mechanism of enzyme catalysis.
Conservation of mass across metabolites and enzymes is decomposed into systems of
bilinear algebraic equations, allowing convenient insertion/deletion of regulatory
components. Simulation of concentration dynamics and identification of steady-state
fluxes requires integration of the system of ODEs representing conservation of mass across
metabolites (Hoops et al., 2006; Tran et al., 2008). In addition to these, many kinetic
101
models based on Michaelis-Menten and Hill kinetic formalisms have also been constructed
(Chassagnole et al., 2002; Srinivasan et al., 2018).
Unfortunately, the non-convexity in the NLP arising from enforcing conservation of mass
across all species limits the direct use of local optimization solvers such as MINOS
(Murtagh and Saunders, 1978), CONOPT (Drud, 1985) or fmincon within MATLABTM.
Instead, metaheuristic approaches such as genetic algorithms (GA) (Khodayari et al.,
2014) and particle swarm optimization (Millard et al., 2017) have been used in the past for
the traversal of kinetic parameter solution space. Meta-heuristic algorithms rarely saturate
the kinetic parameter space with function evaluations and require over 50,000 hours on a
high-performance computing cluster to parameterize a near-genome-scale kinetic model
containing 5,239 kinetic parameters (Khodayari and Maranas, 2016). More importantly,
they cannot confirm optimality of a reported solution due to the lack of gradient evaluations
which must be calculated using the computationally expensive forward sensitivity analysis
(Raue et al., 2013). The lack of efficient gradient estimation also prevents the evaluation
of local sensitivities along with any follow-up investigations on kinetic parameter
uncertainty. Although, recasting the problem within a Bayesian framework such as GRASP
(Saa and Nielsen, 2015) permits the computation of confidence intervals, applicability of
GRASP to large kinetic models is limited by the poor scalability of the underlying Monte-
Carlo-based sampling methods within the Bayesian paradigm (Saa and Nielsen, 2017).
Thus, long parameterization times stemming from poor scalability of existing
parameterization frameworks ultimately precludes the efficient computation of local
sensitivities of steady-state fluxes to kinetic parameters. As a result of this, accurate
102
confidence intervals for estimated kinetic parameters cannot be readily calculated, and
insights into (i) the robustness of resolution of kinetic parameters given mutant flux
datasets, (ii) kinetic parameter confidence levels, and (iii) need for follow up measurements
to improve prediction cannot be gleaned. In response to these challenges we put forth the
K-FIT algorithm, a decomposition-based approach for parameterization of kinetic models
using steady-state fluxomic and/or metabolomic data collected for multiple perturbation
mutants. K-FIT builds upon the concept of Ensemble Modeling (EM) by anchoring
concentrations and kinetic parameters to a reference strain but unlike earlier efforts
employing genetic algorithm (Khodayari et al., 2014) to parameterize the model, K-FIT
achieves many orders of magnitude improvement in efficiency by relying on a customized
decomposition approach. K-FIT was first benchmarked against EM for three test kinetic
models of increasing size ranging from 100 to 953 kinetic parameters to demonstrate the
increase in computational savings with model size. K-FIT remained tractable for even a
large kinetic model containing 307 reactions, 258 metabolites, and 2,407 kinetic
parameters parameterized with 1,728 steady-state fluxes from six single gene-deletion
mutants determined using 13C-metabolic flux analysis (13C-MFA) (Long et al., 2018).
The parameterization was carried out 100 times with random initializations and was
completed within 48 hours of computation time. The best solution was recovered 44 out of
100 times providing confidence that convergence to the true optimum was indeed achieved.
The kinetic model k-ecoli307 accurately recapitulated fluxes to within 15 mmol/gdw-h of
the values reported by 13C-MFA while also predicting fluctuations in glucose uptake in
response to genetic perturbation and flux rerouting through energy metabolism to meet
biosynthetic NADPH demands. The yield predictions of acetate, lactate, and malate for
103
engineered strains were found to be within 30% of the experimental yield for metabolites
derived from central metabolism. The presented algorithm includes local sensitivity
calculations for computation of gradients which ensures optimality of obtained solutions.
This feature will enable follow-up calculations on uncertainty in parameter estimations and
control coefficients to aid efficient design of experiments improve prediction fidelity of
kinetic models and inform metabolic engineering strategies.
4.2. Methods
4.2.1. Kinetic parameterization using K-FIT
K-FIT is a gradient-based kinetic parameterization algorithm that minimizes the least-
squares objective function representing the weighted squared deviation between predicted
and measured steady-state metabolic fluxes (and possibly metabolite concentrations)
across multiple genetic perturbation mutants. The full mathematical description for the K-
FIT algorithm is provided in the supplementary methods. The least-squares NLP is solved
using the Levenberg-Marquardt algorithm (Madsen et al., 2004) in conjunction with the
active-set method for enforcing linear inequality constraints (Gill et al., 1984). K-FIT is
encoded and implemented in MATLABTM and run on an Intel-i7 (4-core processor,
2.6GHz, 12GB RAM) computer. K-FIT is tested using kinetic models at different size
scales. The full source-code is made available on GitHub.
Computation of standard deviations for estimated kinetic parameters was performed using
linear regression tools applied to a local linearization of mutant fluxes using Taylor series
expansion (Wiechert et al., 1997). Briefly, the Covariance matrix 𝑪 is computed by
104
inverting the Hessian 𝑯 computed by K-UPDATE. When the linear approximation holds,
the diagonal of the covariance matrix represents the estimation variance of kinetic
parameters. The approximate standard deviation of kinetic parameter 𝑘𝑝 (𝜎𝑝) is evaluated
as 𝜎𝑝 = √𝐶𝑝𝑝. The approximate confidence interval is computed as 𝑘𝑝 ± 𝜎𝑝.
4.2.2. Construction of the expanded kinetic model of E. coli, k-ecoli307
The expanded metabolic model is constructed by de-lumping the central and peripheral
metabolic pathways in the core model (Foster et al., 2019 (Under Review)) based on the
reported biomass composition (Neidhardt and Curtiss, 1996). The expanded model
contains 307 reactions and 258 metabolites. Atom mapping for the additional reactions
were obtained from the previously published genome-scale carbon mapping model for E.
coli (Gopalakrishnan and Maranas, 2015a). The amino acid labeling data for flux
elucidation was obtained from the published work by Long et al (2018). Metabolic fluxes
and 95% confidence intervals were elucidated using 13C-metabolic flux analysis as
described earlier (Antoniewicz et al., 2006; Gopalakrishnan and Maranas, 2015a). The
mechanism and allosteric regulation of enzyme-catalyzed reactions in the model were
obtained from k-ecoli457, the near-genome-scale kinetic model for E. coli (Khodayari and
Maranas, 2016). The standard deviation 𝜎𝑗 corresponding to the estimated flux 𝑉𝑗 to be
used as a weighting factor in the K-FIT algorithm is computed from the lower and upper
bounds of the confidence interval 𝑉𝑗𝐿𝐵 and 𝑉𝑗
𝑈𝐵 reported by 13C-MFA as 𝜎𝑗 =𝑉𝑗
𝑈𝐵−𝑉𝑗𝐿𝐵
3.92.
Computed kinetic parameters were also packaged into Michaelis-Menten parameters as
described earlier (Khodayari and Maranas, 2016).
105
4.3. Results
In this section, first a schematic representation of the workflow of K-FIT (see Figure 1) is
described. The performance of K-FIT is next benchmarked against the Ensemble Modeling
(EM) (Khodayari et al., 2014) using three test kinetic models to assess the impact of model
scale-up on the computational savings afforded by K-FIT. The applicability of K-FIT to
near genome-scale models is then demonstrated using an expanded kinetic model for E.
coli (k-ecoli307) containing 307 reactions, 258 metabolites, and 2,407 kinetic parameters
parameterized using 13C-amino acid labeling data in six single gene-deletion mutants
(Long et al., 2018). Confidence intervals for all estimated elementary kinetic and
Michaelis-Menten parameters are estimated by leveraging the gradient calculations
embedded within K-FIT. The predictive capability of k-ecoli307 is then assessed by
comparing predicted product yields against experimentally measured yields in six over-
producing strains.
4.3.1. The K-FIT Algorithm
K-FIT is a gradient-based kinetic parameter estimation algorithm using steady-state flux
measurements from multiple genetic perturbation mutants. The schematic workflow for K-
FIT is shown in Figure 4.1 and Figure 4.4. Reaction fluxes are related to metabolite
concentrations using mass-action kinetics after decomposition of the enzyme catalytic
mechanism into elementary steps (see Appendix C for the detailed procedure for
elementary step decomposition). Conservation of mass across enzyme complexes and
metabolites is therefore expressed as a system of bilinear equations. This formalism was
chosen because it is mechanistically sound, obeys mass conservation laws and is inherently
106
thermodynamically feasible (Saa and Nielsen, 2017) while at the same time allows for easy
integration of allosteric regulation without the need to derive cumbersome nonlinear rate
laws. The K-FIT algorithm iteratively applies the following three steps till convergence is
reached: (i) K-SOLVE, (ii) Steady-State Flux Evaluator (SSF-Evaluator), and (iii) K-
UPDATE.
The objective of K-SOLVE is to anchor kinetic parameters to the reference state (WT
network) as described by Tran et al. (Tran et al., 2008). K-SOLVE uses as input the reverse
fluxes of all elementary steps and the enzyme fractions for the WT metabolic network. It
also removes the measured WT fluxes from the sum of the least squares objective function.
It then uses the resultant equalities, WT elementary fluxes and WT enzyme fractions to
satisfy all remaining degrees of freedom and assign unique values to the kinetic parameters
so as they inherently satisfy mass balances across metabolites and enzymes in the WT
network. This is an important consideration as mass balances are not always satisfied under
metabolic steady-state for an arbitrary assignment of values to the kinetic parameters.
Kinetic parameters anchored by K-SOLVE are then used by SSF-Evaluator to compute the
steady-state fluxes and concentrations across all mutants one at a time. The system of
bilinear algebraic equations in metabolite and enzyme complex concentration resulting
from the elementary step decomposition of enzyme catalysis is decomposed into two sub-
problems. The first bilinear sub-problem representing conservation of mass across enzyme
complexes, is reduced to a system of linear algebraic equations in enzyme complex
concentrations when the metabolite concentrations are specified. The second bilinear sub-
problem describing conservation of mass across all metabolites reduces to a system of
107
linear algebraic equations in metabolite concentrations when the concentrations of enzyme
complexes are specified. By iterating between these two linear sub-problems, the steady-
state enzyme levels and metabolite concentrations in each mutant is identified.
Convergence is achieved when the concentrations of enzyme complexes and metabolites
remains almost unchanged between successive iterations. This iterative scheme therefore
enables the direct evaluation of steady-state fluxes without the need to integrate any ODEs
and contributes to the speed-up of the kinetic parameterization process.
The calculated fluxes for all mutants are compared against the corresponding measured
fluxes and the sum of squared residuals (SSR), the first-, and second-order gradients are
computed by K-UPDATE. The WT reverse elementary fluxes and WT enzyme fractions
are then updated using a Newton step and the core loop of K-FIT is repeated until the
minimum deviation of predicted fluxes from experimental measurements is reached. The
local sensitivity of fluxes with respect to kinetic parameters for gradient calculation can
now readily be computed by solving a system of linear equations (see supplementary
methods) as opposed to having to perform costly forward sensitivity analysis. This enables
K-FIT to confirm that any reported solution is indeed optimal while also allowing for the
assembly of the covariance matrix from which approximate confidence intervals for the
estimated kinetic parameters can be efficiently calculated.
4.3.2. Benchmarking K-FIT against Ensemble Modeling
The computational performance of the K-FIT algorithm was first compared against
solution with a genetic algorithm (GA) operating on a population of models constructed
using the Ensemble Modeling (EM) approach. Three test models of increasing sizes were
108
used to assess the impact of model size on parameter estimation speed and solution
reproducibility. The first small model containing 14 reactions, 11 metabolites, and 100
kinetic parameters was adapted from the three glycolytic pathways in E. coli and was
parameterized using flux distributions from four single gene-deletion mutants (Figure
4.5a). The second medium-sized kinetic model containing 33 reactions, 28 metabolites,
and 235 kinetic parameters was adapted from a previous study (Greene et al., 2017) and
was parameterized using flux distributions from seven single gene-deletion mutants
(Figure 4.5b). The third test model describing carbon flows through central and amino acid
metabolism was adapted from the model developed by Foster et. al., (2019 (Under
Review)). This model (Figure 4.5c) contains 108 reactions, 65 metabolites, and 953 kinetic
parameters and was parameterized using flux distributions from seven single gene-deletion
mutants
Kinetic parameters were estimated using K-FIT in 9 minutes, 30 minutes, and 4 hours for
the three models, respectively. In contrast, EM required 60 hours, 726 hours, and 4,278
hours, respectively to parameterize the same three models. Computational speed-up
increased from 100-fold for the first model to 1000-fold for the core kinetic model upon
switching from GA to K-FIT. This dramatic reduction in parameterization time arises from
the (largely) integration-free steady-state flux evaluation using the SSF-Evaluator step and
the fact that K-FIT traverses the variable space in a highly economical manner (i.e., Newton
steps) requiring fewer than 500 steady-state flux evaluations to identify the optimal
solution. In contrast, the GA approach relies on iterative recombination and mutation of
kinetic parameter vectors requiring as many as 20,000 steady-state flux evaluations before
109
finding the same solution (Khodayari et al., 2014) though without confirming optimality.
SSF-Evaluator evaluated steady-state fluxes, on average, in 0.42, 1.13, and 6.12 seconds,
for the three models, respectively, whereas, numerical integration required 3.5, 120, and
440 seconds, respectively. Bypassing integration also enables SSF-Evaluator to handle stiff
systems of ODEs arising from the large dynamic range of kinetic parameters and ensuring
that steady-state fluxes are always within a mass imbalance of just 0.001 mol%. It is
important to note that Newton’s Method can only guarantee convergence to a local and not
necessarily the global minimum of SSR. As a safeguard against failure to reach the true
minimum, K-FIT was run 100 times starting from random initial starting points. For the
three test models K-FIT exhibited a best solution recovery of 98%, 93%, and 60%,
respectively. This high solution reproducibility provides confidence that K-FIT is able to
consistently converge to the lowest SSR of 0 for the small model, 8.9 for the medium-sized
model, and 1.3 for the core model, respectively. Notably, no alternate optima in the vicinity
of the best solution (within an SSR of 100) was detected for any of the three models
implying that the best SSR minimization solution is the only good kinetic parameterization
candidate.
The inherent ability of K-FIT to quickly calculate local sensitivities of predicted fluxes to
metabolite concentrations was leveraged to confirm whether SSFEstimator reported
steady-state concentrations that are stable. To this end, the eigenvalues of the Jacobian
matrix (Greene et al., 2017) at metabolic steady-state were calculated to confirm that the
real part of all eigenvalues were strictly negative for all iterations and problems solved.
The confidence intervals for the inferred kinetic parameters revealed that elementary
110
kinetic parameters were generally unresolved in all three models. For the first model, only
ten elementary kinetic parameters were resolved with a standard deviation less than 10%.
For the medium and core kinetic models, 86 and 90 kinetic parameters are resolved with a
standard deviation less than 10%. In order to investigate the origin of this wide confidence
intervals, we computed the confidence intervals for the enzyme fractions and elementary
fluxes in the WT strain that serve to anchor kinetic parameters in the K-SOLVE step. We
found that all elementary fluxes in all three models were estimated with an uncertainty of
less than 10%. In fact, the average standard deviation in the estimation of elementary fluxes
for the small, medium-sized, and core models was only 1.7%, 2.8%, and 0.97%,
respectively. However, the corresponding average standard deviation for enzyme fractions
for the three models was higher with values of 0.64, 0.27, and 1.81 mol/mol-total enzyme,
respectively. For the three models, only 4, 53, and 52 enzyme fractions were resolved with
a standard deviation less than 0.1 mol/mol-total enzyme. Since enzyme fractions are
bounded between zero and one, this implies that enzyme fractions are generally poorly
resolved for all three models. The uncertainty in the estimation of enzyme fractions
propagates to the aggregated kinetic parameters resulting in the observed wide confidence
intervals. The better resolution of enzyme fractions for larger models can be traced back to
the availability of flux data in more mutants for the medium and core models (seven and
six mutants, respectively) compared to only four mutants for the small kinetic model.
111
4.3.3. Parameterization of a kinetic model (k-ecoli307) for E. coli with near-genome-
wide coverage
Following the application of K-FIT on three test models of increasing models, K-FIT was
deployed for the parameterization of k-ecoli307, an E. coli kinetic model with near-
genome-wide coverage similar to k-ecoli457 (Khodayari and Maranas, 2016). The
expanded model containing 307 reactions, 259 metabolites and 2,407 kinetic parameters
encompasses central metabolism, expanded amino acid, fatty acid, and nucleotide
pathways, and lumped pathways for peptidoglycan biosynthesis. Compared to k-ecoli457,
this model lacks the pathways for anaerobic metabolism and secretion of organic acids as
it was parameterized using data under aerobic growth only. Flux data for six single gene-
deletion mutants were computed using 13C-Metabolic Flux Analysis (13C-MFA) to
recapitulate the measured labeling distribution of 10 proteinogenic amino acids and two
sugar phosphates grown with 1,2-13C-glucose as the carbon tracer (Long et al., 2018). This
provided a total of 1,728 MFA-determined fluxes for kinetic parameterization from the six
mutants. All 69 substrate level regulatory interactions for 26 reactions in the expanded
model were transferred from k-ecoli457. Complete cofactor balances were not included in
k-ecoli307. Instead, ATP was modeled as an energy sink replenished from the hydrolysis
of a single phosphate group, and NADH and NADPH balances were modeled as electron
pairs transferred. This simplification is necessary to allow the total pool of ATP, NADH
and NADPH to fluctuate across mutants, which would be otherwise impossible due to
metabolite pool dependencies introduced by cofactor recycling. The expanded model was
parameterized using flux distributions from the six mutants Δpgi, Δgnd, Δzwf, Δeda, Δedd,
112
and Δfbp with 288 fitted fluxes per mutant.100 fluxes were inferred using 13C labeling
data whereas 207 reactions were growth-coupled. Parameterization using the K-FIT
algorithm was completed in 48 hours on an Intel-i7 (4-core processor, 2.6GHz, 12GB
RAM) computer with a minimum SSR of 131 and a solution reproducibility of 44%.
The recapitulation of the experimentally measured fluxes by K-FIT for the six mutants is
shown in Figure 4.2. All predicted fluxes were within 15 mmol/gdw-h of their
corresponding flux reported by 13C-MFA. This corresponds to a maximum deviation of
only 10% from the experimentally determined fluxes. Flux distributions for Δeda, Δedd,
and Δfbp mutants were largely unchanged from WT (Figures 4.2a, 4.2b, and 4.2c) alluding
to the dispensability of the corresponding genes. In contrast, carbon flux was significantly
rerouted in response to the knockout of pgi, zwf, and gnd genes. Glucose uptake remained
similar to WT for the Δzwf mutant but routed completely via the EMP pathway (Figure
4.2d). The non-oxidative pentose phosphate pathway (TKT and TAL reactions) operated
in reverse to generate ribose-5-phosphate for nucleotide biosynthesis. Glucose catabolism
solely via the EMP pathway increased acetate and biomass production by 10%. The
expanded model also revealed that the loss of NADPH production via the oxidative pentose
phosphate pathway was compensated by a 90% increase in the flux through the
transhydrogenase reaction in the Δzwf strain. Glucose uptake for the Δgnd mutant was
decreased by less than 10% compared to WT (Figure 4.2e). Δgnd was the only strain with
a measurable flux through the ED pathway by rerouting 24.9 mmol/gdw-h of flux through
EDD and EDA reactions. Similar to Δzwf, the reversal of flux through the non-oxidative
113
pentose phosphate pathway generated the required ribose-5-phosphate for nucleotide
biosynthesis.
Of the remaining mutants, Δpgi involved the most significant flux rerouting relative to WT.
Glucose uptake was reduced by 75% compared to WT resulting in a 70% reduction in
growth rate (Figure 4.2f). Flux redirection through the glyoxylate shunt and reduction of
acetate secretion improved carbon routing towards biomass precursors, thereby increasing
the biomass yield by 22% compared to WT. Interestingly, the ED pathway was found to
carry only 2 mmol/gdw-h of flux with almost all of the carbon being metabolized via the
pentose phosphate pathway (see Figure 4.2f). In addition, an 80% reduction in flux through
glycolysis along with the absence of acetate secretion lowers overall glycolytic ATP
production. This loss is compensated by the reversal of flux through the transhydrogenase
reaction relative to WT to enable oxidation of excess NADPH generated by the oxidative
pentose phosphate pathway. k-ecoli307 captured that non-competitive inhibition of the
EDA reaction by glyceraldehyde-3-phosphate limits flux through the ED pathway in all
strains but Δgnd. In the Δgnd mutant, a 37-fold increase in the concentration of 6-
phosphogluconate provided the necessary driving force to overcome this product
inhibition, thereby allowing a flux of 24.9 mmol/gdw-h through the ED pathway. In
contrast, in mutant Δpgi a two-fold increase in the concentration of glyceraldehyde-3-
phosphate maintains the inhibition on the ED pathway which could not be overcome by a
40% increase in the concentration of 6-phosphogluconate (Hoque et al., 2011).
A total of 2,501 Km and Vmax values were subsequently assembled using the estimated
elementary kinetic parameters. Unlike as previously thought, the total number of
114
Michaelis-Menten parameters exceed the number of elementary kinetic parameters. This
can be traced back to the fact that the number of elementary kinetic parameters per reaction
increases linearly with the number of participating species (reactants and products) (see
Appendix C.1) as opposed to quadratic scaling with Michaelis-Menten parameters
(Cleland, 1963). The number of Michaelis-Menten parameters is always two less than the
number of elementary kinetic parameters for a reversible uni-uni reaction mechanism
(Cleland, 1963), is one less than the number of elementary kinetic parameters for a bi-uni
or uni-bi reaction and always exceeds the number of elementary kinetic parameters for bi-
bi and higher order reaction mechanisms (Cleland, 1963). Since the vast majority of
reactions in k-ecoli307 are uni-bi or bi-uni type reactions involving cofactors, the number
of Michaelis-Menten parameters are underestimated in this study due to simplifications in
cofactor metabolism. If the reaction description in k-ecoli307 were expanded to account
for all cofactor forms, protons, water, and phosphate groups, the number of Michaelis-
Menten parameters would be much higher. The larger number of Michaelis-Menten
parameters compared to elementary kinetic parameters would explain the reported
correlations within Michaelis-Menten parameters leading to multicollinearity (Heijnen and
Verheijen, 2013).
The standard deviation for the Michaelis-Menten parameters was calculated by
propagating the corresponding uncertainty of the elementary kinetic parameters (Figure
4.3). Only 321 of the 570 Vmax parameters were resolved with a standard deviation of less
than 20%. 181 Vmax values had a standard deviation exceeding 100% and were deemed
unresolved. Similarly, only 1,424 Km values were resolved with a standard deviation under
115
20% and 443 Km parameters had standard deviation exceeding 100%. As expected, the
estimation uncertainty for the elementary kinetic parameters was propagated to Km and
Vmax (Figure 4.6a). Interestingly, the fraction of very well-resolved (standard deviation <
1%) Michaelis-Menten parameters was 60% whereas only 28% of the elementary kinetic
parameters were well resolved. This was because the nonlinear mapping elementary kinetic
parameters onto Michaelis-Menten parameters resulted in 1,500 Michaelis-Menten
parameters assuming a value under 10 and thus have a narrow confidence interval despite
the uncertainty propagation. As was the case with the three test models, we find that all
1,129 reverse elementary fluxes in the WT network were resolved with a standard deviation
less than 1% whereas the resolution of enzyme fractions exhibited the same trend as
elementary kinetic parameters. This indicates that the wide confidence intervals for
elementary and Michaelis-Menten kinetic parameters can be traced back to inability to pin
down WT enzyme fractions. Unlike with the test models, the average standard deviation
for enzyme fractions was only 0.04 mol/mol-total enzyme with 793 of the 933 enzyme
fractions being resolved with a standard deviation of less than 0.1 mol/mol-total enzyme.
On the other hand, 333 elementary kinetic parameters were poorly resolved due to the fact
that the reactant enzyme complex corresponding to the unresolved kinetic parameter is
estimated to have an abundance less than 0.1 mol/mol-total enzyme leading to a larger
relative uncertainty of estimation. In contrast, 50 of 69 kinetic parameters for inhibition of
enzyme catalysis were resolved with a standard deviation of less than 10%. Well resolved
inhibition kinetic parameters include the inhibition of the oxidative pentose phosphate
pathway by NADPH, product inhibition of the ED pathway by glyceraldehyde-3-
phosphate, and product inhibition of cis-Aconitase by isocitrate. The narrow confidence
116
interval also places a non-zero lower bound on these inhibition constants implying the
essentiality of regulatory interactions in k-ecoli307 to explain the available experimental
flux datasets. It is important to note that the approach used for computing confidence
intervals generally places an upper bound on the confidence interval as it does not take into
account the nonlinear structure of the kinetic model. Accurate confidence intervals can be
computed using profile-likelihood approaches (Antoniewicz et al., 2006) and generally
result in narrower confidence intervals as commonly seen in 13C-MFA.
The predictive capability of the model was evaluated by comparing the model prediction
of product yields in engineered strains with the corresponding experimental yield. The
genetic perturbation mutants considered for evaluation of predictive capability were not
included in the training dataset for kinetic parameterization. Of the six over-producing
strains evaluated, the kinetic model successfully predicted the yields of acetate, malate, and
lactate to within 30% of the reported experimental yield (Table 4.1). This indicates that
the genetic perturbations in the training dataset for parameterization and that the regulatory
structure of the expanded kinetic model is sufficient to explain the phenotypic response of
E. coli to perturbations in the EMP pathway. The yield predictions for acetate and malate
were superior to those by k-ecoli457 due to the fact that both the training dataset for
parameterization of the expanded model and cultivation of the engineered strains were at
the same mid-exponential growth phase, whereas, the training dataset for k-ecoli457 was
generated during late exponential growth phase. The transcriptomic and fluxomic
differences between these two growth conditions limits the carbon flux through acetate
metabolism in the late exponential growth phase (Ishii et al., 2007). Unlike predictions for
117
core metabolism, product yields originating from peripheral metabolism were poorly
predicted by both models. This was because fluxes through peripheral metabolism are
growth-coupled in both models which limits both flux through these pathways as well as
flux variability across different mutant conditions and adversely impacts the prediction
fidelity of both kinetic models.
4.4. Discussion
This chapter details the development of K-FIT, an accelerated kinetic parameterization
algorithm based on steady-state fluxomic data. The K-FIT algorithm estimates kinetic
parameters by solving a nonlinear least-squares minimization problem to recapitulate
experimentally measured steady-state metabolite concentrations and fluxes using an
iterative loop comprised of three steps: K-SOLVE, SSF-Evaluator, and K-UPDATE. The
computational savings afforded by bypassing ODE integration improves parameterization
speed of K-FIT by over three orders of magnitude compared to the GA-based EM
procedure for a core model of metabolism containing 953 kinetic parameters. We anticipate
that these savings would become even more pronounced for larger models. The
parallelizable architecture of SSFEstimator improves the scalability of the procedure while
allowing compatibility with GPU-based computing architectures which affords significant
improvements in computation speed. The iterative scheme presented in SSFEstimator is
inherently numerically stable which allows it to handle stiff systems of equations with ease
while permitting reliable calculation of first- and second-order gradients. Furthermore, the
ability to calculate gradients enables local statistical analysis of inferred kinetic parameters.
118
The poor resolution of elementary kinetic parameters arises from the propagation of
relative uncertainty in the estimation of enzyme fractions. While enzyme fractions were
generally poorly resolved in the three test models, 85% of the enzyme fractions in k-
ecoli307 were resolved with a standard deviation of less than 0.1 mol/mol-total enzyme. It
is important to note that this uncertainty calculation is based on local linear statistics and
thus, places an upper bound on uncertainty estimates. In order to obtain better estimates
for uncertainty, accurate confidence intervals (Antoniewicz et al., 2006) will have to be
constructed that take account for the nonlinear relationships between elementary fluxes,
enzyme fractions, metabolite concentrations, and kinetic parameters. Accurate confidence
intervals will provide insights into resolvability of kinetic parameters for the set of
experimental data and enable the identification of informative mutants (Zomorrodi et al.,
2013) and design of experiments (Banga and Balsa-Canto, 2008) to pin down the poorly
resolved kinetic parameters. Furthermore, using accurate confidence intervals, additional
insights into reaction reversibility and importance of regulatory interactions can be
gleaned. Currently, the statistical significance of regulatory interactions can only be
evaluated using frameworks such as SIMMER (Hackett et al., 2016).
The applicability of K-FIT to large-scale models was demonstrated using an expanded
kinetic model of E. coli containing 307 reactions, 258 metabolites, and 2,407 kinetic
parameters, parameterized using fluxes elucidated using 13C-MFA. In order to avoid any
error propagation arising from flux projection from simpler models, a recently developed
two-step computational pipeline (Foster et al., 2019 (Under Review)) was used for kinetic
parameterization using 13C-labeling data. First, fluxes were elucidated for the expanded
119
model in the WT and six single gene-deletion strains using 13C-MFA. The elucidated
fluxes were then used to parameterize the kinetic model corresponding to the same
stoichiometric model. Although the expanded kinetic model recapitulated the fluxes better
than a core model for E. coli, product yield predictions in engineered strains did not differ
significantly compared to those predicted by the core model. This was traced back to a lack
of variability in fluxes through peripheral pathways across mutants due to growth coupling.
Since most amino acids are not catabolized by E. coli, reliable parameterization of these
pathways requires model expansion to amino acid pool turnover by protein synthesis and
degradation. Additional fluxomic and metabolomic data from overproducing strains will
also be required to capture the link between genetic perturbations and increased flux
through peripheral metabolism as the WT strain of E. coli does not secrete any amino acids
during the mid-exponential growth phase.
Overall, this procedure highlights the data-demanding nature of the kinetic
parameterization problem. Although kinetic parameterization was performed using only
steady-state flux data, steady-state metabolite concentration data can be used in the SSR
objective function. In all studies enzyme levels were assumed to remain the same in
mutants as in WT with the exception of enzymes that are associated with knock-out genes
which were obviously set to zero. Nevertheless, K-FIT allows for enzyme levels for the
mutants to be pre-specified if the information is known a priori. Ideally, one would want
to integrate allosteric with transcriptional regulation so that the enzyme concentrations in
the mutant networks can be related to the altered metabolite concentrations (Fuhrer et al.,
2017). This would ultimately enable the integration of mutant network data generated
120
under both genetic and environmental perturbations and improve its predictive capabilities.
Furthermore, the local sensitivity of fluxes and metabolite concentrations with respect to
kinetic parameters directly map to elasticity coefficients used in metabolic control analysis.
They can thus be used to calculate flux and concentration control coefficients at minimal
additional cost to inform metabolic engineering strategies.
121
Table 4.1: Comparison of predicted product yields (mol/mol glucose) with
experimental yields in engineered over-producing strains of E. coli. The experimental
yields and predictions by k-ecoli457 were obtained from previously published data by
Khodayari and Maranas (Khodayari and Maranas, 2016)
Product
Perturbed
Enzyme
Predicted
Yield
Predicted Yield
(k-ecoli457)
Experimental
Yield
Acetate 0.1x RPI 0.93 0.2 0.75
L-Valine 0.1x THRD 0.03 0.02 0.34
Lactate 0x ACKr 1.4 1.11 1.13
Malate
0.3x PTA;
10x PPCK
0.16 0.84 0.15
Artemisinin 2x PDH 0.17 0.03 0.38
Naringenin 2xACCOAC 0.026 0.012 0.008
122
Figure 4.1: Overview of the core loop of the K-FIT algorithm
123
Figure 4.2: Flux distribution through central metabolism of the expanded model for E.
coli in (a) Δeda, (b) Δedd, (c) Δfbp, (d) Δzwf, (e) Δgnd, and (f) Δpgi mutant strains.
Reactions representing metabolite flows between central and peripheral metabolism are
indicated using green arrows. Fluxes elucidated using 13C-MFA are shown in green and
the corresponding flux prediction by the expanded kinetic model is shown in brown.
Reactions corresponding to the knocked-out genes in each mutant strain are indicated using
red arrows. Flux measurements for PFK and FBP were not fitted due to poor resolution
13C-MFA
(a)
G6PG6P
F6PF6P
FDPFDP
DHAPDHAP G3PG3P
6PG6PG
Ru5PRu5P
CO2CO2
3PG3PG
PEPPEP
PYRPYR
AcCOAAcCOA
CIT
ICT
AKG
SUC
COASUC
MAL
FUM
OAA
CO2CO2
CO2CO2
ACAC
E4PE4P R5PR5P
Xu5PXu5P
S7PS7P
KDPGKDPG
GLXGLX
GlcGlc
PTS
ZWF
PGI
GND
EDD
EDA
RPI
RPE
TKT
TKTTALTKT
TALPFK
FBP
FBA
TPI
GAPDH
/PGK
PGM/
ENO
PYK
PDH
CS
ACONT
IDH
OGDH
SUCOAS
SDH
FH
ME
PPC
PPCk
MDH
ICL
MALS
PTA/
ACK
7
102.4
75.9
735.5648.8
86.7
85.6
1.1
177
11.6
165.4
31.3
1
25.5
11.2
14.3
6.86.8
4.3
11.2
0
0
7.502.5
0.3
121.5
18.5
18.5
18
0.5
11
11
11.5
17.7
6.2
12.5
5.7
31
4.4
20.6
81.821.2
17.9
85.3
3.5
0.3
0.5
6.8
5.1
25.5
1.2
4
73
0.4
2.7
5.7
100
10.1 93.4
83.3
82
169.8
1.3
0.5
25.8
0
0
10.7
6.76.70
10.7
15.1
156.4
27.8
107.6
15.6
15.6
15
7
7
7.6
14.3
14.9 0.6
0.6
0
22.9
0
6.7
13.4
20.2
8
6.7
22.2
61.3
3
71.3
25.8
8.4
30.1
124
(b)
G6PG6P
F6PF6P
FDPFDP
DHAPDHAP G3PG3P
6PG6PG
Ru5PRu5P
CO2CO2
3PG3PG
PEPPEP
PYRPYR
AcCOAAcCOA
CIT
ICT
AKG
SUC
COASUC
MAL
FUM
OAA
CO2CO2
CO2CO2
ACAC
E4PE4P R5PR5P
Xu5PXu5P
S7PS7P
KDPGKDPG
GLXGLX
GlcGlc
PTS
ZWF
PGI
GND
EDD
EDA
RPI
RPE
TKT
TKTTALTKT
TALPFK
FBP
FBA
TPI
GAPDH
/PGK
PGM/
ENO
PYK
PDH
CS
ACONT
IDH
OGDH
SUCOAS
SDH
FH
ME
PPC
PPCk
MDH
ICL
MALS
PTA/
ACK
6.8
91.3
66
79.93.7
76.2
75
1.2
155.6
12.6
143
26.1
1
24.3
10.5
13.8
6.46.4
4
10.5
0
0
7.402.4
0.3
100.7
14.4
14.4
14.3
0.1
7.5
7.5
7.6
13.5
5.9
13.6
0
20.5
0
19.7
65.520.7
16.7
68.9
3.4
0.2
0.1
6.4
5.1
24.3
1.2
4
73
0.4
2.7
5.7
100
10.1 93.4
83.3
82
169.8
1.3
0.5
25.8
0
0
10.7
6.76.70
10.7
15.1
156.4
27.8
107.6
15.6
15.6
15
7
7
7.6
14.3
14.9 0.6
0.6
0
22.9
0
6.7
13.4
20.2
8
6.7
22.2
61.3
3
71.3
25.8
8.4
30.1
125
(c)
G6PG6P
F6PF6P
FDPFDP
DHAPDHAP G3PG3P
6PG6PG
Ru5PRu5P
CO2CO2
3PG3PG
PEPPEP
PYRPYR
AcCOAAcCOA
CIT
ICT
AKG
SUC
COASUC
MAL
FUM
OAA
CO2CO2
CO2CO2
ACAC
E4PE4P R5PR5P
Xu5PXu5P
S7PS7P
KDPGKDPG
GLXGLX
GlcGlc
PTS
ZWF
PGI
GND
EDD
EDA
RPI
RPE
TKT
TKTTALTKT
TALPFK
FBP
FBA
TPI
GAPDH
/PGK
PGM/
ENO
PYK
PDH
CS
ACONT
IDH
OGDH
SUCOAS
SDH
FH
ME
PPC
PPCk
MDH
ICL
MALS
PTA/
ACK
7
91.2
66.7
75.40
75.4
74.3
1.2
154.5
11.9
142.6
25.9
1
24.3
9.1
13.8
5.75.7
3.3
9.1
1.2
1.2
8.102.4
0.2
100.3
12.3
12.3
12
0.3
5
5
5.3
11.4
6.1
11.7
0
20.4
0
19.8
66.421.3
16.8
69.9
3.5
0.3
0.3
5.7
5.1
23.5
1.3
3.5
73
0.3
2.7
5.6
100
0 82.4
82.4
81
168.8
1.4
0.4
25.7
1.5
1.5
9.7
6.26.20
9.7
15.1
155.5
27
107.9
16
16
15.2
7.3
7.3
8.1
14.7
15.5 0.8
0.8
0
22.9
0.2
6.2
13.3
19.1
7.9
6.6
22.4
66.6
3
70.6
25.7
8.9
24.5
126
(d)
G6PG6P
F6PF6P
FDPFDP
DHAPDHAP G3PG3P
6PG6PG
Ru5PRu5P
CO2CO2
3PG3PG
PEPPEP
PYRPYR
AcCOAAcCOA
CIT
ICT
AKG
SUC
COASUC
MAL
FUM
OAA
CO2CO2
CO2CO2
ACAC
E4PE4P R5PR5P
Xu5PXu5P
S7PS7P
KDPGKDPG
GLXGLX
GlcGlc
PTS
ZWF
PGI
GND
EDD
EDA
RPI
RPE
TKT
TKTTALTKT
TALPFK
FBP
FBA
TPI
GAPDH
/PGK
PGM/
ENO
PYK
PDH
CS
ACONT
IDH
OGDH
SUCOAS
SDH
FH
ME
PPC
PPCk
MDH
ICL
MALS
PTA/
ACK
8.3
103
102.3
419324
95
93.7
1.3
184.2
15.1
169.1
34.8
0.7
0
6.9
6.9
22
4.9
6.9
0
0
8.902.9
0.4
118.1
16.6
16.6
16.3
0.3
8
4
8.3
15.6
7.3
15.8
0
25.9
1.2
23.9
76.324.9
19.7
80.4
4.1
0.1
0.3
2
6.6
0
1.2
4.4
98.8
0
2.5
1.9
100
3.2 95.3
92.1
91
179
1.1
0.2
0
0
0
6.2
1.91.90
6.2
6.2
165.7
36
118.4
18
18
17.7
9.7
6.5
10.1
17
15.5 0.3
0.3
1.9
24.9
0.3
1.9
13.3
19.5
8
6.9
22.1
77.2
3.5
80.7
0
8.1
22.9
127
(e)
G6PG6P
F6PF6P
FDPFDP
DHAPDHAP G3PG3P
6PG6PG
Ru5PRu5P
CO2CO2
3PG3PG
PEPPEP
PYRPYR
AcCOAAcCOA
CIT
ICT
AKG
SUC
COASUC
MAL
FUM
OAA
CO2CO2
CO2CO2
ACAC
E4PE4P R5PR5P
Xu5PXu5P
S7PS7P
KDPGKDPG
GLXGLX
GlcGlc
PTS
ZWF
PGI
GND
EDD
EDA
RPI
RPE
TKT
TKTTALTKT
TALPFK
FBP
FBA
TPI
GAPDH
/PGK
PGM/
ENO
PYK
PDH
CS
ACONT
IDH
OGDH
SUCOAS
SDH
FH
ME
PPC
PPCk
MDH
ICL
MALS
PTA/
ACK
6.2
94.5
65.4
567507.1
59.9
58.9
1
143.6
11.5
132.1
13.3
0.9
0
5.2
5.2
1.51.5
3.7
5.2
28.2
28.2
6.702.2
0.2
121.8
17.3
17.3
17.1
0.2
10.9
10.9
11.1
16.5
5.4
16.7
0
24.4
5.7
18.1
85.418.9
14.2
88
2.6
0.3
0.2
1.5
5.6
28.2
1.2
3.5
64.7
0.5
1.9
3.9
90.8
3 62.1
59.1
58.2
139.1
0.9
0.4
0
24.9
24.9
5.1
1.61.60
5.1
5.1
129.3
14.9
114.8
18.3
18.3
18
10.8
10.8
11.1
16.4
15.8 0.3
0.3
0.9
21.4
1.7
1.6
9.8
16.7
7.2
5.3
17.2
77.5
2.5
80
24.9
6.7
18.7
128
(f)
G6PG6P
F6PF6P
FDPFDP
DHAPDHAP G3PG3P
6PG6PG
Ru5PRu5P
CO2CO2
3PG3PG
PEPPEP
PYRPYR
AcCOAAcCOA
CIT
ICT
AKG
SUC
COASUC
MAL
FUM
OAA
CO2CO2
CO2CO2
ACAC
E4PE4P R5PR5P
Xu5PXu5P
S7PS7P
KDPGKDPG
GLXGLX
GlcGlc
PTS
ZWF
PGI
GND
EDD
EDA
RPI
RPE
TKT
TKTTALTKT
TALPFK
FBP
FBA
TPI
GAPDH
/PGK
PGM/
ENO
PYK
PDH
CS
ACONT
IDH
OGDH
SUCOAS
SDH
FH
ME
PPC
PPCk
MDH
ICL
MALS
PTA/
ACK
2.1
25
0
12.90
12.9
12.5
0.4
33.6
4.6
29
0.1
0.3
22.7
13
9.6
77
6
13
2
2
2.601
0.2
22.3
8.8
8.8
2.1
6.7
0
0
6.3
8.5
2.2
14.2
1
3.7
1.8
7.3
0.87.6
6.3
0.4
1.2
0.1
6.7
7
2
24.7
1.2
6
0
0
1
1.9
25.9
0 13
13
12.7
34
0.3
0.3
22.6
2
2
13
770
13
9.6
29.6
0
22.5
9.1
9.1
2.8
0
0
6.5
8.2
14.5 6.3
6.3
0
3.6
1.8
7
3.4
5.4
2.8
1.7
6.6
0.6
2
1.4
24.7
2.6
7.4
129
Figure 4.3: Uncertainty in estimation of Michaelis-Menten kinetic parameters (Km and
Vmax) in k-ecoli307. The width of the confidence interval refers to the standard deviation
of the estimated kinetic parameter determined from the Covariance matrix.
130
Figure 4.4: Overview of the K-FIT algorithm showing the flow of information between
various components.
131
Figure 4.5: Test models used to benchmark the performance of K-FIT against GA-
based EM procedure. (a) Small model containing 14 reactions and 11 metabolites. (b)
Medium-sized model containing 33 reactions and 28 metabolites. (c) Core model
containing 108 reactions and 65 metabolites. Reactions knocked out in the single gene-
deletion mutants are indicated using a red X.
(a)
132
(b)
133
(c)
134
Figure 4.6: Uncertainty in estimation of (a) elementary kinetic parameters and (b) WT
enzyme fractions in k-ecoli307. Width of the confidence interval refers to the standard
deviation computed from the Covariance matrix.
(a)
135
(b)
Chapter 5
Summary and future work
5.1. Summary
This thesis introduces three important tools that enable to construction and deployment of
large-scale predictive kinetic models of metabolism. Identification of kinetic parameters
for a reaction requires the knowledge of (a) flux through the reaction under different
conditions, and (b) concentration of the various species involved in the reaction (reactants,
products, activators, and inhibitors). Since in vivo fluxes are not directly measurable,
indirect approaches must be applied. The most reliable approach involves tracing carbons,
hydrogens, and oxygens from a nutrient source (usually a carbon source such as glucose in
heterotrophs or CO2 in photoautotrophs) to various intracellular metabolites using a stable
isotope such as 13C, 2H, and 18O. Pathway-specific bond breaks and bond formations alter
the labeling distributions of downstream metabolites and the relative contribution of
various pathways can be estimated using nonlinear regression techniques. This technique
was initially applied to small network models comprising of central metabolism only due
to the high computational cost associated with large-scale models, the assumed
intractability of existing modeling frameworks (Choi and Antoniewicz, 2019), the assumed
sufficiency of core metabolic models, and limited availability of reaction atom mapping
information for peripheral metabolic pathways.
In order to elucidate fluxes at the genome-scale in E. coli, first, an atom mapping model,
imEco726, providing a comprehensive inventory of carbon paths was constructed using
137
the CLCA algorithm (Kumar and Maranas, 2014) and manually curated. Tractability of the
EMU algorithm for the genome-scale model was confirmed based on the fact that a ten-
fold increase in the number of reactions only resulted in a five-fold increase in the number
of EMUs (Gopalakrishnan and Maranas, 2015a). The major computational bottleneck was
identified to be the construction of accurate confidence intervals for the estimated fluxes.
Since the computation of confidence intervals for a single flux in imEco726 takes as much
as 30 minutes, the total computation time takes as much as 10 days on an HPC cluster. In
order to reduce this computation time, an efficient algorithm identifying the minimum set
of fluxes for which confidence intervals must be computed is developed that leverages the
topological features of the stoichiometric network. This algorithm identifies all the
reactions that are resolved using 13C data and are not coupled to an external flux
measurement. The number of fluxes for which confidence intervals must be constructed is
reduced by nearly 75% allowing all confidence intervals to be completed in just 3 days.
The confidence intervals for the remaining fluxes are computed using FVA (Mahadevan
and Schilling, 2003). The tools developed as a part of this study enabled the assessment of
the caveats associated with the practice of projecting fluxes elucidated using a core model
onto larger networks for downstream applications such as Optforce (Ranganathan et al.,
2010) and kinetic parameterization. Loss of feasible solutions associated with
simplifications in the core model propagated to the GSM model upon flux projection
resulting in an average 56% reduction in the width of confidence intervals for 90% of all
reactions in the GSM model. This propagation of simplifications reveals the dangers
associated with flux projection and reaffirms the need for direct flux elucidation using
expanded and comprehensive metabolic and mapping models.
138
For isotopic instationary MFA, the computational bottleneck was identified to be the
simulation of metabolite labeling dynamics. To this end, the existing exponential
integration scheme was improved upon by deriving analytical update formulae for the
transition matrices as opposed to numerical computation to accelerate the simulation of
labeling dynamics while decreasing memory requirements. This enabled both the
simulation of metabolite labeling dynamics as well as forward sensitivity analysis using
larger networks to predict metabolite labeling distributions and sensitivity to intracellular
fluxes at various time points. The use of this new algorithm decreased the time required for
ODE integration by as much as 48%. The improved algorithm was deployed to elucidate
fluxes in Synechocystis PCC 6803 under photoautotrophic growth conditions. Flux
elucidation revealed three key insights. First, carbon flux distribution in Synechocystis
supported maximum carbon routing towards biomass with minimal routing towards
byproducts such as organic acids and glycogen. Second, Synechocystis is unable to recycle
fixed CO2 that is oxidized in anabolic reactions, and must therefore, rely on bifurcated
pathway topologies in the TCA cycle and serine metabolism to minimize loss of fixed
carbon. Finally, Synechocystis employs an unconventional pathway for regeneration of
pentose phosphates in the CBB cycle using the TAL bypass as an alternative to FBPase.
Having developed the tools for reliable flux elucidation at the genome-scale, the next
requirement for the construction of predictive models of metabolism is an efficient
algorithm for the identification of kinetic parameters corresponding to all enzyme-
catalyzed reactions. In response to the long computation times that preclude any follow-up
statistical inference of estimated kinetic parameters, a novel decomposition-based kinetic
139
parameterization algorithm, K-FIT is developed. Using a two-pronged strategy, K-FIT
achieves a 1,000-fold speed-up in parameterization. First, K-FIT bypasses numerical
integration for elucidation of steady-state metabolite concentrations by solving a system of
bilinear algebraic equations using a fixed point iteration scheme to iterate between two
smaller linearized sub-problems until steady-state is found. The computed steady-state
concentrations are then used to evaluate steady-state fluxes and local sensitivities of steady-
state fluxes to kinetic parameters. Using this information, the lack-of-fit from experimental
data and the first- and second-order gradients are computed to indirectly update kinetic
parameters. By traversing the feasible kinetic parameter space using steps informed by
gradient information, an optimal solution is usually found within 500 iterations,
contributing the computation speed-up relative to the currently used meta-heuristic
methods such as GA or particle-swarm optimization. The applicability of the K-FIT
algorithm to large-scale kinetic models was then demonstrated by parameterizing a near-
genome-scale kinetic model for E. coli, k-ecoli307, with fluxes elucidated using 13C-MFA
in six single gene-deletion mutants.
5.2. Completed and ongoing research
A key requirement for flux elucidation using 13C-MFA is the construction of curated
genome-scale carbon mapping models. As a part of this thesis, mapping models for two
model organisms were constructed: imEco726 for E. coli and imSyn617 for Synechocystis
PCC 6803. imSyn617 served as the template for the construction of imSyu593, the
mapping model for the fast-growing cyanobacterium Synechococcus elongatus UTEX
2973 (Hendry et al., 2019). imSyu593 built upon imSyn617 by adding the phosphoketolase
140
pathway and reactions from calomide biosynthesis allowing E4P recycling from peripheral
metabolism. Flux elucidation revealed that Synechococcus favored the use of the
phosphoketolase pathway over pyruvate dehydrogenase for acetyl-CoA production. In
addition to this, bifurcated topology in serine metabolism was not observed and the
photorespiratory pathway was complete due to the ability to re-fix carbons oxidized in
anabolic metabolism. The ability to reincorporate oxidized carbons minimized carbon loss
in the form of CO2 and allowed Synechococcus to achieve a near-perfect routing of all
carbons towards biomass production. This, in conjunction with faster CO2 uptake and
higher light tolerance facilitated faster growth in Synechococcus compared to
Synechocystis.
The ability to trace carbons through peripheral metabolism opens up the possibility of
including metabolite labeling distributions from peripheral metabolism to infer flux
distributions within central metabolism. These additional measurements include labeling
distributions of ATP, ADP, AMP, NADP, Coenzyme-A, glucosamine, and N-
acetylglucosamine. Some of these metabolites have already been used for flux elucidation
using an expanded metabolic network for E. coli (McCloskey et al., 2016a) to improve the
precision of flux estimates and resolve exchange fluxes. Currently, carbon mapping
information from imEco726 to elucidate fluxes in cellobiose-grown Clostridium
thermocellum using a combination of amino acid labeling data measured using GC-MS
and LC-MS-derived labeling distributions for central metabolites, ATP, coenzyme-A, and
N-acetyl-glucosamine.
141
More recently, flux elucidation using 13C-MFA and kinetic parameterization using K-FIT
have been combined into a streamlined computational pipeline for the construction of a
core kinetic model for E. coli (k-ecoli74) using amino acid labeling distributions measured
from seven single gene-deletion mutants from upper glycolysis (Foster et al., 2019 (Under
Review)). Kinetic parameterization was carried out in two stages. First, fluxes and
confidence intervals for all reactions from central metabolism in E. coli at isotopic steady-
state were elucidated using the techniques and mapping model established in Chapter 2.
Following this, kinetic parameterization of the same stoichiometric model was performed
using K-FIT as described in Chapter 4 of this thesis. In addition to constructing a predictive
kinetic model for central metabolism in E. coli, this study also assessed the adverse effects
of flux projection on accuracy of kinetic parameterization and its predictive capabilities.
Finally, this study also demonstrated that accurate in silico emulation and training of
kinetic model using data derived from conditions reflecting the growth conditions of
engineered strains contributes to better agreement of model predictions with
experimentally measured product yields in untrained genetic conditions.
5.3. Future directions
Being able to quickly parameterize kinetic models using K-FIT now opens up the
possibility of computing accurate confidence intervals within reasonable time. The current
practice with kinetic parameterization involves reporting the best solution with the lowest
SSR without performing any goodness-of-fit tests on the regressed parameters. Chapter 4
reports that the poor resolvability of kinetic parameters can be traced back to large relative
uncertainty corresponding to the predicted enzyme complex concentrations. Since the
142
accuracy of the calculated standard deviations hinges on the validity of the linearization
approximation (i.e., close to the optimum), it is important to construct accurate confidence
intervals that provides a clearer picture of resolvability of enzyme fractions and elementary
fluxes, which is required for designing meaningful experiments to resolve the unresolved
parameters. A meaningful next step would be the development of a framework that can
also report the variance of predicted fluxes in various mutant conditions given the
uncertainty in kinetic parameter estimation so that the model predictions are more
informative than a mere point estimate as is currently reported.
Currently, kinetic models assume that the total enzyme concentrations do not fluctuate
between different mutants. While this is generally true with the assessed gene-deletion
mutants, transcriptional changes have been widely reported in environmental perturbations
(Fuhrer et al., 2017). To this end, it would be of interest to construct a statistical model that
relates the total enzyme abundance to intracellular metabolite concentrations and global
transcription regulators that can capture the transcriptional differences arising from
changes to environmental conditions. Currently, the constructed kinetic models have good
predictive capabilities only in the growth conditions in which they are trained. The ability
to capture proteomic fluctuations will extend the predictive capabilities of trained kinetic
models to other growth conditions such as late-exponential growth phase, stationary phase
as well as anaerobic metabolism which are of interest for industrial production of valuable
chemicals such as succinate and 23-butanediol. In addition to improving predictions, the
ability to predict proteomic fluctuations will enable kinetic models to reaffirm the enzyme
cost-minimization hypothesis (Noor et al., 2016) that establishes a link between protein
143
cost and thermodynamics and can be an important factor in determining pathway usage in
engineered organisms.
The state of the art in computational strain design is the k-OptForce algorithm which
combines a kinetic description of central metabolism with a stoichiometric description of
peripheral metabolism. The separation of kinetic and stoichiometric description is
implemented due to limited availability of kinetic descriptions for peripheral metabolism.
As a result of this, regulatory interactions within peripheral metabolism such feedback
inhibitions in fatty acid biosynthesis, shikimate pathway, and purine biosynthesis are not
modeled by k-OptForce. Expanding the scope of the component kinetic model to include
regulatory interactions from peripheral metabolism will also expand the repertoire of
meaningful interventions that can be identified by k-OptForce.
Appendix A
Flux elucidation at isotopic steady-state
A.1. Predicting labeling patterns
Decomposing the network using the EMU algorithm provides an exhaustive list of
metabolite fragments and reactions involved in predicting the labeling pattern of target
metabolite fragments for a given tracer input and flux distribution. The mass balance for a
reaction within the EMU network at isotopic and metabolic steady state shown below is
described as:
∑𝑣𝑖𝑀1,2𝑖
𝑖
− (∑𝑣𝑖
𝑖
)𝑀1,2 = 0
(1)
If M3 is a substrate to the network, then the above equation can be re-written as,
∑ 𝑣𝑖𝑀1,2𝑖
𝑖=1,2
− (∑𝑣𝑖
𝑖
)𝑀1,2 = −𝑣3𝑀1,23
(2)
Based on equations (1) and (2), the mass balance for all the reactions of a particular EMU
size can be expressed as:
145
𝑨𝑿 = 𝑩𝒀 (3)
Where, X and Y represent the vectors of balanced and input EMUs respectively, and, A,
and B are the corresponding coefficient matrices, which are functions of fluxes. Since A is
a square matrix, X can be solved by inversion of A and multiplying it with the r.h.s. of
equation (3). The set of target metabolite fragments, x, is a subset of X, and their
corresponding mass isotopomer distributions (MIDs) can be obtained by solving equation
(3). The MIDs estimated above need to be corrected for uncorrected pool dilutions, and
additional label dilution arising from sparged CO2 (Leighty and Antoniewicz, 2012, 2013).
A.2. Least-Squares NLP
min𝑣
∑(𝑥(𝑣)𝑖
𝑝 − 𝑥𝑖𝑚
𝜎𝑖)
2𝑁
𝑖=1
s.t. 𝑺. 𝒗 = 0
𝑣𝑗𝐿𝐵 ≤ 𝑣𝑗 ≤ 𝑣𝑗
𝑈𝐵
In the above formulation,
𝑥(𝑣)𝑖𝑝 is the predicted labeling pattern of fragment I for a given flux distribution, v.
𝑥𝑖𝑚 is the experimentally measured labeling pattern for fragment i.
𝜎𝑖 is the standard error of measurement for fragment i.
S is the stoichiometry matrix.
𝑣𝑗𝐿𝐵is the lower bound on flux vj
𝑣𝑗𝑈𝐵is the upper bound on flux vj.
146
A.3. Implementation
The equality constraint in the above formulation can be eliminated by expressing the vector
of fluxes, 𝒗, in terms of the set of free fluxes, 𝒖, using:
𝒗 = 𝑵.𝒖
Where, 𝑵 is the rational basis for the null space of 𝑺.
Since a non-negativity constraint is imposed on all fluxes, the equality constraint can be
replaced with the inequality constraint:
𝑵.𝒖 ≥ 0
If bounds for 𝒗 are available, the above inequality constraint can be modified to account
for the lower and upper bounds, 𝒗𝐿𝐵, and 𝒗𝑈𝐵:
𝑵.𝒖 ≥ 𝒗𝐿𝐵
𝑵.𝒖 ≤ 𝒗𝑈𝐵
This transforms the NLP problem to:
min𝑢
∑(𝑥(𝒖)𝑖
𝑝 − 𝑥𝑖𝑚
𝜎𝑖)
2𝑁
𝑖=1
s.t. 𝑵.𝒖 ≥ 𝒗𝐿𝐵
𝑵.𝒖 ≤ 𝒗𝑈𝐵
The above minimization problem can be solved using the fmincon function within the
Optimization Toolbox in MATLABTM. Among the different algorithm options, the
interior-point algorithm accepts a user-supplied Hessian, which can be computed using the
147
first-order Taylor Series expansion(Antoniewicz et al., 2006) of the objective function to
yield:
𝐻 = (𝑑𝒙
𝑑𝒖)𝑇
𝑊−1 (𝑑𝒙
𝑑𝒖)
Where W is the covariance matrix serving as a weighting matrix for the least-squares
minimization problem.
The derivative of the estimated MIDs with respect to the free fluxes can be estimated using
the following equation.
(𝑑𝒙
𝑑𝒖)𝑇
= (𝑑𝒙
𝑑𝒗)𝑇
(𝑑𝒗
𝑑𝒖)
(𝑑𝒙
𝑑𝒖)𝑇
= (𝑑𝒙
𝑑𝒗)𝑇
𝑵
To obtain the derivative of MIDs with respect to all the fluxes within the network, we have
to differentiate equation (3), and rearrange it to obtain the following expression.
𝑑𝑿
𝑑𝒗= 𝑨−1 (
𝑑𝑩
𝑑𝒗𝒀 + 𝑩
𝑑𝒀
𝑑𝒗)
A.4. Estimation of confidence intervals
With the EMU network spanning a much smaller portion of the overall metabolic network,
the original procedure for estimation of confidence intervals was modified to enhance the
speed of estimation. While the range estimation procedure(Antoniewicz et al., 2006) of an
148
individual flux was not modified, the set of fluxes whose ranges need to be directly
determined was reduced based on flux coupling properties. The procedure is as follows:
Step 1: Define the initial set of fluxes (v) as the union of the sets of fluxes
involved in EMU balances and those corresponding to an extracellular
measured flux.
Step 2: Identify all fluxes coupled to an extracellular measurement and
eliminate these fluxes from v.
Step 3: For each flux, vi, within v, identify and eliminate fluxes, vj, that are
fully coupled to vi.
Step 4: Estimate the 95% confidence interval for all the fluxes remaining in v.
Step 5: Using the estimated confidence intervals as flux bounds, perform an
FVA to estimate all the other flux ranges within the metabolic network.
While performing FVA, only the net flux of reversible reactions was considered due to the
fact that exchange fluxes which are not involved in EMU balances cannot be resolved by
13C-MFA.
149
Appendix B
Flux elucidation procedure for isotopic instationary MFA
B.1. Least-squares NLP for flux and pool size estimation
Cellular growth with a 13C-labeled substrate results in the incorporation of labeled atoms
into various downstream metabolites causing the synthesis of molecules with different
masses based on the extent of 13C-incorporation. These mass shifts are quantified using
NMR spectroscopy or mass spectrometry (MS) following separation of metabolites using
chromatography (GC or LC). During MS, metabolites can be fragmented as a consequence
of electron impact thus providing information about labeling distributions of both the
complete metabolite as well as its fragments. These measured fragments (including a
partial or whole metabolite) are represented as mass-isotopomer distribution vectors
(MDVs) which are row vectors of the fractional abundance of molecules of various masses
according to their 13C labeling distribution. They are denoted as 𝒙𝑖𝑚𝑒𝑎𝑠 and have a
measurement variance 𝚺𝑖. A transient labeling experiment involves sampling metabolites
at various time points during the isotopic instationary period due to which the labeling
distributions depend on both flux distribution (v) at metabolic steady-state as well as
intracellular metabolite pool sizes (c) (Noh et al., 2006). The objective of 13C-MFA is to
identify a suitable flux distribution and pool sizes consistent with 𝒙𝑖𝑚𝑒𝑎𝑠. While the solution
to the forward problem of estimating labeling distribution with known fluxes and pool sizes
can be obtained easily, either by solving a system of algebraic equations in case of isotopic
steady-state or a system of ordinary differential equations (ODEs) under transient labeling
150
conditions, the inverse problem is nonlinear and non-convex. As a result of this, fluxes and
pool sizes at metabolic steady-state must be obtained as the solution of a variance-weighted
least-squares non-linear programming (NLP) problem that minimizes the sum of square of
residuals (SSRES) representing the sum of squared deviation of predicted metabolite
labeling distributions (𝒙𝑖𝑝𝑟𝑒𝑑
) from the corresponding experimental measurements. Note
that the procedure for estimating 𝒙𝑖𝑝𝑟𝑒𝑑
given a flux distribution and pool sizes is described
in the next subsection. In addition to labeling distributions, extracellular flux measurements
such as substrate uptake rate, growth rate, and product yields can also be measured
(corresponding to 𝒗𝑗𝑚𝑒𝑎𝑠) and can be included in SSRES.
min𝑣,𝑐
𝑆𝑆𝑅𝐸𝑆 = [∑(𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) − 𝒙𝑖
𝑚𝑒𝑎𝑠)
𝑃
𝑖=1
𝑾𝒊(𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) − 𝒙𝑖
𝑚𝑒𝑎𝑠)𝑇
+ ∑(𝑣𝑗
𝑝𝑟𝑒𝑑 − 𝑣𝑗𝑚𝑒𝑎𝑠
𝜎𝑗)
2𝑄
𝑗=1
]
𝑠. 𝑡. 𝑺. 𝒗 = 0
𝒗𝐿𝐵 ≤ 𝒗 ≤ 𝒗𝑈𝐵
𝒄 ≥ 0
The following quantities participate in formulation SSRES with 𝑛𝑣 fluxes and 𝑛𝑐
metabolite pool sizes:
P is the number of metabolite fragments whose labeling distribution is quantified by MS.
151
Q is the number of extracellular fluxes (substrate uptake, growth rate, product yields)
measured.
𝒗 is an [𝑛𝑣 × 1] vector of metabolic fluxes with reversible reactions decomposed into
separate forward and backward reactions, respectively.
𝒄 is an [𝑛𝑐 × 1]vector of pool sizes.
𝒙𝑖𝑚𝑒𝑎𝑠 is the [1 × (𝑘 + 1)] experimentally measured labeling distribution vector of
fragment I containing k carbons. 𝒙𝑖𝑚𝑒𝑎𝑠 contains (k+1) columns to account for the fact that
𝒙𝑖𝑚𝑒𝑎𝑠 can contain from zero to k labeled carbons.
𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) corresponds to the predicted labeling distribution vector of fragment i.
𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) has the same dimensions as 𝒙𝑖
𝑚𝑒𝑎𝑠. 𝒙𝑖𝑝𝑟𝑒𝑑(𝒗, 𝒄) is related implicitly to
intracellular fluxes v and pool sizes c and the procedure for calculating labeling
distributions for a given flux distribution and metabolite pool sizes is described in the next
subsection.
𝑾𝒊 is a [(𝑘 + 1) × (𝑘 + 1)] diagonal matrix of weights equal to 𝚺𝑖−1.
𝑣𝑗𝑚𝑒𝑎𝑠 corresponds to measured extracellular fluxes and product yields with standard
deviation 𝜎𝑗.
𝑣𝑗𝑝𝑟𝑒𝑑
are the predicted (calculated) extracellular fluxes.
S is the stoichiometry matrix.
152
𝒗𝐿𝐵 and 𝒗𝑈𝐵 denote the lower and upper bounds on fluxes 𝒗, respectively obtained using
FVA
The above NLP structure is similar to the NLP formulation used for steady-state MFA
(Antoniewicz et al., 2006; Gopalakrishnan and Maranas, 2015a). The equality constraints
in the above NLP (𝑺. 𝒗 = 0) can be transformed into inequality constraints capturing the
flux bounds (𝒗𝐿𝐵 and 𝒗𝑈𝐵) using a null-space projection representing 𝑛𝑣 fluxes v in terms
of 𝑛𝑢 free fluxes u (Wiechert et al., 1997). This enables a reduction in the number of
dimensions in the search space (Antoniewicz et al., 2006; Gopalakrishnan and Maranas,
2015a).
𝒗 = 𝑵.𝒖 (1)
N is an [𝑛𝑣 × 𝑛𝑢] matrix whose columns represent the basis for the null space of S derived
from the reduced row echelon form of S. The number of columns in N corresponds to the
numbers of degrees of freedom of the null space of S. As a consequence, the independent
variables comprised of 𝑛𝑢 free fluxes and 𝑛𝑐 pool sizes can be combined into an [𝑛𝑝 × 1]
vector of parameters p such that 𝒑 = [𝒖𝑻|𝒄𝑻]𝑇. The Least-squares NLP is now minimized
over 𝑛𝑝 parameters and can be re-written as:
min𝒑
𝑆𝑆𝑅𝐸𝑆 = [∑(𝒙𝑖𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑖
𝑚𝑒𝑎𝑠)
𝑃
𝑖=1
𝑾𝒊(𝒙𝑖𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑖
𝑚𝑒𝑎𝑠)𝑇
+ ∑(𝒗𝑗
𝑝𝑟𝑒𝑑(𝒑) − 𝒗𝑗𝑚𝑒𝑎𝑠
𝜎𝑗)
2𝑄
𝑗=1
]
153
𝑠. 𝑡. 𝑵.𝒖 ≥ 𝒗𝐿𝐵
𝑵.𝒖 ≤ 𝒗𝑈𝐵
𝒄 ≥ 0
In vector notation, SSRES is represented as:
𝑆𝑆𝑅𝐸𝑆 = (𝒙𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑚𝑒𝑎𝑠)𝑾(𝒙𝑝𝑟𝑒𝑑(𝒑) − 𝒙𝑚𝑒𝑎𝑠)𝑇
In the above equation, 𝒙𝑝𝑟𝑒𝑑(𝒑) and 𝒙𝑚𝑒𝑎𝑠(𝒑) are assembled as follows:
𝒙𝑝𝑟𝑒𝑑(𝒑) = [𝒙1𝑝𝑟𝑒𝑑(𝒑)|𝒙2
𝑝𝑟𝑒𝑑(𝒑)|… |𝒙𝑃𝑝𝑟𝑒𝑑(𝒑)|𝑣1
𝑝𝑟𝑒𝑑(𝒑)|𝑣2𝑝𝑟𝑒𝑑(𝒑)|… |𝑣𝑄
𝑝𝑟𝑒𝑑(𝒑)],
𝒙𝑚𝑒𝑎𝑠 = [𝒙1𝑚𝑒𝑎𝑠|𝒙2
𝑚𝑒𝑎𝑠| … |𝒙𝑃𝑚𝑒𝑎𝑠|𝑣1
𝑚𝑒𝑎𝑠|𝑣2𝑚𝑒𝑎𝑠| … |𝑣𝑄
𝑚𝑒𝑎𝑠],
W is the combined [𝑛𝑚 × 𝑛𝑚] diagonal matrix of weights equal to the inverse of the
variance associated with the 𝑛𝑚 measurements contained in 𝒙𝑚𝑒𝑎𝑠.
As described earlier (Antoniewicz et al., 2006), a first-order Taylor series expansion of
𝒙𝑝𝑟𝑒𝑑(𝒑) can be performed to obtain a quadratic approximation for SSRES so that the step
direction can be computed as described in Equation (2):
∆𝒑 = −𝑯−1𝑱 (2)
In the above equation, J is an [𝑛𝑝 × 1] vector representing the approximate gradient of
SSRES and H is an [𝑛𝑝 × 𝑛𝑝] matrix corresponding to the approximate Hessian of
154
SSRES. J and H are related to predicted values (𝒙𝑝𝑟𝑒𝑑) and their sensitivity to parameters
p (𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑) by Equations (3) and (4):
𝑱 = (𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑)𝑾(𝒙𝑝𝑟𝑒𝑑 − 𝒙𝑚𝑒𝑎𝑠)
𝑇
(3)
𝑯 = (𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑)𝑾(
𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑)
𝑇
(4)
Note that Equations (3) and (4) require the computation of the sensitivities 𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑 in
addition to 𝒙𝑝𝑟𝑒𝑑. 𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑 is an [𝑛𝑚 × 𝑛𝑝] corresponding to the sensitivities of P predicted
MDVs (𝒙𝑖𝑝𝑟𝑒𝑑(𝒑)) and Q extracellular fluxes (𝑣𝒊
𝒑𝒓𝒆𝒅) and is assembled as shown in
Equation (5)
𝜕𝒙𝑝𝑟𝑒𝑑
𝜕𝒑 = [
𝜕𝒙𝟏𝒑𝒓𝒆𝒅
𝜕𝒑|𝜕𝒙𝟐
𝒑𝒓𝒆𝒅
𝜕𝒑|… |
𝜕𝒙𝑷𝒑𝒓𝒆𝒅
𝜕𝒑|𝜕𝑣𝟏
𝒑𝒓𝒆𝒅
𝜕𝒑|𝜕𝑣𝟐
𝒑𝒓𝒆𝒅
𝜕𝒑|… |
𝜕𝑣𝑸𝒑𝒓𝒆𝒅
𝜕𝒑]
(5)
Under the imposed metabolic steady-state conditions, the sensitivity 𝑣𝒊𝒑𝒓𝒆𝒅
with respect to
parameters p is constant. Since kinetic parameters are not invoked in the INST-MFA
modeling framework, free fluxes u and pool sizes c are treated as independent fitted
parameters. Therefore, metabolic fluxes 𝒗 are insensitive to changes in pool sizes 𝒄. The
sensitivity of the calculated fluxes (𝑣𝒊𝒑𝒓𝒆𝒅
) with respect to parameters u and c is calculated
as:
155
𝜕𝑣𝒊𝒑𝒓𝒆𝒅
𝜕𝒄= 0
𝜕𝑣𝒊𝒑𝒓𝒆𝒅
𝜕𝒖= 𝑵𝑘
𝑇
𝑵𝑘 is a [1 × 𝑛𝑢] vector derived from the kth row of N relating the predicted flux 𝒗𝒊𝒑𝒓𝒆𝒅
to
the free fluxes u.
The aggregate parameter sensitivity matrix is assembled as follows:
𝜕𝑣𝒊𝒑𝒓𝒆𝒅
𝜕𝒑= [
𝜕𝑣𝒊𝒑𝒓𝒆𝒅
𝜕𝒖
𝑇
|𝜕𝑣𝒊
𝒑𝒓𝒆𝒅
𝜕𝒄
𝑇
]𝑇 (6)
𝜕𝑣𝒊𝒑𝒓𝒆𝒅
𝜕𝒑 has the dimensions [𝑛𝑝 × 1]. Unlike predicted fluxes, the sensitivity of the predicted
metabolite labeling distributions (𝜕𝒙𝒊
𝒑𝒓𝒆𝒅
𝜕𝒑) depends on the labeling dynamics due to
instationary isotopic conditions and must be co-estimated with metabolite labeling
distributions (𝒙𝒊𝒑𝒓𝒆𝒅
) as described in the next subsection.
B.2. Dynamic EMU balances and simulation of labeling distributions
𝒙𝑖𝑚𝑒𝑎𝑠 and 𝒙𝑖
𝑝𝑟𝑒𝑑 represent the measured and predicted 13C labeling distributions for a
subset of all the carbon atoms of a particular measured metabolite. A subset of atoms of
any metabolite is termed an Elementary Metabolite Unit (EMU) (Antoniewicz et al., 2007).
For example, an MS fragment of a three-carbon metabolite M comprised of carbons at
positions 2 and 3, 𝒙𝑖𝑚𝑒𝑎𝑠 encodes the measured MDV whereas 𝒙𝑖
𝑝𝑟𝑒𝑑 describes the MDV
156
of the corresponding EMU 𝑴2,3. The predicted labeling distribution of 𝑴2,3 depends on
the labeling distribution of all externally provided substrates. Therefore, the atoms
represented by EMU 𝑴2,3 must be traced back to each external substrate through all the
paths afforded by the carbon mapping model using algorithms such as EMU decomposition
(Antoniewicz et al., 2007). Any intracellular metabolite M can be produced by one of four
possible reaction types shown in Table 1. Consider the flux balance across EMU 𝑴2,3 of
size 2 (indicating number of carbons contained in the EMU) as shown in Figure 6 with
reactions 𝑣1, 𝑣2, 𝑣3, and 𝑣4 generating 𝑴2,3 from EMUs 𝑷2,3, 𝑸2,3, 𝑹2,3, and the
convolution of 𝑫2 and 𝑬1, respectively. Convolution of EMUs (Reaction type 4 in Table
1) arises from the formation of a bond between two EMUs (i.e., 𝑫2 and 𝑬1) of a smaller
size than 𝑴2,3. The corresponding MDV convolution is described by Equation (6). Note
that metabolite R (Reaction type 3) denotes an externally provided substrate.
𝑫𝟐 = [𝑎 (1 − 𝑎)]; 𝑬𝟏 = [𝑏 (1 − 𝑏)]
𝑫𝟐 ∗ 𝑬𝟏 = [𝑎 ∗ 𝑏 𝑎 ∗ (1 − 𝑏) + (1 − 𝑎) ∗ 𝑏 (1 − 𝑎) ∗ (1 − 𝑏)] (6)
Table B.1. Four types of reaction classes impacting EMU balances. Reaction v1
involves no rearrangement of the carbon skeleton of the reactant P. Reaction v2 involves
breaking of the C-C bond between carbons 3 and 4 of Q. Reaction v3 is an uptake reaction
for the external substrate R. Reaction v4 involves a bond formation between the second
carbon of D and the single-carbon E fiving rise to the convolution term described in
Equation (6).
157
Reaction Types Example
𝑣1: P (abc) → M (abc)
Enolase:
2PG → PEP
𝑣2: Q (abcd) → M (abc) + S1 (d)
Malic Enzyme:
Mal → Pyr + CO2
𝑣3: R (abc) → M (abc)
Glucose uptake via PTS:
Gluc + PEP → G6P + Pyr
𝑣4: D (ab) + E (c) → M (abc)
SHMT:
Gly + MEETHF → Ser + THF
158
Figure B.1. Flux balance for EMU 𝑴2,3. 𝑴2,3 is produced by four separate reactions
(one from each class) as described in Table B.1.
The labeling dynamics of EMU 𝑴2,3 can be expressed using the following relation:
𝐶𝑀
𝑑𝑴2,3
𝑑𝑡= [𝑣1 𝑣2 𝑣3 𝑣4 −(𝑣1 + 𝑣2 + 𝑣3 + 𝑣4)]
[
𝑷2,3
𝑸2,3
𝑹2,3
𝑫2 ∗ 𝑬1
𝑴2,3 ]
(7)
Here, CM denotes the pool size of metabolite M. 𝑹2,3 has a constant labeling distribution
since R is an externally supplied substrate (such as glucose or CO2). A characteristic feature
159
of the EMU method is that the labeling distributions of smaller-sized EMUs are unaffected
by the EMUs of larger-size (Antoniewicz et al., 2007). As a result, 𝑹2,3 and 𝑫2 ∗ 𝑬1 can
be separated from the remaining EMUs and Equation (7) as they are unaffected by the
labeling dynamics of the other size 2 EMUs:
𝐶𝑀
𝑑𝑴2,3
𝑑𝑡= [𝑣1 𝑣2 −(𝑣1 + 𝑣2 + 𝑣3 + 𝑣4)] [
𝑷2,3
𝑸2,3
𝑴2,3
]
+ [𝑣3 𝑣4] [𝑹2,3
𝑫2 ∗ 𝑬1]
(8)
Equation (8) reveals that 𝑴2,3 depends on 𝑷2,3 and 𝑸2,3 which must be traced back to
external substrates and smaller EMU convolutions in a similar fashion resulting in an EMU
network of size 2. This results in two sets of EMUs: 𝑿𝟐 containing 𝑷2,3, 𝑸2,3, and 𝑴2,3,
connected to each other by the coefficient matrix 𝑨𝟐=[𝑣1 𝑣2 −(𝑣1 + 𝑣2 + 𝑣3 + 𝑣4)],
and 𝒀𝟐 containing the EMUs and convolutions unaffected by 𝑿𝟐 and related to 𝑿𝟐 via the
coefficient matrix 𝑩𝟐 = [𝑣3 𝑣4]. Equation (8) can thus be re-expressed in a general form
as:
𝐶𝑀
𝑑𝑴2,3
𝑑𝑡= 𝑨𝟐𝑿𝟐 + 𝑩𝟐𝒀𝟐
(9)
Although unaffected by size 2 EMUs, 𝑫2 and 𝑬1are not of constant MDV as metabolites
D and E are not externally supplied substrates. Therefore, a size 1 EMU network must be
constructed in a similar manner as the size 2 network to trace back 𝑫2 and 𝑬1 to the
substrate R resulting in EMU balances similar to Equation (8). Since convolution terms
require at least two atoms, a size 1 network will have no such terms. A dynamic balance as
160
Equation (9) can be extended to all balanced metabolites in the EMU model of a particular
size, n and conforms to the following mathematical structure (Young et al., 2008):
𝑪𝒏
𝑑𝑿𝒏
𝑑𝑡= 𝑨𝒏𝑿𝒏 + 𝑩𝒏𝒀𝒏
(10)
In this mathematical description, n corresponds to the size of the EMU network. If k
intracellular metabolite EMUs, m extracellular substrate EMUs, and q convolution terms
are contained within the size n EMU network we have:
Xn is a [𝑘 × (𝑛 + 1)] matrix describing the labeling distribution of the k size-n EMUs
similar to 𝑿𝟐 in Equation (9)
Yn is an [(𝑚 + 𝑞) × (𝑛 + 1)] matrix encoding the labeling distribution of extracellular
substrate EMUs and convolution terms similar to 𝒀𝟐 in Equation (9).
An is an [𝑘 × 𝑘] matrix representing the connectivity between the k EMUs in the size n
network similar to 𝑨𝟐 in Equation (9).
Bn is an [𝑘 × (𝑚 + 𝑞)] matrix capturing the connectivity between the extracellular
substrate EMUs and convolution terms and the k EMUs in the size n network similar to 𝑩𝟐
in Equation (9). All the elements of An and Bn are linear functions of fluxes as shown in
Equation (8).
Cn is a [𝑘 × 𝑘] diagonal matrix capturing the pool sizes of the metabolites corresponding
to EMUs, Xn.
Equation (10) can be rewritten as:
𝑑𝑿𝒏
𝑑𝑡= 𝑭𝒏𝑿𝒏 + 𝑮𝒏
(11)
New parameters 𝑭𝒏 and 𝑮𝒏 are defined as:
161
𝑭𝒏 = 𝑪𝒏−𝟏𝑨𝒏
𝑮𝒏 = 𝑪𝒏−𝟏𝑩𝒏𝒀𝒏
(12)
Evaluation of the step direction described in Equations (2-4) for the least-squares NLP
problem (section 1.1) requires knowledge of the sensitivity of the predicted labeling
distributions with respect to parameters p (i.e., 𝜕𝒙𝒊
𝒑𝒓𝒆𝒅
𝜕𝒑) and can be obtained by
differentiating Equation (11) with respect to p such that:
𝑑
𝑑𝑡(𝜕𝑿𝒏
𝜕𝒑) = 𝑭𝒏
𝜕𝑿𝒏
𝜕𝒑+
𝜕𝑭𝒏
𝜕𝒑𝑿𝒏 +
𝜕𝑮𝒏
𝜕𝒑
(13)
The overall system is thus represented by the following system of Equations:
𝑑𝑿𝒏
𝑑𝑡= 𝑭𝒏𝑿𝒏 + 𝑮𝒏
𝑑
𝑑𝑡(𝜕𝑿𝒏
𝜕𝒑) = 𝑭𝒏
𝜕𝑿𝒏
𝜕𝒑+ 𝑯𝒏
(14)
(15)
𝑯𝒏 consists of all the terms unaffected by changes in 𝜕𝑿𝒏
𝜕𝒑 and can be expressed as:
𝑯𝒏 =𝜕𝑭𝒏
𝜕𝒑𝑿𝒏 +
𝜕𝑮𝒏
𝜕𝒑
For a system of equations with 𝑛𝑝 sensitivity parameters 𝜕𝑿𝒏
𝜕𝒑𝒊 are independent of
𝜕𝑿𝒏
𝜕𝒑𝒋 but
are dependent on 𝑿𝒏 (see Eq. 13). As a result of this, Equations (14) and (15) must be
solved simultaneously for labeling distributions and sensitivities to free fluxes and pool
sizes. At the start of the isotope labeling experiment, all atoms are assumed to have a 13C
enrichment equal to natural abundance of 13C. Solving equation (13) at the isotopic steady-
162
state conditions prior to the start of the labeling experiment confirms that 𝜕𝑿𝒏
𝜕𝒑 is zero at t =
0. Labeling distributions are sampled at various time points (𝑡1, 𝑡2, … , 𝑡𝑛) after the
introduction of the tracer. As a result of this, Equations (14) and (15) must be integrated
between the required time intervals ([𝑡1, 𝑡2], [𝑡2, 𝑡3], … , [𝑡𝑛−1, 𝑡𝑛]) to extract the relevant
𝒙𝒊𝒑𝒓𝒆𝒅
and the corresponding sensitivities 𝜕𝒙𝒊
𝒑𝒓𝒆𝒅
𝜕𝒑 at (𝑡1, 𝑡2, … , 𝑡𝑛).
The analytical solution to the above system of equations is not available due to the presence
of non-linear EMU convolution terms. An approximate analytical solution for 𝑿𝑛 at any
future time point (𝑡0 + ∆𝑡) as a function of the labeling distribution at a previous time point
𝑿𝑛(𝑡𝑜), 𝑭𝑛, 𝑮𝑛, and the time interval ∆𝑡 can be obtained by solving the system of ODEs
described by Equations (14) and (15) using the Integration Factor approach described
earlier (Young et al., 2008):
𝑿𝑛(𝑡0 + ∆𝑡) = 𝑒𝑭𝑛∆𝑡𝑿𝑛(𝑡𝑜) + ∫ 𝑒𝑭𝑛(∆𝑡−𝜏)𝑮𝑛(𝑡0 + 𝜏)𝑑𝜏
∆𝑡
0
(16)
𝜕𝑿𝑛
𝜕𝒑(𝑡0 + ∆𝑡) = 𝑒𝑭𝑛∆𝑡
𝜕𝑿𝑛
𝜕𝒑(𝑡𝑜) + ∫ 𝑒𝑭𝑛(∆𝑡−𝜏)𝑯𝑛(𝑡0 + 𝜏)𝑑𝜏
∆𝑡
0
(17)
The current state-of the art procedure by (Young et al., 2008) discretizes Equations (14)
and (15) using a non-causal first-order hold equivalent to compute the one-step solution
represented by Equations (18). Equations (21) depict the solution where the transition
matrices 𝚽𝒏, 𝚪𝒏, and 𝛀𝒏 are calculated using the following relation:
163
𝑿𝑛(𝑡0 + ∆𝑡) = 𝚽𝑛𝑿𝑛(𝑡𝑜) + 𝚪𝑛𝑮𝑛(𝑡0) + 𝛀𝑛(𝑮𝑛(𝑡0 + ∆𝑡) − 𝑮𝑛(𝑡0))
𝜕𝑿𝑛
𝜕𝒑(𝑡0 + ∆𝑡) = 𝚽𝑛
𝜕𝑿𝑛
𝜕𝒑(𝑡𝑜) + 𝚪𝑛𝑯𝑛(𝑡0)
+ 𝛀𝑛(𝑯𝑛(𝑡0 + ∆𝑡) − 𝑯𝑛(𝑡0))
(18)
[𝚽𝒏 𝚪𝒏 𝛀𝒏
𝟎 𝟎 𝟎𝟎 𝟎 𝟎
] = exp ([𝑭𝑛∆𝑡 𝐈∆t 𝟎
𝟎 𝟎 𝑰𝟎 𝟎 𝟎
])
(19)
B.3. An improved algorithm for simulating labeling dynamics and sensitivities
Here we propose a faster and memory-efficient approach to compute the transition matrices
by discretizing the partial analytical solution represented by equations (16) and (17) in
order to obtain analytical expressions for the transition matrices 𝚽𝒏, 𝚪𝒏, and 𝛀𝒏 in terms
of 𝑭𝑛. 𝑮𝑛 and 𝑯𝑛 are linearized in the interval [𝑡0, 𝑡0 + ∆𝑡] using the computable quantities
𝑮𝑛(𝑡0), 𝑮𝑛(𝑡0 + ∆𝑡), 𝑯𝑛(𝑡0), and 𝑯𝑛(𝑡0 + ∆𝑡) using a non-causal first-order hold
equivalent (Franklin et al., 1997) so that 𝑮𝑛(𝑡0 + 𝜏) and 𝑯𝑛(𝑡0 + 𝜏) at any time 𝑡0 + 𝜏
between 𝑡0 and 𝑡0 + ∆𝑡 can be expressed as:
𝑮𝑛(𝑡0 + 𝜏) = 𝑮𝑛(𝑡0) +𝜏
∆𝑡(𝑮𝑛(𝑡0 + ∆𝑡) − 𝑮𝑛(𝑡0)) (
20)
𝑯𝑛(𝑡0 + 𝜏) = 𝑯𝑛(𝑡0) +𝜏
∆𝑡(𝑯𝑛(𝑡0 + ∆𝑡) − 𝑯𝑛(𝑡0)) (
21)
Upon substituting Equations (20) and (21) in Equations (16) and (17) and integrating, we
get:
164
𝑿𝑛(𝑡0 + ∆𝑡) = 𝚽𝑛𝑿𝑛(𝑡𝑜) + 𝚪𝑛𝑮𝑛(𝑡0) + 𝛀𝑛(𝑮𝑛(𝑡0 + ∆𝑡) − 𝑮𝑛(𝑡0))
𝜕𝑿𝑛
𝜕𝒑(𝑡0 + ∆𝑡) = 𝚽𝑛
𝜕𝑿𝑛
𝜕𝒑(𝑡𝑜) + 𝚪𝑛𝑯𝑛(𝑡0)
+ 𝛀𝑛(𝑯𝑛(𝑡0 + ∆𝑡) − 𝑯𝑛(𝑡0))
(22)
Note that matrices 𝚽𝑛, 𝚪𝑛, and 𝛀𝑛 are recast as:
𝚽𝑛 = 𝑒𝑭𝑛∆𝑡
𝚪𝑛 = (𝑒𝑭𝑛∆𝑡 − 𝑰)𝑭𝑛−1
𝛀𝑛 = [(𝑒𝑭𝑛∆𝑡 − 𝑰)(𝑭𝑛∆𝑡)−1 − 𝑰]𝑭𝑛−1
(23)
Equation (22) represents a time-discretized form of the ODEs defined by Equations (14)
and (15). Matrix 𝚽𝒏 captures the non-linear coupling between labeling distributions,
fluxes and pool sizes. Since all the eigenvalues of 𝐅𝒏 are negative (Anderson, 1983), 𝒆𝑭𝑛𝑡
eventually vanishes implying that the product 𝚪𝑛𝐆𝑛 contains the labeling distributions at
isotopic steady-state. While the end result of both approaches is the same, the size of the
matrix for which the exponential is computed in Equation (23) is 1/9th the size of the
matrix in Equation (19). This size reduction reduces memory requirements while
accelerating the process of matrix exponential evaluation, which is the computational
bottleneck in this algorithm, thus improving scalability and enabling INST-MFA using a
genome-scale mapping model.
The matrix exponential can be approximated using different approaches such as Taylor
series expansion or Pade’s approximation (Moler and Van Loan, 2003). Because Pade’s
approximation is valid when the matrix elements are small, the matrix Fn must be first
rescaled. The matrix exponential is then evaluated as:
165
𝑒𝑭𝒏 = (𝑒(𝑭𝑛)/𝑠)𝑠
𝑠 = 2𝑞
(24)
(25)
Here q is a positive integer chosen such that the absolute maximum value of any element
in the matrix Fn/s is less than 0.5 (Golub and Loan, 1996). The exponential of Fn/s is
evaluated using Pade’s approximation, which is then squared q times to obtain the
exponential of Fn. Integration of ODEs described by equation (22) can be accomplished
using adaptive step-size and error control methods such as adaptive Runge-Kutta method
or Richardson’s extrapolation integrators (Press et al., 2007a).
166
Appendix C
Mathematical description of K-FIT
C.1. Overview of elementary reaction step decomposition
The complex catalytic mechanism of enzyme catalysis can be decomposed into a series of
elementary steps that are modeled using mass action kinetics. Each elementary step is
treated as reversible with one forward and one reverse elementary reaction. Each
elementary reaction is associated with one of three types of events: (i) binding of one
metabolite with an enzyme complex, (ii) release of one metabolite from an enzyme
complex, or (iii) conversion of the enzyme-reactant complex to the enzyme-product
complex. The flux through each elementary reaction is termed elementary flux and is
related to the concentration of metabolites and enzyme complexes using mass-action
kinetics. The following example details the decomposition of an enzyme-catalyzed
reaction into elementary steps and establishes the basic terms used in the kinetic
parameterization algorithm.
Consider the conversion of a metabolite A to B catalyzed by an enzyme E, regulated by a
non-competitive inhibitor C, and an activator D. The reaction mechanism can be
decomposed into six elementary steps as shown in Table 1. The set of elementary steps 𝐿
is defined as 𝐿 = {1,2,3,4,5,6}. Elementary steps 1, 2, and 3 describe the conversion of A
to B and are therefore termed catalytic elementary steps. Elementary steps 4 and 5 model
the inhibition of enzyme catalysis by metabolite C and step 6 denotes the activation of the
inactive enzyme complex for catalysis by metabolite D. Steps 4, 5, and 6 do not participate
167
in the reaction; instead they regulate enzyme function and are thus referred to as regulatory
elementary steps. The set of catalytic elementary steps, denoted by 𝐿𝑐𝑎𝑡, is defined here as
𝐿𝑐𝑎𝑡 = {1,2,3}. The corresponding set of regulatory elementary steps 𝐿𝑟𝑒𝑔 is defined as
𝐿𝑟𝑒𝑔 = {4,5,6}. From Table C.1, we see that the number of unique enzyme complexes
formed over the course of the reaction is equal to the number of elementary steps required
to model the catalytic and regulatory functions of the enzyme.
Table C.1. List of elementary steps describing the catalytic mechanism and regulation
of enzyme 𝐸
168
Type of
Elementary Step
Elementary
step #
Elementary Step Description of
elementary step
Catalytic Steps
1 𝐴 + 𝐸 �̂�1
⇌�̂�2
𝐸𝐴 Reactant binding to
free enzyme
2 𝐴𝐸 �̂�3
⇌�̂�4
𝐸𝐵 Conversion of
reactant to product
3 𝐸𝐵 �̂�5
⇌�̂�6
𝐸 + 𝐵 Product release from
bound complex
Regulatory
Steps
4 𝐶 + 𝐸 �̂�7
⇌�̂�8
𝐸𝐶 Inhibition of free
enzyme
5 𝐶 + 𝐸𝐴 �̂�9
⇌�̂�10
𝐸𝐴𝐶
Inhibition of
enzyme-substrate
complex
6 𝐷 + 𝐸∗ �̂�11
⇌�̂�12
𝐸
Activation of
inactive enzyme
form
169
Each elementary step is modeled to be reversible with two separate elementary reactions
in the forward and reverse directions. Thus, an enzyme-catalyzed reaction that decomposes
into 𝑛𝐿 elementary steps will involve 𝑛𝑃 = 2𝑛𝐿 elementary reactions. The index of any
elementary step 𝑙 ∈ 𝐿 is related to the corresponding indices of its forward and reverse
elementary reactions (𝑓𝑤𝑑 and 𝑟𝑒𝑣, respectively) as follows:
𝑓𝑤𝑑 = 2𝑙 − 1
𝑟𝑒𝑣 = 2𝑙
(1)
(2)
Based on this, the set of elementary reactions is defined as 𝑃 = {𝑝|𝑝 = 1,2, … ,2𝐿}. This
implies that there is a sequence of alternating forward and reverse elementary reactions
contained within set P. Each elementary step is associated with its own kinetic rate
constant �̂�𝑝∀𝑝 ∈ 𝑃. We define [𝐴], [𝐵], [𝐶], and [𝐷] to be the concentrations of metabolite
𝐴, 𝐵, 𝐶, and 𝐷, respectively, and [𝐸∗] [𝐸], [𝐸𝐴] and [𝐸𝐵] to denote the concentrations of
the un-activated enzyme 𝐸∗, active free enzyme 𝐸, substrate-bound complex 𝐸𝐴 and the
product-bound complex 𝐸𝐵, respectively. [𝐸𝐶] and [𝐸𝐴𝐶] denote concentrations of the
inhibitor-bound complexes 𝐸𝐶 and 𝐸𝐴𝐶, respectively. As stated earlier, flux through
elementary steps is referred to as elementary flux. For the example reaction, the elementary
flux through the twelve elementary steps, 𝑣𝑝 ∀𝑝 ∈ 𝑃 can be computed by expressing the
reaction rate of each elementary reaction using mass-action kinetics as described by Tran
et al (Tran et al., 2008) and is shown in Equations (3):
170
𝑣1 = �̂�1[𝐴][𝐸] 𝑣2 = �̂�2[𝐸𝐴]
(3)
𝑣3 = �̂�3[𝐸𝐴] 𝑣4 = �̂�4[𝐸𝐵]
𝑣5 = �̂�5[𝐸𝐵] 𝑣6 = �̂�6[𝐵][𝐸]
𝑣7 = �̂�7[𝐶][𝐸] 𝑣8 = �̂�8[𝐸𝐶]
𝑣9 = �̂�9[𝐶][𝐸𝐴]
𝑣11 = �̂�11[𝐷][𝐸∗]
𝑣10 = �̂�10[𝐸𝐴𝐶]
𝑣12 = �̂�12[𝐸]
Consistent with the convention introduced by Tran et al (Tran et al., 2008), the
concentration of metabolites 𝐴, 𝐵, 𝐶, and 𝐷 are normalized with respect to the
concentrations in the Wild-Type (WT) strain 𝐴𝑊𝑇, 𝐵𝑊𝑇, 𝐶𝑊𝑇, and 𝐷𝑊𝑇, respectively. The
corresponding relative concentrations 𝑎, 𝑏, 𝑐, and 𝑑 are defined as:
𝑎 = [𝐴]/[𝐴𝑊𝑇]
(4)
𝑏 = [𝐵]/[𝐵𝑊𝑇]
𝑐 = [𝐶]/[𝐶𝑊𝑇]
𝑑 = [𝐷]/[𝐷𝑊𝑇]
The total concentration [𝐸0] of the enzyme catalyzing the conversion of 𝐴 to 𝐵 is related
to the concentration of various enzyme forms/complexes as:
171
[𝐸0] = [𝐸] + [𝐸𝐴] + [𝐸𝐵] + [𝐸𝐶] + [𝐸𝐴𝐶] + [𝐸∗] (5)
Enzyme fractions are defined as the fractional abundance of each enzyme form relative to
the total enzyme [𝐸0].
𝑒 = [𝐸]/[𝐸0] 𝑒𝑐 = [𝐸𝐶]/[𝐸0]
(6) 𝑒𝑎 = [𝐸𝐴]/[𝐸0] 𝑒𝑎𝑐 = [𝐸𝐴𝐶]/[𝐸0]
𝑒𝑏 = [𝐸𝐵]/[𝐸0] 𝑒∗ = [𝐸∗]/𝐸0]
Metabolite and total enzyme concentrations in the WT strain are often unavailable and are
therefore, lumped together with kinetic rate constants yielding the following aggregated
kinetic parameters:
𝑘1 = �̂�1[𝐴𝑊𝑇][𝐸0] 𝑘2 = �̂�2[𝐸0]
(7)
𝑘3 = �̂�3[𝐸0] 𝑘4 = �̂�4[𝐸0]
𝑘5 = �̂�5[𝐸0] 𝑘6 = �̂�6[𝐵𝑊𝑇][𝐸0]
𝑘7 = �̂�7[𝐶𝑊𝑇][𝐸0] 𝑘8 = �̂�8[𝐸0]
𝑘9 = �̂�9[𝐶𝑊𝑇][𝐸0]
𝑘11 = �̂�11[𝐷𝑊𝑇][𝐸0]
𝑘10 = �̂�10[𝐸0]
𝑘12 = �̂�12[𝐸0]
172
Upon substituting the definitions from Equations (4), (6) and (7) in Equation (3)
expressions are derived for all fluxes as a function of aggregated kinetic parameters,
relative metabolite concentrations and fractional enzyme abundances:
𝑣1 = 𝑘1𝑎𝑒 𝑣2 = 𝑘2𝑒𝑎
(8)
𝑣3 = 𝑘3𝑒𝑎 𝑣4 = 𝑘4𝑒𝑏
𝑣5 = 𝑘5𝑒𝑏 𝑣6 = 𝑘6𝑏𝑒
𝑣7 = 𝑘7𝑐𝑒 𝑣8 = 𝑘8𝑒𝑐
𝑣9 = 𝑘9𝑐𝑒𝑎
𝑣11 = 𝑘11𝑑𝑒∗
𝑣10 = 𝑘10𝑒𝑎𝑐
𝑣12 = 𝑘12𝑒
Conservation of mass across all enzyme fractions at pseudo-steady-state yields the
following linear equalities:
𝑑𝑒
𝑑𝑡= 𝑣2 + 𝑣5 + 𝑣8 + 𝑣11 − 𝑣1 − 𝑣6 − 𝑣7 − 𝑣12 = 0 (9)
𝑑𝑒𝑎
𝑑𝑡= 𝑣1 + 𝑣4 + 𝑣10 − 𝑣2 − 𝑣3 − 𝑣9 = 0 (10)
𝑑𝑒𝑏
𝑑𝑡= 𝑣3 + 𝑣6 − 𝑣4 − 𝑣5 = 0 (11)
𝑑𝑒𝑐
𝑑𝑡= 𝑣7 − 𝑣8 = 0 (12)
173
𝑑𝑒𝑎𝑐
𝑑𝑡= 𝑣9 − 𝑣10 = 0 (13)
𝑑𝑒∗
𝑑𝑡= 𝑣12 − 𝑣11 (14)
Upon substituting the flux expressions from Equations (8) in Equations (9) - (14), an
[𝑛𝐿 × 𝑛𝐿] square system of linear algebraic equations with the enzyme fractions as the only
variables is obtained assuming that the relative metabolite concentrations (𝑎, 𝑏, 𝑐, and 𝑑)
and kinetic parameters (𝑘𝑝∀𝑝 ∈ 𝑃) are specified.
𝑑𝑒
𝑑𝑡= 𝑘2𝑒𝑎 + 𝑘5𝑒𝑎 + 𝑘8𝑒𝑐 + 𝑘11𝑑𝑒∗ − (𝑘1𝑎 + 𝑘6𝑏 + 𝑘7𝑐 + 𝑘12)𝑒 = 0 (15)
𝑑𝑒𝑎
𝑑𝑡= 𝑘1𝑎𝑒 + 𝑘4𝑒𝑏 + 𝑘10𝑒𝑎𝑐 − (𝑘2 + 𝑘3 + 𝑘9𝑐)𝑒𝑎 = 0 (16)
𝑑𝑒𝑏
𝑑𝑡= 𝑘3𝑒𝑎 + 𝑘6𝑏𝑒 − (𝑘4 + 𝑘5)𝑒𝑏 = 0 (17)
𝑑𝑒𝑐
𝑑𝑡= 𝑘7𝑐𝑒 − 𝑘8𝑒𝑐 = 0 (18)
𝑑𝑒𝑎𝑐
𝑑𝑡= 𝑘9𝑐𝑒𝑎 − 𝑘10𝑒𝑎𝑐 = 0 (19)
𝑑𝑒∗
𝑑𝑡= 𝑘12𝑒 − 𝑘11𝑑𝑒∗ = 0 (20)
Note that Equation (15) can be reconstituted as a linear combination of Equations (16) -
(20) because the free enzyme must be regenerated at the end of the catalytic cycle to
maintain steady-state. This results in a rank-deficiency in this system of equations that can
174
be rectified by appending Equation (21) which ensures that the total enzyme concentration
is maintained constant at metabolic steady-state. Equation (21) is obtained by substituting
Equations (6) in Equation (5):
𝑒 + 𝑒𝑎 + 𝑒𝑏 + 𝑒𝑐 + 𝑒𝑎𝑐 = 1 (21)
Equation (21) replaces Equation (15) resulting in an [𝑛𝐿 × 𝑛𝐿] system of equations of full-
rank for computing enzyme fractions given kinetic parameters and relative metabolite
concentrations. This means that given WT-normalized concentrations 𝑎, 𝑏, 𝑐, and 𝑑, and
kinetic parameters 𝑘𝑝∀𝑝 ∈ 𝑃, solving the system of linear equations yields a unique
assignment for the enzyme fractions 𝑒, 𝑒𝑎, 𝑒𝑏, 𝑒𝑐, 𝑒𝑎𝑐 and 𝑒∗. Fluxes through the
elementary reactions are computed by substituting the newly computed enzyme fractions
in Equations (8). Using the mapping of elementary flux indices to elementary step indices
described in Equations (1) and (2), the net flux through any elementary step 𝑙 =
{1,2,3,4,5,6} can be recovered as follows:
𝑣𝑙(𝑛𝑒𝑡)
= 𝑣2𝑙−1 − 𝑣2𝑙 (22)
The net flux through all the catalytic steps (𝑙 = {1,2,3}) is equal to the net flux through the
overall reaction 𝑉.
𝑣𝑙(𝑛𝑒𝑡)
= 𝑉 𝑙 = {1,2,3} (23)
175
From the steady-state conditions on the “dead-end” complexes formed via substrate-level
regulation (see Equations (12), (13), and (14)), it can be derived that the net flux through
the regulatory elementary steps is always equal to zero.
𝑣𝑙(𝑛𝑒𝑡)
= 0 𝑙 = {4,5,6} (24)
The automated calculation of the net flux through a reaction given elementary kinetic
parameters and relative metabolite concentrations is facilitated by deriving generalized
expressions for Equations (8) - (14) using the following quantities:
𝒗 is the [𝑛𝑃 × 1] vector of elementary fluxes whose elements 𝑣𝑝 denote the flux through
elementary reaction 𝑝 ∈ 𝑃
𝒆 is the [𝑛𝐿 × 1] vector of enzyme fractions whose elements 𝑒𝑙 represent the fractional
abundance of enzyme complex 𝑙 ∈ 𝐿
𝐼 = {𝑖|𝑖 = 1,2, … , 𝑛𝑀} is the set of all metabolites. In the above example 𝑛𝑀 = 4.
𝒔 is the [𝑛𝑀 × 1] matrix of relative metabolite concentrations whose elements 𝑠𝑖 represent
the fold-change in concentration of metabolite 𝑖 ∈ 𝐼 relative to WT.
𝑬 is the enzyme complex stoichiometry matrix of dimensions [𝑛𝐿 × 𝑛𝑃] whose elements
𝐸𝑙𝑝 represent the stoichiometric coefficient of enzyme complex 𝑙 ∈ 𝐿 in elementary
reaction 𝑝 ∈ 𝑃
𝑬 is defined as follows for the above example:
176
Note that all the elements in 𝑬 can assume only a value of -1, 0, or 1 and that 𝑬 has exactly
one negative and one positive entry per column. This is because, by definition, elementary
reactions operate on a single enzyme form (either metabolite-bound of free) which is
converted into another form but never destroyed. In contrast, the same enzyme form can
participate in multiple elementary reactions and there exists at least one elementary
reaction that consumes it (entry of -1) and at least one that produces it (i.e. entry of 1).
𝑺 is the metabolite stoichiometry matrix of dimensions [𝑛𝑀 × 𝑛𝑃] whose elements 𝑆𝑖𝑝
represent the stoichiometric coefficient of metabolite 𝑖 ∈ 𝐼 in elementary reaction 𝑝 ∈ 𝑃
𝑺 is defined as follows for the above example:
177
As was the case for matrix 𝑬, all elements of 𝑺 are equal to either -1, 0 or 1. In addition, 𝑺
has at most one non-zero entry per column. This is because an elementary reaction can
represent only a single binding, release, or catalysis event (Saa and Nielsen, 2017).
Catalytic elementary reactions do not involve metabolites, whereas binding and release
events either consume or produce a metabolite, respectively.
The flux 𝑣𝑝 through elementary reaction 𝑝 (𝑣𝑝) is related to the concentration of
metabolites and enzyme complexes using mass-action kinetics (Khodayari and Maranas,
2016):
𝑣𝑝 = 𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
(
∏ 𝑠𝑖
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0 )
∀𝑝 ∈ 𝑃 (25)
In Equation (25), the product operator in (∏ 𝑒𝑙
𝑙𝐸𝑙𝑝<0
) serves to identify the only reactant
enzyme complex participating in elementary reaction 𝑝. Recall that matrix E has a single
element equal to -1 per column. Likewise, the product operator in (∏ 𝑠𝑖
−𝑆𝑖𝑝𝑖
𝑆𝑖𝑝≤0) serves
to identify the only reactant metabolite (if any) in elementary reaction 𝑝. Recall that matrix
𝑺 has at most one non-zero element per column equal to -1 or 1. Therefore, elementary
reactions representing catalysis or product release do not involve a metabolite on the
reactant side thus yielding a zero exponent. Elementary reactions modeling binding of a
metabolite with an enzyme complex always involve a single reacting metabolite which
yield an exponent of 1 (negative of -1 stoich. coeff.). This implies that in Equation (25) the
178
exponent on the metabolite concentration is always equal to either 0 or 1. Equation (25)
thus captures either a linear relation between 𝑣𝑝 and 𝑒𝑙 when there is no participating
metabolite or a bilinear relation when a metabolite is a co-reactant in the elementary
reaction. The kinetic parameter 𝑘𝑝 for elementary reaction 𝑝 ∈ 𝑃 is a lumped parameter
expressed as the product of the kinetic rate constant �̂�𝑝, the total enzyme concentration 𝐸0,
and the metabolite concentration in the WT as described by Equation (7).
Conservation of mass across the 𝑙𝑡ℎ enzyme complex is mathematically represented as:
𝑑𝑒𝑙
𝑑𝑡= ∑ 𝐸𝑙𝑝𝑣𝑝
𝑛𝑃
𝑝=1
∀𝑝 ∈ 𝑃 (26)
At pseudo-steady-state Equation (26) simplifies to:
∑ 𝐸𝑙𝑝𝑣𝑝
𝑛𝑃
𝑝=1
= 0 ∀𝑝 ∈ 𝑃 (27)
The net flux through the 𝑙𝑡ℎ elementary step (𝑣𝑙(𝑛𝑒𝑡)
) is computed as the difference between
the flux through the corresponding forward and reverse elementary reactions as described
by Equation (22). The net flux through all catalytic elementary steps is equal to the net
overall flux through the reaction. As a convention, we assign the “net” flux through the last
catalytic elementary step as an index indicator of the net flux (𝑉) through the overall
reaction. This information is stored in the set 𝐿(𝑛𝑒𝑡) which is 𝐿(𝑛𝑒𝑡) = {3} for the above
example. This index mapping the last catalytic step to the net flux through the overall
179
reaction is accomplished using a [1 × 𝑛𝐿] indicator vector 𝑵 whose elements are as
follows:
𝑁𝑗 = {1, 𝑖𝑓 𝑙 ∈ 𝐿(𝑛𝑒𝑡)
0, Otherwise (28)
In reference to the above example, 𝑵 is a [1 × 6] vector defined as 𝑵 =
[0 0 1 0 0 0]. The net flux (𝑉) through the overall reaction is recovered by the
summation operator in Equation (29). Only a single term in the sum is non-zero.
𝑉 = ∑𝑁𝑙𝑣𝑙(𝑛𝑒𝑡)
𝑛𝐿
𝑙=1
(29)
Even though the above treatment refers to a reversible uni-molecular reaction with non-
competitive inhibition the same concepts can be generalized to any ordered or ping-pong
mechanism of enzyme catalysis involving 𝑛𝑠𝑢𝑏𝑠 substrates, 𝑛𝑝𝑑𝑡 products, activators,
competitive inhibitors and uncompetitive inhibitors. Examples of elementary step
decomposition for various reaction mechanisms is shown in Table 2. The above definitions
and concepts form the foundation for the K-FIT procedure for estimating kinetic
parameters given flux distributions.
180
Table C.2. Elementary step decomposition for various reactions
Reaction Reaction Mechanism Elementary Step
Decomposition
Example from
Central Metabolism
𝐴 ⇌ 𝐵 Uni-Uni
𝐸 + 𝐴 ⇌ 𝐸𝐴
𝐸𝐴 ⇌ 𝐸𝐵
𝐸𝐵 ⇌ 𝐸 + 𝐵
Phosphoglucose
isomerase
𝐴 ⇌ 𝐵 + 𝐶 Uni-Bi
𝐸 + 𝐴 ⇌ 𝐸𝐴
𝐸𝐴 ⇌ 𝐸𝐵𝐶
𝐸𝐵𝐶 ⇌ 𝐸𝐶 + 𝐵
𝐸𝐶 ⇌ 𝐸 + 𝐶
Fructose
bisphosphate
aldolase
𝐴 + 𝐵 ⇌ 𝐶 + 𝐷 Ordered Bi-Bi
𝐸 + 𝐴 ⇌ 𝐸𝐴
𝐸𝐴 + 𝐵 ⇌ 𝐸𝐴𝐵
𝐸𝐴𝐵 ⇌ 𝐸𝐶𝐷
𝐸𝐶𝐷 ⇌ 𝐸𝐷 + 𝐶
𝐸𝐷 ⇌ 𝐸 + 𝐷
Phosphoglycerate
kinase
𝐴 + 𝐵 ⇌ 𝐶 + 𝐷 Bi-substrate Ping-
Pong
𝐸 + 𝐴 ⇌ 𝐸𝐴
𝐸𝐴 ⇌ 𝐸𝐶
𝐸𝐶 ⇌ 𝐸∗ + 𝐶
𝐸∗ + 𝐵 ⇌ 𝐸∗𝐵
𝐸∗𝐵 ⇌ 𝐸𝐷
𝐸𝐷 ⇌ 𝐸 + 𝐷
Transketolase
181
C.2. Nonlinear least-squares regression-based procedure for kinetic
parameterization
The K-FIT kinetic parameterization procedure is designed to make use of steady-state flux
measurements for multiple genetic perturbations to parameterize a single kinetic model of
metabolism. Kinetic parameters values are estimated by solving a least-squares problem
that minimizes the deviation between predicted and experimentally measured steady-state
flux distributions across all perturbed networks. The formal description of this least-
squares optimization problem requires the definition of the following sets, parameters,
variables and constraints:
Sets
Set of metabolites 𝐼 = {𝑖|𝑖 = 1,2, … , 𝑛𝑀}
Set of reactions 𝐽 = {𝑗|𝑗 = 1,2, … , 𝑛𝑅}
Set of elementary steps 𝐿 = {𝑙|𝑙 = 1,2, … , 𝑛𝐿}
𝐿𝑗𝑐𝑎𝑡 ⊆ 𝐿 is the subset of all catalytic elementary steps for reaction 𝑗
𝐿𝑗𝑟𝑒𝑔
⊂ 𝐿 is the subset of all regulatory elementary steps for reaction 𝑗
Set of elementary reactions 𝑃 = {𝑝|𝑝 = 1,2, … , 𝑛𝑃}
Set of perturbation mutants 𝐶 = {𝑐|𝑐 = 1,2, … , 𝑛𝐶} with 𝑐 = 1 denoting wild-type WT (or
reference) network
182
𝐽𝑐𝑚𝑒𝑎𝑠 ⊆ 𝐽 is the subset of all reactions with available flux measurements under perturbation
mutant 𝑐 ∈ 𝐶. The cardinality of 𝐽𝑐𝑚𝑒𝑎𝑠 is 𝑛𝑐
𝑚𝑒𝑎𝑠.
Parameters
𝑺 is the metabolite stoichiometry matrix of dimensions [𝑛𝑀 × 𝑛𝑃] whose elements 𝑆𝑖𝑝
represent the stoichiometric coefficient of metabolite 𝑖 ∈ 𝐼 in elementary reaction 𝑝 ∈ 𝑃
𝑬 is the enzyme complex stoichiometry matrix of dimensions [𝑛𝐿 × 𝑛𝑃] whose elements
𝐸𝑙𝑝 represent the stoichiometric coefficient of enzyme complex (or free enzyme) 𝑙 ∈ 𝐿 in
elementary reaction 𝑝 ∈ 𝑃
𝑽𝑐(𝑚𝑒𝑎𝑠)
is the [𝑛𝑐𝑚𝑒𝑎𝑠 × 1] vector of flux measurements in mutant 𝑐 ∈ 𝐶 whose elements
𝑉𝑗,𝑐(𝑚𝑒𝑎𝑠)
represent the measured flux through reaction 𝑗 ∈ 𝐽𝑐𝑚𝑒𝑎𝑠 with standard deviation
𝜎𝑗,𝑐(𝑚𝑒𝑎𝑠)
𝑳(𝑛𝑒𝑡) is the [𝑛𝑅 × 1] net flux mapping vector whose elements (𝐿𝑗(𝑛𝑒𝑡)
) store the index of
the last catalytic elementary step 𝑙 ∈ 𝐿 that quantifies the net flux through the overall
reaction 𝑗 ∈ 𝐽.
Variables
𝒌 is the [𝑛𝑃 × 1] vector of kinetic parameters whose elements 𝑘𝑝 denote the kinetic
parameter for elementary reaction 𝑝 ∈ 𝑃
𝒔 is the [𝑛𝑀 × 𝑛𝐶] matrix of relative metabolite concentrations whose elements 𝑠𝑖𝑐
represent the fold-change in concentration of metabolite 𝑖 ∈ 𝐼 in mutant 𝑐 ∈ 𝐶 relative to
183
WT. The 𝑐𝑡ℎ column representing the [𝑛𝑀 × 1] vector of relative metabolite concentrations
in mutant 𝑐 ∈ 𝐶 is denoted as 𝒔𝑐.
𝒆 is the [𝑛𝐿 × 𝑛𝐶] matrix of enzyme fractions whose elements 𝑒𝑙𝑐 represent the fractional
abundance of enzyme complex 𝑙 ∈ 𝐿 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the
[𝑛𝐿 × 1] vector of enzyme fractions for mutant 𝑐 ∈ 𝐶 is denoted as 𝒆𝑐. The number of
enzyme complexes is equal to the number of elementary steps as discussed earlier.
𝒗 is the [𝑛𝑃 × 𝑛𝐶] matrix of elementary fluxes whose elements 𝑣𝑝,𝑐 denote the flux through
elementary reaction 𝑝 ∈ 𝑃 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the [𝑛𝑝 × 1]
vector of elementary fluxes in mutant 𝑐 ∈ 𝐶 is denoted as 𝒗𝑐.
𝒗𝑛𝑒𝑡 is the [𝑛𝐿 × 𝑛𝐶] matrix of net elementary fluxes whose elements 𝑣𝑙,𝑐𝑛𝑒𝑡 represent the
net flux through elementary step 𝑙 ∈ 𝐿 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the
[𝑛𝐿 × 1] vector of net elementary fluxes in mutant 𝑐 ∈ 𝐶 is denoted as 𝒗𝑐𝑛𝑒𝑡.
𝑽 is the [𝑛𝑅 × 𝑛𝐶] matrix of reaction fluxes whose elements 𝑉𝑗,𝑐 denote the flux through
reaction 𝑗 ∈ 𝐽 in mutant 𝑐 ∈ 𝐶. The 𝑐𝑡ℎ column representing the [𝑛𝑅 × 1] vector of
elementary fluxes in mutant 𝑐 ∈ 𝐶 is denoted as 𝑽𝑐.
In addition to these variable declarations the following three matrices are defined:
𝑹 is an [𝑛𝑅 × 𝑛𝐿] grouping matrix that indicates which enzyme complexes 𝑙 ∈ 𝐿 participate
in reaction 𝑗 ∈ 𝐽. It is defined as:
𝑅𝑗𝑙 = {1 if 𝑙 ∈ {𝐿𝑗
𝑐𝑎𝑡⋃𝐿𝑗𝑟𝑒𝑔
}
0, otherwise
184
𝑵 is an [𝑛𝑅 × 𝑛𝐿] indicator matrix that is used to map net flux through elementary steps 𝑙
to flux through the overall reaction 𝑗 ∈ 𝐽. Based on the convention established in the
Introduction section (Equation (28)), the last catalytic step serves as a measure of flux
through the overall reaction. It is defined as:
𝑁𝑗𝑙 = {1 𝑖𝑓 𝑙 = 𝐿𝑗
(𝑛𝑒𝑡)
0, otherwise
𝒁 is an [𝑛𝑅 × 𝑛𝐶] indicator matrix that maps the abundance of the enzyme catalyzing
reaction 𝑗 in mutant 𝑐 ∈ 𝐶 relative to its abundance in the WT strain. It is defined as:
𝑍𝑗,𝑐 = {0 if reaction 𝑗 ∈ 𝐽 is eliminated under condition 𝑐 ∈ 𝐶1, otherwise
The definition of matrix 𝒁 implies that the mutant networks are derived by eliminating one
or more reactions from the metabolic network of the reference strain. This definition can
be generalized to incorporate other genetic perturbations such as over-expression and
down-regulation of gene expression. In the absence of proteomic data in mutant strains, we
assume that the enzymes maintain levels as in the WT.
185
Least-squares minimization problem P1
Using the definitions introduced above the least-squares minimization problem for kinetic
parameterization is formulated for the general case as the following nonlinear optimization
problem:
min𝒌,𝒆,𝒔,𝒗,𝑽
𝜙 = ∑ ∑ (𝑉𝑗𝑐 − 𝑉𝑗𝑐
(𝑚𝑒𝑎𝑠)
𝜎𝑗𝑐)
2
𝑗∈𝐽𝑐𝑚𝑒𝑎𝑠
𝑛𝐶
𝑐=1
subject to:
𝑣𝑝,𝑐 = 𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0 )
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
(30)
𝑣𝑙,𝑐(𝑛𝑒𝑡)
= 𝑣(2𝑙−1),𝑐 − 𝑣2𝑙,𝑐
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
(31)
∑(𝐸𝑙𝑝𝑣𝑝𝑐)
𝑃
𝑝=1
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
(32)
∑(𝑅𝑗𝑙𝑒𝑙,𝑐)
𝑛𝐿
𝑙=1
= 𝑍𝑗𝑐
∀𝑗 ∈ 𝐽
∀𝑐 ∈ 𝐶
(33)
∑(𝑆𝑖𝑝𝑣𝑝𝑐)
𝑃
𝑝=1
= 0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(34)
186
𝑉𝑗,𝑐 = ∑(𝑁𝑗𝑙𝑣𝑙,𝑐(𝑛𝑒𝑡))
𝑛𝐿
𝑙=1
∀𝑗 ∈ 𝐽
∀𝑐 ∈ 𝐶
(35)
𝑠𝑖,𝑐 ≥ 0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(36)
𝑠𝑖,1 = 1 ∀𝑖 ∈ 𝐼 (37)
0 ≤ 𝑒𝑙,𝑐 ≤ 1
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
(38)
𝑘𝑝 ≥ 0 ∀𝑝 ∈ 𝑃 (39)
Equation (30) in the above formulation represents the rate law for any elementary reaction
governed by mass-action kinetics. It is a generalized form of Equation (25) accounting for
reaction rates across all mutants 𝑐 ∈ 𝐶. As discussed before, the role of the product
operators is to select the single enzyme complex and (possibly) metabolite participating in
the elementary reaction rate equation. Therefore, Equation (30) involves either a bilinear
term (product of enzyme fraction times a relative metabolite concentration) or linear term
(enzyme fraction term) in the right-hand side. Equation (32) and (34) enforce conservation
of mass across all enzyme complexes and metabolites, respectively. Equation (32) is an
extension of Equation (27) to include enzyme complex balances across all mutants.
Equation (33) ensures that the total amount of the enzyme in all of its forms catalyzing
reaction 𝑗 remains constant. It is a generalization of Equations (21) to account for enzyme
187
presence or absence in different mutants. Thus, the presence or absence of reaction 𝑗 in
mutant 𝑐 is captured by Equation (33). Equation (31) computes the net flux through any
elementary step based on the mapping of elementary reactions to elementary steps
established with Equations (1), (2) and (22). Equation (35) links the net flux through the
elementary steps (i.e., last catalytic step) of a reaction to the overall flux through reaction
𝑗. Equation (38) ensures that the enzyme fractional abundances are bounded between zero
and one. Equation (36) and (39) enforce non-negativity of relative metabolite
concentrations and kinetic parameters, respectively. Since all metabolite concentrations are
normalized with respect to the corresponding concentrations in the WT strain as described
in Equations (4), Equation (37) sets all relative concentrations for the WT strain (𝑐 = 1)
equal to one.
Equation (30) involving (at most) bilinear terms is the only set of nonlinear constraints in
NLP problem P1. This constraint renders the optimization formulation nonconvex making
even the identification of a feasible point challenging let alone convergence to the optimum
value. Therefore, any attempt to solve problem P1 using an off-the-self NLP solver such
MINOS (Murtagh and Saunders, 1978), CONOPT (Drud, 1985), or fmincon from the
Optimization Toolbox in MATLABTM is unlikely to succeed due to difficulties in
maintaining feasibility and progressively reduced step-length in the line-search.
Conceptually, this can be remedied by integrating Equations (32) and (34) to steady-state
after substituting the expression for elementary flux from Equation (30). However, this
tends to be rather time consuming (i.e., order of minutes) due to the stiffness of the
differential equations and the loss of accuracy arising from taking large time steps.
188
Furthermore, the inability to integrate Equations (32) and (34) to steady-state for some sets
of kinetic parameters results in the premature termination of any gradient-based
optimization algorithm. Therefore, past efforts in kinetic parameterization have relied on
meta-heuristic optimization algorithms such as Genetic Algorithm (Khodayari et al., 2014)
and particle swarm optimization (Millard et al., 2017). The lack of gradient information in
this class of methods limits efficient traversal of the kinetic space in search of an acceptable
solution which may or may not be optimal or even near-optimal for the least squares
objective function. This computational inefficiency in performing kinetic model
parameterization prevents any follow up calculations to assess uncertainties in kinetic
parameters due to experimental errors or internal kinetic parameter dependencies. This
computational inefficiency is one of the contributing factors that have so far throttled back
the parameterization of large-scale and wide application of kinetic models in strain design.
Faced with these challenges, we put forth a customized procedure that can reliably identify
optimal or near optimal kinetic model parameterizations while achieve orders of magnitude
improvement in computational time over stochastic approaches. The following subsections
will describe strategies to transform problem P1 into a successive sequence of easier-to-
solve subproblems. These strategies form the basis of the kinetic parameterization
algorithm, K-FIT. K-FIT allows for the efficient solution of NLP problem P1 using three
main tasks/procedures:
I. Procedure K-SOLVE anchors kinetic parameters 𝑘𝑝 to the specified steady-
state flux distribution in the WT network 𝑽1 such that such that conservation of
mass across metabolites (Equation (34)), pseudo-steady-state condition across all
189
enzyme complexes (Equation (32)), and normalization of metabolite concentrations
(Equation (37)) are simultaneously satisfied for the WT network. This is
accomplished by rearranging Equation (30) to express 𝒌 as a function of the WT
enzyme fractions 𝒆1 and the flux through the reverse elementary reactions 𝒗𝑟 ⊂ 𝒗1
while maintaining the relative metabolite concentrations 𝑠𝑖,1 = 1 ∀𝑖 ∈ 𝐼 (i.e., 𝒌 =
𝑓(𝒗𝑟 , 𝒆1)).
II. Procedure SSF-Evaluator computes the steady-state fluxes 𝑽𝑐 and relative
metabolite concentrations 𝒔𝑐 across all mutants (𝑐 > 1) using the kinetic
parameters 𝒌 computed in procedure K-SOLVE. Procedure SSF-Evaluator
decomposes the system of bilinear equations in 𝒔𝑐 and 𝒆𝑐 defined by Equations
(30), (32), (33), and (34) into two blocks of equations representing conservation of
mass across enzyme complexes and metabolites, respectively. The bilinear
equations become linear when one of either (𝒔𝑐 or 𝒆𝑐) is specified. When 𝒔𝑐 is
specified, Equations (32) and (33) form an exactly determined [𝑛𝐿 × 𝑛𝐿] system of
linear algebraic equations in 𝒆𝑐. Similarly, Equation (34) represents an exactly
determined [𝑛𝑀 × 𝑛𝑀] system of linear algebraic equations in 𝒔𝑐 when 𝒆𝑐 is
specified. SSF-Evaluator iterates between these two blocks using originally a fixed-
point iteration (FPI) scheme (or Newton / Richardson extrapolation if needed) until
a steady-state is found. This strategy allows for the direct evaluation of both fluxes
and concentration across all mutants that automatically satisfy all the nonlinear
equality constraints from problem P1 and leaves only linear (in)equalities in the
constraint set.
190
III. Procedure K-UPDATE computes the sensitivity of net flux through all
reactions 𝑽 to WT enzyme fractions 𝒆1 and reverse elementary fluxes 𝒗𝑟, which is
then used to compute the approximate gradient 𝑮 and the approximate 𝑯 for the
objective function 𝜙. 𝑮 and 𝑯 are then used to check for optimality and update 𝒆1
and 𝒗𝑟 using a Newton step if optimality is not achieved. The updated values for
𝒆1 and 𝒗𝑟 are then fed to the K-SOLVE procedure which evaluates updated kinetic
parameters 𝒌 and the calculation sequence described above is repeated.
The mathematical details and implementation of all the component subroutines of K-FIT
are described in the following subsections.
C.3. KSOLVE: Anchoring kinetic parameters to the WT flux distribution
K-SOLVE computes a set of kinetic parameters 𝒌 that satisfy Equations (30) - (39) for the
WT network (𝑐 = 1) when the WT flux distribution 𝑽1, enzyme fractions 𝒆1 and non-
negative elementary fluxes 𝒗1 are specified. This anchoring is required because
conservation of mass across all enzyme fractions, mass balance across metabolites, and
normalization of metabolite concentrations may not be simultaneously satisfied. To
demonstrate this, we recast Equations (32), (33), and (34) after substituting the expression
for elementary fluxes in terms of mass-action kinetics described in Equation (30) and
setting 𝑠𝑖,1 = 1∀𝑖 ∈ 𝐼 based on Equation (37).
∑(𝑅𝑗𝑙𝑒𝑙,1)
𝑛𝐿
𝑙=1
= 1
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
(40)
191
∑
(
𝐸𝑙𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙′𝐸𝑙′𝑝<0
)
)
𝑛𝑃
𝑝=1
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
∀𝑙′ ∈ 𝐿
(41)
∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
)
𝑛𝑃
𝑝=1
= 0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(42)
Equations (40), (41), and (42) form an overdetermined system of (𝑛𝐿 + 𝑛𝑀) linear
algebraic equations in 𝑛𝐿 unknown enzyme fractions 𝒆1 when kinetic parameters 𝒌 are
specified. This system of equations for arbitrary values of 𝒌 will likely be infeasible
indicating that not possible values for kinetic parameters 𝒌 simultaneously satisfy
conservation of mass across all metabolites and enzyme complexes. This necessitates the
development of the K-SOLVE procedure which derives a link between 𝒌, and 𝒆1 so that
conservation of mass is always satisfied. This is achieved by rearranging Equation (30) for
the WT network and exploiting the property that the product term containing relative
metabolite concentrations (∏ 𝑠𝑖,1
−𝑆𝑖𝑝𝑖
𝑆𝑖𝑝≤0) will always be equal to one because the
metabolite concentrations are scaled with respect to WT (i.e. 𝑠𝑖,1 = 1):
192
𝑘𝑝 = 𝑣𝑝,1
(
∏ 𝑒𝑙,1
𝑙𝐸𝑙𝑝<0 )
−1
∀𝑝 ∈ 𝑃 (43)
Note that Equation (43) reveals that 𝑘𝑝 can be uniquely determined when both 𝒗1 and 𝒆1
are specified. Of these variables, 𝒆1 is bounded between 0 and 1, and further constrained
by the following relations:
∑(𝑅𝑗𝑙𝑒𝑙,1)
𝑛𝐿
𝑙=1
= 1 ∀𝑙 ∈ 𝐿 (40)
0 ≤ 𝑒𝑙,1 ≤ 1 ∀𝑙 ∈ 𝐿 (44)
Equation (43) relates 2𝑛𝐿 kinetic parameters to 𝑛𝐿 enzyme fractions and 2𝑛𝐿 elementary
fluxes. Enzyme fractions are further constrained by Equation (40) and bounded as shown
in Equation (44) implying that there exists multiple value assignments for the 2𝑛𝐿
elementary fluxes that could yield the same 𝑘𝑝 values. This implies that the assignment of
values for the elementary fluxes 𝑣𝑝 is not unique and that there exist unsatisfied degrees of
freedom as only a subset of 𝑣𝑝 are independent variables. The reason for this dependency
is the presence of pairs of forward and reverse fluxes that can assume an infinity of possibly
combinations of values with the same net flux 𝑣𝑙(𝑛𝑒𝑡)
through the elementary step. We
extract an independent subset of 𝒗1 by arbitrarily selecting the reverse flux as the
independent variables and relating the forward fluxes as a function of the reverse and net
fluxes. This requires the definition of two separate [𝑛𝐿 × 1] vectors 𝒗𝑓 and 𝒗𝑟 denoting
fluxes through forward and reverse elementary reactions, respectively in the WT strain.
193
Elements of vectors 𝒗𝑓 and 𝒗𝑟 are mapped to the [2𝑛𝐿 × 1] vector of elementary fluxes in
the WT network (𝒗1) using Equations (45) and (46).
𝑣𝑓,𝑙 = 𝑣2𝑙−1,1
𝑣𝑟,𝑙 = 𝑣2𝑙,1
(45)
(46)
Because the net flux through an elementary step 𝑣𝑙,1(𝑛𝑒𝑡)
is the difference between the
forward and reverse elementary fluxes we obtain
𝑣𝑓𝑙= 𝑣𝑙,1
(𝑛𝑒𝑡)+ 𝑣𝑟𝑙
∀𝑙 ∈ 𝐿 (47)
The net flux through all elementary steps of an enzyme-catalyzed reaction in the WT strain
is related to the net flux through the reaction in the WT (𝑐 = 1) by Equations (48) and (49).
𝑣𝑙,1(𝑛𝑒𝑡)
= 𝑉𝑗,1
∀𝑙 ∈ 𝐿𝑗𝑐𝑎𝑡
∀𝑗 ∈ 𝐽
(48)
𝑣𝑙,1(𝑛𝑒𝑡)
= 0
∀𝑙 ∈ 𝐿𝑗𝑟𝑒𝑔
∀𝑗 ∈ 𝐽
(49)
When 𝑽1 is specified then the values of the net fluxes 𝑣𝑙(𝑛𝑒𝑡)
through all elementary steps
(both catalytic and regulatory) can be recovered from Equations (48) and (49). These
values can then be plugged into Equation (47) to calculate 𝒗𝑓 for a given assignment of
value of the independent variables 𝒗𝑟. Since vector 𝑽1 stores the steady-state fluxes in the
194
WT, equality constraints representing conservation of mass across metabolites in the WT
in problem P1 are inherently satisfied.
Therefore when 𝒗1(𝑛𝑒𝑡)
, 𝒗𝑟, and 𝒆1 are specified, a unique set of kinetic parameters 𝒌 can
be obtained by solving the following [𝑛𝑃 × 𝑛𝑃] system of linear algebraic equations.
𝑣𝑟,𝑙 + 𝑣𝑙(𝑛𝑒𝑡)
= 𝑘(2𝑙−1)
(
∏ 𝑒𝑙′,1
𝑙′𝐸𝑙′𝑝<0 )
∀𝑝 ∈ 𝑃
∀𝑙 ∈ 𝐿
∀𝑙′ ∈ 𝐿
(50)
𝑣𝑟,𝑙 = 𝑘(2𝑙)
(
∏ 𝑒𝑙′,1
𝑙′𝐸𝑙′𝑝<0 )
Note that the rate law expressions in Equations (50) are derived by setting the relative
metabolite concentrations in Equation (30) for WT to one. The vector of kinetic parameters
𝒌 is recovered from Equation (51) as the following explicit relations
𝑘(2𝑙−1) = (𝑣𝑟𝑙+ 𝑣𝑙
(𝑛𝑒𝑡))
(
∏ 𝑒𝑙′,1
𝑙′
𝐸𝑙′𝑝<0)
−1
∀𝑝 ∈ 𝑃
∀𝑙 ∈ 𝐿
∀𝑙′ ∈ 𝐿
(51)
𝑘(2𝑙) = (𝑣𝑟𝑙)
(
∏ 𝑒𝑙′,1
𝑙′
𝐸𝑙′𝑝<0)
−1
195
Values assumed by 𝒆1 and 𝒗𝑟 are constrained by the following (in)equalities:
∑𝑅𝑗𝑙𝑒𝑙,1
𝑛𝐿
𝑙=1
= 1 ∀𝑗 ∈ 𝐽 (40)
0 ≤ 𝑒𝑙,1 ≤ 1 ∀𝑙 ∈ 𝐿 (44)
𝑣𝑟,𝑙 ≥ 0 ∀𝑙 ∈ 𝐿 (52)
𝑣𝑟,𝑙 + 𝑣𝑙,1(𝑛𝑒𝑡)
≥ 0 ∀𝑙 ∈ 𝐿 (53)
Since all elementary fluxes and enzyme fractions are non-negative, non-negativity of the
kinetic parameters 𝒌 computed in Equation (51) is always guaranteed. The steps for
computing this feasible set of kinetic parameters is provided in the following algorithmic
description for K-SOLVE. K-SOLVE accepts WT enzyme fractions 𝒆1 and reverse
elementary fluxes 𝒗𝑟 as inputs and returns kinetic parameters 𝒌 as the output.
Algorithm procedure K-SOLVE
Begin
Specify and fix flux distribution in the WT strain 𝑽1.
Specify and fix 𝒆1 and 𝒗𝑟 satisfying Equation (40), (44), (52), and (53).
Set 𝑣𝑙,1(𝑛𝑒𝑡) ∀𝑙 ∈ 𝐿𝑗
𝑐𝑎𝑡 to 𝑉𝑗,1∀𝑗 ∈ 𝐽
Set 𝑣𝑙,1(𝑛𝑒𝑡) ∀𝑙 ∈ 𝐿𝑗
𝑟𝑒𝑔 to 0
Compute kinetic parameters 𝒌 by substituting 𝒆1, 𝒗𝑟, and 𝒗1(𝑛𝑒𝑡)
in Equation (51)
return 𝒌
end
196
C.4. SSF-Evaluator: Evaluation of steady-state fluxes for the mutant networks
using the kinetic parameter assignments of K-SOLVE
Having computed a set of kinetic parameters 𝒌 satisfying Equations (30) - (39) for the WT
strain (𝑐 = 1) using K-SOLVE, the objective of SSF-Evaluator is to compute the flux
distributions in the mutant strains. Typically, this is achieved by integrating the ODEs
describing conservation of mass across all metabolites and enzyme complexes. To
circumvent the unreliability and high computational cost associated with numerical
integration, we put forth a decomposition-based approach that leverages the bilinear
structure of the underlying system of equations. In this section, we derive updating
formulae for the metabolite concentrations (Equations (57), (59), and (82)) in response to
the altered enzyme concentrations compared to WT in the mutant networks (see Equation
(33)). These update formulae are then fed into the SSF-Evaluator procedure that evaluates
fluxes and metabolite concentrations in mutants when the kinetic parameters 𝒌 are
provided.
Substituting the expression for 𝑣𝑝,𝑐 from Equation (30) that pose metabolite and enzyme
mass balances as functions of enzyme fractions 𝒆𝑐 and relative metabolite concentrations
𝒔𝑐 across all mutant networks 𝑐 ∈ 𝐶 into Equations (32) and (34) yields Equations (54) and
(55), respectively:
197
∑
(
𝐸𝑙𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙′𝐸𝑙′𝑝<0
)
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)
)
𝑛𝑃
𝑝=1
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
(54)
∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖′𝑝
𝑖′𝑆𝑖′𝑝≤0
)
)
𝑛𝑃
𝑝=1
= 0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(55)
Equations (54) and (55) must be supplemented by Equation (33) that imposes that the sum
of the fractional abundance of all enzyme complexes of a particular enzyme must be equal
to the fold-change in the total enzyme level relative to WT. Thus, for every mutant network
𝑐, the enzyme fractions 𝒆𝑐 encode any changes to enzyme level by means of upregulation,
downregulation or absence as described by Equation (33).
∑(𝑅𝑗𝑙𝑒𝑙,𝑐)
𝑛𝐿
𝑙=1
= 𝑍𝑗𝑐
∀𝑗 ∈ 𝐽
∀𝑐 ∈ 𝐶
(33)
Equations (33) and (54) form a [𝑛𝐿 × 𝑛𝐿] system of linear algebraic equations of full rank
in 𝒆𝑐 that can efficiently be solved for the fractional enzyme complex abundances in all
mutant networks given the values for the relative metabolite concentrations 𝒔𝑐 and kinetic
parameters 𝒌 (Briggs and Haldane, 1925). It is important to note that the steady-state
enzyme fractions 𝒆𝑐 encode any changes to enzyme presence in mutant 𝑐 through Equation
(33).
198
Elementary binding and release steps bind only one reactant or release only one product at
a time. This ensures that the only possible exponent for the metabolite concentration term
in Equations (54) and (55) is equal to one. Equations (55) therefore simplifies to a system
of linear algebraic equations in 𝒔𝑐 when 𝒆𝑐 is specified and can be recast as:
∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
)
𝑝∈𝑃𝑆𝑖𝑝>0
+ ∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
𝑠𝑖,𝑐
)
𝑝∈𝑃𝑆𝑖𝑝<0
= 0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(56)
The relative metabolite concentrations can then be directly calculated from the following
explicit expression:
𝑠𝑖,𝑐 = −
∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
)
𝑝∈𝑃𝑆𝑖𝑝>0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(57)
∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
)
𝑝∈𝑃𝑆𝑖𝑝<0
Equation (57) relates relative metabolite concentrations 𝒔𝑐 to enzyme fractions 𝒆𝑐 at
metabolic steady-state for a given set of elementary step kinetic parameters 𝒌. When 𝒔𝑐
and 𝒆𝑐 do not represent steady-state relative metabolite concentrations and enzyme
fractions, the left hand-side of Equation (56) quantifies the mass imbalance of metabolite
𝑖 in network 𝑐 as shown in Equation (58).
199
𝑑𝑠𝑖,𝑐
𝑑𝑡= ∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
)
𝑝∈𝑃𝑆𝑖𝑝>0
+ ∑
(
𝑆𝑖𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
𝑠𝑖,𝑐
)
𝑝∈𝑃𝑆𝑖𝑝<0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(58)
C.4.1. Fixed-Point Iteration (FPI)
In summary, the enzyme fractions 𝒆𝑐 can be computed from the kinetic parameters 𝒌 and
metabolite concentrations 𝒔𝑐 by solving the system of linear equations (33) and (54) and in
turn the computed enzyme fractions 𝒆𝑐 can be used to update metabolite concentrations 𝒔𝑐.
This establishes the following fixed-point iteration (FPI) procedure to solve for the
unknown concentrations 𝒔𝑐 and enzyme fractions 𝒆𝑐 given kinetic parameters 𝒌:
Algorithmic Implementation of FPI
Begin
Specify and fix 𝒌
set 𝑠𝑡𝑜𝑙:= 10−6, 𝑖𝑡𝑒𝑟: = 1
Initialize 𝑠𝑖,𝑐(0): = 1, ∀𝑖 ∈ 𝐼 and 𝑐 ∈ 𝐶
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(0)
Compute 𝒔𝑐(𝑖𝑡𝑒𝑟)
by solving Equation (57) with 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)
Compute 𝑑𝒔𝑐
𝑑𝑡 by solving Equation (58) with 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆𝑐: = 𝒆𝑐
(𝑖𝑡𝑒𝑟)
While ‖𝑑𝑠𝒄
𝑑𝑡‖
∞> 𝑠𝑡𝑜𝑙 or ‖𝒔𝑐
(𝑖𝑡𝑒𝑟+1)− 𝒔𝑐
(𝑖𝑡𝑒𝑟)‖
∞> 10−4
𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
Compute 𝒔𝑐(𝑖𝑡𝑒𝑟)
by solving Equation (57) with 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)
Compute 𝑑𝑠𝒄
𝑑𝑡 by solving Equation (58) with 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆𝑐: = 𝒆𝑐
(𝑖𝑡𝑒𝑟)
return 𝒔𝑐(𝐹𝑃𝐼)
≔ 𝒔𝑐(𝑖𝑡𝑒𝑟)
and 𝒆𝑐(𝐹𝑃𝐼)
≔ 𝒆𝑐(𝑖𝑡𝑒𝑟)
end
200
It is important to note that the FPI algorithm has linear convergence which causes the
method to slow down as we approach metabolic steady-state. This can be accelerated by
switching to Newton’s method which has quadratic convergence. We switch to Newton’s
method when either the mass imbalance is within the specified threshold of 𝑠𝑡𝑜𝑙 or the
progress towards steady-state becomes too slow. This happens when the change in
metabolite concentrations between iterations falls below a pre-specified threshold of 10−4.
C.4.2. Newton’s method for accelerating convergence
Let 𝒔𝑐(𝐹𝑃𝐼)
and 𝒆𝑐(𝐹𝑃𝐼)
be current iterates that do not represent steady-state relative metabolite
concentrations and enzyme fractions, respectively. They can be used as starting points for
Newton’s method where relative metabolite concentrations are updated in the 𝑛𝑡ℎ iteration
as:
𝒔𝑐(𝑛+1)
= 𝒔𝑐(𝑛)
− (𝜕 (
𝑑𝒔𝑐𝑑𝑡
)
𝜕𝒔𝑐)
−1
𝑑𝒔𝑐
𝑑𝑡
(59)
In Equation (59), 𝑑𝒔𝑐
𝑑𝑡 is computed using Equation (58). The quantity (
𝜕(𝑑𝒔𝑐𝑑𝑡
)
𝜕𝒔𝑐) represents
the Jacobian 𝑱 of the function 𝑑𝒔𝑐
𝑑𝑡 described in Equation (59) and can be recast in terms of
elementary fluxes as:
𝑑𝑠𝑖,𝑐
𝑑𝑡= ∑(𝑆𝑖𝑝𝑣𝑝𝑐)
𝑃
𝑝=1
= 0
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
(60)
201
The Jacobian 𝐽 obtained by differentiating Equation (60) with respect to 𝒔𝑐 yields:
𝐽𝑖𝑖′,𝑐 =𝜕
𝜕𝑠𝑖′,𝑐(𝑑𝑠𝑖,𝑐
𝑑𝑡) = ∑ 𝑆𝑖𝑝 (
𝜕𝑣𝑝𝑐
𝜕𝑠𝑖′,𝑐)
𝑃
𝑝=1
∀𝑖 ∈ 𝐼
∀𝑐 ∈ 𝐶
∀𝑖′ ∈ 𝐼
(61)
Recall that 𝑣𝑝𝑐 is related to kinetic parameters 𝒌, enzyme fractions 𝒆𝑐, and relative
metabolite concentrations 𝒔𝑐 using the mass-action kinetics of Equation (30). The
sensitivity of 𝑣𝑝𝑐 to the relative metabolite concentrations is obtained by differentiating
Equation (30) with respect to 𝒔𝑐:
𝑣𝑝,𝑐 = 𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝<0
)
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
(30)
𝜕𝑣𝑝𝑐
𝜕𝑠𝑖′,𝑐
= ∑
(
𝑘𝑝
(
(
∏ 𝑠𝑞,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝<0 )
𝜕
𝜕𝑠𝑖,𝑐
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
+
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
𝜕
𝜕𝑠𝑖′,𝑐
(
∏ 𝑠𝑞,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝<0 )
)
)
𝑙𝐸𝑙𝑝<0
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
∀𝑖 ∈ 𝐼
∀𝑖′ ∈ 𝐼
(62)
Since only one enzyme complex and (at most) one metabolite participates in any
elementary reaction, the derivatives in Equation (62) can be simplified as:
202
𝜕
𝜕𝑠𝑖′,𝑐
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
= − ∑ 𝐸𝑙𝑝 (𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐
)
𝑙𝐸𝑙𝑝≤0
∀𝑖′ ∈ 𝐼
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
(63)
𝜕
𝜕𝑠𝑖′,𝑐
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)
= − ∑ 𝑆𝑖𝑝 (𝜕𝑠𝑖,𝑐
𝜕𝑠𝑖′,𝑐
)
𝑖𝑆𝑖𝑝≤0
∀𝑖′ ∈ 𝐼
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
(64)
Equation (62) can therefore be simplified by substituting the expressions for the derivatives
in Equations (63) and (64) as:
𝜕𝑣𝑝𝑐
𝜕𝑠𝑖′,𝑐
= −𝑘𝑝
(
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝<0 )
(
∑ 𝐸𝑙𝑝𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐𝑙𝐸𝑙𝑝≤0 )
+
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
(
∑ 𝑆𝑖𝑝𝜕𝑠𝑖,𝑐
𝜕𝑠𝑖′,𝑐𝑖𝑆𝑖𝑝≤0 )
)
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
∀𝑖′ ∈ 𝐼
(65)
The partial derivatives 𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐 must be computed to quantify the sensitivity of elementary
fluxes to substrate concentrations. This is achieved by differentiating Equations (33) and
(54) with respect to 𝑠𝑖′,𝑐:
∑(𝑅𝑗𝑙
𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐 )
𝑛𝐿
𝑙=1
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
∀𝑖′ ∈ 𝐼
(66)
203
∑
(
𝐸𝑙𝑝𝑘𝑝
(
∏ 𝑠𝑖,𝑐
𝑖𝑆𝑖𝑝≤0
)
∑ (𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐)
𝑙𝐸𝑙𝑝<0
)
𝑃
𝑝=1
− ∑
(
𝑆𝑖′𝑝𝐸𝑙𝑝𝑘𝑝
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)
)
𝑃
𝑝=1
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
∀𝑖′ ∈ 𝐼
(67)
𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐 is computed by solving an exactly determined [𝑛𝐿 × 𝑛𝐿] system of linear algebraic
equations formed by Equations (66) and (67). The computed 𝜕𝑒𝑙,𝑐
𝜕𝑠𝑖′,𝑐 is then substituted in
Equation (65) to compute 𝜕𝑣𝑝𝑐
𝜕𝑠𝑖,𝑐 which is subsequently substituted in Equation (61) to
compute all elements in the Jacobian 𝑱. Having computed 𝑱, metabolite concentrations can
be updated using Equation (59) until the steady-state concentrations are reached or 𝑱
becomes singular. An alternative updating scheme for when 𝑱 become singular is detailed
in the following subsection. The following algorithm details the steps involved in the
identification of steady-state metabolite concentrations using Newton’s method.
Algorithmic Implementation of Newton’s Method
Begin
Specify and fix 𝒌
Set 𝑠𝑡𝑜𝑙: = 10−6, 𝑖𝑡𝑒𝑟: = 1
Initialize 𝑠𝑖,𝑐(𝑖𝑡𝑒𝑟): = 𝑠𝑖,𝑐
𝐹𝑃𝐼 , ∀𝑖 ∈ 𝐼 and 𝑐 ∈ 𝐶
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆𝑐:= 𝒆𝑐
(𝑖𝑡𝑒𝑟) into Equation (58)
Compute 𝜕𝒆𝑐
𝜕𝒔 by solving Equations (66) and (67) with 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆𝑐: = 𝒆𝑐
(𝑖𝑡𝑒𝑟)
Compute 𝜕𝒗𝑐
𝜕𝒔 by substituting
𝜕𝒆𝑐
𝜕𝒔 into Equation (65)
Compute 𝑱 by substituting 𝜕𝒗𝑐
𝜕𝒔 Equation (61)
204
While (‖𝑑𝒔𝑐
𝑑𝑡‖
∞> 𝑠𝑡𝑜𝑙) and 𝑱 is not singular
𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1
Update 𝒔𝑐(𝑖𝑡𝑒𝑟)
by substituting 𝜕(
𝑑𝒔𝑐𝑑𝑡
)
𝜕𝒔𝑐= 𝑱 and 𝒔𝑐
(𝑛)= 𝒔𝑐
(𝑖𝑡𝑒𝑟−1) into Equation (59)
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆𝑐:= 𝒆𝑐
(𝑖𝑡𝑒𝑟) into Equation (58)
Compute 𝜕𝒆𝑐
𝜕𝒔 by solving Equations (66) and (67) with 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟)
and 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)
Compute 𝜕𝒗𝑐
𝜕𝒔 by substituting
𝜕𝒆𝑐
𝜕𝒔 into Equation (65)
Compute 𝑱 by substituting 𝜕𝒗𝑐
𝜕𝒔 Equation (61)
return 𝒔𝑐(𝑁𝑀)
≔ 𝒔𝑐(𝑖𝑡𝑒𝑟)
and 𝒆𝑐(𝑁𝑀)
≔ 𝒆𝑐(𝑖𝑡𝑒𝑟)
end
On average we find 𝑱 becomes singular in only approximately 5% of the all mutant flux
evaluations using SSF-Evaluator, thus requiring a different updating formula.
C.4.3. Richardson’s Extrapolation when J becomes singular
If singularity for the Jacobian is detected then we switch to a semi-implicit first-order
integrator (Press et al., 2007b) using Richardson’s extrapolation by initializing the relative
metabolite concentrations at the current point (𝒔𝑐(𝑁𝑀)
). The update formula for the
metabolite concentrations (Equation (72)) is derived using the following procedure. The
initial value problem described by Equation (60) can be expressed in matrix form as:
𝑑𝒔𝑐
𝑑𝑡= 𝑺. 𝒗𝑐 = 𝒇(𝒔𝑐) ∀𝑐 ∈ 𝐶 (68)
205
Equation (68) is integrated starting from the initial condition 𝒔(0) = 𝒔𝑐(𝑁𝑀)
where 𝒔𝑐(𝑁𝑀)
is the vector of relative metabolite concentrations when Newton’s method fails (J becomes
singular). We use the implicit Euler’s method to update substrate concentrations 𝒔 upon
taking a time step of ℎ. This is due to the stiffness of the system of equations that precludes
the use of a less costly explicit method. The update formula for the 𝑛𝑡ℎ iteration is:
𝒔𝑐(𝑛+1)
−𝒔𝑐(𝑛)
ℎ= 𝒇(𝒔𝑐
(𝑛+1))
∀𝑐 ∈ 𝐶 (69)
Since 𝒔𝑐(𝑛+1)
is unknown, 𝒇(𝒔𝑐(𝑛+1)
) cannot be evaluated a priori and must be approximated
using Taylor series expansion.
𝒇 (𝒔𝑐(𝑛+1)
) = 𝒇 (𝒔𝑐(𝑛)
) +𝜕𝒇
𝜕𝒔𝑐
(𝒔𝑐(𝑛+1)
− 𝒔𝑐(𝑛)
) ∀𝑐 ∈ 𝐶 (70)
Equation (70) is substituted back in Equation (69) to yield:
𝒔𝑐(𝑛+1)
− 𝒔𝑐(𝑛)
= ℎ𝒇 (𝒔𝑐(𝑛)
) + ℎ𝜕𝒇
𝜕𝒔 (𝒔𝑐
(𝑛+1)− 𝒔𝑐
(𝑛)) ∀𝑐 ∈ 𝐶 (71)
Equation (71) is rearranged to obtain the semi-implicit update formula for 𝒔𝑐 :
𝒔𝑐(𝑛+1)
= 𝒔𝑐(𝑛)
+ (𝑰 − ℎ𝜕𝒇
𝜕𝒔𝑐
)−1
ℎ𝒇 (𝒔𝑐(𝑛)
) ∀𝑐 ∈ 𝐶 (72)
𝜕𝒇
𝜕𝒔𝑐
in Equation (72) is the Jacobian matrix 𝑱 also present in Equation (61) and is calculated
as described earlier. Equation (72) is integrated using the error-controlled integration
algorithm Richardson extrapolation until either the time step ℎ exceeds a maximum time
step of ℎ𝑚𝑎𝑥 or the desired threshold on Equation (68) is reached (‖𝑑𝒔𝑐
𝑑𝑡‖
∞≤ 𝑠𝑡𝑜𝑙). If ℎ
exceeds ℎ𝑚𝑎𝑥, Newton’s method is reinitialized using concentrations at the termination
206
point of the semi-implicit integration procedure (𝒔𝑐(𝐼𝑁𝑇)
) and solved until ‖𝑑𝒔𝑐
𝑑𝑡‖
∞≤ 𝑠𝑡𝑜𝑙 is
achieved.
Algorithmic Implementation of Semi-implicit integration using Richardson’s extrapolation
Begin
Specify and fix 𝒌
Set 𝑠𝑡𝑜𝑙: = 10−6, 𝑖𝑡𝑒𝑟: = 1, ℎ ≔ 2 × 10−6, ℎ𝑚𝑎𝑥 ≔ 1010, 𝑡𝑜𝑙 ≔ 10−4
Initialize 𝑠𝑖,𝑐(𝑖𝑡𝑒𝑟): = 𝑠𝑖,𝑐
(𝑁𝑀), ∀𝑖 ∈ 𝐼 and 𝑐 ∈ 𝐶
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆𝑐:= 𝒆𝑐
(𝑖𝑡𝑒𝑟) into Equation (58)
Compute 𝑱 by solving Equations (61), (65), (66) and (67) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
and 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)
Set 𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1
While (‖𝑑𝒔𝑐
𝑑𝑡 ‖
∞> 𝑠𝑡𝑜𝑙) or ℎ < ℎ𝑚𝑎𝑥
Compute 𝒔𝑐(𝑛)
by substituting 𝒔𝑐(𝑛−1)
:= 𝒔𝑐(𝑖𝑡𝑒𝑟−1)
, 𝒇 (𝒔𝑐(𝑛−1)
) ≔𝑑𝒔𝑐
𝑑𝑡,
𝜕𝒇
𝜕𝒔𝑐
≔ 𝑱, and
ℎ ≔ ℎ into Equation (72)
Set 𝒔𝑐(𝑜𝑛𝑒−𝑠𝑡𝑒𝑝 )
≔ 𝒔𝑐(𝑛)
Compute 𝒔𝑐(𝑛)
by substituting 𝒔𝑐(𝑛−1)
:= 𝒔𝑐(𝑖𝑡𝑒𝑟−1)
, 𝒇 (𝒔𝑐(𝑛−1)
) ≔𝑑𝒔𝑐
𝑑𝑡,
𝜕𝒇
𝜕𝒔𝑐
≔ 𝑱, and
ℎ ≔ℎ
2 into Equation (72).
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑛)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐
(𝑛) and 𝒆𝑐:= 𝒆𝑐
(𝑖𝑡𝑒𝑟) into
Equation (58).
Compute 𝑱 by solving Equations (61), (65), (66) and (67) with 𝒔𝑐: = 𝒔𝑐(𝑛)
and 𝒆𝑐: = 𝒆𝑐(𝑖𝑡𝑒𝑟)
Compute 𝒔𝑐(𝑛)
by substituting 𝒔𝑐(𝑛−1)
:= 𝒔𝑐(𝑛)
, 𝒇 (𝒔𝑐(𝑛−1)
) ≔𝑑𝒔𝑐
𝑑𝑡,
𝜕𝒇
𝜕𝒔𝑐
≔ 𝑱,
and ℎ ≔ℎ
2 into Equation (72).
207
Set 𝒔𝑐(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝 )
≔ 𝒔𝑐(𝑛)
if (‖𝒔𝑐(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝 )
− 𝒔𝑐(𝑜𝑛𝑒−𝑠𝑡𝑒𝑝 )
‖∞
< 𝑡𝑜𝑙)
Set 𝒔𝑐(𝑖𝑡𝑒𝑟)
≔ 𝒔𝑐(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝 )
Compute 𝒆𝑐(𝑖𝑡𝑒𝑟)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐
(𝑖𝑡𝑒𝑟) and 𝒆:= 𝒆𝑐
(𝑖𝑡𝑒𝑟) into
Equation (58).
Compute 𝑱 by solving Equations (61), (65), (66) and (67) with 𝒔𝑐: = 𝒔𝑐(𝑖𝑡𝑒𝑟)
and 𝒆:= 𝒆𝑐(𝑖𝑡𝑒𝑟)
Set ℎ ≔ℎ×√𝑡𝑜𝑙
√‖𝒔(𝑡𝑤𝑜−𝑠𝑡𝑒𝑝)−𝒔(𝑜𝑛𝑒−𝑠𝑡𝑒𝑝)‖∞
Set 𝑖𝑡𝑒𝑟 ≔ 𝑖𝑡𝑒𝑟 + 1
else
Set ℎ ≔ℎ
2
return 𝒔𝑐(𝐼𝑁𝑇)
≔ 𝒔𝑐(𝑖𝑡𝑒𝑟)
and 𝒆𝑐(𝐼𝑁𝑇)
≔ 𝒆𝑐(𝑖𝑡𝑒𝑟)
end
For the large-scale kinetic model (k-ecoli307) parameterized in this study (see Results
section in the main manuscript), the average computation time required to evaluate steady-
state fluxes in mutants by FPI, Newton’s method, and semi-implicit integration was 10
seconds, 4 seconds, and 37 seconds, respectively. In contrast, steady-state flux evaluation
using numerical integration alone required over 6 minutes to achieve the same mass
imbalance of 10−3 mol%. CPU times are reported are reported for an Intel-i7 (4-core
processor, 2.6GHz, 12GB RAM) computer using a single core implementation.
208
C.4.4. Integration of FPI, Newton’s method, and semi-implicit integration into a
single pipeline
The three separate methods of updating metabolite concentrations (i) FPI, (ii) Newton’s
method and (iii) semi-implicit integration and switching criteria are integrated into the
SSF-Evaluator procedure. SSF-Evaluator initially solves for steady-state concentrations
using FPI and switches to Newton’s method when the change in metabolite concentrations
between successive iterations falls below a pre-specified threshold of 10−4. Newton’s
method fails when the Jacobian 𝑱 becomes singular, which prompts the switch to semi-
implicit integration using Richardson’s extrapolation. The following summarizes in detail
the algorithmic steps involved:
Algorithmic Implementation of Steady-State Flux Estimator (SSF-Evaluator)
Begin
Specify and fix 𝒌
Specify 𝑚𝑢𝑡𝑎𝑛𝑡 𝑐
Set 𝑠𝑡𝑜𝑙: = 10−6
Initialize 𝑠𝑖,𝑐(𝑖𝑛𝑖𝑡):= 1, ∀𝑖 ∈ 𝐼
Compute 𝒆𝑐(𝑖𝑛𝑖𝑡)
by solving Equations (33) and (54)with 𝒔𝑐: = 𝒔𝑐(𝑖𝑛𝑖𝑡)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔𝑐
(𝑖𝑛𝑖𝑡) and 𝒆𝑐:= 𝒆𝑐
(𝑖𝑛𝑖𝑡) into Equation (58).
While (‖𝑑𝒔𝑐
𝑑𝑡‖
∞> 𝑠𝑡𝑜𝑙)
Compute 𝒔𝑐(𝐹𝑃𝐼)
by using the FPI algorithm using 𝒔𝑐(0)
= 𝒔𝑐(𝑖𝑛𝑖𝑡)
Compute 𝒆𝑐(𝐹𝑃𝐼)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝐹𝑃𝐼)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔(𝐹𝑃𝐼) and 𝒆𝑐:= 𝒆(𝐹𝑃𝐼) into Equation (58).
if (‖𝑑𝒔𝑐
𝑑𝑡‖
∞≤ 𝑠𝑡𝑜𝑙)
Set 𝒔𝑐(𝑆𝑆)
≔ 𝒔𝑐(𝐹𝑃𝐼)
209
else
Set 𝒔𝑐(𝐼𝑁𝑇)
≔ 𝒔𝑐(𝐹𝑃𝐼)
while (‖𝑑𝒔𝑐
𝑑𝑡‖
∞> 𝑠𝑡𝑜𝑙)
Compute 𝒔𝑐(𝑁𝑀)
by solving the Newton’s method
using 𝒔𝑐(0)
= 𝒔𝑐(𝐼𝑁𝑇)
Compute 𝒆𝑐(𝑁𝑀)
by solving Equations (33) and (54)
with 𝒔𝑐: = 𝒔(𝑁𝑀)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔(𝑁𝑀) and 𝒆𝑐: = 𝒆(𝑁𝑀) into
Equation (58).
if (‖𝑑𝒔𝑐
𝑑𝑡‖
∞> 𝑠𝑡𝑜𝑙)
Compute 𝒔𝑐(𝐼𝑁𝑇)
using Semi-implicit integration using
𝒔𝑐(0)
= 𝒔𝑐(𝑁𝑀)
Compute 𝒆𝑐(𝐼𝑁𝑇)
by solving Equations (33) and (54) with
𝒔𝑐: = 𝒔(𝐼𝑁𝑇)
Compute 𝑑𝒔𝑐
𝑑𝑡 by substituting 𝒔𝑐: = 𝒔(𝐼𝑁𝑇) and
𝒆𝑐:= 𝒆(𝐼𝑁𝑇) into Equation (58).
if (‖𝑑𝒔𝑐
𝑑𝑡‖
∞≤ 𝑠𝑡𝑜𝑙)
Set 𝒔𝑐(𝑆𝑆)
≔ 𝒔𝑐(𝐼𝑁𝑇)
else
Set 𝒔𝑐(𝑆𝑆)
≔ 𝒔𝑐(𝑁𝑀)
Compute 𝒆𝑐(𝑆𝑆)
by solving Equations (33) and (54) with 𝒔𝑐: = 𝒔𝑐(𝑆𝑆)
Compute steady-state fluxes 𝑽𝑐(𝑆𝑆)
by solving Equations (30), (31), and (35) with 𝒔𝑐: =
𝒔𝑐(𝑆𝑆)
and 𝒆𝑐: = 𝒆𝑐(𝑆𝑆)
return 𝑽𝑐(𝑆𝑆)
, 𝒔𝑐(𝑆𝑆)
and 𝒆𝑐(𝑆𝑆)
end
210
Overall, SSF-Evaluator provides an integrated procedure for calculating steady-state
relative metabolite concentrations and enzyme fractions across all mutant networks given
a set of kinetic parameters bypassing integration in almost all cases. Steady-state
elementary fluxes are then computed by substituting the known 𝒌, 𝒔(𝑆𝑆) and 𝒆(𝑆𝑆) into
Equation (30). Elementary fluxes are then related to the net flux through the reaction using
Equations (31) and (35).
It is important to note that SSF-Evaluator is parallelizable across all mutant networks as
reactions fluxes in any particular mutant are independent of metabolite concentrations and
enzyme abundances in any other mutant. Based on this, K-SOLVE and SSF-Evaluator
generate steady-state reaction fluxes 𝑽(𝑆𝑆), relative metabolite concentrations 𝒔(𝑆𝑆), and
enzyme fractions 𝒆(𝑆𝑆) across all mutants given enzyme fractions 𝒆1 and reverse
elementary fluxes 𝒗𝑟 for the WT.
C.5. NLP problem K-FIT
K-SOLVE allows for the calculation of the kinetic parameters as a function of the enzyme
fractions and reverse elementary fluxes in WT. The SSF-Evaluator procedure, in turn,
allows for the calculation of the relative metabolite concentrations and enzyme fractions
using as input the kinetic parameters estimated by K-SOLVE. This implies that metabolic
fluxes 𝑽𝑐 in the mutant networks can be expressed as implicit functions of 𝒆1 and 𝒗𝑟.
Executing procedure K-SOLVE and SSF-Evaluator allows for the calculation of the value
of these implicit functions 𝑽𝑐 = 𝑽𝑐(𝒆1, 𝒗𝑟).
211
This means that NLP problem P1 can be recast as the following NLP problem with only
linear constraints described below. No equality constraints describing conservation of mass
across all metabolites and enzyme complexes in the WT need to be explicitly imposed
within K-FIT as they are implicitly enforced by K-SOLVE, which limits kinetic parameter
values to only those that simultaneously satisfy conservation of mass (for both enzymes
and metabolites) and concentration scaling with respect to WT. By propagating the
calculated 𝒌, SSF-Evaluator identifies fluxes 𝑽𝑐 across all mutants that automatically
satisfy conservation of mass constraints. The objective function 𝜙 as defined in K-FIT
below includes only the sum of squared errors for only steady-state fluxes in the mutant
networks. Nevertheless, metabolite concentration measurements for the mutant networks,
whenever available, can be supplemented in the objective function in a similar manner.
min𝒆1,𝒗𝑟
𝜙(𝒆1, 𝒗𝑟) = ∑ ∑ (𝑉𝑗𝑐(𝒆1, 𝒗𝑟) − 𝑣𝑗𝑐
(𝑚𝑒𝑎𝑠)
𝜎𝑗𝑐)
2
𝑗∈𝐽𝑐𝑚𝑒𝑎𝑠
𝐶
𝑐=2
Subject to:
𝑣𝑙,1(𝑛𝑒𝑡)
= 𝑉𝑗,1
∀𝑙 ∈ 𝐿𝑗𝑐𝑎𝑡
∀𝑗 ∈ 𝐽
(48)
𝑣𝑙,1(𝑛𝑒𝑡)
= 0
∀𝑙 ∈ 𝐿𝑗𝑟𝑒𝑔
∀𝑗 ∈ 𝐽
(49)
∑(𝑅𝑗𝑙𝑒𝑙,1)
𝑛𝐿
𝑙=1
= 1 ∀𝑗 ∈ 𝐽 (40)
212
0 ≤ 𝑒𝑙,1 ≤ 1 ∀𝑙 ∈ 𝐿 (44)
𝑣𝑟,𝑙 ≥ 0 ∀𝑙 ∈ 𝐿 (52)
𝑣𝑙,1(𝑛𝑒𝑡)
+ 𝑣𝑟,𝑙 ≥ 0 ∀𝑙 ∈ 𝐿 (53)
Since all constraints in formulation K-FIT are linear, K-FIT can efficiently be solved using
a gradient-based method that requires as inputs the first- and second-order gradients of the
objective function with respect to the variables 𝒆1 and 𝒗𝑟 to construct the update formula
and check for convergence. The expressions that relate the approximate gradient and
Hessian to the sensitivity of the predicted steady-state fluxes can be derived by constructing
a quadratic approximation for the objective function 𝜙. The following procedure describes
the construction of the quadratic approximation of 𝜙 used to update 𝒆1 and 𝒗𝑟 at each
iteration of K-FIT.
C.6. K-UPDATE procedure that checks for convergence and updates kinetic
parameters using the approximate gradient and Hessian of 𝝓
The variables 𝒆1 and 𝒗𝑟 are first assembled for convenience into a single [2𝑛𝐿 × 1] vector
𝒙
𝒙 = [(𝒆1)𝑻|(𝒗𝒓 )
𝑻]𝑻 (73)
213
The objective function 𝝓 and flux through reaction 𝑗 in mutant 𝑐 are recast as implicit
functions of 𝒙 as 𝜙(𝒙) and 𝑉𝑗,𝑐(𝒙). The objective function is expressed in vector form as
𝜙(𝒙) = (𝑽(𝒙) − 𝑽(𝒎𝒆𝒂𝒔))𝑇𝑾−𝟏(𝑽(𝒙) − 𝑽(𝒎𝒆𝒂𝒔)) (74)
𝑽(𝒙) is the [𝑛𝑚𝑒𝑎𝑠 × 1] vector of the calculated steady-state fluxes in mutants.
𝑛𝑚𝑒𝑎𝑠 = ∑(cardinality 𝑜𝑓 𝐽𝑐(𝑚𝑒𝑎𝑠))
𝑐
𝑽(𝒎𝒆𝒂𝒔) is the [𝑛𝑚𝑒𝑎𝑠 × 1] vector of measured fluxes.
𝑾 is the [𝑛𝑚𝑒𝑎𝑠 × 𝑛𝑚𝑒𝑎𝑠] diagonal matrix storing the variance of the flux measurements,
thus
𝑊𝑖𝑖 = 𝜎𝑖−2 ∀𝑖 = {1,2, … , 𝑛𝑛𝑒𝑎𝑠}
Upon defining the residual 𝒓(𝒙) = (𝑽(𝒙) − 𝑽(𝒎𝒆𝒂𝒔)), the objective function is expressed
more compactly as
𝜙(𝒙) = (𝒓(𝒙))𝑇𝑾−𝟏𝒓(𝒙) (75)
For a small perturbation ∆𝒙 to the parameter vector 𝒙, the objective function at 𝒙 + ∆𝒙
becomes equal to
𝜙(𝒙 + ∆𝒙) = (𝒓(𝒙 + ∆𝒙))𝑇𝑾−𝟏𝒓(𝒙 + ∆𝒙) (76)
Equation (73) is identical to the least squares representation of isotope tracer-based flux
elucidation using 13C-MFA (Antoniewicz et al., 2006). A popular and successful solution
214
strategy involves constructing a quadratic approximation of the objective function
described by Equation (76). Using Taylor series expansion, 𝒓(𝒙 + ∆𝒙) linearized about 𝒙
as described by Antoniewicz et al. (Antoniewicz et al., 2006) as:
𝑟(𝒙 + ∆𝒙) = 𝒓(𝒙) +𝜕𝒓
𝜕𝒙∆𝒙
(77)
𝜕𝒓
𝜕𝒙 is the [𝑛𝑚𝑒𝑎𝑠 × 2𝑛𝐿] matrix representing the local sensitivity of 𝒓(𝒙) with respect to 𝒙.
𝜙(𝒙 + ∆𝒙) is computed by substituting Equation (77) in Equation (76) yielding:
𝜙(𝒙 + ∆𝒙) = (𝒓(𝒙 + ∆𝒙))𝑇𝑾−𝟏𝒓(𝒙 + ∆𝒙)
= (𝒓(𝒙) +𝜕𝒓
𝜕𝒙∗ ∆𝒙)
𝑇
𝑾−𝟏 (𝒓(𝒙) +𝜕𝒓
𝜕𝒙∗ ∆𝒙)
= (𝒓(𝒙))𝑇𝑾−𝟏𝒓(𝒙) + 𝟐(∆𝒙)𝑻 ∗ (
𝜕𝒓
𝜕𝒙)𝑻
𝑾−𝟏𝒓(𝒙) + (∆𝒙)𝑻 (𝜕𝒓
𝜕𝒙)𝑻
𝑾−𝟏𝜕𝒓
𝜕𝒙∗ ∆𝒙
(78)
The approximate gradient 𝑮 and the approximate Hessian 𝑯 are defined using Equation
(79).
𝑮 = (𝜕𝒓
𝜕𝒙)𝑻
𝑾−𝟏𝒓(𝒙)
𝑯 = (𝜕𝒓
𝜕𝒙)𝑻
𝑾−𝟏𝜕𝒓
𝜕𝒙
(79)
Upon replacing the relevant terms in Equation (78) using the definitions of the objective
function 𝜙(𝒙) from Equation (75) and the approximate Gradient and Hessian from
Equation (79), Equation (78) is simplified as
215
𝜙(𝒙 + ∆𝒙) = 𝜙(𝒙) + 2∆𝒙𝑇𝑮 + ∆𝒙𝑇𝑯∆𝒙 (80)
Equation (80) is the local quadratic approximation (Antoniewicz et al., 2006) of the
objective function 𝜙(𝒙). In the above expression, 𝑮 = (𝜕𝒓
𝜕𝒙)𝑻
𝑾−𝟏𝒓(𝒙) and 𝑯 =
(𝜕𝒓
𝜕𝒙)𝑻
𝑾−𝟏 𝜕𝒓
𝜕𝒙 are the approximate gradient and Hessian, respectively. Upon subtracting
Equation (75) from Equation (80) we obtain:
∆𝜙 = 𝜙(𝒙 + ∆𝒙) − 𝜙(𝒙) = 2∆𝒙𝑇𝑮 + ∆𝒙𝑇𝑯∆𝒙 (81)
A stationary point (i.e., local minimum) for the (approximated) objective function is
reached when 𝑑(∆𝜙)
𝑑(∆𝒙)= 0, which yields:
∆𝒙 = −𝑯−1𝑮 (82)
Equation (82) computes the unconstrained search direction at each iteration. Note that 𝜕𝒓
𝜕𝒙
is needed in to update 𝒙. Because the residual vector 𝒓(𝒙) only contains steady-state fluxes,
𝜕𝒓
𝜕𝒙 is assembled using the sensitivity of fluxes to 𝒙 based on the chain rule:
𝜕𝒗𝑐
𝜕𝒙=
𝜕𝒗𝑐
𝜕𝒌 𝜕𝒌
𝜕𝒙
(83)
𝜕𝒗
𝜕𝒌 is computed by differentiating Equation (30) with respect to 𝒌 to yield:
216
𝜕𝑣𝑝𝑐
𝜕𝒌= 𝑘𝑝
(
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0 )
𝜕
𝜕𝒌
(
∏ 𝑒𝑙′,𝑐
𝑙𝐸𝑙𝑝<0 )
+
(
∏ 𝑒𝑙′,𝑐
𝑙𝐸𝑙𝑝<0 )
𝜕
𝜕𝒌
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0 )
)
+
(
∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0 )
(
∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0 )
𝜕𝑘𝑝
𝜕𝒌
∀𝑝 ∈ 𝑃
∀𝑐 ∈ 𝐶
∀𝑖 ∈ 𝐼
(84)
In Equation (84), both 𝜕𝒆𝑐
𝜕𝒌 and
𝜕𝒔𝑐
𝜕𝒌 are unknown. They can be inferred by solving the system
of linear algebraic equations formed by differentiating Equations (33), (54), and (56),
respectively, with respect to 𝒌 as follows:
∑(𝑅𝑗𝑙
𝜕𝑒𝑙,𝑐
𝜕𝒌 )
𝑛𝐿
𝑙=1
= 0 ∀𝑐 ∈ 𝐶
∀𝑗 ∈ 𝐽
(85)
∑ 𝐸𝑙𝑝
(
𝑘𝑝
(
(∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)𝜕
𝜕𝒌(∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
) + (∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)𝜕
𝜕𝒌(∏ 𝑠
𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)
)
𝑃
𝑝=1
+ (∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)(∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)𝜕𝑘𝑝
𝜕𝒌
)
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
∀𝑖 ∈ 𝐼
(86)
∑ 𝑆𝑖𝑝
(
𝑘𝑝
(
(∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)𝜕
𝜕𝒌(∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
) + (∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)𝜕
𝜕𝒌(∏ 𝑠
𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)
)
𝑃
𝑝=1
+ (∏ 𝑠𝑖,𝑐
−𝑆𝑖𝑝
𝑖𝑆𝑖𝑝≤0
)(∏ 𝑒𝑙,𝑐
𝑙𝐸𝑙𝑝<0
)𝜕𝑘𝑝
𝜕𝒌
)
= 0
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
∀𝑖 ∈ 𝐼
(87)
217
The partial derivatives of the product operators 𝜕
𝜕𝒌(∏ 𝑒𝑙,𝑐
𝑙
𝐸𝑙𝑝<0) and 𝜕
𝜕𝒌(∏ 𝑠
𝑖,𝑐
−𝑆𝑖𝑝𝑖
𝑆𝑖𝑝≤0) can be
expressed in summation form as shown in Equations (63) and (64). Equations (85), (86),
and (87) therefore form a [(𝑛𝐿 + 𝑛𝑀) × (𝑛𝐿 + 𝑛𝑀)] system of linear algebraic equations
that can be solved to obtain 𝜕𝒆𝑐
𝜕𝒌 and
𝜕𝒔𝑐
𝜕𝒌 when 𝒌, 𝒆𝑐, and 𝒔𝑐 are specified.
𝜕𝒗𝑐
𝜕𝒌 is calculated
by substituting 𝜕𝒆𝑐
𝜕𝒌 and
𝜕𝒔𝑐
𝜕𝒌 into Equation (84). Because 𝒙 contains both WT enzyme
fractions and elementary fluxes, 𝜕𝒌
𝜕𝒙 is calculated by differentiating by parts Equation (50)
with respect to 𝒙 to yield:
𝜕
𝜕𝒙(𝑣𝑟,𝑙 + 𝑣𝑙,1
(𝑛𝑒𝑡))
=𝜕𝑘(2𝑙−1)
𝜕𝒙
(
∏ 𝑒𝑙,1
𝑙𝐸𝑙𝑝<0 )
+ 𝑘(2𝑙−1)
𝜕
𝜕𝒙
(
∏ 𝑒𝑙,1
−𝐸𝑙𝑝
𝑙𝐸𝑙𝑝<0 )
∀𝑝 ∈ 𝑃
∀𝑙 ∈ 𝐿
(88)
𝜕𝑣𝑟,𝑙
𝜕𝒙=
𝜕𝑘(2𝑙)
𝜕𝒙
(
∏ 𝑒𝑙,1
𝑙𝐸𝑙𝑝<0 )
+ 𝑘(2𝑙)
𝜕
𝜕𝒙
(
∏ 𝑒𝑙,1
−𝐸𝑙𝑝
𝑙𝐸𝑙𝑝<0 )
Solution to the [𝑛𝑃 × 𝑛𝑃] square system of linear algebraic equations formed by Equation
(88) yields 𝜕𝒌
𝜕𝒙. Flux sensitivities can be obtained by substituting
𝜕𝒌
𝜕𝒙 in Equation (84).
Having computed the sensitivity of elementary fluxes, the sensitivity of all net reaction
fluxes is calculated by substituting 𝜕𝒗
𝜕𝒙 in Equations (89) and (90), which are obtained by
differentiating Equations (31) and (35) with respect to 𝒙 as shown below.
218
𝜕𝑣𝑙,𝑐(𝑛𝑒𝑡)
𝜕𝒙=
𝜕𝑣(2𝑙−1),𝑐
𝜕𝒙−
𝜕𝑣2𝑙,𝑐
𝜕𝒙
∀𝑙 ∈ 𝐿
∀𝑐 ∈ 𝐶
(89)
𝜕𝑉𝑗,𝑐
𝜕𝒙= ∑(𝑁𝑗𝑙
𝜕𝑣𝑙,𝑐(𝑛𝑒𝑡)
𝜕𝒙)
𝑛𝐿
𝑙=1
∀𝑗 ∈ 𝐽
∀𝑐 ∈ 𝐶
(90)
The sequence of steps to be followed to compute the approximate gradients of the objective
function and update the variables 𝒙 in every iteration is described by the algorithmic
procedure for K-UPDATE.
Algorithmic procedure K-UPDATE
begin
Specify and fix 𝒌, 𝑒, 𝒔, and 𝑽 computed by K-SOLVE and SSF-Evaluator
Specify measured fluxes 𝑽(𝑚𝑒𝑎𝑠) and the weighting matrix 𝑾
Specify list of 𝑚𝑢𝑡𝑎𝑛𝑡𝑠
Compute 𝜕𝒌
𝜕𝒙 by solving the [𝑛𝑃 × 𝑛𝑃] system of linear Equation (88) using 𝒆1 ≔ 𝒆1
(𝑆𝑆)
and 𝒌 ≔ 𝒌
for all mutants:
Calculate sensitivities 𝜕𝒔𝑐
(𝑆𝑆)
𝜕𝒌 and
𝜕𝒆𝑐(𝑆𝑆)
𝜕𝒌 by solving the [(𝑛𝐿 + 𝑛𝑀) × (𝑛𝐿 + 𝑛𝑀)]
system of linear Equations (85), (86), and (87) using 𝒆𝑐 ≔ 𝒆𝑐(𝑆𝑆)
and 𝒔𝑐 ≔ 𝒔𝑐(𝑆𝑆)
Calculate 𝜕𝒗𝑐
(𝑆𝑆)
𝜕𝒌 by substituting
𝜕𝒔𝑐(𝑆𝑆)
𝜕𝒌 and
𝜕𝒆𝑐(𝑆𝑆)
𝜕𝒌 in Equation (84).
Calculate 𝜕𝒗𝑐
(𝑆𝑆)
𝜕𝒙 by substituting
𝜕𝒗𝑐(𝑆𝑆)
𝜕𝒌 and
𝜕𝒌
𝜕𝒙 into Equation (83).
Calculate 𝜕𝒗𝑐
(𝑛𝑒𝑡)
𝜕𝒙 by substituting
𝜕𝒗𝑐(𝑆𝑆)
𝜕𝒙 into Equation (89).
Calculate 𝜕𝑽𝑐
𝜕𝒙 by substituting
𝜕𝒗𝑐(𝑛𝑒𝑡)
𝜕𝒙 into Equation (90).
219
Assemble the residual vector 𝒓(𝒙) from 𝑽 and 𝑽(𝑚𝑒𝑎𝑠)
Assemble the sensitivity matrix 𝜕𝒓
𝜕𝒙 from
𝜕𝑽
𝜕𝒙
Compute the objective function 𝜙 by substituting 𝒓(𝒙) and 𝑾 into Equation (75).
Compute the approximate gradient 𝑮 and the approximate Hessian 𝐻 by substituting
𝒓(𝒙), 𝜕𝒓
𝜕𝒙, and 𝑾 into Equation (79)
return 𝜙, 𝑮, and 𝑯
end
C.7. Algorithmic description of K-FIT
The procedures K-SOLVE, SSF-Evaluator, and K-UPDATE are integrated into the
algorithm K-FIT as described below. Briefly, WT enzyme fractions 𝒆1 and reverse
elementary fluxes 𝒗𝑟 satisfying (in)equalities in Equations (40), (44), (52), and (53) are
randomly initialized. For convenience, we combine the operations of K-SOLVE and SSF-
Evaluator into a single algorithm FLUXSOLVE which predict steady-state fluxes in mutant
networks given 𝒆1 and 𝒗𝑟. In the first step of FLUXSOLVE, kinetic parameters 𝒌 anchored
to WT steady-state fluxes 𝑽1 are computed from 𝒆1 and 𝒗𝑟 using procedure K-SOLVE.
The kinetic parameters are then used to evaluate steady-state fluxes in mutant networks
using procedure SSF-Evaluator. Having computed steady-state fluxes in mutants 𝑽,
relative metabolite concentrations 𝒔, and enzyme fractions 𝒆, the objective function 𝜙 and
its approximate gradient 𝑮 and Hessian 𝑯 are computed using procedure K-UPDATE. 𝑮
and 𝑯 are used to check for convergence and update 𝒆1 and 𝒗𝑟 if optimality is not achieved.
220
The algorithmic description of FLUXSOLVE is provided below:
Algorithmic description of FLUXSOLVE
begin
Specify and fix WT enzyme fractions 𝒆1 and reverse elementary fluxes 𝒗𝑟.
Specify and fix the WT steady-state flux distribution 𝑽1.
Specify the list of 𝑚𝑢𝑡𝑎𝑛𝑡𝑠
Compute anchored kinetic parameters 𝒌 using Procedure K-SOLVE with the specified 𝒆1,
𝒗𝑟, and 𝑽1.
for all mutants
Compute steady-state fluxes 𝑽𝑐(𝑆𝑆)
, relative metabolite concentrations 𝒔𝑐(𝑆𝑆)
, and
enzyme fractions 𝒆𝑐(𝑆𝑆)
in 𝑚𝑢𝑡𝑎𝑛𝑡 𝑐 ∈ 𝐶 using SSF-Evaluator with kinetic
parameters 𝒌
Set 𝑽𝑐 ≔ 𝑽𝑐(𝑆𝑆)
, 𝒔𝑐 ≔ 𝒔𝑐(𝑆𝑆)
, and 𝒆𝑐 ≔ 𝒆𝑐(𝑆𝑆)
return 𝑽, 𝒔, and 𝒆
end
The overall workflow for the K-FIT algorithm combining procedures FLUXSOLVE and
K-UPDATE is described below and is also pictorially shown in Figure 4.4:
Overall algorithmic procedure K-FIT
begin
Specify and fix WT flux distribution 𝑽1, measured fluxes 𝑽(𝑚𝑒𝑎𝑠), variance 𝑾,
set of mutants, 𝑥𝑡𝑜𝑙 and 𝑔𝑡𝑜𝑙
Randomly initialize 𝒙 satisfying constraints in Equations (40), (44), (52),
and (53).
Set 𝑠𝑡𝑜𝑙 ≔ 10−6
Using FLUXSOLVE and inputs 𝒙 evaluate initial steady-state fluxes 𝑽, relative
metabolite concentrations 𝒔, and enzyme fractions 𝒆.
221
Evaluate the initial value of the objective function 𝜙(𝒙) and gradients 𝑮 and 𝑯
using
K-UPDATE.
Set 𝑑𝑜𝑛𝑒 ≔ 𝑓𝑎𝑙𝑠𝑒
Set 𝒙𝒃𝒆𝒔𝒕 ≔ 𝒙, 𝜙𝑏𝑒𝑠𝑡 ≔ 𝜙(𝒙)
while (not 𝑑𝑜𝑛𝑒)
Compute ∆𝒙 using Equation (82)
if (‖∆𝒙‖∞ ≤ 𝑥𝑡𝑜𝑙) or (‖𝑮‖∞ ≤ 𝑔𝑡𝑜𝑙)
Set 𝑑𝑜𝑛𝑒 ≔ 𝑡𝑟𝑢𝑒
else
Update 𝒙 ≔ 𝒙𝒃𝒆𝒔𝒕 + ∆𝒙
Using FLUXSOLVE and inputs 𝒙 evaluate steady-state fluxes 𝑽,
relative metabolite concentrations 𝒔, and enzyme fractions 𝒆.
Evaluate the initial value of the objective function 𝜙(𝒙) and
gradients 𝑮 and 𝑯 using K-UPDATE.
if 𝜙(𝒙) < 𝜙𝑏𝑒𝑠𝑡
Update 𝒙𝒃𝒆𝒔𝒕 ≔ 𝒙, 𝜙𝑏𝑒𝑠𝑡 ≔ 𝜙(𝒙)
return 𝒙𝑏𝑒𝑠𝑡, 𝜙𝑏𝑒𝑠𝑡
end
References
Abdel-Hamid, A.M., Attwood, M.M., and Guest, J.R. (2001). Pyruvate oxidase contributes to the
aerobic growth efficiency of Escherichia coli. Microbiology 147, 1483-1498.
Abernathy, M.H., Yu, J., Ma, F., Liberton, M., Ungerer, J., Hollinshead, W.D., Gopalakrishnan,
S., He, L., Maranas, C.D., Pakrasi, H.B., et al. (2017). Deciphering cyanobacterial
phenotypes for fast photoautotrophic growth via isotopically nonstationary metabolic flux
analysis. Biotechnology for Biofuels 10, 273.
Ahn, W.S., and Antoniewicz, M.R. (2011). Metabolic flux analysis of CHO cells at growth and
non-growth phases using isotopic tracers and mass spectrometry. Metabolic engineering
13, 598-609.
Alagesan, S., Gaudana, S.B., Sinha, A., and Wangikar, P.P. (2013). Metabolic flux analysis of
Cyanothece sp. ATCC 51142 under mixotrophic conditions. Photosynth Res 118, 191-
198.
Anderson, D.H. (1983). Compartmental Modeling and Tracer Kinetics. (Springer-Verlag Berlin
Heidelberg).
Anderson, L.E., and Carol, A.A. (2004). Enzyme co-localization with rubisco in pea leaf
chloroplasts. Photosynth Res 82, 49-58.
Anderson, L.E., Gatla, N., and Carol, A.A. (2005). Enzyme co-localization in pea leaf
chloroplasts: glyceraldehyde-3-P dehydrogenase, triose-P isomerase, aldolase and
sedoheptulose bisphosphatase. Photosynth Res 83, 317-328.
Antoniewicz, M.R., Kelleher, J.K., and Stephanopoulos, G. (2006). Determination of confidence
intervals of metabolic fluxes estimated from stable isotope measurements. Metabolic
engineering 8, 324-337.
Antoniewicz, M.R., Kelleher, J.K., and Stephanopoulos, G. (2007). Elementary metabolite units
(EMU): a novel framework for modeling isotopic distributions. Metabolic engineering 9,
68-86.
Atsumi, S., Higashide, W., and Liao, J.C. (2009). Direct photosynthetic recycling of carbon
dioxide to isobutyraldehyde. Nat Biotechnol 27, 1177-1180.
Baba, T., Ara, T., Hasegawa, M., Takai, Y., Okumura, Y., Baba, M., Datsenko, K.A., Tomita, M.,
Wanner, B.L., and Mori, H. (2006). Construction of Escherichia coli K-12 in-frame,
223
single-gene knockout mutants: the Keio collection. Molecular systems biology 2, 2006
0008.
Banga, J.R., and Balsa-Canto, E. (2008). Parameter estimation and optimal experimental design.
Essays Biochem 45, 195-209.
Bonarius, H.P., Timmerarends, B., de Gooijer, C.D., and Tramper, J. (1998). Metabolite-
balancing techniques vs. 13C tracer experiments to determine metabolic fluxes in
hybridoma cells. Biotechnol Bioeng 58, 258-262.
Bricker, T.M., Zhang, S., Laborde, S.M., Mayer, P.R., 3rd, Frankel, L.K., and Moroney, J.V.
(2004). The malic enzyme is required for optimal photoautotrophic growth of
Synechocystis sp. strain PCC 6803 under continuous light but not under a diurnal light
regimen. Journal of bacteriology 186, 8144-8148.
Briggs, G.E., and Haldane, J.B. (1925). A Note on the Kinetics of Enzyme Action. Biochem J 19,
338-339.
Burgard, A.P., Nikolaev, E.V., Schilling, C.H., and Maranas, C.D. (2004). Flux coupling analysis
of genome-scale metabolic network reconstructions. Genome Res 14, 301-312.
Burgard, A.P., Pharkya, P., and Maranas, C.D. (2003). Optknock: a bilevel programming
framework for identifying gene knockout strategies for microbial strain optimization.
Biotechnol Bioeng 84, 647-657.
Byrd, R.H., Gilbert, J.C., and Nocedal, J. (2000). A trust region method based on interior point
techniques for nonlinear programming. Math. Program. 89, 149-185.
Byrd, R.H., Hribar, M.E., and Nocedal, J. (1999). An Interior Point Algorithm for Large-Scale
Nonlinear Programming. SIAM J. on Optimization 9, 877-900.
Caspi, R., Altman, T., Billington, R., Dreher, K., Foerster, H., Fulcher, C.A., Holland, T.A.,
Keseler, I.M., Kothari, A., Kubo, A., et al. (2014). The MetaCyc database of metabolic
pathways and enzymes and the BioCyc collection of Pathway/Genome Databases.
Nucleic Acids Res 42, D459-471.
Chae, T.U., Choi, S.Y., Kim, J.W., Ko, Y.S., and Lee, S.Y. (2017). Recent advances in systems
metabolic engineering tools and strategies. Current opinion in biotechnology 47, 67-82.
Chang, Y., Suthers, P.F., and Maranas, C.D. (2008). Identification of optimal measurement sets
for complete flux elucidation in metabolic flux analysis experiments. Biotechnol Bioeng
100, 1039-1049.
224
Chassagnole, C., Noisommit-Rizzi, N., Schmid, J.W., Mauch, K., and Reuss, M. (2002).
Dynamic modeling of the central carbon metabolism of Escherichia coli. Biotechnol
Bioeng 79, 53-73.
Chen, W.L., Chen, D.Z., and Taylor, K.T. (2013). Automatic reaction mapping and reaction
center detection. Wiley Interdisciplinary Reviews: Computational Molecular Science 3,
560-593.
Chen, X., Alonso, A.P., Allen, D.K., Reed, J.L., and Shachar-Hill, Y. (2011). Synergy between
(13)C-metabolic flux analysis and flux balance analysis for understanding metabolic
adaptation to anaerobiosis in E. coli. Metabolic engineering 13, 38-48.
Chen, X., Schreiber, K., Appel, J., Makowka, A., Fähnrich, B., Roettger, M., Hajirezaei, M.R.,
Sönnichsen, F.D., Schönheit, P., Martin, W.F., et al. (2016). The Entner–Doudoroff
pathway is an overlooked glycolytic route in cyanobacteria and plants. Proceedings of the
National Academy of Sciences 113, 5441-5446.
Cheng, J.K., and Alper, H.S. (2014). The genome editing toolbox: a spectrum of approaches for
targeted modification. Current opinion in biotechnology 30, 87-94.
Cho, S., Shin, J., and Cho, B.K. (2018). Applications of CRISPR/Cas System to Bacterial
Metabolic Engineering. Int J Mol Sci 19.
Choi, J., and Antoniewicz, M.R. (2019). Tandem Mass Spectrometry for (13)C Metabolic Flux
Analysis: Methods and Algorithms Based on EMU Framework. Front Microbiol 10, 31.
Chowdhury, A., Khodayari, A., and Maranas, C.D. (2015a). Improving prediction fidelity of
cellular metabolism with kinetic descriptions. Current opinion in biotechnology 36, 57-
64.
Chowdhury, A., Zomorrodi, A.R., and Maranas, C.D. (2014). k-OptForce: integrating kinetics
with flux balance analysis for strain design. PLoS Comput Biol 10, e1003487.
Chowdhury, A., Zomorrodi, A.R., and Maranas, C.D. (2015b). Bilevel optimization techniques in
computational strain design. Computers & Chemical Engineering 72, 363-372.
Clasquin, M.F., Melamud, E., Singer, A., Gooding, J.R., Xu, X., Dong, A., Cui, H., Campagna,
S.R., Savchenko, A., Yakunin, A.F., et al. (2011). Riboneogenesis in yeast. Cell 145,
969-980.
Cleland, W.W. (1963). The kinetics of enzyme-catalyzed reactions with two or more substrates or
products: I. Nomenclature and rate equations. Biochimica et Biophysica Acta (BBA) -
Specialized Section on Enzymological Subjects 67, 104-137.
225
Copeland, W.B., Bartley, B.A., Chandran, D., Galdzicki, M., Kim, K.H., Sleight, S.C., Maranas,
C.D., and Sauro, H.M. (2012). Computational tools for metabolic engineering. Metabolic
engineering 14, 270-280.
Costa, R.S., Verissimo, A., and Vinga, S. (2014). KiMoSys: a web-based repository of
experimental data for KInetic MOdels of biological SYStems. BMC systems biology 8,
85.
Crown, S.B., and Antoniewicz, M.R. (2012). Selection of tracers for 13C-metabolic flux analysis
using elementary metabolite units (EMU) basis vector methodology. Metabolic
engineering 14, 150-161.
Crown, S.B., Indurthi, D.C., Ahn, W.S., Choi, J., Papoutsakis, E.T., and Antoniewicz, M.R.
(2011). Resolving the TCA cycle and pentose-phosphate pathway of Clostridium
acetobutylicum ATCC 824: Isotopomer analysis, in vitro activities and expression
analysis. Biotechnol J 6, 300-305.
Crown, S.B., Long, C.P., and Antoniewicz, M.R. (2015). Integrated 13C-metabolic flux analysis
of 14 parallel labeling experiments in Escherichia coli. Metabolic engineering 28, 151-
158.
Dash, S., Khodayari, A., Zhou, J., Holwerda, E.K., Olson, D.G., Lynd, L.R., and Maranas, C.D.
(2017). Development of a core Clostridium thermocellum kinetic metabolic model
consistent with multiple genetic perturbations. Biotechnol Biofuels 10, 108.
Dash, S., Mueller, T.J., Venkataramanan, K.P., Papoutsakis, E.T., and Maranas, C.D. (2014).
Capturing the response of Clostridium acetobutylicum to chemical stressors using a
regulated genome-scale metabolic model. Biotechnol Biofuels 7, 144.
Dromms, R.A., and Styczynski, M.P. (2012). Systematic applications of metabolomics in
metabolic engineering. Metabolites 2, 1090-1122.
Drud, A. (1985). CONOPT: A GRG code for large sparse dynamic nonlinear optimization
problems. Math. Program. 31, 153-191.
Du, B., Zielinski, D.C., Kavvas, E.S., Drager, A., Tan, J., Zhang, Z., Ruggiero, K.E.,
Arzumanyan, G.A., and Palsson, B.O. (2016). Evaluation of rate law approximations in
bottom-up kinetic models of metabolism. BMC systems biology 10, 40.
Eisenhut, M., Ruth, W., Haimovich, M., Bauwe, H., Kaplan, A., and Hagemann, M. (2008). The
photorespiratory glycolate metabolism is essential for cyanobacteria and might have been
conveyed endosymbiontically to plants. Proc Natl Acad Sci U S A 105, 17199-17204.
226
Feist, A.M., Henry, C.S., Reed, J.L., Krummenacker, M., Joyce, A.R., Karp, P.D., Broadbelt,
L.J., Hatzimanikatis, V., and Palsson, B.O. (2007). A genome-scale metabolic
reconstruction for Escherichia coli K-12 MG1655 that accounts for 1260 ORFs and
thermodynamic information. Molecular systems biology 3, 121.
Feng, X., Bandyopadhyay, A., Berla, B., Page, L., Wu, B., Pakrasi, H.B., and Tang, Y.J. (2010).
Mixotrophic and photoheterotrophic metabolism in Cyanothece sp. ATCC 51142 under
continuous light. Microbiology 156, 2566-2574.
Flores, S., Gosset, G., Flores, N., de Graaf, A.A., and Bolivar, F. (2002). Analysis of carbon
metabolism in Escherichia coli strains with an inactive phosphotransferase system by
(13)C labeling and NMR spectroscopy. Metabolic engineering 4, 124-137.
Foster, C.J., Gopalakrishnan, S., Antoniewicz, M.R., and Maranas, C.D. (2019 (Under Review)).
From E. coli mutant 13C labeling data to a core kinetic model: A kinetic model
parameterization pipeline.
Franklin, G.F., Powell, D.J., and Workman, M.L. (1997). Digital Control of Dynamic Systems
(3rd Edition). (Boston, MA, USA: Addison-Wesley Longman Publishing Co., Inc.).
Frohlich, F., Kaltenbacher, B., Theis, F.J., and Hasenauer, J. (2017). Scalable Parameter
Estimation for Genome-Scale Biochemical Reaction Networks. PLoS Comput Biol 13,
e1005331.
Frohlich, F., Kessler, T., Weindl, D., Shadrin, A., Schmiester, L., Hache, H., Muradyan, A.,
Schutte, M., Lim, J.H., Heinig, M., et al. (2018). Efficient Parameter Estimation Enables
the Prediction of Drug Response Using a Mechanistic Pan-Cancer Pathway Model. Cell
Syst 7, 567-579 e566.
Fuhrer, T., Zampieri, M., Sevin, D.C., Sauer, U., and Zamboni, N. (2017). Genomewide
landscape of gene-metabolome associations in Escherichia coli. Molecular systems
biology 13, 907.
Gill, P.E., Murray, W., and Wright, M.H. (1984). Practical Optimization. (London: Academic
Press).
Girgis, H.S., Harris, K., and Tavazoie, S. (2012). Large mutational target size for rapid
emergence of bacterial persistence. Proceedings of the National Academy of Sciences of
the United States of America 109, 12740-12745.
Giuliano, G. (2014). Plant carotenoids: genomics meets multi-gene engineering. Curr Opin Plant
Biol 19, 111-117.
227
Golub, G.H., and Loan, C.F.V. (1996). Matrix computations (3rd ed.). (Johns Hopkins University
Press).
Gopalakrishnan, S., and Maranas, C.D. (2015a). 13C metabolic flux analysis at a genome-scale.
Metabolic engineering 32, 12-22.
Gopalakrishnan, S., and Maranas, C.D. (2015b). Achieving Metabolic Flux Analysis for S.
cerevisiae at a Genome-Scale: Challenges, Requirements, and Considerations.
Metabolites 5, 521-535.
Greene, J.L., Waechter, A., Tyo, K.E.J., and Broadbelt, L.J. (2017). Acceleration Strategies to
Enhance Metabolic Ensemble Modeling Performance. Biophysical journal 113, 1150-
1162.
Hackett, S.R., Zanotelli, V.R., Xu, W., Goya, J., Park, J.O., Perlman, D.H., Gibney, P.A.,
Botstein, D., Storey, J.D., and Rabinowitz, J.D. (2016). Systems-level analysis of
mechanisms regulating yeast metabolic flux. Science 354.
Hasunuma, T., Kikuyama, F., Matsuda, M., Aikawa, S., Izumi, Y., and Kondo, A. (2013).
Dynamic metabolic profiling of cyanobacterial glycogen biosynthesis under conditions of
nitrate depletion. J Exp Bot 64, 2943-2954.
Hatzimanikatis, V., and Bailey, J.E. (1997). Effects of spatiotemporal variations on metabolic
control: approximate analysis using (log)linear kinetic models. Biotechnol Bioeng 54, 91-
104.
Heijnen, J.J., and Verheijen, P.J. (2013). Parameter identification of in vivo kinetic models:
limitations and challenges. Biotechnol J 8, 768-775.
Hendry, J.I., Gopalakrishnan, S., Ungerer, J., Pakrasi, H.B., Tang, Y.J., and Maranas, C.D.
(2019). Genome-Scale Fluxome of Synechococcus elongatus UTEX 2973 Using
Transient (13)C-Labeling Data. Plant Physiol 179, 761-769.
Holms, H. (1996). Flux analysis and control of the central metabolic pathways in Escherichia
coli. FEMS microbiology reviews 19, 85-116.
Hoops, S., Sahle, S., Gauges, R., Lee, C., Pahle, J., Simus, N., Singhal, M., Xu, L., Mendes, P.,
and Kummer, U. (2006). COPASI--a COmplex PAthway SImulator. Bioinformatics 22,
3067-3074.
Hoque, M.A., Fard, A.T., Rahman, M., Alattas, O., Akazawa, K., and Merican, A.F. (2011).
Comparison of dynamic responses of cellular metabolites in Escherichia coli to pulse
addition of substrates. Biologia 66, 954.
228
Hua, Q., Yang, C., Baba, T., Mori, H., and Shimizu, K. (2003). Responses of the central
metabolism in Escherichia coli to phosphoglucose isomerase and glucose-6-phosphate
dehydrogenase knockouts. Journal of bacteriology 185, 7053-7067.
Huege, J., Goetze, J., Schwarz, D., Bauwe, H., Hagemann, M., and Kopka, J. (2011). Modulation
of the major paths of carbon in photorespiratory mutants of synechocystis. PLoS One 6,
e16278.
Huege, J., Sulpice, R., Gibon, Y., Lisec, J., Koehl, K., and Kopka, J. (2007). GC-EI-TOF-MS
analysis of in vivo carbon-partitioning into soluble metabolite pools of higher plants by
monitoring isotope dilution after 13CO2 labelling. Phytochemistry 68, 2258-2272.
Ishii, N., Nakahigashi, K., Baba, T., Robert, M., Soga, T., Kanai, A., Hirasawa, T., Naba, M.,
Hirai, K., Hoque, A., et al. (2007). Multiple high-throughput analyses monitor the
response of E. coli to perturbations. Science 316, 593-597.
Jahan, N., Maeda, K., Matsuoka, Y., Sugimoto, Y., and Kurata, H. (2016). Development of an
accurate kinetic model for the central carbon metabolism of Escherichia coli. Microbial
cell factories 15, 112.
Jamshidi, N., and Palsson, B.O. (2008). Formulating genome-scale kinetic models in the post-
genome era. Mol Syst Biol 4, 171.
Jochum, C., Gasteiger, J., and Ugi, I. (1980). The Principle of Minimum Chemical Distance
(PMCD). Angewandte Chemie International Edition in English 19, 495-505.
Khodayari, A., and Maranas, C.D. (2016). A genome-scale Escherichia coli kinetic metabolic
model k-ecoli457 satisfying flux data for multiple mutant strains. Nat Commun 7, 13806.
Khodayari, A., Zomorrodi, A.R., Liao, J.C., and Maranas, C.D. (2014). A kinetic model of
Escherichia coli core metabolism satisfying multiple sets of mutant flux data. Metabolic
engineering 25, 50-62.
Kim, J., Reed, J.L., and Maravelias, C.T. (2011). Large-scale bi-level strain design approaches
and mixed-integer programming solution techniques. PLoS One 6, e24162.
Klemke, F., Baier, A., Knoop, H., Kern, R., Jablonsky, J., Beyer, G., Volkmer, T., Steuer, R.,
Lockau, W., and Hagemann, M. (2015). Identification of the light-independent
phosphoserine pathway as an additional source of serine in the cyanobacterium
Synechocystis sp. PCC 6803. Microbiology 161, 1050-1060.
Knoop, H., Zilliges, Y., Lockau, W., and Steuer, R. (2010). The metabolic network of
Synechocystis sp. PCC 6803: systemic properties of autotrophic growth. Plant Physiol
154, 410-422.
229
Korner, R., and Apostolakis, J. (2008). Automatic determination of reaction mappings and
reaction center information. 1. The imaginary transition state energy approach. J Chem
Inf Model 48, 1181-1189.
Kotte, O., Zaugg, J.B., and Heinemann, M. (2010). Bacterial adaptation through distributed
sensing of metabolic fluxes. Molecular systems biology 6, 355.
Kucho, K., Okamoto, K., Tsuchiya, Y., Nomura, S., Nango, M., Kanehisa, M., and Ishiura, M.
(2005). Global analysis of circadian expression in the cyanobacterium Synechocystis sp.
strain PCC 6803. Journal of bacteriology 187, 2190-2199.
Kumar, A., and Maranas, C.D. (2014). CLCA: maximum common molecular substructure queries
within the MetRxn database. J Chem Inf Model 54, 3417-3438.
Kumar, A., Suthers, P.F., and Maranas, C.D. (2012). MetRxn: a knowledgebase of metabolites
and reactions spanning metabolic models and databases. BMC bioinformatics 13, 6.
Lafontaine Rivera, J.G., Theisen, M.K., Chen, P.W., and Liao, J.C. (2017). Kinetically accessible
yield (KAY) for redirection of metabolism to produce exo-metabolites. Metabolic
engineering 41, 144-151.
Latendresse, M., Malerich, J.P., Travers, M., and Karp, P.D. (2012). Accurate atom-mapping
computation for biochemical reactions. J Chem Inf Model 52, 2970-2982.
Leighty, R.W., and Antoniewicz, M.R. (2012). Parallel labeling experiments with [U-
13C]glucose validate E. coli metabolic network model for 13C metabolic flux analysis.
Metabolic engineering 14, 533-541.
Leighty, R.W., and Antoniewicz, M.R. (2013). COMPLETE-MFA: complementary parallel
labeling experiments technique for metabolic flux analysis. Metabolic engineering 20,
49-55.
Li, M., Yao, S., and Shimizu, K. (2007). Effect of poxB gene knockout on metabolism in
Escherichia coli based on growth characteristics and enzyme activities. World J
Microbiol Biotechnol 23, 573-580.
Liang, F., and Lindblad, P. (2016). Effects of overexpressing photosynthetic carbon flux control
enzymes in the cyanobacterium Synechocystis PCC 6803. Metabolic engineering 38, 56-
64.
Long, C.P., and Antoniewicz, M.R. (2014). Quantifying biomass composition by gas
chromatography/mass spectrometry. Analytical chemistry 86, 9423-9427.
230
Long, C.P., Gonzalez, J.E., Feist, A.M., Palsson, B.O., and Antoniewicz, M.R. (2018). Dissecting
the genetic and metabolic mechanisms of adaptation to the knockout of a major metabolic
enzyme in Escherichia coli. Proc Natl Acad Sci U S A 115, 222-227.
Long, M.R., Ong, W.K., and Reed, J.L. (2015). Computational methods in metabolic engineering
for strain design. Current opinion in biotechnology 34, 135-141.
Luo, B., Groenke, K., Takors, R., Wandrey, C., and Oldiges, M. (2007). Simultaneous
determination of multiple intracellular metabolites in glycolysis, pentose phosphate
pathway and tricarboxylic acid cycle by liquid chromatography-mass spectrometry. J
Chromatogr A 1147, 153-164.
Machado, D., and Herrgard, M. (2014). Systematic evaluation of methods for integration of
transcriptomic data into constraint-based models of metabolism. PLoS Comput Biol 10,
e1003580.
Madsen, K., Nielsen, H.B., and Tingleff, O. (2004). Methods for Non-Linear Least Squares
Problems (2nd Edition). (Kongens Lyngby: Technical University of Denmark).
Mahadevan, R., and Schilling, C.H. (2003). The effects of alternate optimal solutions in
constraint-based genome-scale metabolic models. Metabolic engineering 5, 264-276.
Maurino, V.G., and Weber, A.P. (2013). Engineering photosynthesis in plants and synthetic
microorganisms. J Exp Bot 64, 743-751.
McCloskey, D., Young, J.D., Xu, S., Palsson, B.O., and Feist, A.M. (2016a). MID Max: LC-
MS/MS Method for Measuring the Precursor and Product Mass Isotopomer Distributions
of Metabolic Intermediates and Cofactors for Metabolic Flux Analysis Applications.
Analytical chemistry 88, 1362-1370.
McCloskey, D., Young, J.D., Xu, S., Palsson, B.O., and Feist, A.M. (2016b). Modeling Method
for Increased Precision and Scope of Directly Measurable Fluxes at a Genome-Scale.
Analytical chemistry 88, 3844-3852.
Metallo, C.M., Gameiro, P.A., Bell, E.L., Mattaini, K.R., Yang, J., Hiller, K., Jewell, C.M.,
Johnson, Z.R., Irvine, D.J., Guarente, L., et al. (2012). Reductive glutamine metabolism
by IDH1 mediates lipogenesis under hypoxia. Nature 481, 380-384.
Metallo, C.M., Walther, J.L., and Stephanopoulos, G. (2009). Evaluation of 13C isotopic tracers
for metabolic flux analysis in mammalian cells. Journal of biotechnology 144, 167-174.
Millard, P., Smallbone, K., and Mendes, P. (2017). Metabolic regulation is sufficient for global
and robust coordination of glucose uptake, catabolism, energy production and growth in
Escherichia coli. PLoS Comput Biol 13, e1005396.
231
Miskovic, L., and Hatzimanikatis, V. (2010). Production of biofuels and biochemicals: in need of
an ORACLE. Trends Biotechnol 28, 391-397.
Moler, C., and Van Loan, C. (2003). Nineteen Dubious Ways to Compute the Exponential of a
Matrix, Twenty-Five Years Later. SIAM Review 45, 3-49.
Mollney, M., Wiechert, W., Kownatzki, D., and de Graaf, A.A. (1999). Bidirectional reaction
steps in metabolic networks: IV. Optimal design of isotopomer labeling experiments.
Biotechnology and bioengineering 66, 86-103.
Monod, J., Wyman, J., and Changeux, J.P. (1965). On the Nature of Allosteric Transitions: A
Plausible Model. J Mol Biol 12, 88-118.
Morgan, H.L. (1965). The Generation of a Unique Machine Description for Chemical Structures-
A Technique Developed at Chemical Abstracts Service. Journal of Chemical
Documentation 5, 107-113.
Murphy, T.A., Dang, C.V., and Young, J.D. (2013). Isotopically nonstationary 13C flux analysis
of Myc-induced metabolic reprogramming in B-cells. Metabolic engineering 15, 206-
217.
Murtagh, B.A., and Saunders, M.A. (1978). Large-scale linearly constrained optimization. Math.
Program. 14, 41-72.
Nakahara, K., Yamamoto, H., Miyake, C., and Yokota, A. (2003). Purification and
characterization of class-I and class-II fructose-1,6-bisphosphate aldolases from the
cyanobacterium Synechocystis sp. PCC 6803. Plant Cell Physiol 44, 326-333.
Nazem-Bokaee, H., Gopalakrishnan, S., Ferry, J.G., Wood, T.K., and Maranas, C.D. (2016).
Assessing methanotrophy and carbon fixation for biofuel production by Methanosarcina
acetivorans. Microbial cell factories 15, 10.
Neidhardt, F.C., and Curtiss, R. (1996). Escherichia coli and Salmonella : cellular and molecular
biology.
Nielsen, J. (2003). It is all about metabolic fluxes. Journal of bacteriology 185, 7031-7035.
Nogales, J., Gudmundsson, S., Knight, E.M., Palsson, B.O., and Thiele, I. (2012). Detailing the
optimality of photosynthesis in cyanobacteria through systems biology analysis. Proc
Natl Acad Sci U S A 109, 2678-2683.
Noh, K., Gronke, K., Luo, B., Takors, R., Oldiges, M., and Wiechert, W. (2007). Metabolic flux
analysis at ultra short time scale: isotopically non-stationary 13C labeling experiments.
Journal of biotechnology 129, 249-267.
232
Noh, K., Wahl, A., and Wiechert, W. (2006). Computational tools for isotopically instationary
13C labeling experiments under metabolic steady state conditions. Metabolic engineering
8, 554-577.
Noh, K., and Wiechert, W. (2011). The benefits of being transient: isotope-based metabolic flux
analysis at the short time scale. Applied microbiology and biotechnology 91, 1247-1265.
Noor, E., Flamholz, A., Bar-Even, A., Davidi, D., Milo, R., and Liebermeister, W. (2016). The
Protein Cost of Metabolic Fluxes: Prediction from Enzymatic Rate Laws and Cost
Minimization. PLoS Comput Biol 12, e1005167.
O'Byrne, C.P., Feehily, C., Ham, R., and Karatzas, K.A. (2011). A modified rapid enzymatic
microtiter plate assay for the quantification of intracellular gamma-aminobutyric acid and
succinate semialdehyde in bacterial cells. J Microbiol Methods 84, 137-139.
Patil, K.R., Rocha, I., Forster, J., and Nielsen, J. (2005). Evolutionary programming as a platform
for in silico metabolic engineering. BMC Bioinformatics 6, 308.
Pazman, A. (1993). Nonlinear statistical models.
Pharkya, P., Burgard, A.P., and Maranas, C.D. (2004). OptStrain: a computational framework for
redesign of microbial production systems. Genome Res 14, 2367-2376.
Placzek, S., Schomburg, I., Chang, A., Jeske, L., Ulbrich, M., Tillack, J., and Schomburg, D.
(2017). BRENDA in 2017: new perspectives and new tools in BRENDA. Nucleic Acids
Res 45, D380-D388.
Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (2007a). Numerical Recipes:
The Art of Scientific Computing (3rd Edition). (Cambridge University Press).
Press, W.H., Teukolsky, S.A., Vetterling, W.T., and Flannery, B.P. (2007b). Numerical Recipes:
The Art of Scientific Computing (3rd Edition). (Cambridge University Press).
Ranganathan, S., Suthers, P.F., and Maranas, C.D. (2010). OptForce: an optimization procedure
for identifying all genetic manipulations leading to targeted overproductions. PLoS
Comput Biol 6, e1000744.
Ranganathan, S., Tee, T.W., Chowdhury, A., Zomorrodi, A.R., Yoon, J.M., Fu, Y., Shanks, J.V.,
and Maranas, C.D. (2012). An integrated computational and experimental study for
overproducing fatty acids in Escherichia coli. Metabolic engineering 14, 687-704.
Raue, A., Schilling, M., Bachmann, J., Matteson, A., Schelker, M., Kaschek, D., Hug, S., Kreutz,
C., Harms, B.D., Theis, F.J., et al. (2013). Lessons learned from quantitative dynamical
modeling in systems biology. PLoS One 8, e74335.
233
Saa, P., and Nielsen, L.K. (2015). A general framework for thermodynamically consistent
parameterization and efficient sampling of enzymatic reactions. PLoS Comput Biol 11,
e1004195.
Saa, P.A., and Nielsen, L.K. (2017). Formulation, construction and analysis of kinetic models of
metabolism: A review of modelling frameworks. Biotechnol Adv 35, 981-1003.
Saha, R., Liu, D., Hoynes-O'Connor, A., Liberton, M., Yu, J., Bhattacharyya-Pakrasi, M.,
Balassy, A., Zhang, F., Moon, T.S., Maranas, C.D., et al. (2016). Diurnal Regulation of
Cellular Processes in the Cyanobacterium Synechocystis sp. Strain PCC 6803: Insights
from Transcriptomic, Fluxomic, and Physiological Analyses. MBio 7.
Saha, R., Verseput, A.T., Berla, B.M., Mueller, T.J., Pakrasi, H.B., and Maranas, C.D. (2012).
Reconstruction and comparison of the metabolic potential of cyanobacteria Cyanothece
sp. ATCC 51142 and Synechocystis sp. PCC 6803. PLoS One 7, e48285.
Sandberg, T.E., Long, C.P., Gonzalez, J.E., Feist, A.M., Antoniewicz, M.R., and Palsson, B.O.
(2016). Evolution of E. coli on [U-13C]Glucose Reveals a Negligible Isotopic Influence
on Metabolism and Physiology. PLoS One 11, e0151130.
Sauer, U. (2006). Metabolic networks in motion: 13C-based flux analysis. Molecular systems
biology 2, 62.
Scanlan, D.J., Sundaram, S., Newman, J., Mann, N.H., and Carr, N.G. (1995). Characterization of
a zwf mutant of Synechococcus sp. strain PCC 7942. Journal of bacteriology 177, 2550-
2553.
Schellenberger, J., Lewis, N.E., and Palsson, B.O. (2011). Elimination of thermodynamically
infeasible loops in steady-state metabolic models. Biophysical journal 100, 544-553.
Schmidt, K., Carlsen, M., Nielsen, J., and Villadsen, J. (1997). Modeling isotopomer distributions
in biochemical networks using isotopomer mapping matrices. Biotechnol Bioeng 55, 831-
840.
Schmidt, K., Nielsen, J., and Villadsen, J. (1999). Quantitative analysis of metabolic fluxes in
Escherichia coli, using two-dimensional NMR spectroscopy and complete isotopomer
models. Journal of biotechnology 71, 175-189.
Segre, D., Vitkup, D., and Church, G.M. (2002). Analysis of optimality in natural and perturbed
metabolic networks. Proc Natl Acad Sci U S A 99, 15112-15117.
Shastri, A.A., and Morgan, J.A. (2007). A transient isotopic labeling methodology for 13C
metabolic flux analysis of photoautotrophic microorganisms. Phytochemistry 68, 2302-
2312.
234
Shimizu, K. (2004). Metabolic flux analysis based on 13C-labeling experiments and integration
of the information with gene and protein expression patterns. Advances in biochemical
engineering/biotechnology 91, 1-49.
Srinivasan, S., Cluett, W.R., and Mahadevan, R. (2018). Model-based design of bistable cell
factories for metabolic engineering. Bioinformatics 34, 1363-1371.
Steinhauser, D., Fernie, A.R., and Araujo, W.L. (2012). Unusual cyanobacterial TCA cycles: not
broken just different. Trends Plant Sci 17, 503-509.
Stovicek, V., Holkenbrink, C., and Borodina, I. (2017). CRISPR/Cas system for yeast genome
engineering: advances and applications. FEMS Yeast Res 17.
Suastegui, M., Yu Ng, C., Chowdhury, A., Sun, W., Cao, M., House, E., Maranas, C.D., and
Shao, Z. (2017). Multilevel engineering of the upstream module of aromatic amino acid
biosynthesis in Saccharomyces cerevisiae for high production of polymer and drug
precursors. Metabolic engineering 42, 134-144.
Suss, K.H., Arkona, C., Manteuffel, R., and Adler, K. (1993). Calvin cycle multienzyme
complexes are bound to chloroplast thylakoid membranes of higher plants in situ. Proc
Natl Acad Sci U S A 90, 5514-5518.
Suthers, P.F., Burgard, A.P., Dasika, M.S., Nowroozi, F., Van Dien, S., Keasling, J.D., and
Maranas, C.D. (2007). Metabolic flux elucidation for large-scale models using 13C
labeled isotopes. Metabolic engineering 9, 387-405.
Takabayashi, A., Kadoya, R., Kuwano, M., Kurihara, K., Ito, H., Tanaka, R., and Tanaka, A.
(2013). Protein co-migration database (PCoM -DB) for Arabidopsis thylakoids and
Synechocystis cells. Springerplus 2, 148.
Tanabe, M., and Kanehisa, M. (2012). Using the KEGG database resource. Curr Protoc
Bioinformatics Chapter 1, Unit1 12.
Tang, Y.J., Martin, H.G., Myers, S., Rodriguez, S., Baidoo, E.E., and Keasling, J.D. (2009).
Advances in analysis of microbial metabolic fluxes via (13)C isotopic labeling. Mass
Spectrom Rev 28, 362-375.
Tepper, N., and Shlomi, T. (2010). Predicting metabolic engineering knockout strategies for
chemical production: accounting for competing pathways. Bioinformatics 26, 536-543.
Teusink, B., Passarge, J., Reijenga, C.A., Esgalhado, E., van der Weijden, C.C., Schepper, M.,
Walsh, M.C., Bakker, B.M., van Dam, K., Westerhoff, H.V., et al. (2000). Can yeast
glycolysis be understood in terms of in vitro kinetics of the constituent enzymes? Testing
biochemistry. Eur J Biochem 267, 5313-5329.
235
Thiel, K., Vuorio, E., Aro, E.M., and Kallio, P.T. (2017). The effect of enhanced acetate influx on
Synechocystis sp. PCC 6803 metabolism. Microbial cell factories 16, 21.
Tian, M., and Reed, J.L. (2018). Integrating proteomic or transcriptomic data into metabolic
models using linear bound flux balance analysis. Bioinformatics 34, 3882-3888.
Tran, L.M., Rizk, M.L., and Liao, J.C. (2008). Ensemble modeling of metabolic networks.
Biophysical journal 95, 5606-5617.
Usui, Y., Hirasawa, T., Furusawa, C., Shirai, T., Yamamoto, N., Mori, H., and Shimizu, H.
(2012). Investigating the effects of perturbations to pgi and eno gene expression on
central carbon metabolism in Escherichia coli using (13)C metabolic flux analysis.
Microbial cell factories 11, 87.
van Eunen, K., Kiewiet, J.A., Westerhoff, H.V., and Bakker, B.M. (2012). Testing biochemistry
revisited: how in vivo metabolism can be understood from in vitro enzyme kinetics. PLoS
Comput Biol 8, e1002483.
van Gulik, W.M., and Heijnen, J.J. (1995). A metabolic network stoichiometry analysis of
microbial growth and product formation. Biotechnol Bioeng 48, 681-698.
Varma, A., and Palsson, B.O. (1994). Stoichiometric flux balance models quantitatively predict
growth and metabolic by-product secretion in wild-type Escherichia coli W3110. Applied
and environmental microbiology 60, 3724-3731.
Varman, A.M., Yu, Y., You, L., and Tang, Y.J. (2013). Photoautotrophic production of D-lactic
acid in an engineered cyanobacterium. Microbial cell factories 12, 117.
Waltz, R.A., Morales, J.L., Nocedal, J., and Orban, D. (2006). An interior algorithm for nonlinear
optimization that combines line search and trust region steps. Math. Program. 107, 391-
408.
Weininger, D., Weininger, A., and Weininger, J.L. (1989). SMILES. 2. Algorithm for generation
of unique SMILES notation. Journal of Chemical Information and Computer Sciences 29,
97-101.
Wiechert, W., and de Graaf, A.A. (1996). In vivo stationary flux analysis by 13C labeling
experiments. Advances in biochemical engineering/biotechnology 54, 109-154.
Wiechert, W., Mollney, M., Isermann, N., Wurzel, M., and de Graaf, A.A. (1999). Bidirectional
reaction steps in metabolic networks: III. Explicit solution and analysis of isotopomer
labeling systems. Biotechnol Bioeng 66, 69-85.
236
Wiechert, W., Siefke, C., de Graaf, A.A., and Marx, A. (1997). Bidirectional reaction steps in
metabolic networks: II. Flux estimation and statistical analysis. Biotechnology and
bioengineering 55, 118-135.
Wiechert, W., and Wurzel, M. (2001). Metabolic isotopomer labeling systems. Part I: global
dynamic behavior. Math Biosci 169, 173-205.
Wittig, U., Kania, R., Golebiewski, M., Rey, M., Shi, L., Jong, L., Algaa, E., Weidemann, A.,
Sauer-Danzwith, H., Mir, S., et al. (2012). SABIO-RK--database for biochemical reaction
kinetics. Nucleic Acids Res 40, D790-796.
Xiong, W., Lo, J., Chou, K.J., Wu, C., Magnusson, L., Dong, T., and Maness, P. (2018). Isotope-
Assisted Metabolite Analysis Sheds Light on Central Carbon Metabolism of a Model
Cellulolytic Bacterium Clostridium thermocellum. Front Microbiol 9, 1947.
Xiong, W., Morgan, J.A., Ungerer, J., Wang, B., Maness, P.-C., and Yu, J. (2015). The plasticity
of cyanobacterial metabolism supports direct CO2 conversion to ethylene. Nature Plants
1, 15053.
Xu, H., Andi, B., Qian, J., West, A.H., and Cook, P.F. (2006). The alpha-aminoadipate pathway
for lysine biosynthesis in fungi. Cell Biochem Biophys 46, 43-64.
Xu, P., Li, L., Zhang, F., Stephanopoulos, G., and Koffas, M. (2014). Improving fatty acids
production by engineering dynamic pathway regulation and metabolic control. Proc Natl
Acad Sci U S A 111, 11299-11304.
Xu, P., Ranganathan, S., Fowler, Z.L., Maranas, C.D., and Koffas, M.A. (2011). Genome-scale
metabolic network modeling results in minimal interventions that cooperatively force
carbon flux towards malonyl-CoA. Metabolic engineering 13, 578-587.
Yan, C., and Xu, X. (2008). Bifunctional enzyme FBPase/SBPase is essential for
photoautotrophic growth in cyanobacterium Synechocystis sp. PCC 6803. Progress in
Natural Science 18, 149-153.
Yang, C., Hua, Q., and Shimizu, K. (2002a). Integration of the information from gene expression
and metabolic fluxes for the analysis of the regulatory mechanisms in Synechocystis.
Applied microbiology and biotechnology 58, 813-822.
Yang, C., Hua, Q., and Shimizu, K. (2002b). Metabolic flux analysis in Synechocystis using
isotope distribution from 13C-labeled glucose. Metabolic engineering 4, 202-216.
Yang, C., Hua, Q., and Shimizu, K. (2002c). Quantitative analysis of intracellular metabolic
fluxes using GC-MS and two-dimensional NMR spectroscopy. Journal of bioscience and
bioengineering 93, 78-87.
237
Yoo, H., Antoniewicz, M.R., Stephanopoulos, G., and Kelleher, J.K. (2008). Quantifying
reductive carboxylation flux of glutamine to lipid in a brown adipocyte cell line. The
Journal of biological chemistry 283, 20621-20627.
You, L., Berla, B., He, L., Pakrasi, H.B., and Tang, Y.J. (2014). 13C-MFA delineates the
photomixotrophic metabolism of Synechocystis sp. PCC 6803 under light- and carbon-
sufficient conditions. Biotechnol J 9, 684-692.
Young, J.D., Shastri, A.A., Stephanopoulos, G., and Morgan, J.A. (2011). Mapping
photoautotrophic metabolism with isotopically nonstationary (13)C flux analysis.
Metabolic engineering 13, 656-665.
Young, J.D., Walther, J.L., Antoniewicz, M.R., Yoo, H., and Stephanopoulos, G. (2008). An
elementary metabolite unit (EMU) based method of isotopically nonstationary flux
analysis. Biotechnol Bioeng 99, 686-699.
Yu, Y., You, L., Liu, D., Hollinshead, W., Tang, Y.J., and Zhang, F. (2013). Development of
Synechocystis sp. PCC 6803 as a phototrophic cell factory. Mar Drugs 11, 2894-2916.
Zhang, S., and Bryant, D.A. (2011). The tricarboxylic acid cycle in cyanobacteria. Science 334,
1551-1553.
Zhao, J., and Shimizu, K. (2003). Metabolic flux analysis of Escherichia coli K12 grown on 13C-
labeled acetate and glucose using GC-MS and powerful flux calculation method. Journal
of biotechnology 101, 101-117.
Zomorrodi, A.R., Lafontaine Rivera, J.G., Liao, J.C., and Maranas, C.D. (2013). Optimization-
driven identification of genetic perturbations accelerates the convergence of model
parameters in ensemble modeling of metabolic networks. Biotechnol J 8, 1090-1104.
Zomorrodi, A.R., and Maranas, C.D. (2010). Improving the iMM904 S. cerevisiae metabolic
model using essentiality and synthetic lethality data. BMC systems biology 4, 178.
Zomorrodi, A.R., Suthers, P.F., Ranganathan, S., and Maranas, C.D. (2012). Mathematical
optimization applications in metabolic networks. Metabolic engineering 14, 672-686.
Zupke, C., and Stephanopoulos, G. (1994). Modeling of Isotope Distributions and Intracellular
Fluxes in Metabolic Networks Using Atom Mapping Matrixes. Biotechnology progress
10, 489-498.
VITA
SARATRAM GOPALAKRISHNAN
EDUCATION
The Pennsylvania State University Sep 2013 -
Mar 2019 PhD in Chemical Engineering
Johns Hopkins University Sep 2011 -
May 2013 MSE in Chemical and Biomolecular Engineering
Manipal University Aug 2007 -
May 2011 BE in Biotechnology
HONORS AND AWARDS
1. McWhirter Graduate Fellowship, The Pennsylvania State University, 2013
2. Best Candidacy Award, McWhirter Graduate Research Symposium, Sep 2014
3. Best Paper Award, McWhirter Graduate Research Symposium, Sep 2016
SELECT PUBLICATIONS
1. Gopalakrishnan, S., & Maranas, C. D. (2015a). 13C metabolic flux analysis at a
genome-scale. Metab Eng, 32, 12-22.
2. Soo, V. W., McAnulty, M. J., Tripathi, A., Zhu, F., Zhang, L., Hatzakis, E., . . ,
Gopalakrishnan, S., . . . Wood, T. K. (2016). Reversing methanogenesis to capture
methane for liquid biofuel precursors. Microb Cell Fact, 15(1), 11.
3. Nazem-Bokaee, H., Gopalakrishnan, S., Ferry, J. G., Wood, T. K., & Maranas, C.
D. (2016). Assessing methanotrophy and carbon fixation for biofuel production by
Methanosarcina acetivorans. Microb Cell Fact, 15(1), 10.
4. Abernathy, M. H., Yu, J., Ma, F., Liberton, M., Ungerer, J., Hollinshead, W. D., . .,
Gopalakrishnan, S., . . . Tang, Y. J. (2017). Deciphering cyanobacterial phenotypes
for fast photoautotrophic growth via isotopically nonstationary metabolic flux
analysis. Biotechnol Biofuels, 10, 273.
5. Gopalakrishnan, S., Pakrasi, H. B., & Maranas, C. D. (2018). Elucidation of
photoautotrophic carbon flux topology in Synechocystis PCC 6803 using genome-
scale carbon mapping models. Metab Eng, 47, 190-199.
6. Hendry, J.I., Gopalakrishnan, S., Ungerer, J., Pakrasi, H.B., Tang, Y.J., and
Maranas, C.D. (2019). Genome-Scale Fluxome of Synechococcus elongatus UTEX
2973 Using Transient (13)C-Labeling Data. Plant Physiol 179, 761-769