Thesis: Approximation of a Cell Cycle Model

8/14/2019 Thesis: Approximation of a Cell Cycle Model

1/80

Approximation of a cell cycle model

Master Thesis

Chaiyut ThanukaewM. Sc. Course Automation and Robotics

Technical University of Dortmund

Dept. of Biochemical and Chemical EngineeringChair of Process Dynamics and Operations

First Supervisor: Prof. Dr.-Ing. Sebastian Engell

Second Supervisor: Dipl.-Ing. Tobias Claus Neymann


2/80


3/80

Declaration

I hereby declare and confirm that the master thesis

Approximation of a cell cycle model

is entirely the result of my own work except where otherwise indicated.

Dortmund, May 15th, 2009 Signature


4/80


5/80

Abstract

Models in biological systems are basically quite huge and complex. They are written based

on hypothesis of molecular mechanisms. The dynamics of the system are explained by a setof first-order nonlinear ODEs using kinetic variables to tune the responds, thus their solutions

can be solve systematically. This is a good connection to employ the genetic algorithm (GA)

for model reduction. The GA simply maps each state of the model to binary bit indicating

which state is include or excluded in a feasible reduced model. A predefined number of thefeasible reduced models are initially introduced by random. They are subsequently developed

via evolutionary processes of selection and reproduction. Therefore, the better reduced

models (measure in a term of cost value) are expected to be found. These processes are

repeatedly done until the desired solution exists. However, there are some difficulties

applying the GA to reduce the biological system, e.g. the resultant reduced model must bereasonably interpreted referred to physical reality. A good example would be, in this work,

the cell cycle model of budding yeast must preserve the mass state; otherwise it is not viable

because a yeast cell without the mass does not make any sense. Hopefully, the result obtained

from applying the GA to reduce the yeast cell cycle in this work will return some benefits to

ones who are interested in.


6/80


7/80

Acknowledgements

The task cannot be completed if lacking of full supports and valuable comments from my

supervisor, Dip.-Ing. Tobias Neymann. I really appreciate for his generous helps during workingperiod and I also would like to thank Dip.-Inform. Tomas Tometski for good suggestions regarding

the GA. Definitely, I sincerely thank Prof. Dr.-Ing. Sebastian Engell for opportunity working with

the department.

Million thanks to my friends, cousins, brothers and sisters, and other relatives in my country for

readily full support and encouragement. They do not only share my bliss, but also stand beside me

when the problems exist. Thanks all of friends in Germany for your kind welcomes and warm

hospitality from beginning till the end.

Finally, without love and dedication from my parents, I could not reach this successful point. All

footprints they have left behind give me the hints to create my own successful paths. Whenever I

am discouraged, they always say Where there is a will, there is a way.


8/80


9/80

Contents

List of Figures

List of Tables

1 Introduction 1

2 Cell Cycle of Budding Yeast 32.1 Cell Cycle..3

2.2 Mathematical Model of Budding Yeast 5

3 Genetic Algorithm (GA) 103.1 Introduction to Genetic Algorithms.. 10

3.2 Individual Representation. 12

3.3 Objective Function 14

3.4 Selection Methods.15

3.5 Genetic Operations17

3.5.1 Crossover. 17

3.5.2 Mutation...19

3.6 Replacement Strategy20

3.7 Advantages and Limitations of the GA.20

4 Implementation Methods 224.1Jacobian-based Local Refinement (JLR)..224.2Constant Input Response (CIR) 234.3Proposed Methods.24

4.3.1 Representation of the Given Problem and Objective function... 24

4.3.2 Implementation Using a Genetic Algorithm26

4.4Application a Cell Cycle Model of Budding Yeast..284.4.1 Reduced Model: Comparison between Substitution Eliminated State

by Zero- and Mean-valued Constant 28

4.4.2 Reduced Model with Parameter Estimation 30

5 Results and Discussion 31


10/80

5.1Jacobian Index Table 315.2Constant Input Response (CIR) 355.3 Preconditions for Generating Feasible Candidates in the First Generation.. 37

5.3.1 Overview of Initialization of the GA 37

5.3.2 Preconditions for Initialization of the GA 38

5.4 Reduced Model with Zero- and Mean-valued Substitution. 41

5.4.1 Overview and Setting Simulation Parameters 41

5.4.2 Results. 43

5.4.3 Discussion regarding the GAs Viewpoint. 45

5.4.2 Discussion regarding the Cell Cycle Model... 48

5.5 Reduced Model with Parameter Estimation. 52

5.5.1 Overview and Setting Simulation Parameters 52

5.5.2 Results and discussion 53

6 Conclusion and Future Works 57

Bibliography

Appendix A

Appendix B

Appendix C


11/80

List of Figures

2.1: Cell cycle of budding yeast... 4

2.2: Consensus diagram of the budding yeast.. 6

3.1: a GA cycle 11

3.2: The pseudo-code of the conventional genetic algorithm. 12

3.3: Example of individuals with binary encoding. 13

3.4: Example of a map for the TSP where each circle representing a city and

all transitions connected representing a feasible route. 14

3.5: Example of individuals with permutation encoding.. 14

3.6: Roulette-wheel selection 15

3.7: Situation before ranking (graph of fitness proportion).. 16

3.8: Situation after ranking (graph of order numbers). 16

3.9: One Point Crossover. 18

3.10: Uniform crossover .. 19

3.11: Mutation.. 19

4.1: Pseudo-code to find the CIR. 24

4.2: Example of an ODE system containing several state variables in which

there exist the problem when substitution an eliminated state wit zero is applied 28

4.3: Original index vector and index vector with parameter estimation... 30

5.1: Pseudo-code to find the Jacobian Index Table... 31

5.2: Original mass state and mass states with different added constant to Sic1... 35

5.3: CIR of the full model generated from using 28 values of added constant inputs.. 36

5.4: CIR of the full model generated from using 10 values of added constant inputs

obtained from the median-valued substitution... 37

5.5. Results of the GA for 2. Reduced model with mean-valued substitution: Case #1 47



5.8: Cell cycle of the originally full model 49

5.9: Cell cycle of the reduced model S1 from Table 5.8... 49

5.10: Cell cycle of the reduced model S2 from Table 5.8. 50

5.11: (a) Histogram of the parameter Kez (original value = 0.3)

(b) Histogram of the parameter Kez2 (original value = 0.2)


12/80

(c) Histogram of the threshold of Esp1 (original value = 0.1)

(d) Histogram of the threshold of ORI, BUD and SPN (original value = 1) 55


13/80

List of Tables

2.1: Equations.. 8

5.1: Jacobian Index Table (Excluding the Diagonal Elements)

of the Cell Cycle Model of Budding Yeast.. 34

5.2: Added constant inputs and median values for each group.. 36

5.3: Mean values of all states in the ODE system .. 39

5.4: Status of the model by resetting each bit of the index s to 0;

(a) with zero substitution, and (b) with mean-valued substitution.. 40

5.5: Parameter settings of the GA (substitution an eliminated state with zero). 42

5.6: Parameter settings of the GA

(substitution an eliminated state with mean-valued constant). 42

5.7: Results of the GA comparing between the reduced models

with zero- and mean-valued substitution. 43

5.8: Resulting reduced models from several simulations... 45

5.9: Parameter settings of the GA with parameter estimation 53

5.10: Results of the GA with parameter estimation. 54


14/80

1

Chapter 1

IntroductionIn order to get deep insight into a complex system, a mathematical model is necessary to describe

what is going on in the system. The model may be based either on first order principles or data-

based (black box), or a combination of both approaches. But the most important objective is to get

the accurate model which explains input-output relationship or mechanisms how the system

behaves and how the whole things inside react to each other and other perturbing signals.

Models, especially in biological systems, can be quite complex and huge comprised of many state

variables, e.g. a cell cycle model in budding yeast and a model of the mammalian circadian clock.

In these models, the responses of a cell to internal and external signals are controlled by networks of

proteins. Mathematical modeling, based on biochemical rate equations, provides a precise andreliable tool for explanation of what happens within the cell and the cell network. To facilitate the

models, a mathematical technique called model reduction is applied. The model reduction returns

some advantageous aspects for instance; to lower a computational effort, to make an infeasible

simulation feasible, to reduced data transfer in online application and so on.

In every moment of daily life, biological systems are working for instance cells within human

bodies, animal cells, plant cells and tiny livings like yeast. A sequence of events from a cell grows

till divide itself into two daughter cells is called a cell cycle. To study what happen inside a cell and

relationship among internal components during the cell cycle, many of control mechanisms areapplied. Finding out an accurate model representing the cell cycle is one challenging task for

researchers over decades. Thus, we would grasp, for example, how cancer cells copy themselves

speedily if we study another comparative organism explicitly. That is why studying the buddingyeast cell cycle model plays significant role and is worthy to get insight into.

Yeast cells, simple and single celled eukaryotes, undergo cell division cycle similar to human cells.

There are many yeast species and some are crucial as model organisms in modern cell biology

researches. They are, moreover, the most thoroughly researched eukaryotic microorganism.

Researchers have used them to gather information into the biology of other eukaryotic cells and

ultimately human beings biology. Chapter 2 will go deeper into details for the cell cycle model inbudding yeast.

Nonlinear systems are normally taking place naturally rather than linear ones. One of the greatest

difficulties finding out solutions of nonlinear systems is that it is not generally possible to combine

known solutions into new solutions. In linear problems, independent solutions can be used toconstruct general solutions through the superposition principle. For this reason, problem solving

regarding nonlinear differential equations are extremely diverse and methods of solution or analysis

are problem dependent.

To obtain less complicated model than a given original model, model reduction is an important tool

in many areas of research such as combustion chemical plant, biological modeling and so on.

Models written in those areas are mainly, of course, nonlinear. There are other else methods for the

model reduction rather than the GA for example proper orthogonal decomposition in model order


15/80

2

reduction technique (MOR), lumping similar state variables together, eliminating states insensitive

to parametric perturbation and leaving redundant state variables. Proper model reduction techniques

should be chosen well suited for given problems. Most recent approach to the nonlinear modelreduction uses mathematic programming techniques in which state variables are removed from the

models without seriously degrading of accuracy.

In this work, we apply an optimization technique Genetic algorithm (GA) to reduce the degree ofthe cell cycle model of budding yeast (model reduction). The algorithm itself has initially takenideas from biological evolutions based on selection, recombination and mutation for breeding to a

next generation, thus, after breeding, we hope finding new better and better cases called parents for

breeding again in the further generations. The GA have been know for decades, however it have

been widely used in last recent years because of much advantageously efficient stuffs employed in

calculating tasks (K. F. Man et al.1999). The model reduction by the GA is suited to resolveproblems which require searching a huge number of possibilities for the solutions. Another

advantage of the GA is that it is easily implemented for parallel calculation, thus computational

time might decrease significantly. Further details of the GA will be discussed again in Chapter 3.

The GA is, however, not the approach yielding globally optimal solutions for the given problems.

On the other hand, its advantage is to return satisfied results within the expected time.

In this task, we are going to apply the GA with the cell cycle model of budding yeast a set of the

first order ordinary differential nonlinear equations. The goal is to reduce as many as possible the

number of states while preserving some key characteristics (normally responses to some signals) inthe manners such that the original model reacts with. Furthermore, few key restrictions in the

reduced model must be preserved such preserving the mass state (lacking of the mass does not make

any sense to study the cell cycle), viability criteria such that they are considered regarding physical

reality of the budding yeast cell cycle model. The full implementation is left for talking in Chapter

4: Implementation Methods.


16/80

3

Chapter 2

Cell Cycle of Budding YeastIn this chapter, fundamentals of a cell cycle of budding yeast and is discussed. The fundamentals of

a cell cycle of budding yeast are initially explained for two reasons: first, to get familiar with the

model we are going to work with, and second, to realize the crucial criteria in the model.

2.1 Cell Cycle

Biological systems are working in every moment in daily life, for instance, cells within human

bodies, animals, plants and microorganisms like yeast. A sequence of events from a newborn cellgrowing till divides itself into two cells is called a cell cycle. After studying budding yeast cell

simple and single celled eukaryotes, for years, the knowledge obtained from it has shown beneficial

returns in understanding cell proliferation in multicellular plants and animals.

The molecular machinery of eukaryotic* cell cycle control is known in more detail for budding

yeast, Saccharomyces cerevisiae, than for any other organisms (K. C. Chen at al., 2000). In

eukaryotic cells, the cell cycle or cell-division cycle is a process by which a cell divides to form two

new cells. The process does not comprise only division but also replication. A growing mother cell

replicates all its essential components e.g. DNA, and later divides them more or less equally

between two daughter cells. So, each daughter cell contains the machinery and information requiredto repeat the cell-division process again and again.

The eukaryotic cell cycle is divided in 4 phases: G1-, S-, G2- and M-phase. These phases are

defined as follows;

- G1 or Gap1 phase: the cell grows.- S or DNA synthesis phase: the cell makes copies of its chromosomes. Each

chromosome now consists of two sister chromatids.

- G2 or Gap2 phase: the cell checks the duplicated chromosomes and get ready to divide.- M or Mitosis phase: the cell separates the copied chromosomes to form two full set

(mitosis) and the cell divides into two new cells.

We can say that, in the cell cycle, S and M phases are such the two important phases separated by

two waiting/checking gaps G1 and G2.

*Eukaryote is a cell with visible nuclei e.g. all living cellular organisms excepting ones in prokaryotes, while

prokaryote is a cell whose DNA is not contained within nuclei e.g. bacteria and blue-green algae, and its cell

cycle occurs via a process termed binary fission.


17/80

4

Figure 2.1: Cell cycle of budding yeast (L. Calzone et al., 2009)

The figure shows a cell cycle of budding yeast. After division, a single daughter yeast cell is growing in the

phase G1 until it becomes big enough at the start position, then the cell starts budding. At the S phase, the

most important genetic components DNA within chromosomes must be accurately replicated. After

checking replicated chromosomes at the G2 phase, the cell is ready for segregation if there is no mistake. At

the M phase, the cell does chromosomes separation and cell division (cytokinesis). The process, finally, yields

two approximately even daughter cells. Though, the cell cycle may not be completed if some conditions are

failed e.g. if the cell could not reach the critical size or there is DNA damage in the G1-phase, and if DNA

damaged, DNA is not replicated or chromosome are not aligned at the M phase.

Mitosis plays a key role in the cell cycle. It is further divided into 4 stages: prophase, metaphase,

anaphase and telophase. Mitosis, basically, acts as the process in cell division by which the nucleus

divides resulting in two new nuclei, each of which contains a complete copy of the parental

chromosomes.

The next problem coming to our minds is how we would know which mechanisms take place at

what time in a cell cycle? The answer to this question may concern some elements such that they

are used as triggers or referent levels indicating that which stage is being in operation. In the

budding yeast system, cyclins are accepted as the regulatory substances responsible to this role.

Cyclin is a protein active in regulating the cell cycle. It typically fluctuates in concentration becauseof synthesis and degradation at specific points during the cell cycle and that regulate the cycle by

binding to its partner cyclin-dependent kinase (Cdk). Thus, the word cyclin is named since its

concentration varies in a cyclical fashion during the cell cycle.


18/80


19/80

6

observed behavior of the chemical reaction system. If a set of rate constants can be found for whichthe solutions fit the observations, then the mechanism is provisionally confirmed (depending on

further experimental investigations). If not, inconsistencies will identify aspects of the mechanism

that require revision and further testing. Although a mechanism can be disproved if it is inconsistent

with well-established facts, it can never be proved correct because new observations may force

modifications and additions (K. C. Chen at al., 2004).

So, we can say that the budding yeast cell cycle model from K. C. Chen at al., 2004 is built up

based on trial and error with manually tuned kinetic parameters to get the exact appropriate

mathematic model. The molecular mechanisms of the budding yeast model are represented in a set

of 36, nonlinear, ordinary differential equations plus 25 algebraic equations.

The consensus diagram of the budding yeast shown below and its explanation, and the subsequent

equations in Table 2.1 (In Appendix B, the equations, parameters and initial conditions are fully

shown) are taken from K. C. Chen at al., 2004. Other aspects for more details can be obtained from

the reference. Viability rules are integrated into the model in order to assure that the model

describes the biology and some mutants correctly, not as a general test for budding yeast model.

Figure 2.2: Consensus model of the budding yeast (K. C. Chen at al., 2004)

The figure shows consensus model of the cell cycle control mechanism in budding yeast. The diagram is

suggested to read from bottom left toward top right, whereas, in the wiring diagram, Cln2 stands for Cln1 and

2, Clb5 for Clb5 and 6, and Clb2 for Clb1 and 2; furthermore, the kinase partner of the cyclins, Cdc28, is not

shown explicitly. There is an excess of Cdc28 and it combines promptly with cyclins as soon as they are

synthesized. Newborn daughter cells must reach to a critical size, then the sufficient Cln3 and Bck2 activate


20/80

7

the transcription factors MBF and SBF which synthesize two cyclins, Cln2 and Clb5. Cln2 is basically

responsible for bud emergence and Clb5 for initiating DNA synthesis. Clb5-dependent kinase activity is not

immediately active because the G1-phase cell is full of cyclin-dependent kinase inhibitors (CKI; namely, Sic1

and Cdc6). After the CKIs are phosphorylated by Cln2/Cdc28, they are rapidly degraded by SCF, releasing

Clb5/Cdc28 to do its job. A fourth class of mitotic cyclins, denoted Clb2, are out of the picture in G1

because their transcription factor Mcm1 is inactive, their degradation pathway Cdh1/APC is active, and their

stoichiometric inhibitors CKI are abundant. Cln2- and Clb5-dependent kinases remove CKI and inactivate

Cdh1, allowing Clb2 to accumulate, after some delay, as it activates its own transcription factor, Mcm1.

Clb2/Cdc28 turns off SBF and MBF. (Clb5/Cdc28 is probably the other down-regulator of MBF.) As

Clb2/Cdc28 drives the cell into mitosis, it also sets the stage for exit from mitosis by stimulating the synthesis

of Cdc20 and by phosphorylating components of the APC (see text for details). Meanwhile, Cdc20/APC is

kept inactive by the Mad2-dependent checkpoint signal responsive to unattached chromosomes. When the

replicated chromosomes are attached, active Cdc20/APC initiates mitotic exit. First, it degrades Pds1,

releasing Esp1, a protease involved in sister chromatid separation. It also degrades Clb5 and partially Clb2,

lowering their potency on Cdh1 inactivation. In this model, Cdc20/APC promotes degradation of a

phosphatase (PPX) that has been keeping Net1 in its unphosphorylated form, which binds with Cdc14. As the

attached chromosomes are properly aligned on the metaphase spindle, Tem1 is activated, which in turn

activates Cdc15 (the endpoint of the MEN signal-transduction pathway in the model). When Net1 gets

phosphorylated by Cdc15, it releases its hold on Cdc14. Cdc14 (aphosphatase) then does battle against the

cyclin-dependent kinases: activating Cdh1, stabilizing CKIs, and activating Swi5 (the transcription factor for

CKIs). In this manner, Cdc14 returns the cell to G1 phase (no cyclins, abundant CKIs, and active Cdh1).

A cell cycle may not be complete if some mechanisms are abnormal. In the simulation the modelregarding to the equations, parameter values and initial conditions in Appendix B (only the ODEs

are shown in Table 2.1), the simulated model is considered viable if the following rules are fulfilled;

(A)The model must execute the following events in order, otherwise the model is consideredinviable.

1. Origin re-licensing (due to a drop in 1. [Clb2]+[Clb5] below Kez2),2. Origin activation (due to a subsequent rise in [Clb2]+[Clb5], causing [ORI] to increase

above 1),3. Spindle alignment (due to a rise in [Clb2], causing [SPN] to increase above 1),4. Esp1 activation ([Esp1] to increase above 0.1, due to Pds1 proteolysis at anaphase), and

5. [Clb2] dropping below a threshold Kez to trigger nuclear division.

(B)The model is inviable if division occurs in an "unbudded cell" (i.e. when [BUD] neverreaches 1 in the cycle).

(C)The cell cycle should be stable such that the root mean square deviation of all variables is 10.These viable criteria are crucial since, in later tasks, feasibly reduced models must comply these

criteria otherwise they are considered to be failed. Note that the viability criteria (C) will not beused in our viability determination because it is used to assure the stability of the whole model.


21/80

8

------------------------------------------------------------------------------------------------------------------------Table 2.1: Equations

------------------------------------------------------------------------------------------------------------------------

]1.[

]1.[

])1[]1.([

])1[]1.([]1.[

]1[

]1.[]1[

]20).[()]20[]20]).([.["'(]20[

]20.[]1.["']20[

].[

].[

][1

])[1].(2.[][

]5]).[2.[,(])5[]5].([14.[]1.["']5[

]5.[]1.["']5[

]5).[]14.[(]5.[]5[

]2).[]14.[(]2.[]2[

]5).[(]5].[14.[]6].[5.[]5[

]2).[(]2].[14.[]6].[2.[]2[

]5.[]2[]6).[]14.[(]6.[]6[

]6).[]5.[]2.[(

]6].[14.[]5).[(]2).[(]).[''']5.["'(]6[

]5).[]14.[(]5.[]5[

]2).[]14.[(]2.[]2[

]5).[(]5].[14.[]1].[5.[]5[

]2).[(]2].[14.[]1].[2.[]2[

]5.[]2.[]1).[]15.[(]1.[]1[

]1).[]5.[]2.[(]1].[14.[]5).[(]2).[(])5.["'(]1[

]2]).[6.[]1.[(])2.[]2.[(])2.[]2.[(]]).[1.["'(]2[

]5]).[6.[]1.[(])5.[]5.[(]).[]5.[(]]).[.["'(][

]).[].["'(][

].[][

,

,

,

,,,

,,

20,220,20,

20,20,20,

,

,

,

,

,,,,

,,,

2,6,36,6,

2,6,36,6,

6,5,5,6,5,

6,2,2,6,2,

5,2,6,36,6,

6,5,2,

6,5,5,2,2,6,6,6,

5,1,31,1,

2,1,31,1,

1,5,5,1,5,

1,2,2,1,2,

5,2,1,31,1,

1,5,2,1,5,5,2,2,1,1,

2,2,2,2,6,32,1,32,2,

5,5,5,5,6,35,1,35,5,

2,2,2,

CdhJ

CdhV

CdhCdhJ

CdhCdhVCdhkk

dt

Cdhd

Cdhkkdt

Cdhd

CdckkCdcCdcPAPCkkdt

Cdcd

CdckMcmkkdt

Cdcd

PAPCJ

PAPCk

PAPCJ

PAPCClbk

dt

PAPCd

SwiClbswikikSwiSwiCdckMcmkkdt

Swid

SwikMcmkkdt

Swid

PFVkCdckFVdt

PFd

PFVkCdckFV

dt

PFd

FVVkPFCdckCdcClbkdt

Fd

FVVkPFCdckCdcClbkdt

Fd

PFVPFVPCdckCdckCdcVdt

PCdcd

CdcVClbkClbk

PCdcCdckFkVFkVSBFkSwikkdt

Cdcd

PCVkCdckCVdt

PCd

PCVkCdckCVdt

PCd

CVVkPCCdckSicClbkdt

Cd

CVVkPCCdckSicClbkdt

Cd

PCVPCVPSickCdckSicVdt

PSicd

SicVClbkClbkPSicCdckCkVCkVSwikkdt

Sicd

ClbCdckSickVFkPFkCkPCkmassMcmkkdt

Clbd

ClbCdckSickVFkPFkC5kPCkmassMBFkkdt

Clb5d

Cln2kSBFkkdt

Cln2d

masskmassdt

d

cdhi

cdhi

Tcdha

Tcdhacdhdcdhs

TcdhdcdhsT

AdmadATaaA

TdssT

apci

apci

apca

apca

swidTswiaswisswis

TswidswisswisT

bdfdfppfkp

bdfdfppfkp

fkpbdfdifppfas

fkpbdfdifppfas

bdbdfdfppfkp

fkpfasfas

fppfdibdfdibdfsfsfs

bdcdcppckp

bdcdcppckp

ckpbdbdicppbas

ckpbdbdicppbas

bdbdcdcppckp

ckpbasbascppbdibdbdibdcscs

fasbasbdfdifdbdicdbsbs

fasbasbdfdifdbdicdbsbs

ndnsns

g

(continued)

--------------------------------------------------------------------------------------------------------------


22/80

9

------------------------------------------------------------------------------------------------------------------------Table 2.1: Equations (continued)

------------------------------------------------------------------------------------------------------------------------

].[]2[

]2[.

][

].[])5.[]3ln[]2ln.[.(][

].[])2[]5.[.(][

]).[(]1].[1.[]1[

]1]).[1.[(].[]1.["].["']1[

].[][

].[].[]1].[14.[].[]).[(][

]1.[]1.[]1].[14.[].[].[]1.[]1[

]1.[]1[

]14]).[1.[]1.[(].[][.])[].([]14.[]14[

]14.[]14[

]15[])15[]15]).([14.[''']1.["])1[]1.(['(]15[

]1.[

]1.[

])1[]1.([

])1[]1.([]1[

,,

,5,3,2,,

,2,5,,

,,,

,,,,2,1,

,,

,,,,,14,

,,,,14,,,

,,

,,,,,14,14,

14,14,

15,15,15,15,

,

2

,

1

SPNkClbJ

Clbk

dt

SPNd

BUDkClbCCkdt

BUDd

ORIkClbClbkdt

ORId

PEVkEspPdskdt

Espd

PdsEspkVPEkMcmkSBFkkdtPdsd

PPXVkdt

PPXd

RENTVRENTPVNetCdckRENTkRENTkkdt

RENTd

NetVPNetVNetCdckRENTkRENTkNetkkdt

Netd

Netkkdt

Netd

CdcPNetkNetkRENTPkRENTkRENTPRENTkCdckkdt

Cdcd

Cdckkdt

Cdcd

CdckCdcCdcCdckTemkTemTemkdt

Cdcd

TemJ

Temk

TemTemJ

TemTemk

dt

Temd

spns

spn

spns

buddbbudnbudnbudbuds

oridboriborioris

pdsdespdiespas

espaspdsdespdipdsspdsspdss

ppxdppxs

netkpnetpprentasrentdinetdd

netkpnetpprentasrentdidnetdnets

TnetdnetsT

rentpasrentasrentpdirentdinetdds

TdsT

iTaaTa

temi

bub

Ttema

Tlte

------------------------------------------------------------------------------------------------------------------------

Reset rules: When [Clb2] drops below Kez, we reset [BUD] and [SPN] to zero, and divide the mass between daughter celland mother cell as follows: massf* mass for daughter, and mass (1 -f) * mass for mother, withf = ekg *D, whereD= (1.026/kg) - 32 is the observed daughter cell cycle time as a function of growth rate. When [Clb2] - [Clb5] drops below

Kez2, [ORI] is reset to 0.

Flags: Bud emergence when [BUD] = 1, start DNA synthesis when [ORI] = 1, chromosome alignment on spindlecompleted when [SPN] = 1.

------------------------------------------------------------------------------------------------------------------------


23/80

10

Chapter 3

Genetic Algorithm (GA)The Genetic Algorithm (GA) as a search and optimizing methodology is well-known and very

useful in many applications in a variety of fields, for instances engineering and scientific areas. The

main reason for this success is undoubtedly due to the advances in solid-state microelectronics that

led to the proliferation of widely available, low cost and speedy computers (K. F. Man et al., 1999).

How the GA works is based on the Darwinian principle of natural selection survival of the fittest

individuals. It was firstly introduced in 1975 by Holland (M. Sriniva and L.M. Paynaik, 1994) as a

programming computation called Simple Genetic Algorithm (SGA). The GA is not

mathematically oriented, it, instead, possesses an intrinsic flexibility and freedom to choose

desirable optima regarding designs and specifications. The GA can also resolve a given problemcoming along with some criteria e.g. being nonlinear, constrained, discrete or quite infinite

computational time.

As the area of genetic algorithms is very wide, it is not possible to cover everything in these pages.

The purpose of this part is to give the foundations of the GA in order to appreciate its abilities in

problem solving. For a given specific problem, the GA may be designed in other ways in order to

resolve the problem appropriately and efficiently. As normally defined, the word GeneticAlgorithm (GA) means to use binary bit strings to encode candidate solutions, while other

techniques in the family are defined slightly different e.g. Evolutionary Strategy (ES) which is used

to optimize real-valued parameters, Evolutionary Programming (EP) which differs from the GA and

ES that there is no exchange of genes among a parent pair; every individual is consider as a parent

and only mutation operator is used, Genetic Programming (GP) in which the individuals arerepresented by a tree syntax (J. C. G. Narbona, 2006).

3.1 Introduction to Genetic Algorithms

The basic principles of GA were first proposed by Holland (M. Sriniva and L.M. Paynaik, 1994).

There are series of literature and reports available thereafter. The GA is inspired by the mechanism

of natural selection where stronger individuals are likely the winners in a competing environment.

Through the consecutive generations of genetic evolution, fitter and fitter individuals will be found.

After time passes, the fittest individuals comparable to the optimal solutions of a given problem

will come out. In other word, the GA aims at further improving them until achieving a defined

objective value, reaching a setting iteration number or no fitter individual found for some time.

The GA presumes that a feasible solution of any problem is an individual and can be represented by

a set of parameters the process of encoding a feasible solution is called representation. These

parameters are regarded as the genes in a chromosome and can be constructed by binary bit strings.

A positive value, basically known as a fitness or cost value, is used to reflect the degree of

goodness of the individuals of a given problem. It is obtained from calculating the cost function


24/80

11

and would be defined as the better individual the higher cost value, or the better individual thelower cost value depended on the problem. The cost value allows each candidate to be

quantitatively evaluated.

Initially in the first generation, a number of individuals are randomly generated to be candidate

solutions for a given problem. According to the fitness determination, each individual can be

distinguished apart from others because of different cost values. To breed new individuals for asubsequent generation, the higher-cost individuals acting as the parents should get more chance

to be selected rather than the lower-cost ones. In other word, the probability of each individual

chosen is proportional to its cost value. This selected process is known as the selection. However,

the GA itself is based on random operation, thus no one could guarantee if an individual would havethe exact chance to survive in proportion to its fitness.

A pair of randomly selected individuals is called parents playing a key role in breeding

mechanism. To breed, the chosen parents undergo the genetic operations called crossover and

mutation. Two parents forming the crossover return also two offspring such that each offspring

takes some gene contents from the father, another from the mother. When the crossover is finished,

randomly changes with predetermined probabilities are introduced to the gene contents, causing

additional changes. These changes are called mutation.

The expectation of the GA is that the average fitness of the population will increase for each

generation and, by repeating these processes: selection, crossover and mutation, for some number of

generations, (very) good solutions to the problem should be discovered.

Only by chance, those processes which are applied to the selected individuals from a prior

generation improve the newborn individuals. However, there should be a number of the newborn

individuals which are better than the previous ones since they are not arbitrarily generated, but

rather created according to the strategies of selection, crossover and mutation. The algorithm could

yield a satisfied solution in the end if it keeps making better and better individuals which are more

complete or more efficient to the problem. Otherwise, the algorithm is aborted if there is no

improvement of individuals for some periods or a predefined setting number of iterations is met.

Figure 3.1: a GA cycle (K. F. Man et al., 1999)


25/80

12

The GA has proven itself to be the powerful and successful problem-solving strategies. It has been

applied to a wide variety fields e.g. music generation, genetic synthesis, VLSI technology, strategy

planning and machine learning (M. Sriniva and L.M. Paynaik, 1994).

As depicted in Figure 3.1: a GA cycle, the algorithm starts at the block Population such that the

GA randomly generates the individuals until reaching the set number of individuals. The encodedindividuals (genotypes) are transformed back regarding the original problem context (phenotypes)

in order to determine the fitness values calculated from the objective function. Then, the selection

chooses the individuals to be the parents according to a predetermined strategy. Each pair of parents

is performed under the genetic operation: crossover and mutation, to yield the required number ofoffspring. The offspring, later, are again calculated for the fitness values. Lastly, the individual

population is replaced by the offspring via a chosen strategy of replacement.

The pseudo-code of the conventional genetic algorithm is shown in Figure 3.2.

Figure 3.2: The pseudo-code of the conventional genetic algorithm(K.F.Man et al., 1999)

3.2 Individual Representation

The most crucial fundamental of the EA structure is the encoding mechanism used to represent

feasible solutions for a given problem. Objects forming feasible solutions within the original

Evolutionary Algorithm ()

{// start with an initial time

t := 0;

// initialize a usually random population of individual

init_population P(t);

// evaluate fitness of all initial individuals of population

evaluate P(t);

// evolution cycle

while not terminated do

// increase the time counter

t := t + 1;

// select a sub-population for offspring production

P'(t) := select_parent P(t) ;// recombine the genes of selected parents

recombine P'(t) ;

// perturb the mated population stochastically

mutate P'(t) ;// evaluate its new fitness

evaluate P'(t) ;

// select the survivors from actual fitness

P := survive P, P'(t) ;

od}


26/80

13

problem context are called phenotypes, their encodings, the individuals within the GA, are called

genotypes. The representative step specifies the mapping from the phenotypes onto a set of

genotypes.

The problem to be tackled varies from one to the other. The coding of individual representing the

candidate solutions also varies due to the nature of the problem itself. So far, the problem

representation as the bit string is the most classic method used by the GA because of its simplicityand traceability. Meanwhile, there are also other representing methods more suited to certain

problems than using the binary bit strings e.g. the order-based representation which is particularly

useful for the problem in which sequences of individuals contents are required to be solved one

of the most well-known problems is the Traveling Salesman Problem (TSP).

After a properly representative method is chosen for the problem, the GA operations crossover

and mutation are also needed to develop on the basis of this fundamentally representative

structure. It is clear that, for example, if we consider the TSP problem (a salesman needs to

minimize the total travelling distances of visiting entirely n cities), the conservative crossover

method in which the genetic details of an offspring comprises of one part taken from its mother and

another from father will not work because a permutation encoding is basically applied for the TSP,

thus the conservative crossover would cause an element within the offspring appearing more thanone time.

A few representative methods for the GA and other optimization techniques will be given below.

There are many more encoding methods. However, the encoded result must be in the form such that

a computer can process, thus the GA is the population-based technique of searching optimal

(satisfying) results from a huge search space and the computer facilitates the computational effort a

lot.

Representation methods and examples

1. Encoding the solution as binary strings: Binary encoding is the most common for theGA. A feasible solution is in the array in which each element is a binary number of 0 or1. One of the examples for the binary encoding is the Knapsack Problem where there

are things with given value and size, the objective is to maximize the value of things in the

knapsack, but do not extend the knapsacks capacity. The encoding of the feasible solution

is each bit indicating whether the corresponding thing is in the knapsack or not, e.g. Figure

3.3.

Figure 3.3: Example of individuals with binary encoding

2. Encoding the solution as the permutation order: For some specific problems, eachposition of a fixed-length array represents some particular aspect a feasible solution. Itmight be an integer or decimal number. One of the most classic instances such that a

feasible solution is encoded as the permutation order is the Traveling Salesman Problem

(TSP). For a given number of city and distances between a departed city to a destination

city, the salesman needs to travel all of them while minimize the total traveling distances.


27/80

14

Thus, the sequences of cities are encoded as the fixed-length array, and different sequences(routes) return different distances.

The example map of 10 cities for the TSP is depicted in Figure 3.6. Two different scenarios

for the permutation encoding are shown in Figure 3.7. Of course, if they represent the

feasible individuals for the TSP, Individual A means the salesman travels firstly from the

city 1 to the city 5 and so on (153264798), whereas Individual B isdifferently from the city 8 to the city 5 and so on (856723149).

Therefore, two individuals return the different distances informing that which route is

shorter.

Figure 3.4: Example of a map for the TSP where each circle representing a city and

all transitions connected representing a feasible route

Figure 3.5: Example of individuals with permutation encoding

3. Encoding the solutions as array of real-values: Every individual array is a string of somevalues. Value encoding is very good for some special problems. On the other hand, it is, for

this encoding, often necessary to develop some new crossover and mutation specific to the

problem. One example in this case would be finding weights for a neural network. Assume

there is a basic neural network which comprises of one input and out put layer. The output

is a function of the input connected by one hidden layer. Finding the weights for the inputs

to train for the wanted output can be encoded as the real value.

3.3 Objective Function

An objective function is a measuring mechanism being used to evaluate how good a feasible

solution is. This is a very important link to relate the GA and the system concerned. Since each

candidate is individually going through the same evaluating process, the range of this value varies

from one candidate to another.


28/80

15

However, writing a proper objective function must be carefully considered. The objective mustevaluate entirely generated individuals with the same standard, furthermore there must be no

conflict causing any doubt in the quality of all individuals.

To make clear in this point, consider an individual after being applied to the problem, then it is

calculated for a fitness value in order to reflect its performance. Therefore, when the entire

individuals have been tested, the relative ability of this individual can be identified, e.g. assumethere are 4 individuals in each generation, the individual a is calculated for its fitness value as well

as others. Then, all fitness values in this generation are normalized. Therefore, the relative ability of

individual a is determined by its normalized portion. When apply this evaluation process throughout

all population in the GA with the same standard, of course, the ability of individual a can beindentified.

3.4 Selection Methods

To generate good offspring, a good parent selection mechanism is important. This is a process used

for determining the number of trials for one particular individual used in reproduction. The

selection may behave in a deterministic or in a randomized manner, depending on the algorithmchosen and its application-dependent implementation. Moreover, the elitist selection may

incorporate. The chance of selecting one individual as a parent should be directly proportional to the

number of offspring produced. There are many techniques which the EA can use to select the

individuals to be copied into the next generation, but listed below are some of the most common

methods.

Selection methods

1. Fitness-proportionate selection or Roulette-wheel selection: Fitter individuals are morelikely, but not certain, to be selected. Conceptually, this can be presented as a game of

roulette each individual get a slice of the wheel, but fitter ones get larger slices than the

less fit ones. The wheel is then spun, and whichever individual owns the section on which itlands each time is chosen.

Figure 3.6: Roulette-wheel selection

However, there are some drawbacks for the roulette-wheel selection. Assume there is an

outstanding individual dominating most parts of the roulette wheel, the generated offspring


29/80

16

are likely to converge to a certain search space with respect to this individual. In anothercircumstance, if all individuals fitness in the wheel is near each other, the selection

pressure comparing between parents with a high fitness and parents with a low one is low.

Therefore, the generated offspring do not improve much.

2. Rank selection: Each individual in the population is sorted to be assigned for a numericalrank based on the fitness, and the selection is based on this ranking rather than absolutedifference in fitness. The fitness assigned to each individual depends only on its position in

the individuals rank, not on the actual objective value. The advantage of this method is that

it can prevent very fit individuals from gaining dominance early at the expense of less fitted

ones. Therefore, this behavior would increase the populations genetic diversity and mighthinder attempts to find satisfying solutions in a limited number of generations.

Figure 3.7 shows the situation before ranking in the situation that there are 4 individuals

and Individual#1 is very fit, thus it dominates most of the area in the fitness-proportion pie

chart. The result after ranking is depicted in Figure 3.8

Figure 3.7: Situation before ranking (graph of fitness proportion)

Figure 3.8: Situationafter ranking (graph of order numbers)


30/80

17

3. Tournament selection: Subgroups of individuals are chosen from the larger population,and members of each subgroup compete against each other. Only one individual from each

subgroup is chosen to reproduce.

4. Stochastic universal selection: The individuals are assigned to contiguous sections of aline exactly as the roulette-wheel selection. Differently, a number of pointers over the line

are distributed evenly over here. The number of individuals to be selected (n) correspondsto the number of pointers. Then, the distance between the pointers is 1/n and the position of

the first pointer is given by a randomly generated number in the range [0, 1/n]. The

individuals whose segments include the pointers are selected.

5. Elitist selection: A few of the most fitted members of each generation are guaranteed to beselected. Normally, the GA does not pure elitism but rather a chosen selection strategy

combines the elitism together to assure convergence of the solution if there is no better

offspring introduced in a succeeding generation.

3.5 Genetic Operations

Crossover and mutation are two basic operators of the GA. Performance of GA also depend on

appropriately chosen crossover and mutation schemes. Type and implementation of genetic

operators depends on the encoding method and on the type of problem.

3.5.1 Crossover

The crossover occurs in the GA when two parents are selected for the reproduction. It is sometimes

called recombination. The crossover is the process to combine the attributes of two chosen

individuals, thus the result yields two new offspring in which, by chance, the good attributes from

the mother and from the father combine, so that their fitness values are better than the fitness valuesof their parents.

There are commonly few crossover methods as following;

1. Single point, two-point and multi-point crossover: It is the most common crossovermethod that was inspired by biological processes. The crossover point is randomly

generated and, later, applied to the two selected parents. The first offspring copies the part

before the crossover point from Parent#1 as its first attributes, and then the part after the

crossover point taken from Parent#2 is combined together. The second offspring is also

done analogously.


31/80

18

Parent #1

Parent #2

Offspring #1

Offspring #2

Crossover point

Figure 3.9: One point crossover

Similar to the single point crossover, the two-point crossover can be done with random

choosing 2 crossover points. The muti-point crossover can also be analogously

implemented. We can say that the single-point and two-point crossovers are the subset of

the multi-point one.

However, the single point crossover has a major drawback in some situations as shown in

K.F. Man et al.1999. For example, assume that there are two high-performance schemes:

S1 = 1 0 1 * * * * 1

S2 = * * * * 1 1 * *

There are two individuals in the population,I1 andI2, matched by S1 and S2 respectively:

I1 = 1 0 1 1 0 0 0 1

I2 = 0 1 1 0 1 1 0 0

If the single point crossover is performed, it is impossible to get the individual that can bematched by the following scheme (S3) as the first scheme (S1) will be destroyed.

S3 = 1 0 1 * 1 1 * 1

A multi-point crossover can overcome this problem, resulting in a greatly improved

performance of offspring generation.

Assuming that the two-point crossover is preferred to substitute the single point crossover

in this case, theI1 andI2are then performed. Thus the resulting offspring would be returned

as theI3 andI4 in which theI3is matched by S3.

I1 = 1 0 1 1 | 0 0 | 0 1

I2 = 0 1 1 0 | 1 1 | 0 0

I3 = 1 0 1 1 1 1 0 1

I4 = 0 1 1 0 0 0 0 0


32/80

19

2. Uniform crossover: This crossover method generates the offspring based on a randomlygenerated crossover mask. The operation is demonstrated in Figure 3.10. The mask bit 1

means the corresponding bits from Parent#1 and Parent#2 are needed to swap, otherwise

there is no swapping.

Parent #1

Parent #2

Mask

Offspring #1

Offspring #2

1 10 1 00 0 01 1 00 1

Figure 3.10 Uniform crossover

The resultant offspring contain a mixture of bit contents from each parent. The number of

effective crossing points is not fixed, but it will be in average ofL/2 (where L is the bitlength of each individual).

3.5.2 Mutation

After crossover, the offspring are subjected to the mutation. The mutation for the binary string

means to flip the mutated bits to the opposite values changing a bit of 0 to 1 or vice versa.

Basically, the bits of binary string are independently mutated. The purpose of mutation is to avoid

candidate solutions converging into local minima. For example, suppose all the strings in a

population have converged to a 0 at a given position and the optimal solution has a 1 at that

position, then crossover cannot regenerate a 1 at that position, while a mutation could. By simply

changing some elements regarding a mutation probability to new random values, it will raisegenetic diversity of candidate in the population. Usually, the mutation rate is small.

Figure 3.11: Mutation

The mutation concept was originally designed only for the binary-represented individual. From K.

F. Man et al.1999, the concept is then adapted and applied with also real number individual, e.g. arandom mutation is designed as:

),( gg (3.1)

Where g is the real number element in individual; is a random function (may be Gaussian ornormal distribution); , are respectively the mean and variance regarding the random function.


33/80

20

3.6 Replacement Strategy

According to Figure 3.1, when the generation of the sub-population (offspring) is completed,

several representative strategies proposed for the replacement of old generation exist. In case of

generational replacement, all individuals in the present generation are entirely replaced by the new

offspring. Therefore, the population of sizeNneedsNoffspring for replacement for this strategy.

However, this strategy might fail to generate better offspring comparing to the best current

individual. So, it is usually combined with an elitist strategy where one or a few of the best

individual is copied into the subsequent generation. The elitism may increase the speed of

domination of a population by an outstanding individual, but it appears to improve the performance

in average.

Another strategy of the generational replacement is that not all of the offspring are used for thereplacement. Only some offspring (usually the better ones) are used to replace the individual in the

population.

Knowing that a large number of offspring produced consumes a lot of computation in each

generational cycle, the other scheme is to generate a small number of offspring. Normally, the worstindividuals are replaced when new offspring are put into the population. A direct replacement of the

parents by the corresponding offspring may also be adopted. Another optional method is to replace

the oldest individual which remains in the population for a long time. However, this might cause

discarding the best long-life individual.

3.7 Advantages and Limitations of the GA

There are many features that make the GA becoming popular for resolving optimization problems.

Firstly, the GA itself is not mathematic-oriented optimization strategy even, sometimes,mathematical formulations are required, e.g. to determine a cost function. Thus, it is not hard to

comprehend the algorithm and later apply them to a given problem.

Secondly, encoding the candidate solutions of a given problem can be done easily with the

presentation of binary bit strings (the ordinary GA). Therefore, we do not need any specific

knowledge to the problem.

According the candidate solutions in the GA are different from one to another, a parallel

computation technique of modern technology can give some advantages of simultaneously

searching the candidate solutions. Hence, the computational time is significantly reduced.

Lastly, the GA can be used to solve multi-objective problems. The problems to be optimized, in thereal world, cannot basically be stated in the term of single value to be minimized or maximized.

However, they are expressed in the term of multiple objectives which normally involves with the

tradeoffs such that one objective cannot be increased without the decreasing of another. The GA can

find the best result in the multi-objective scenario by simply choosing some feasible solutions in the

search space and later developing them by the selection, crossover and mutation in order to

achieve the better solutions.


34/80

21

In contrast, there are some limitations for the GA.

Considering in the representative point of view, the ordinary GA does not suit for a problem which

is better defined its solution as a string of real-values; finding the weight values for nodes in a

hidden layer of a neural network would be a good example for this case (Chapter 3.2). Therefore,

other optimization techniques based on the GA are introduced, for instance, Evolutionary Strategy

(ES), Evolutionary Programming (EP) and Genetic Programming (GP) which are mentioned early

in this chapter.

The most important point of the GA limitation would be the GA is not the optimization method

which guarantees that the optimal solution to the problem will be found. In contrast, it returns

satisfying solution within predetermined time, e.g. setting number of the simulation, better solution

cannot be found for some time or obtaining the result with predefined cost value.

Other limitations would concern with choosing parameter values and the GA mechanisms applied

to a specific problem. For example, if an individual is very fit to the problem compared with others,

so the offspring generated in a subsequent generation would be dominated by the very fit individual,

therefore this scenario causes the candidate solution becoming convergent and they would be stuck

in some local minima.


35/80

22

Chapter 4

Implementation MethodsIn this chapter, the Jacobian-based Local Refinement (JLR) and the Constant Input Response (CIR)

are firstly discussed thus they will be later comprised in the proposed methods. The proposed

methods show how to manage all materials in previous chapters in order to reduce the given model.

4.1 Jacobian-based Local Refinement (JLR)

Mathematically, the Jacobian is shorthand for the Jacobian matrix. The Jacobian matrix is the

matrix of all first-order partial derivatives of a vector-valued function.

Suppose a systematic function Fis given by m real-valued component functions;y1(x1, x2,,xn), ,

ym(x1, x2,, xn). The partial derivatives of all these function (if existent) can be written in the m-by-n

matrix the Jacobian matrixJofF as follow;

n

mm

n

x

y

x

y

x

y

x

y

J

1

1

1

1

Example: the Jacobian matrix of the function F(x1, x2, x3, x4) with the component;

4134

3

2

23

42

11

.3)sin(.

.2.4

7

xxxx

xxx

xx

xx

3)sin(0)cos(.

0280

7000

0001

),,,(

113

2

4

4

3

4

2

4

1

4

4

3

3

3

2

3

1

3

4

2

3

2

2

2

1

2

41

1

3

1

2

1

1

1

4321

xxx

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

x

xxxxJF


36/80

23

To take the Jacobian of a vector-valued function, the result implies also the information ofrelationship among the state vectors in the systematic function. Thus, we can apply the result to

initial processes of model reduction in order to reduce the computational effort afterward. As taken

this method from S. R. Tayler et al., 2008, we reservedly call it Jacobian-based Local Refinement

(JLR)

The JLR could reduce some redundant computations depending on users decision: some statevectors could be neglected from the calculation if they are found out that, whether or not they exist,

there is no effect to the whole system.

In the example above, the Jacobian result shows that;

- the first state ( 1x ) is solely the function of itself, and is called as an input-only statebecause its solution is used as the input for other states, and

- the second state ( 2x ) is solely the function of the variable x4 thus, it is called as anoutput-only state meaning that this state completely depends on the variables taken

from other states.

In the model reduction, these two types of state input- and output-only state are important. We

could substitute the input-only state by a constant such that its value is the solution of that state, and

the output-only state can be discarded from the reduced model since we can indirectly obtain it from

other states. In other word, these states are extraneous from the reduced models viewpoint.

4.2 Constant Input Response (CIR)

The basic idea of this task is taken from Oscillator Model Reduction Preserving the Phaseresponse: Application to the Circadian Clock by F. J. Doyle et al., 2008. Generally, in circadian

clock models, the phase response curve (PRC) is widely used as evaluated criterion for the modelswhen the light input is changed in both of its duration and acute. F. J. Doyle et al., 2008, employ aso called Parametric Impulse Phase Response Curve (pIPRC) which measures the phase shift of a

reduced system to the original system when the light parameter is varied. Roughly speaking, the

pIPRC is the phase difference between one of a feasibly reduced system and another one from the

original system when the light input is changed. The pIPRC is finally accumulated as one of the two

major terms in the total cost function ofF. J. Doyle et al., 2008.

Thus, we try to find any response by varying interested parameters in the cell cycle model of

budding yeast similar to a pulse signal and, then, notice how the whole system reacts.

Unfortunately, the cell cycle of budding yeast model does not posses this phase-different property at

all (from initial experiment) since it is not a limit cycle oscillator model. Thus, hereby, we introduce

a new term called Constant Input Response (CIR) in order to evaluate the sensitivity of thefeasibly reduced models comparing to one of the original model.

A constant input signal is added directly to the state Sic1 (state 5) in the cell cycle model of budding

yeast (K. C. Chen et al., 2004). As Sic1 is a regular cyclin subunit called stoichiometric CDK

inhibitor, to directly add a constant signal, physically, would mean we put some substances in order

to boost up some activities of the cell cycle e.g. going faster into the G1-phase. How can we notice

the response of the budding yeast system after adding up the constant signal to the Sic1 state? The


37/80

24

easiest answer is to measure how the mass physically reacts to this perturbation. The differentialequation of Sic5 (4.1) below shows how to add the constant into the state.

addedckpbasbas

cppbdibdbdibdcscs

KSicVClbkClbk

PSicCdckCkVCkVSwikkdt

Sicd

]1).[]5.[]2.[(

]1].[14.[]5).[(]2).[(])5.["'(]1[

1,5,2,

1,5,5,2,2,1,1, (4.1)

The constant Kadded is increasingly varied from zero to a certain level such that the whole model is

not oscillatory or fails the viability criteria. The responses of the mass to the different Kadded are

taken to plot the CIR.

So, the CIR is defined as follow: For each added constant input, the CIR is the amplitude and time

of the mass state such that it becomes stable while, also, the whole model must comply the viability

criteria.

According the CIR is in the series of time-amplitude for which each point corresponds to an added

constant input applied, it can be plotted in two dimensions such that the x-axis is defined as

Recovery Time, while the y-axis is for Recovery Amplitude. The result of the CIR applying to

the full model can be found in Chapter 5.2.

Here is the pseudo-code to find the CIR is of a given stable and viable model;

Figure 4.1: Pseudo-code to find the CIR

4.3 Proposed Methods

To investigate mechanisms involved in the cell cycle of budding yeast, we would like to minimize

the number of states in the original model in order to facilitate the mechanism investigation.

Therefore, the reduced model will have a closed-form expression as a system of ODEs, moreover

the reduced model must retain the biochemical interpretation of the state vector and response to

specific input signals as the full model does.

The proposed method for model reduction will return an optimized reduced model with a minimal

number of states while preserving some key characteristics the constant input response (CIR).Thus, the cost function of the problem is formulated as the minimization of the cost depending on

the number of states and the CIR-associated error.

4.3.1 Representation of the Given Problem and Objective function

The cell cycle of budding yeast model consists of 36 ODEs and 25 algebraic equations (K.C. Chen

et al., 2004). Considering only the ODE system, the first main objective is to reduce the number of

Repeat

Vary the constant-added input

Measure amplitude and time of the first peak of the mass state becoming stable

Until the tested model is not oscillatory or not viable


38/80

25

ODE states. To achieve that objective, a kind of representative index indicating which states areincluded or excluded in the reduced model is needed. Firstly, consider the full model which is

represented by;

),( pxfx

(4.2)

Where:

x= dynamic state variables of the system,

x = vector of state variables, and

p = vector of parameters.

Meanwhile, the reduced system requires one more variable indicating the existent states. We rewrite

the reduced ODE system as;

),,( psxfx

(4.3)

Where: s = index vector where siis 1 if the i thstate is included in the model (and

0 if it is excluded).

The number of states present in the model is

Nf

i

is1

, where Nf is the number of states in the full

model. This representation method for the reduced system is taken the idea from S. R. Tayler et al.,2008.

We want to find the reduced model by minimizing the number of states while preserving the shape

of a particular CIR. The cost of a reduced model is calculated by its size and its error (measured

by its ability to reproduce the CIR). The minimal reduced model ),,( psxfx

is defined such

that its solutionx*minimizes the cost , i.e.

)(min)( xxSs

(4.4)

Where: S in the set of all vectors of the lengthNfwhose entries are 0 or 1, andx

is the solution of ),,( psxfx

.

The cost function is defined to be negative in case the reduced model fails to be periodical or any

of the viability constraints, thus it is regarded as the unusable model. Otherwise, it is non-negative,

i.e.

We give the weight to and CIR equally (multiplied by 1), since these terms are independent. Toobtain one term possessing the great value does not mean we could get another term as a great

value, e.g. a reduced model which has more number of states left is likely to return the better CIR

than other ones with less number of states.

4.5

CIR

.1.1

1 x is not oscillatory or inviable.

x is oscillatory and viable.


39/80


40/80

27

with the systems Jacobian matrix which eliminates some extraneous states, before the GA operates,in order to reduce computational cost.

Each individual in a genetic algorithm is defined by its genome an array of genes encoding the

solution. We map each gene to a decision variable, meaning that there are Nf binary entries in the

genome.

The implementation of the GA starts with randomly generating an initial population P(0) of sizeNc.

However, there are some conditions causing the initial population cannot be arbitrarily random

generated (the details are shown in 5.3 Preconditions of generating a feasible candidate the first

generation). Thus, the preconditions e.g. mass and other states used to determine the viable criteriamust exist as well as others to be fulfilled before the initial population are created.

Then for subsequent generations, we createNcnew individuals, or offspring (with non-negative cost

functions). We use an elitist strategy: the few best individuals in the previous generation are copied

into the current generation, thus we use the elitism of two meaning the two best individuals are

unconditional copied from the preceding generation. The reason of elitism is to make sure that the

algorithm is converging finding the optimal solution. The remaining offspring in generation i are

created using the genetic operators selection, crossover, and mutation according to the followingalgorithm:

1. Select parentsp1 andp2 from P(i - 1).

2. Create offspring c using uniform crossover withp1 andp2.

3. Mutate the genes in offspring c.

4. Remove the states of offspring c deemed extraneous by the system Jacobian.

5. Compute cost for offspring c.

6. If0, then add offspring c to P(i).

Because the individuals do likely close to each other their fitness are theoretically in the range of

[0, 2], a linear ranking selection seems to be the suitable choice among other selection strategies.

Thus, we use a linear ranking selection operator; each parent is chosen with a probability

proportional to their fitness rank (i.e. the fittest parent is most likely to be chosen).

For reproduction, we use a uniform crossover each gene in offspring cs genome is chosen from

parentp1 with probability 0.5. Then, we mutate each gene with probability 0.1, where mutation is

simply to flip the state index to be a reversal one e.g. from inclusion to be exclusion of that gene or

vice versa.

Because the individual configuration is chosen randomly, some states may be extraneous and can be

removed by the Jacobian-based local refinement method (JLR). Therefore, the computational effortis reduced. The important thing in the JLR is to determine which states are crucial to the model in

both the full and reduced one, and whether or not we can leave the input-only states or the output-

only state. After all determinations are done, the index table of the JLR is needed to be calculated

prior starting the GA.

After offspring c has undergone the local refinement, we determine its fitness by evaluating its cost

function. If the cost is negative then it is discarded and does not count toward the required Nc. We

run the algorithm until reaching the maximal number of generations which is 25 generation here.


41/80

28

4.4 Application to a Cell Cycle of Budding Yeast

A cell cycle of budding yeast model proposed by K.C.Chen et al. 2004 comprises of 36 ODEs and

25 auxiliary algebraic equations but only the main 36 ODEs are required to be reduced. As an arrayof 36 binary strings is employed to represent each feasible case of the reduce model, the total

number of possibilities of the reduced models is exactly 236 cases (approximately 64 billion cases)

without any constraint. To manage with such this very huge search space, the GA is well-suited in

order to find a satisfying solution.

4.4.1 Reduced Model: Comparison between Substitution the Eliminated

State by Zero and by Mean-valued Constant

The experiment was firstly designed to find the optimal result which possesses the lowest number

of states, while preserving the CIR curve. In the referent literature regarding Oscillator Model

Reduction Reserving the Phase Response proposed by S.R.Tayler et al., 2008, an excluded state is

substituted by zero meaning that the discarded state is totally cut away from the model or it does

not, in other word, affect anything.

Applied this to the cell cycle of budding yeast, however, one can argue which kind of substitution

should be used. The substitution method must not introduce more constraints into the model and the

reduced model with a given substitution method should contain the same properties as the original

model.

There are two feasible substitution methods: 1. substitution the eliminated state by zero, and 2.

substitution the eliminated state by its mean-valued constant. So, we are going to compare the best

solutions taken from the finally reduced models. The better method measured by finally returning

the solution with lower cost should be accredited for further experiments.

Without mathematic proof, there is a foreseeing case (in 1. substitute the eliminated state with zero)that could introduce a consequent problem regarding model reduction. Assume there is an ODE

system consisting of several states. In the system, there is a state m such that the dynamics of thevariable m is a function of itself multiplied by another state variable n. Then, assume further that the

state n could be reduced; hence its dynamics as well as its variable are substituted by zeros.

Therefore, there is the problematic consequence taking place afterward which is the state m also

becomes zero but we do not want it occurred.

nkdt

dn

mndt

dm

.

.

2

Figure 4.2: Example of an ODE system containing several state variables such that

there exists the problem, when substitution an eliminated state with zero, is applied


42/80


43/80


44/80

31

Chapter 5

Results and DiscussionIn this chapter, the simulation results will be shown and discussed. Firstly, the Jacobian-based Local

Refinement (JLR) will be calculated and then, its result will be kept as an index table for later use.

Then, the Constant Input Response (CIR) of the full model will be shown. It will be employed to

calculate the total cost value of each feasible solution from the GA. Before applying the GA, some

preconditions are introduced, e.g. some states must be fixed, so that, the randomly generated

individuals in the first generation of the GA can be found. After then, the GA will give the answer

to a question of which type of substitution the eliminated states is better. The resulting reduced

model will be discussed regarding its numerical solution and the wiring diagram of the cell cycle

presented in Chapter 2. And, finally, the GA with parameter estimation will be simulated.

5.1 Jacobian Index Table

In order to apply the Jacobian to the ODE system, the algebraic equations must be firstly substituted

into the ODEs before taking the Jacobian because the equations need to contain the variables only

belonging to the system itself.

The Jacobian index table is designed to be initially calculated since this table will facilitate finding

out the Jacobian-based Local Refinement (JLR) afterward. To create the table, we do not payattention to what the result of each element in the Jacobian is. Instead, we do care only which

element is non-zero and which is zero.

The codes to resolve this problem are written in MATLAB, thus the pseudo-code to find the

Jacobian index table of the model is shown below, in Figure 5.1.

Figure 5.1: Pseudo-code to find the Jacobian Index Table

The result, represented as the Jacobian index table, is shown in Table 5.1. As presented in the

pseudo-code, the element 1 means that the Jacobian result in that position of the Jacobian matrixis non-zero. On the other hand, the element 0 is represented when the Jacobian result is zero.

// Pseudo-code for Jacobian Index Table

Substitute the algebraic equations into the ODE system

Define the variable vector

Calculate the Jacobian of the ODE system symbolically

Substitute non-zero elements of the Jacobian result matrix with 1, otherwise

with 0

Calculate the Jacobian index table by subtraction of all diagonal elements)(

~JdiagJJ


45/80

32

Regarding the result, it is clear that the states 34, 35 and 36 are the output-only states since all

elements in such those respective rows are zeros. So, they do not affect the model at all because

they do not feed any signal back into the model. Hence, we can ignore them. However, the viability

criteria require that the events triggered by the states 33, 34, 35 and 36 (the states Esp1, ORI, BUD

and SPN respectively) must be in the correct sequence, so we will keep the states 34, 35 and 36 for

checking the viability but not count them into the reduced models size at all (the state 33 Esp1

is mentioned here because it involves in the determination of the viability criteria but, as the resultfrom Table 5.1, it is not the output-only state). The ordering events of the viability criteria are

already discussed in Chapter 2.

The states 22, 24, 26 and 28 are clearly the input-only states since all elements in the respective

columns are zeros. These input-only states can be substituted by their solutions at the steady-state.

Of course, if there is individually the state 22, 26 or 28 existing in the reduced model in the end, we

will not count them anyways. Notice that the state 26 is exceptional because it is affected by reset

conditions. To make clear in this point, consider 2 groups of these equations: a. the equations of the

states 22, 26 and 28, and b. the equation of the state 24;

.]1.[]1[

]14.[]14[

]1.[]1[

,,

14,14,

,,

TnetdnetsT

TdsT

Tcdhdcdhs

T

Netkkdt

Netd

Cdckkdt

Cdcd

Cdhkkdt

Cdhd

In the state 22, The constant ks,cdh is 0.01, kd,cdh is 0.01 and the initial value of [Cdh1]Tis 1. Thus, its

solution definitely equals the constant 1. The states 24 and 28 can be explained analogously such

that they also return the constant results.

The state 24 is written below;

State 24:]1[

]1.[

])1[]1[

])1[]1.([]1[

,

2

,

1

TemJ

Temk

TemTemJ

TemTemk

dt

Temd

temi

bub

Ttema

Tlte

.

Unlike the states in the first group, the state 24 is not obviously seen such that it is the input-only

state but the result in Table 5.1 confirms that. However, it can neither be substituted by a constant

nor be discarded because of the reset conditions: kbub2 = 1 (for [ORI] > 1 and [SPN] < 1) or 0.2

(otherwise) and klte1 = 1 (for [SPN] > 1 and [Clb] > kez) or 0.1 (otherwise). This can be obtained

from Table 2: Parameter values and initial conditions in Appendix B.

To decide which one of these input-only states can be left out depends further on which substitution

methods are used, e.g. all states 22, 26 and 28 can be neglected from the reduced model in case of

the mean-valued substitution because there is no difference whether they exist in the reduced model

(their mean values are the same as their numerical solutions).

However, in the circumstance of substitution the eliminated state by zero, these all states cannot be

initially left out. There will exist some problems in case of leaving all of them, e.g. leaving only the

state 24 causes the full system to be unstable (not oscillatory) and inviable (as demonstrated in 5.3).

According to this simulation result, we decided to keep all input-only states for later steps in the

scenario of substitution the eliminated state by zero.

State 22:

State 26:

State 28:


46/80

33

The mass state (1) is also the input-only state but it is one crucial state in the model since, firstly,

the model will not work properly if the mass is missing and, secondly, the model does not make any

physical sense if there is no mass in the budding yeast system.

The state 4 is needed to be preserved because it is the state triggering the mass resetting event. As

mentioned in the reset rules, the mass is reset when Clb2 is lower than Kez, thus this state is also

crucial and, of course, we must make sure for its existence.

Finally, the state 5 (Sic1) which involves in adding the constant input must be preserved. Otherwise,

adding any constant input does not cause any change to the system if this state is not counted to

exist in the reduced model.


47/80

34

Table5.1:JacobianIndexTable(ExcludingtheDiagonalElements)of

theCell

CycleModelofBuddingYeast


48/80


49/80

36

maximal value supported by the full model even if there is a reduced model which could stand a

higher value.

The CIR is one of the crucial terms in the total cost function. Any CIR of a reduced model which is

the most-likely to one of the full model will feasibly yield a good value of the total cost function.

The lower CIR the better reduced model (only in the point of view of the CIR). The reduced model

which has a very low CIR means that its response to different added constant inputs is very close tothe response of the full model.

Figure 5.3: CIR of the full model generated from using 28 values of added constant inputs

Table 5.2: Added constant inputs and median values for each group


50/80

37

Figure 5.4: CIR of the full model generated from using 10 values of added constant inputs

obtained from the median-valued substitution

5.3 Preconditions for Generating Feasible Candidates in the First

Generation

Since there is an obstruction occurring when we firstly try to implement the GA to the cell cycle

model, the problem is that the feasible individual in the first generation cannot be found. Therefore,

we add some preconditions prior to generating the individual in order to overcome this difficulty.

The preconditions, however, are different between 2 cases of the substitution for eliminated state.

5.3.1 Overview of Initialization of the GA

The feasible candidates are randomly generated in the first generation of the GA. However, from

the simulated experiments, we could not find out any candidates from several-hour running becausethere is no even oscillatory candidate taking place. Why does this problem exist? To answer this

problem, let make clear on what method we employ to generate these candidates in the first

generation.

In the first generation, the respective codes generate a feasible candidate represented by a 36-by-1

matrix in which its elements are binary numbers, thus each element comes from rounding a numbercreated from a uniform distribution in MATLAB. It means that, in average, half of the generated

elements are 1s and another half are 0s after the rounding process.

The initial individuals in the first generation are required to be stable and viable. From the processof randomly generating the feasible candidate above, we suppose it is appropriate or not. What if

there are some constraints, e.g. the viability criteria mentioned in Chapter 2, causing the generating

process cannot create a feasible candidate arbitrarily, how to find out those constraints? There must

be some state variables such that the model will not be oscillatory in case of lacking them. In order


51/80

38

to reduce the computational effort for the initialization, we will fix some states. Thus, how to find

out which states will be fixed and what are the reasons will be explained in the subsequent topic.

5.3.2 Preconditions for Initialization of the GA

In order to test for the crucial states, the s index representing a feasible candidate is manually put

into the full model by setting only one state/bit to be 0 each time. The experiments are done for 2

cases; 1. Zero substitution, and 2. Mean-valued substitution. The results are shown in Figure 5.5.Note that the zero substitution means to substitute an eliminated state with zero and the mean-

valued substitution means to substitute an eliminated state with itself mean value which is shown in

Table 5.3.

In case 1. Zero substitution, the results show that the states 1, 4, 20, 21, 24, 25, 26, 27 and 36individually cannot be left out, otherwise a non-oscillatory system exists. However, the

combinatory cases are not tested because the GA itself should find out which combinations are

useful. In the followings, all restrictions to generate feasible candidates in the first generation for

the case of zero substituti

Thesis: Approximation of a Cell Cycle Model

Documents

Transcript of Thesis: Approximation of a Cell Cycle Model