Speeding up the Gillespie algorithm

69
Speeding up the Gillespie algorithm Colin Gillespie School of Mathematics & Statistics

description

A review of the techniques used to make the Gillespie algorithm computationally efficient.

Transcript of Speeding up the Gillespie algorithm

Page 1: Speeding up the Gillespie algorithm

Speeding up the Gillespie algorithm

Colin Gillespie

School of Mathematics & Statistics

Page 2: Speeding up the Gillespie algorithm

Outline

1. Brief description of stochastic kinetic models2. Gillespie’s direct method

I Different Gillespie!

3. Discussion

2/1

Page 3: Speeding up the Gillespie algorithm

Stochastic kinetic models

Suppose we have:

I N species: X1,X2, . . . ,XN

I M reactions: R1,R2, . . . ,RM

I In a “typical” model, M = 3N.

Reaction Ri takes the form

Ri : ui1X1 + . . . + uik XNci−−→ vi1X1 + . . . + vik XN .

The effect of reaction i on species j is to change Xj by an amount vij − uij .

3/1

Page 4: Speeding up the Gillespie algorithm

Mass action kinetics

Example zeroth-order reaction: if reaction Ri has the form

Ri : ∅ ci−→ Xk

then the rate that this reaction occurs is

hi(x) = ci .

The effect of this reaction is

xk = xk + 1 .

4/1

Page 5: Speeding up the Gillespie algorithm

Mass action kinetics

Example first-order reaction: if reaction Ri has the form

Ri : Xjci−→ 2Xj

then the rate that this reaction occurs is

hi(x) = cixj

where xj is the number of molecules of Xj at time t . The effect of thisreaction is

xj = xj + 1 .

5/1

Page 6: Speeding up the Gillespie algorithm

Mass action kinetics

Example second-order reaction: if reaction Ri has the form

Ri : Xj + Xkci−→ Xk

then the rate that this reaction occurs is

hi(x) = cixjxk .

The effect of this reaction is

xj = xj − 1

There is no overall effect on Xk . For example, Xk could be an enzyme.

6/1

Page 7: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

So R1 and R3 are first-order reactions and R2 is a second order reaction.

7/1

Page 8: Speeding up the Gillespie algorithm

The Gillespie algorithm

I (Dan) Gillespie has developed a number of algorithms. The “Gillespiealgorithm” refers to his 1977 Journal of Chemical Physics paper (cited∼ 1800 times)

I Kendall’s 1950’s paper “An artificial realisation of a simple birth anddeath process”, simulated a simple model using a table of randomnumbers (cited ∼ not very often)

8/1

Page 9: Speeding up the Gillespie algorithm

The Gillespie algorithm

I (Dan) Gillespie has developed a number of algorithms. The “Gillespiealgorithm” refers to his 1977 Journal of Chemical Physics paper (cited∼ 1800 times)

I Kendall’s 1950’s paper “An artificial realisation of a simple birth anddeath process”, simulated a simple model using a table of randomnumbers (cited ∼ not very often)

8/1

Page 10: Speeding up the Gillespie algorithm

“....premature optimisation is the root of all evil”

Donald Knuth

9/1

Page 11: Speeding up the Gillespie algorithm

The direct method

1. Initialisation: initial conditions, reactions constants, and randomnumber generators

2. Propensities update: Update each of the M hazard functions, hi(x)

3. Propensities total: Calculate the total hazard h0 = ∑Mi=1 hi(x)

4. Reaction time: τ = −ln[U(0, 1)]/h0 and t = t + τ

5. Reaction selection: A reaction is chosen proportional to it’s hazard

6. Reaction execution: Update species

7. Iteration: If the simulation time is exceeded stop, otherwise go backto step 2

Typically there are a large number of iterates.

10/1

Page 12: Speeding up the Gillespie algorithm

The Gillespie slow down

As the number of reactions (and species) increase, the length of time asingle iteration takes also increases

ExampleIn the next few slides we will consider a toy model:

Xici−→ ∅, i=1, . . . , N

where N = M = 600, xi(0) = 1000, ci = 1 and the final time is T = 30.So

hi(x) = cixi

11/1

Page 13: Speeding up the Gillespie algorithm

The Gillespie algorithm

I When we discuss this algorithm we are thinking about software whichreads in a description of your model in SBML (say), and runsstochastic simulations

I Examples: Copasi, celldesigner, gillespie2

12/1

Page 14: Speeding up the Gillespie algorithm

Step 2: Propensities update

I At each iteration we update each of the M hazards. That is wecalculate hi(x) for i = 1, . . . ,M. This is O(M)

I However, after a single reaction has occurred we actually only needto update the hazards that have changed

Toy ExampleI If reaction 1 occurs

R1 : X1c1−→ ∅,

only species X1 is changed

I The only hazard that contains X1 is R1

13/1

Page 15: Speeding up the Gillespie algorithm

Step 2: Propensities update

I At each iteration we update each of the M hazards. That is wecalculate hi(x) for i = 1, . . . ,M. This is O(M)

I However, after a single reaction has occurred we actually only needto update the hazards that have changed

Toy ExampleI If reaction 1 occurs

R1 : X1c1−→ ∅,

only species X1 is changed

I The only hazard that contains X1 is R1

13/1

Page 16: Speeding up the Gillespie algorithm

Dependency graphs

I Construct a dependency graph for the hazards

I For the toy model the graph just contains M = 600 independentnodes

��������r r r r r ��

��R1 R2 RM

14/1

Page 17: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

15/1

Page 18: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

����

R1

@@@R

���

��������

R1 R2

15/1

Page 19: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

����

R1

@@@R

���

��������

R1 R2

����

R2

@@@R?

���

������������

R1 R2 R3

15/1

Page 20: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

����

R1

@@@R

���

��������

R1 R2

����

R2

@@@R?

���

������������

R1 R2 R3

����

R3

@@@R

���

��������

R2 R3

15/1

Page 21: Speeding up the Gillespie algorithm

Directed graph

I Equivalently, we could represent the dependency graph as a directedgraph

����

����

����

R1 R2 R3- -

� �

16/1

Page 22: Speeding up the Gillespie algorithm

Directed graph

I Equivalently, we could represent the dependency graph as a directedgraph

����

����

����

R1 R2 R3- -

� �

16/1

Page 23: Speeding up the Gillespie algorithm

Directed graph

I Equivalently, we could represent the dependency graph as a directedgraph

����

����

����

R1 R2 R3- -

� �

16/1

Page 24: Speeding up the Gillespie algorithm

Directed graph

I Equivalently, we could represent the dependency graph as a directedgraph

����

����

����

R1 R2 R3- -

� �

16/1

Page 25: Speeding up the Gillespie algorithm

Dependency graph

I So instead of updating all M reactions, we only need to update Dpropensities. Usually D < 6

I However, constructing and traversing the graph also takes time

I So we would only implement this data structure if M > 10

17/1

Page 26: Speeding up the Gillespie algorithm

Step 3: Propensities total

I At each iteration we combine all M hazards - O(M)

h0(x) =M

∑i=1

hi(x) .

I However, after a single reaction has occurred we only need to updatethe hazards that have change

I If we have used a dependency graph for the reaction network thenI we can subtract the old hazard values from h0I add the new hazards values to h0

18/1

Page 27: Speeding up the Gillespie algorithm

Step 3: Propensities total

I At each iteration we combine all M hazards - O(M)

h0(x) =M

∑i=1

hi(x) .

I However, after a single reaction has occurred we only need to updatethe hazards that have change

I If we have used a dependency graph for the reaction network thenI we can subtract the old hazard values from h0I add the new hazards values to h0

18/1

Page 28: Speeding up the Gillespie algorithm

Step 3: Propensities total

Toy modelI If reaction Ri fires, then

hnew0 = hold

0 − holdi + hnew

i

I One addition and a one subtraction instead of 600 additions

19/1

Page 29: Speeding up the Gillespie algorithm

Step 4: Reaction time

Reaction time: τ = −ln[U(0, 1)]/h0. As the number of reactions andspecies increase, the time of this step is constant.

I For the toy model, we spend about 3% of computer time executingthis step

I You could generate the random numbers on a separate thread (on amulticore machines) to save you a small amount of time

20/1

Page 30: Speeding up the Gillespie algorithm

Step 5: Reaction selection

I We choose a reaction proportional to it’s propensity. Or search for theµ that satisfies this equation:

µ

∑i=1

hi(x) > U × h0(x) >µ−1

∑i=1

hi(x),

where U ∼ U(0, 1)

I This is O(M)

I The key to reducing this bottleneck is noting that in most systems,some reactions occur more often than others. The model system ismulti-scale.

I To speed up this step, we order the hi ’s in terms of size

21/1

Page 31: Speeding up the Gillespie algorithm

Step 5: Reaction selection

I We choose a reaction proportional to it’s propensity. Or search for theµ that satisfies this equation:

µ

∑i=1

hi(x) > U × h0(x) >µ−1

∑i=1

hi(x),

where U ∼ U(0, 1)

I This is O(M)

I The key to reducing this bottleneck is noting that in most systems,some reactions occur more often than others. The model system ismulti-scale.

I To speed up this step, we order the hi ’s in terms of size

21/1

Page 32: Speeding up the Gillespie algorithm

Step 5: Reaction selection

Consider the following pieces of R code:

## u are U(0, 1) RNsfor(i in 1:length(u)) {

i f (u[i] < 0.01)x = 1

else i f (u[i]<0.05)x = 2

else i f (u[i]<0.1)x = 3

elsex = 4

}

Calling this piece of code 107 times takes about 34 seconds.

22/1

Page 33: Speeding up the Gillespie algorithm

Step 5: Reaction selection

Now lets just reverse the order of the if statements

for(i in 1:length(u)) {i f (u[i] < 0.9)x = 1

else i f (u[i]<0.95)x = 2

else i f (u[i]<0.99)x = 3

elsex = 4

}

Calling this piece of code 107 times takes about 15 seconds. A reductionof around 44%.

23/1

Page 34: Speeding up the Gillespie algorithm

Step 5: Reaction selection

I In the previous example, it was obvious how we should order the ifstatements since we were generating a random number from a staticdistribution

I In the reaction selection step, the distribution a function of time

I The optimal ordering depends on the current time

CodingI If you are reading in a SBML file, you don’t have a bunch of

pre-written if statementsI Instead, we will have two vectors: order and hazards

I hazards: A vector of length M containing the current values of hi(x)I order: A vector of length M containing integers indicating the order

we read the hazards vector

24/1

Page 35: Speeding up the Gillespie algorithm

Step 5: Reaction selection

I In the previous example, it was obvious how we should order the ifstatements since we were generating a random number from a staticdistribution

I In the reaction selection step, the distribution a function of time

I The optimal ordering depends on the current time

CodingI If you are reading in a SBML file, you don’t have a bunch of

pre-written if statementsI Instead, we will have two vectors: order and hazards

I hazards: A vector of length M containing the current values of hi(x)I order: A vector of length M containing integers indicating the order

we read the hazards vector

24/1

Page 36: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

25/1

Page 37: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

0 5 10 15 20 25 30

05

01

00

15

02

00

25

03

00

Time

Ha

za

rd R

ate

25/1

Page 38: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

0 5 10 15 20 25 30

05

01

00

15

02

00

25

03

00

Time

Ha

za

rd R

ate

25/1

Page 39: Speeding up the Gillespie algorithm

Lotka-Volterra model

R1 : X1 → 2X1 R2 : X1 + X2 → 2X2 R3 : X2 → ∅

0 5 10 15 20 25 30

05

01

00

15

02

00

25

03

00

Time

Ha

za

rd R

ate

25/1

Page 40: Speeding up the Gillespie algorithm

Optimised direct method

Solution 1 - Cao et al., 2004I Run a few presimulations for a short period of time t <max-time

I Reorder your hazard vector according to the presimulations

I Run your main simulation

26/1

Page 41: Speeding up the Gillespie algorithm

Optimised direct method

Solution 1 - Cao et al., 2004I Run a few presimulations for a short period of time t <max-time

I Reorder your hazard vector according to the presimulations

I Run your main simulation

Lotka-VolterraUsing the standard parameters from Boys, Wilkinson & Kirkwood,in a typical simulation, reactions R1, R2 and R3 occur in roughlyequal amounts.

26/1

Page 42: Speeding up the Gillespie algorithm

Optimised direct method

Solution 1 - Cao et al., 2004I Run a few presimulations for a short period of time t <max-time

I Reorder your hazard vector according to the presimulations

I Run your main simulation

DisadvantagesI Clearly doing presimulations isn’t great

I How long should you simulate for?I Presimulations will be time consuming

I The order of reactions is fixed. So at some simulations pointsthe order may be sub-optimal.

26/1

Page 43: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 Reactions

27/1

Page 44: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsExecute R4

R1 R2 R3 R4 R5

27/1

Page 45: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsSwap R4 with R3

R1 R2 R4 R3 R5

27/1

Page 46: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsExecute R5

R1 R2 R4 R3 R5

27/1

Page 47: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsSwap R5 with R3

R1 R2 R4 R5 R3

27/1

Page 48: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsExecute R4

R1 R2 R4 R5 R3

27/1

Page 49: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsSwap R4 with R2

R1 R4 R2 R5 R3

27/1

Page 50: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsExecute R5

R1 R4 R2 R5 R3

27/1

Page 51: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I Each time a reaction is executed, it is moved up one place in the

reaction vector

I Similar to a Bubble sort

Example: 5 ReactionsSwap R5 with R2

R1 R4 R5 R2 R3

27/1

Page 52: Speeding up the Gillespie algorithm

Sorting direct method

Solution 2: McCollum et al., 2006I The swapping effectively reduces the search depth for a reaction the

next time it’s executed

I Only requires a swap of two memory addresses, so very littleoverhead

I Handles sharp changes in propensity, such as on/off behaviour inswitches

I Easy to code

I Reduces the problem to order O(S), where S is the search distance

28/1

Page 53: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

29/1

Page 54: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

Reactions

0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

2

29/1

Page 55: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

Reactions

0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

2

0.4

0.6

29/1

Page 56: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

Reactions

0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

2

29/1

Page 57: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

Reactions

0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

2

29/1

Page 58: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

Reactions

0.0

00

.02

0.0

40

.06

0.0

80

.10

0.1

2

29/1

Page 59: Speeding up the Gillespie algorithm

Binary searches

I Binary search Li & Petzold, Tech Report. 2006I Composition and Rejection scheme - Slepoy et al. J. Chem. Phys.

2008I I suspect these methods are only useful for very large systems

29/1

Page 60: Speeding up the Gillespie algorithm

Step 6: Reaction execution

I After a reaction has fired, update the species

I Naively, we could update all species after a reaction has fired

x = x + S(j)

where S(j) = v (j) − u(j) denotes the j th column of the stoichiometrymatrix S. This operation would be O(N)

I However, S is almost certainly sparse. In the toy model, we have:

R1 : X1 → ∅

soS(1) = (−1, 0, 0, 0, . . . , 0, 0, 0)′

30/1

Page 61: Speeding up the Gillespie algorithm

Step 6: Reaction execution

I After a reaction has fired, update the species

I Naively, we could update all species after a reaction has fired

x = x + S(j)

where S(j) = v (j) − u(j) denotes the j th column of the stoichiometrymatrix S. This operation would be O(N)

I However, S is almost certainly sparse. In the toy model, we have:

R1 : X1 → ∅

soS(1) = (−1, 0, 0, 0, . . . , 0, 0, 0)′

30/1

Page 62: Speeding up the Gillespie algorithm

Sparse vectors

I Instead we use compressed column format for storageI For each column in the stoichiometry matrix we have two vectors:

1. A vector of the non-zero values2. A vector of indices for the non-zero values

31/1

Page 63: Speeding up the Gillespie algorithm

Sparse vectors

I Instead we use compressed column format for storageI For each column in the stoichiometry matrix we have two vectors:

1. A vector of the non-zero values2. A vector of indices for the non-zero values

Toy modelI So

S(1) = (−1, 0, 0, 0, . . . , 0, 0, 0)′

would be represented as:

V1 = (−1) and C1 = (1)

31/1

Page 64: Speeding up the Gillespie algorithm

Lotka-Volterra systemFor the Lotka-Volterra reaction:

R2 : X1 + X2 → 2X2

we have the stoichiometry matrix column:

S(2) = (−1, 1)′

which would be represented as:

V2 = (−1, 1) and C2 = (1, 2)

32/1

Page 65: Speeding up the Gillespie algorithm

Lotka-Volterra systemFor the Lotka-Volterra reaction:

R2 : X1 + X2 → 2X2

we have the stoichiometry matrix column:

S(2) = (−1, 1)′

which would be represented as:

V2 = (−1, 1) and C2 = (1, 2)

32/1

Page 66: Speeding up the Gillespie algorithm

Discussion

I The Gillespie algorithm is a fairly easy method to implement, but wecan achieve impressive increases of execution speed with efficientdata structures

I In fact “clever programming” can turn an obviously slow algorithm intoa faster, more efficient method

I Gibson-Bruck did this with Gillespie’s first reaction methodI Topic of my next talk

33/1

Page 67: Speeding up the Gillespie algorithm

DiscussionI This highlights that it can be very difficult to carry out speed

comparisons of different algorithms.I What do we mean when we measure the speed of an algorithm?I We need to be sure that the slowness of an algorithm isn’t down to bad

programmingI Likelihood free techniques require millions of simulator calls. It is

crucial that you have an efficient simulator.

34/1

Page 68: Speeding up the Gillespie algorithm

DiscussionI This highlights that it can be very difficult to carry out speed

comparisons of different algorithms.I What do we mean when we measure the speed of an algorithm?I We need to be sure that the slowness of an algorithm isn’t down to bad

programmingI Likelihood free techniques require millions of simulator calls. It is

crucial that you have an efficient simulator.

However,

“....premature optimisation is the root of all evil”

Donald Knuth

34/1

Page 69: Speeding up the Gillespie algorithm

Further Reading

Gillespie, D., 1977. Exact Stochastic Simulation of Coupled Chemical Reactions. The Journal of Physical Chemistry.

Kendall, D. G., 1950. An artificial realisation of a simple birth and death process. Journal of the Royal Statistical Society, B.

McCollum JM, Peterson GD, Cox CD, Simpson ML, Samatova NF., 2006. The sorting direct method for stochastic simulation of

biochemical systems with varying reaction execution behavior. Computational Biology and Chemistry.

Slepoy A, Thompson AP, Plimpton SJ., 2008. A constant-time kinetic Monte Carlo algorithm for simulation of large biochemical

reaction networks. The Journal of Chemical Physics.

35/1