Dynamical Analysis of Complex Systems 7CCMCS04 · 2011-05-05 · contains many major concepts which...

Dynamical Analysis of Complex

Systems

7CCMCS04

A Annibale

Department of Mathematics King’s College London

January 2011

Contents

Introduction and Overview 70.1 Brownian Motion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

0.1.1 Einstein’s explanation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70.1.2 Langevin’s equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

0.2 Fokker-Planck equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130.3 Birth-Death processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1 Probability theory 151.1 Probability density and moments . . . . . . . . . . . . . . . . . . . . . . . . . . . 151.2 Transformation of variable . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 161.3 Characteristic Function and Cumulants . . . . . . . . . . . . . . . . . . . . . . . 171.4 The Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 181.5 Generalization to several random variables . . . . . . . . . . . . . . . . . . . . . . 181.6 The multivariate Gaussian distribution . . . . . . . . . . . . . . . . . . . . . . . . 201.7 Examples and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

2 Master equation 232.1 Stochastic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

2.1.1 Stationary processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 252.1.2 Ergodic processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

2.2 Markov processes and the Chapman-Kolmogorov equation . . . . . . . . . . . . . 262.3 The Master equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2.3.1 Steady state . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 292.3.2 Convergence to equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . 302.3.3 Mean-field equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

2.4 Examples and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.4.1 The Wiener-Levy process . . . . . . . . . . . . . . . . . . . . . . . . . . . 322.4.2 The Ornstein-Uhlenbeck process . . . . . . . . . . . . . . . . . . . . . . . 322.4.3 The random walk in one dimension . . . . . . . . . . . . . . . . . . . . . . 34

3 Markov Chains and One step processes 373.1 Markov Chains . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

3.1.1 Eigenvalues and eigenvectors of stochastic matrices . . . . . . . . . . . . . 403.1.2 Definitions based on accessibility of states . . . . . . . . . . . . . . . . . . 413.1.3 Convergence to a stationary state . . . . . . . . . . . . . . . . . . . . . . . 423.1.4 Time-reversal and detailed balance . . . . . . . . . . . . . . . . . . . . . . 43

3

4 CONTENTS

3.2 Examples and exercises on Markov chains . . . . . . . . . . . . . . . . . . . . . . 45

3.2.1 Two-state Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45

3.2.2 Three-state chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

3.3 One-step processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

3.3.1 The Poisson process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

3.4 Examples and exercises on One step processes . . . . . . . . . . . . . . . . . . . . 55

3.4.1 The decay process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.2 Birth-death process . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

3.4.3 Chemical reaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.4.4 Continuos time random walk . . . . . . . . . . . . . . . . . . . . . . . . . 56

3.4.5 Diffusion-annihilation on complex networks . . . . . . . . . . . . . . . . . 57

4 Topics in equilibrium and nonequilibrium statistical mechanics 59

4.1 Equilibrium . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

4.2 Probability distributions in dynamical systems . . . . . . . . . . . . . . . . . . . 60

4.3 Detailed balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

4.4 Time dependent correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.5 Wiener-Khintchine theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

4.5.1 White noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

4.6 Fluctuation-Dissipation theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.1 Linear response . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

4.6.2 The fluctuation-dissipation theorem . . . . . . . . . . . . . . . . . . . . . 69

4.6.3 Power absorption . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

5 Fokker Planck 73

5.1 Kramers-Moyal expansion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

5.1.1 The jump moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

5.1.2 Expression for the multivariate case . . . . . . . . . . . . . . . . . . . . . 75

5.1.3 Deterministic processes: Liouville equation . . . . . . . . . . . . . . . . . 76

5.2 Fokker-Planck as a continuity equation . . . . . . . . . . . . . . . . . . . . . . . . 76

5.3 Macroscopic equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

5.4 Examples of Fokker-Planck equations . . . . . . . . . . . . . . . . . . . . . . . . . 78

6 Langevin equation 81

6.1 Langevin equation for one variable . . . . . . . . . . . . . . . . . . . . . . . . . . 81

6.2 Langevin equation for Brownian motion . . . . . . . . . . . . . . . . . . . . . . . 82

6.2.1 Mean-square displacement . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

6.2.2 Stationary velocity distribution function . . . . . . . . . . . . . . . . . . . 84

6.2.3 Conditional probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

6.3 Colored noise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

6.3.1 Derivation of the Fokker-Planck equation . . . . . . . . . . . . . . . . . . 86

6.4 Quasilinear Langevin equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

6.5 Non-linear Langevin equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

6.5.1 Definition of the Stochastic integral . . . . . . . . . . . . . . . . . . . . . 91

6.6 The Kramers-Moyal coefficients for the Langevin equation . . . . . . . . . . . . . 92

6.6.1 Multivariate case . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

CONTENTS 5

6.7 Diffusion in phase space: Klein-Kramers equation . . . . . . . . . . . . . . . . . . 956.7.1 The strong friction limit: Smoluchowski equation . . . . . . . . . . . . . . 97

6.8 Relaxation times . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 986.9 Fluctuation-dissipation relations for Langevin equation . . . . . . . . . . . . . . . 99

6.9.1 Fluctuation-dissipation violation . . . . . . . . . . . . . . . . . . . . . . . 101

7 Applications to complex systems 1037.1 Neural networks models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1037.2 Microscopic dynamics in probabilistic form . . . . . . . . . . . . . . . . . . . . . 105

7.2.1 Parallel dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1057.2.2 Sequential dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

7.3 Macroscopic dynamics in probabilistic form . . . . . . . . . . . . . . . . . . . . . 1077.3.1 Sequential dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1077.3.2 Parallel dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1117.3.3 General formalism . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

7.4 Detailed balance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1137.4.1 Interaction symmetry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

8 Generating functional analyisis 1178.1 General . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178.2 Generating function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1178.3 Langevin dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1198.4 A simple example: the harmonic oscillator . . . . . . . . . . . . . . . . . . . . . . 1218.5 Quenched disorder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

A Kramers-Kronig relations 127

6 CONTENTS

Introduction and Overview

A complete description of the dynamical evolution of a macroscopic system (N interacting par-ticles in a box or N interacting objects in a lattice) would in principle require solving all themicroscopic equations of the system. We cannot generally do this, instead we use a stochas-tic description, i.e. we describe the system by macroscopic variables, which fluctuate in time.Fluctuations, generally arising from system’s interactions with its environment (the “thermalbath”), the finite number of system’s discrete constituents, etc., constitute a background of“noise” which imposes limitations on the possible accuracy of physical observables measure-ments. These will generally give rise to data exhibiting a well defined deterministic trend, whichis reproducible, with fluctuations around it, which are not.

In this course we study methods to determine the gross deterministic behaviour and the statisticsof fluctuations around it, for model systems that evolve in time under the effect of a processwhich is too complicated to be described deterministically and therefore it can only be describedprobabilistically. We start giving a semi-historical outline of how a phenomenological theory offluctuating phenomena arose and introduce some of the concepts and methods we shall belearning about.

0.1 Brownian Motion

The observation that, when suspended in water, small pollen grains are found to be in a veryanimated and irregular state of motion was first investigated by English botanist Robert Brown,in 1827; as a result, the phenomenon took on the name of Brownian motion. This phenomenonreveals very clearly the statistical fluctuations which occur in a system in thermal equilibriumand can serve as a prototype problem whose analysis provides considerable insights into themechanism responsible for the existence of fluctuations and dissipation of energy. Brownianmotion is also of great practical interest because such fluctuations constitute a background of“noise” which imposes limitations on the possible accuracy of delicate physical measurements.The first convincing explanation of Brownian motion was given by Einstein in 1905.

0.1.1 Einstein’s explanation

Einstein assumed that Brownian motion is caused by frequent impacts on the pollen grains bymolecules of the liquid in which it is suspended and that the motion of these molecules is socomplicated that its effect on the pollen grain can only be described probabilistically in termsof frequent statistically independent impacts.

In Einstein’s model, he assumes that each individual particle executes a motion which is inde-

7

8 CONTENTS

pendent of the motion of all other particles and that the movements of the same particle indifferent time intervals are independent processes, as long as these intervals are not chosen toosmall. In other words, it is assumed that impacts happen at times 0, τ, 2τ, · · ·, with τ being verysmall compared to the observable time intervals, but nevertheless so large that in two successivetime intervals τ the movements executed by the particle can be thought of as events which areindependet of each other.

For simplicity, consider the case of Brownian motion in one dimension, i.e. along the X-axis.It is assumed that when an impact occurs, the probability that the pollen particle receivesa displacement between ∆ and ∆ + d∆ is φ(∆)d∆, so that

∫∞−∞ d∆φ(∆) = 1. It is also

assumed that the probability density φ is even, i.e. φ(∆) = φ(−∆), from which it follows∫∞−∞ d∆φ(∆)∆2n+1 = 1 ∀ n = 1, 2, · · ·.

Now suppose that f(x, t)dx is the probability that the particle lies between x and x+dx at timet, so that

∫∞−∞ dx f(x, t) = 1; the probability that the particle lies between x and x+ dx at time

t+ τ is

f(x, t+ τ)dx =∑

∆

Prob(particle at x+ ∆ at time t and receives a kick −∆ in the time gap τ)

= dx

∫ ∞

−∞d∆ f(x+ ∆, t)φ(−∆) = dx

∫ ∞

−∞d∆ f(x+ ∆, t)φ(∆) (1)

Since τ is small

f(x, t+ τ) = f(x, t) + τ∂f

∂t

Furthermore we develop f(x+ ∆, t) in powers of ∆

f(x+ ∆, t) = f(x, t) + ∆∂f

∂∆+

1

2∆2 ∂

2f

∂∆2+ · · · (2)

Using this series under the integral we get

f(x, t) + τ∂f

∂τ= f(x, t)

∫ ∞

−∞d∆φ(∆) +

∂f

∂x

∫ ∞

−∞d∆φ(∆)∆ +

1

2

∂2f

∂x2

∫ ∞

−∞d∆φ(∆)∆2 + · · ·

Terms where ∆ has an odd exponent vanish because of the parity of function φ. Out of theremaining terms, each one is very small compared with the previous, since φ will only be differentfrom zero for very small values of ∆, so we obtain

∂f

∂t= DE

∂2f

∂x2(3)

where we have used normalization of φ and set

DE =1

2τ

∫ ∞

−∞d∆φ(∆)∆2

This is known as the differential equation of diffusion, with DE being the diffusion coefficient,and its solution is, assuming the particle is at time t = 0 in x = 0,

f(x, t) =1√

4πDEte−x

2/4DEt (4)

0.1. BROWNIAN MOTION 9

(Note that f(x, t) is normalized, so that it is indeed a probability density).

The result (4) is obtained by Fourier transform methods: we note that equation (3) is solved byfunctions of the form

f(x, t) =

∫

dξ√2πF (ξ)e−ixξ−tDEξ

2

with F (ξ) an arbitrary function of ξ. Let f(x, 0) = h(x). Then

F (ξ) =

∫

dx√2πh(x)eixξ

If the particle is in x = 0 at t = 0, one has h(x) = δ(x) and F (ξ) = 1/√

2π; f(x, t) follows bysolving the Gaussian integral

f(x, t) =

∫

dξ

2πe−ixξ−tDEξ

2=

1√4πDEt

e−x2/4DEt

The displacement on the X-axis that a particle experience on average leads to Einstein’s muchacclaimed relation for the diffusion coefficient

λx =√

〈x2〉 =√

2DEt

Einstein’s derivation is based on a discrete time assumption, that impacts occur at times0, τ, 2τ..., and the resulting equation must be regarded as an approximation in which τ is con-sidered so small that t may be considered as continuous. This description of Brownian motioncontains many major concepts which are central of dynamical analysis of stochastic processes

• Chapman-Kolmogorov equation occurs in (1), where the probability of the particle beingin x at time t + τ is given by the sum of the probability of all possible “pushes” ∆ fromposition x+∆, multiplied by the probability of being at x+∆ at time t. This is based onthe independence of the pushes on the previous history, i.e. on the Markovian property ofthe process.

• Kramers-Moyal and similar expansions are essentially the approximation used in (2), whicheffectively replaces a process whose sample path need not be continuous with one whosepaths are continuous.

• equation (3) is a special case of the Fokker-Planck equation, which is an equation for theprobability distribution to find the system in a certain state, and describes a large class ofvery interesting stochastic processes in which the system has a continuous sample path.

The pollen’s grain position, if thought of as obeying a probabilistic law given by solving thediffusion equation, where t is continuous, can be written as x(t), where x(t) is a continuousfunction of time, but a random function. This leads us to consider the possibility of describingthe dynamics of the system in a more direct probabilistic way, in which one has a random orstochastic differential equation for the path. This procedure was initiated by Langevin.

10 CONTENTS

0.1.2 Langevin’s equation

A particle of mass m and radii a, suspended in a fluid with viscosity η experiences a viscousdrag force in the X-direction

fs = −αv = −6πηav

which represents the average macroscopic effect of the irregular impacts of the molecules of thefluid on the particle. The momentum of the particle is tranferred to the molecules of the fluidand if the mass of the particle is large enough so that the velocity acquired by the particlebecause of thermal fluctuations is negligible, the dynamical equation for the particle is given by

mv = −αv (5)

and the velocity of the particle will decrease to zero with the relaxation time τ = m/α, accordingto v(t) = v(0)e−t/τ .

If the mass is small, the thermal velocity may be observable and therefore the evolution of theparticle cannot be described by (5). Neverthelss, if the mass of the particle is still large comparedto that of molecules, one expects (5) to be valid approximately. However, it must be modifiedsomehow. The modification consists in adding a fluctuating force on the RHS of (5), i.e. thetotal force of the molecules acting on the small particle is decomposed into a continuous damping(average) force fs and a fluctuating force fr(t) mimicking the effect of the deviation from averagebehaviour of molecules of the fluid. The resulting dynamical equation of the particle is

mv = −αv + fr(t) (6)

The force fr(t) is a stochastic or random force, the properties of which are given only on average.

Where does the stochastic force arise? If we were to treat Brownian motion exactly, we shouldsolve the coupled equation of motion for all the molecules of the fluid and for the small particle.Because of the large number of molecules in the fluid (order 1023), however, we cannot generallysolve these coupled equations. Furthermore, since we do not know the initial values of all themolecules of the fluid we cannot calculate the exact motion of the small particle immersed in thefluid. If we were to use another system (particle and fluid) identical to the first except for theinitial values of the fluid, a different motion of the small particle would result. As usually donein thermodynamics, we consider an ensemble of such systems (Gibbs ensemble). The force fr(t),then varies from system to system and the only thing that we can do is to consider averages ofthis force for the ensemble.

Introducing the fluctuating force per unit mass Γ(t) = fr(t)/m and setting γ = α/m we getfrom (6)

v + γv = Γ(t) (7)

which is a stochastic differential equation, i.e. a differential equation with a random term Γ (theLangevin force) and hence whose solution is, in some sense, a random function. Each solutionrepresents a different trajectory, however measurable results can be derived by using generalproperties of Γ.

0.1. BROWNIAN MOTION 11

Since the Langevin equation contains a frictional force −αv, it implies the existence of a processwhereby the energy associated with the coordinate x of the particle is dissipated in the courseof time to the other degrees of freedom (e.g. to the molecules of the liquid surrounding theparticle). This is in agreement with our macroscopic experience of frictional forces. Nevertheless,one deals here with an example of an interesting and conceptually difficult general problem. Themicroscopic equations governing the motion of the combined system particle+fluid do not involveany frictional forces. The total energy is conserved and the motion is reversible (if the sign of thetime were reversed the equation of motion would be unchanged and all particles would classicallyretrace their paths in time). But if one focuses attention on the massive particle, its interactionwith the fluid (large system) can be adequately described by equations of motion involvingfrictional forces. There is thus dissipation of energy from the particle to the heat reservoir(fluid) and the motion of the particle is not reversible. It is beyond the scope of this course tounderstand how this situation comes about and how the modified equations of motion for theparticle are derived from the microscpic equations. We shall assume the validity of Langevin’sequation as an adequate phenomenological description of Brownian motion and illustrate howit can be applied to the calculation of quantities of physical interest.

In the reminder of this section we derive the diffusion coefficient of the Brownian motion. Firstwe rewrite (7) in terms of x, by using v = x and multiply it by x, thus obtaining

1

2

d2

dt2(x2)−

(

dx

dt

)2

= −1

2γd

dt(x2) + Γx (8)

We assume that the average over the ensemble of Γ(t) is zero

〈Γ(t)〉 = 0

because the equation of motion of the average velocity should be given by (5). Normally, one alsoassume that the collision of different molecules of the fluid with the particle are approximatelyindependent, so if we multiply two Langevin forces at different times the average value is zerofor time differences t′ − t which are larger than the duration time τ0 of a collision, i.e.

〈Γ(t)Γ(t′)〉 = 0 for |t− t′| ≥ τ0Usually, the duration time τ0 of a collision is much smaller than the relaxation time τ of thevelocity of the small particle. We may therefore take the limit τ0 → 0 as a reasonable approxi-mation, giving

〈Γ(t)Γ(t′)〉 = 2Dδ(t− t′) (9)

where D can be determined by the energy equipartition, as we will show shortly.If the mean value of the fluctuating force Γ always vanishes, irrespective of the value of v or x,then 〈xΓ〉 = 〈x〉〈Γ〉 = 0, because of the irregularity of Γ and assuming independence of x and Γ,i.e. assuming that the irregularities of x as a function of time do not somehow conspire to bealways in the same direction as those of Γ.

If the particle is assimilated to a molecule of the liquid, in equilibrium at temperature T, itsaverage kinetic energy, is given by the equipartition law (in one dimension)

1

2m〈v2〉 =

1

2kT

12 CONTENTS

where k is Boltzman’s constant, T is the absolute temperature.Averaging (8) over a large number of different particles and using equipartition of energy onehas

d2

dt2〈x2〉+ γ

d

dt〈x2〉 =

2kT

m(10)

where the term 〈xΓ〉 has been set equal to zero. The solution is given by (C is an integrationconstant)

d〈x2〉dt

=2kT

γm+ Ce−γt (11)

Langevin estimated that the decaying exponential approaches zero with a time constant of theorder of 10−8, so that d〈x2〉/dt enters rapidly a constant regime d〈x2〉/dt = 2kT/γm. Thereforeone further integration in this asymptotic regime yields

〈x2〉 − 〈x20〉 ≃

2kT

γmt (12)

and the particle bahaves like a diffusing particle executing a random walk, so 〈x2〉 ∝ t. Indeedequation (12) corresponds to Einstein’s relation, once we identify

DE =kT

γm

Einstein’s condition of independence of displacements ∆ at different times is equivalent toLangevin assumption 〈Γx〉 = 0.Note that Einstein relation for the diffusion coefficient

DE = limt→∞

〈x2〉 − 〈x20〉

2t=kT

γm=

RT

6NAπηa

where R is the perfect gas constant and NA is the Avogadro number.allows the determination of the Avogadro number (a microscopic quantity) from experimentallyaccessible macroscopic quantities, thus providing a nonambiguous link between the microscopicand macroscpic levels of description in physics and can be considered as a “proof” of the existenceof atoms.

The picture of a Brownian particle immersed in a fluid is typical of a variety of problems, evenwhen there are no particles. For instance, it is the case if there is only one slowly or heavydegree of freedom that interacts, in a more or less irregular random fashion, with many otherfast or light degrees of freedom, which play the role of the bath. Thus the general concept offluctuations describable by Langevin equation and Fokker-Planck equations has developed veryextensively in a very wide range of situations.

The advantages of a continuous description turn out to be very significant, since only a very fewparameters are required, i.e. essentially the coefficient of the derivatives

∫ ∞

−∞d∆φ(∆)∆,

∫ ∞

−∞d∆φ(∆)∆2 (13)

It is rare a problem (mechanical oscillators, fluctuations in electrical circuits, chemical reactions,dynamics of dipoles and spins, escape over metastable barriers, etc.) which cannot be specified,

0.2. FOKKER-PLANCK EQUATION 13

in at least some degrees of freedom, by the corresponding Fokker-Planck equation (allowing bothcoefficients (13) to depend on x and t, and in a space of an appropriate number of dimensions),or equivalently, by augmenting a deterministic differential equation with some flucutating forceor field, like in Langevin’s approach.

0.2 Fokker-Planck equation

The general Fokker-Planck equation for one variable x has the form

∂f(x, t)

∂t=

[

− d

dxD(1)(x) +

d2

dx2D(2)(x)

]

f(x, t).

D(2)(x) > 0 is called the diffusion coefficient and D(1)(x) is the drift coefficient. The diffusionand drift coefficient may also depend on time. Mathematically, it is a linear second-order partialdifferential equation of a parabolic type.

A generalization to the N variables x1 · · ·xN has the form

d

dtf(x, t) =

−N∑

i=1

∂

∂xiD

(1)i (x) +

N∑

i,j=1

∂2

∂xi∂xjD(2)(x)

f(x, t). (14)

The Fokker-Planck equation is an equation for the distribution function of N macroscopic vari-ables x.

For a deterministic treatment we neglect the fluctuations of the macroscopic variables. For theFP equation this means that we neglect the diffusion term. Equation (14) is then equivalent tothe system of differential equations

d

dtxi = D

(1)i (x1, · · · , xN ), i = 1, · · · , N

for the N macrovariables x.

0.3 Birth-Death processes

A wide number of phenomena can be modelled by birth-death processes. An entartaining modelis the predator-prey system (Lotka-Volterra, 1925/26), consisting of two species, one of whichpreys on the other, which is itself supplied with an inexhaustible food supply. Let X symbolizeprey, Y the predator and A the food of the prey. The process under consideration may be

X +A→ 2X

Y +X → 2Y

Y → B (death by natural causes) (15)

It is easy to guess model differential equations for x and y, the numbers of X and Y

d

dtx = k1ax− k2xy

d

dty = k2xy − k3y (16)

14 CONTENTS

Fixed points are a1 = (0, 0)t and a2 = (a k1/k2, k3/k2)t. a1 is unstable, a2 is marginally

stable. Leads to oscillatory behaviour in predator and prey population, which could be exploitedeconomically (e.g. hare–lynx ⇒ price of lynx-fur high when number of lynx is low!).Of course, realistic systems do not follow the solutions of the differential equations exactly, butfluctuate about these curves. One must include these fluctuations and the easiest way to do this,is by means of a birth-death master equation. We assume a probability distribution P (x, y, t) forthe number of individuals at time t and that in an infinitesimal time interval ∆t the followingtransition probabilities laws hold

Prob(x→ x+ 1, y → y) = k1ax∆t

Prob(x→ x− 1, y → y + 1) = k2xy∆t

Prob(x→ x, y → y − 1) = k3y∆t

Prob(x→ x, y → y) = 1− (k1ax+ k2xy + k3y)∆t (17)

So far we have simply replaced the rate laws by probability laws. Next, we employ Chapman-Kolmogorov equation, i.e. we write the probability at time t+∆t as the sum of the probabilitiesof all previous states multiplied by the probability of a transition to the state (x, y). By letting∆t→ 0

P (x, y, t+ ∆t)− P (x, y, t)

∆t→ ∂P (x, y, t)

∂t= k1a(x− 1)P (x− 1, y, t)

+k2(x+ 1)(y − 1)P (x+ 1, y − 1, t) + k3(y + 1)P (x, y − 1, t)

−(k1ax+ k2xy + k3y)P (x, y, t) (18)

This model has a wide application (systems of molecules, photons, biological systems, etc.)The choice of transition probabilities can be derived with much greater detail than the simplemultiplicative choice (this is just the most elementary). Solutions of (18) determine both thegross deterministic motion and the fluctuations. The latter are typically of magnitude of thesquare root of the number of individuals involved. A theory can be developed which can dealwith a wide range of models in this category and there is a close connection between this kindof theory and that of stochastic differential equations.

Chapter 1

Probability theory

1.1 Probability density and moments

We call X a stochastic variable if the number X cannot be predicted. By repeating the experi-ment N times (N realizations) we obtain N numbers

X1, X2, ..., XN (1.1)

Instead of repeating the experiment with one system N times, one may also think to havean ensemble of N identical systems and make one experiment for each system. Whereas thenumbers X1, ..., XN cannot be predicted, some averages for N → ∞ may be predicted andshould give the same value for identical systems. The simplest average is

〈X〉 = limN→∞

1

N(X1 + ...+XN ); (1.2)

A general average is

〈f(X)〉 = limN→∞

1

N(f(X1) + ...+ f(XN )) (1.3)

where f is some arbitrary function.

If f(X) = θ(x−X), where θ is the step function defined as

θ(x) =

1 x ≥ 00 x < 0

(1.4)

and if M is the number of experiments (or realizations) where the random variable X is equalor less than x then

〈θ(x−X)〉 = limN→∞

1

N(θ(x−X1) + ...+ θ(x−XN ))

= limN→∞

M

N≡ p(X ≤ x) (1.5)

i.e. in the limit N →∞ the relative frequency M/N is the probability that the random variableX is equal or less than x.

15

16 CHAPTER 1. PROBABILITY THEORY

The probability density function PX(x) of the random variable X is defined, assuming that p isdifferentiable, as

PX(x) =d

dxp(X ≤ x) =

d

dx〈θ(x−X)〉 = 〈δ(x−X)〉 (1.6)

where we introduced the Dirac δ as the derivative of the step function. Hence, the probabilityof finding the stochastic variable X in the interval x ≤ X ≤ x+ dx is

p(X ≤ x+ dx)− p(X ≤ x) = dxd

dxp(X ≤ x) = PX(x)dx (1.7)

Normalization requires∫

dxPX(x) = 1.For discrete random variables, p jumps at the discrete values xn and PX(x) then consists in asum of δ functions

PX(x) =∑

n

pnδ(x− xn) (1.8)

where pn is the probability to find the discrete value xn.The statistical properties of the random variable X are completely determined by the probabilitydensity PX(x). It is easy to form averages on the basis of the given probability density PX(x)

〈f(X)〉 = 〈∫

dxδ(x−X)f(x)〉 =

∫

dxPX(x)f(x) (1.9)

In particular, 〈Xm〉 ≡ µm is called the mth moment of X, and µ1 the average or mean. Also,

σ2 = 〈(X − 〈X〉)2〉 = µ2 − µ21 (1.10)

is called the variance, which is the square of the standard deviation σ.

Exercise - Show that the Cauchy distribution

P (x) =γ

π[(x− a)2 + γ2](−∞ < x <∞) (1.11)

does not have finite variance.

1.2 Transformation of variable

Let the continuous variable X be mapped into the new variable Y = f(X). The probabilitythat Y has a value between y and y + dy is

PY (y)dy =

∫

y<f(x)<y+dydxPX(x) (1.12)

An equivalent formula is

PY (y) =

∫

dxPX(x)δ[f(x)− y] (1.13)

The last integral is easily evaluated: if xn = f−1n (y) is the nth simple root of f(x)− y = 0, then

PY (y) =∑

n

∫

dxPX(x)δ(x− xn)|df

dx|−1x=xn

=∑

n

PX(xn)|f ′(xn)|−1 (1.14)

1.3. CHARACTERISTIC FUNCTION AND CUMULANTS 17

A familiar example is the derivation, from the one-dimensional Maxwell distribution for X = v,

P (v) =

√

m

2πkTexp

(

−mv2

2kT

)

(1.15)

of the Boltzmann probability density for the energy E = mv2/2

P (E) =1√

πkTEexp

(

− E

kT

)

(1.16)

Exercise Show the derivation of the above result.

1.3 Characteristic Function and Cumulants

The characteristic function of a stochastic variable X is defined by the average of exp(ikX),namely

GX(k) = 〈eikX〉 =

∫

dxP (x)eikX (1.17)

This is merely the Fourier transform of PX(x), and can naturally be solved for the probabilitydistribution

PX(x) =1

2π

∫ ∞

−∞dk exp(−ikx)GX(k). (1.18)

The function GX(k) provides an alternative complete characterization of the probability distri-bution. It exists for all real k and has the properties

G(0) = 1; |G(k)| ≤ 1 (1.19)

By expanding the exponential in the integrand of (1.17) and interchaing the order of the resultingseries and the integral, one gets

G(k) =∑

m=0

(ik)m

m!

∫

dxxmPX(x) =∞∑

m=0

(ik)m

m!µm (1.20)

Therefore, one finds that GX(k) is the moment generating function, in the sense that

µm = (−i)m ∂m

∂kmGX(k)|k=0 (1.21)

The same function GX(k) also serves to generate the cumulants κm = 〈〈Xm〉〉 which are definedby

logG(k) =∞∑

m=1

(ik)m

m!κm (1.22)

Note that, owing to PX(x) is normalised, logG(0) = 0 so the m = 0 term vanishes and theabove series begins at m = 1. The cumulants are combinations of the moments. For the firstfour cumulants, for example, one has

κ1 = µ1

κ2 = µ2 − µ21 = σ2

κ3 = µ3 − 3µ3µ1 + 2µ31

κ4 = µ4 − 4µ3µ1 − 3µ22 + 12µ2µ

21 − 6µ4

1 (1.23)


We finally mention that there exists a general expression for κm in terms of the determinant ofa m×m matrix constructed with the moments µi|i = 1, ...,m

κm = (−1)m−1 det

µ1 1 0 0 0 . . .µ2 µ1 1 0 0 . . .

µ3 µ2

(

21

)

µ1 1 0 . . .

µ4 µ3

(

31

)

µ2

(

32

)

µ1 1 . . .

µ5 µ4

(

41

)

µ3

(

42

)

µ2

(

43

)

µ1 . . .

. . . . . . . . . . . . . . . . . .

(1.24)

1.4 The Gaussian distribution

The general form of the Gaussian distribution for one variable is

P (x) = Ce−12Ax2−Bx (1.25)

where A is a positive constant that determines the width, B determines the position of the peakand C is a normalization constant

C =

(

A

2π

)12

e−B2/2A (1.26)

It is often convenient to express the parameters A,B in term of the average µ1 = −B/A andvariance σ2 = 1/A

P (x) =1√

2πσ2exp

[

−(x− µ1)2

2σ2

]

(1.27)

The characteristic function is

G(k) = 〈eikx〉 =1√

2πσ2

∫

dx exp

[

−(x− µ1)2

2σ2+ ikx

]

=1√

2πσ2

∫

dx exp

[

−x2 − 2(µ1 + ikσ2)x+ µ2

1

2σ2

]

= eiµ1k−12σ2k2

(1.28)

The use of cumulants is particularly suited for this distribution since

κ1 = µ1, κ2 = σ2, κ3 = κ4 = ... = 0 (1.29)

1.5 Generalization to several random variables

Generally we may have n random variables X1, ..., Xn. Similarly to (1.6) we may introduce then-dimensional distribution function

PX1,...,Xn(x1, ..., xn) ≡ Pn(x1, ..., xn) = 〈δ(x1 −X1)...δ(xn −Xn)〉 (1.30)

1.5. GENERALIZATION TO SEVERAL RANDOM VARIABLES 19

Averages can be calculated as

〈f(X1, ..., Xn)〉 =

∫

dx1 ...

∫

dxn f(x1, ..., xn)Pn(x1, ..., xn) (1.31)

The probability density of s < n random variables (marginal distribution) is obtained by inte-gration over the other variables

Ps(x1, ..., xs) =

∫

dxs+1 ...

∫

dxnPn(x1, ..., xs, xs+1, ..., xn) (1.32)

Alternatively, one may consider those realizations of the n random variables X1, ..., Xn wherethe first s random variables take the fixed values x1, ..., xs, and consider the joint probability ofthe remaining variables Xs+1, ..., Xn. This is called the conditional probability and it is denotedby

Pn−s|s(xs+1, ..., xn|x1, ..., xs) (1.33)

By Bayes’ rule,

Pn(x1, ..., xn) = Pn−s|s(xs+1, ..., xn|x1, ..., xs)Ps(x1, ..., xs) (1.34)

and we may express the conditional probability density in terms of Pn only

Pn−s|s(xs+1, ..., xn|x1, ..., xs) =Pn(x1, ..., xn)

∫

dxs+1 ...∫

dxn Pn(x1, ..., xn)(1.35)

Note that from normalization of the joint and marginal probabilities it follows the normalizationof the conditional probability.

For multivariate probability distributions, the moments are defined by

〈Xm11 ...Xmn

n 〉 =

∫

dx1 ...dxn xm11 ...xmn

n Pn(x1, ..., xn) (1.36)

while the characteristic (moment generating) function depends on n auxiliary variables k =(k1, ..., kn)

G(k) = 〈exp[i(k1X1 + ...+ knXn)]〉 =∑ (ik1)

m1 ...(ikn)mn

m1!...mn!〈Xm1

1 ...Xmnn 〉 (1.37)

Similarly, the cumulants of the multivariate distribution are defined in terms of the coefficientsof the expansion of lnG as

lnG(k) =∑′ (ik1)

m1 ...(ikn)mn

m1!...mn!〈〈Xm1

1 ...Xmnn 〉〉 (1.38)

where the prime indicates the absence of the term with all the mi simultaneously vanishing (bythe normalization of Pn).


1.6 The multivariate Gaussian distribution

A collection of n random variable X1, ..., Xn are said to have a zero-mean (joint) Gaussianprobability distribution if

Pn(x1, ..., xn) = C exp

−1

2

n∑

ij

Aijxixj

(1.39)

with positive-definite symmetric matrix A and C determined from normalization as

C =

√

detA

(2π)n(1.40)

In vector notation

P (x) =

√

detA

(2π)ne−

12xAx (1.41)

The symmetry of the distribution implies 〈xi〉 = 0 and the covariances are

〈xixj〉 = (A−1)ij (1.42)

It follows that a zero-mean Gaussian distribution is fully determined by its covariances. Writingthe covariance matrix as C = A−1 one can represent the distribution in the form

P (x) =1

(2π)n/2(detC)1/2e−

12xC−1x (1.43)

One can immediately work out the characteristic function

G(k) = 〈eikx〉 = e−12kCk (1.44)

Thus the characteristic function of a Gaussian distribution is again Gaussian.

These results generalize easily to the case of nonzero means by setting y = x+µ. The probabilitydistribution of y is then

P (y) =1

(2π)n/2(detC)1/2e−

12(y−µ)C−1(y−µ) (1.45)

with 〈yi〉 = µi and 〈yiyj〉 = µiµj + Cij , i.e. Cij = 〈〈yiyj〉〉. The corresponding characteristicfunction is

G(k) = 〈eiky〉 = eikµ〈eikx〉 = eikµ− 12kCk (1.46)

and hence again of a Gaussian form.

If follows that a Gaussian distribution is fully determined by the averages of the variables andtheir covariance matrix. If the variables are uncorrelated then A−1 is diagonal, so A is diagonaland the variables are also independent (i.e. the joint probability distribution factorizes). Thisindependence can always be achieved by a transformation of variables.

1.7. EXAMPLES AND EXERCISES 21

The moments of a multivariate Gaussian distribution with zero mean have a remarkable property.Consider (1.41). We rewrite its characteristic function as

G(k) =∏

pq

e12(ikq)Cqp(ikp) =

∏

pq

e12(ikp)(ikq)〈xpxq〉 =

∏

pq

[1 +1

2(ikp)(ikq)〈xpxq〉+ ...] (1.47)

To find the moments 〈xixjxk...〉 one has to collect the terms proportional to kikjkk.... We notethat only terms with an even number of factors k show up in (1.47). As a consequence, allodd moments of a zero-mean Gaussian distribution vanish. Hence, we restrict to the case of aneven number of factors in 〈...〉. We suppose that the subscripts of the factors are different fromeach other, so that the dots in (1.47) cannot contribute. Hence, the only way in which kikjkk...occurs is as the result of multiplying suitable pairs of kpkq. On the other hand, each product ofpairs kpkq that makes up kikjkk... does occur. The result is

〈xixjxk...〉 =∑

〈xpxq〉〈xuxv〉... (1.48)

The subscripts p, u, u, v... are the same as i, j, k, ... taken two by two. The summation extendsover all different ways in which i, j, k, ... can be subdivided in pairs. The factor 1/2 in (1.47)cancels because the product contains each pair twice.For instance,

〈x1x2x3x4〉 = 〈x1x2〉〈x3x4〉+ 〈x1x3〉〈x2x4〉+ 〈x1x4〉〈x2x3〉 (1.49)

This is a realization of Wick theorem in classical systems.

1.7 Examples and exercises

To illustrate the definitions given for multivariate distributions, let us compute them for a simpletwo-variable Gaussian distribution

P2(x1, x2) =

√

1− λ2

(2πσ2)2exp

[

− 1

2σ2(x2

1 − 2λx1x2 + x22)

]

(1.50)

where the parameter −1 < λ < 1 is such to ensure that the quadratic form in the exponent ispositive definite. To fix ideas, one may interpret P2(x1, x2) as the Boltzman distribution of twoharmonic oscillators coupled by a potential term ∝ λx1x2.

(a) Verify that the normalization factor takes this value by direct integration or by comparingour distribution with the multidimansional Gaussian distribution (1.41) where

A =1

σ2

(

1 −λ−λ 1

)

(1.51)

(b) Verify that the marginal probability of the individual variables (for one of them since theyare identical), defined by P1(x1) =

∫

dx2 P2(x1, x2) is

P1(x1) =1

√

2πσ2λ

exp

(

− x21

2σλ2

)

with σ2λ =

σ2

1− λ2(1.52)


We see that the marginal distribution depends on λ, which results in a modified variance.To see that σ2

λ is indeed the variance 〈〈x21〉〉 = 〈x2

1〉−〈x1〉2, note that 〈xm1 〉 can be obtainedfrom the marginal distribution only (this is a general result)

〈xm1 〉 =

∫

dx1 dx2 xm1 P2(x1, x2) =

∫

dx1 xm1 P1(x1) (1.53)

Then, inspecting the marginal distribution, verify that

〈x1〉 = 0 〈x2〉 = 0 (1.54)

〈x21〉 = σ2

λ 〈x22〉 = σ2

λ (1.55)

(c) To complete the calculation of the moments up to second order, we need the covariance ofx1 and x2: 〈〈x1x2〉〉 = 〈x1x2〉 − 〈x1〉〈x2〉 which reduces to calculate 〈x1x2〉. Show that

〈x1x2〉 =λ

1− λ2σ2 (1.56)

(d) It is convenient to calculate the normalised variance, or correlation coefficient,

γ =〈x1x2〉

√

〈x21〉〈x2

2〉(1.57)

which is merely given by λ. Therefore the parameter λ in the distribution is a measure ofhow correlated the variables x1 and x2 are. Note that in the limit λ→ 0 the variables arenot correlated at all and the distribution factorizes

P2|λ=0 =1√

2πσ2e−x

21/2σ

2 1√2πσ2

e−x22/2σ

2(1.58)

In the limit λ→ 1 the variables are maximally correlated, γ = 1. The distribution becomesa function of x1 − x2, so it is favoured that x1 and x2 take similar values

P2|λ=1 → e−(x1−x2)2/2σ2(1.59)

We can now interpret the increase of the variance with λ: the correlation between thevariables allow them to take arbitrarily large values, with the only restriction of theirdifference being small.

(e) By using Bayes rule P1|1(x1|x2) = P2(x1, x2)/P1(x2) show that

P1|1(x1|x2) =1√

2πσ2exp

[

− 1

2σ2(x1 − λx2)

2]

(1.60)

Then, at λ = 0 (no correlation) the values taken by x1 are indpendent of x2, while forλ → 1 they are centered around those taken by x2, and hence strongly conditioned bythem.

Chapter 2

Master equation

Now that we have reviewed some of the basic ideas of probability theory, we will begin tostudy how probability distributions evolve in time. In this chapter we will derive equationsfor the evolution of probability distribution for processes in which most of the memory effectsin the evolution can be neglected (Markov processes) and we shall apply these equations to avariety of problems involving discrete stochastic variables. The level at which we discuss theevolution of probability distributions in this chapter is seminphenomenological because we donot concern ourselves with the underlying dynamics of the stochastic variables. We approximatethe dynamics by a judicious choice of the transition matrix.

The equations which govern the evolution of the probability distribution for Markov processes isthe Master equation. It is one of the most important equations in statistical physics because of itsalmost universal applicability. As a system of stochastic variables evolves, there are transitionsbetween various values of the stochastic variables. Through transitions, the probability of findingthe system in a given state changes until the system reaches a final equilibrium steady state inwhich transitions cannot cause further changes in the probability distributions (it can happenthat it never reaches a steady state, but we will be concerned mostly with the cases for whichit can). To derive the Master equation we must assume that the probability for each transitiondepends only on the preceding step and not on any previous history (this assumption applies tomany systems). If the transitions between values of the stochastic variable only occur in smallsteps, then the master equation reduces approximately to a partial differential equation for theprobability density (one example for such a partial differential equation is the Fokker-Planckequation that we will derive in Chapter 5).

2.1 Stochastic processes

A stochastic process is the time evolution of a stochastic variable. If X is the stochastic variabe,the stochastic process is X(t). A stochastic variable is defined by specifying the set of possiblevalues called “range” or “set of states” and the probability distribution over the set. The setcan be discrete (eg. number of electrons arrived at an anode up to a certain time) or continuous

(eg. the velocity of a Brownian particle). If the set is multidimensional the stochastic variableis a vector (e.g. the three velocity components of a Brownian particle).

Stochastic processes are often encountered in Statistical Mechanics. The latter studies systems

23

24 CHAPTER 2. MASTER EQUATION

of large numbers of particles, so precise calculations cannot be made. Instead, average valuesare measured and fluctuations about the average values are calculated by means of probabilitytheory. An archetypal example of stochastic process is the Brownian motion, i.e. the motion ofa heavy colloidal particle immersed in a fluid made up of light particles. The stochastic variableX in this case may be the position or the velocity of the particle.A standard procedure in thermodynamics is to regard the system as an instance of an ensemble

of systems, i.e. a collection of systems each leading to a number X which depends on time.Alhough the outcome of one system cannot be precisely predicted, we assume that ensembleaverages exist and can be calculated. For the fixed time t1 we may therefore define a probabilitydensity by

P1(x1, t1) = 〈δ(x1 −X(t1))〉 (2.1)

where the bracket 〈...〉 indicates the ensemble average. Similarly we can define the probabilitydensity Pn

Pn(x1, t1; ...;xn, tn) = 〈δ(x1 −X(t1))...δ(xn −X(tn))〉 (2.2)

If we know the infinite hierarchy of probability densities

P1(x1, t1), P2(x1, t1;x2, t2), P3(x1, t1;x2; t2;x3, t3), ... (2.3)

we know completly the time dependence of the process described by the random variable X(t).The hierarchy of functions Pn obey the following consistency conditions:

1. Pn ≥ 0;

2. Pn does not change on interchange of two pairs (xk, tk) and (xℓ, tℓ)

3.∫

dxn Pn(x1, t1; ...;xn−1, tn−1;xn, tn) = Pn−1(x1, t1; ...;xn−1, tn−1)

4.∫

dx1 P1(x1, t1) = 1

We can obtain any average from the probability density, as in Sec. 1.5. The only new featureis that these average values are no longer just numbers, but are now functions of the t valuesspecifying which random variables are being averaged. For example, the average of X at timet1 is given by

〈X(t1)〉 =

∫

dx1 P1(x1, t1)x1 (2.4)

More generally, take n values t1, t2, ..., tn for the time variable (not necessarily different) andform the n-th moment

〈X(t1)X(t2)...X(tn)〉 =

∫

dx1dx2...dxn Pn(xn, tn; ...;x2, t2;x1, t1)x1x2...xn (2.5)

Of particular interest is the autocorrelation function

κ(t1, t2) = 〈〈X(t1)X(t2)〉〉 = 〈X(t1)X(t2)〉 − 〈X(t1)〉〈X(t2)〉 (2.6)

For t1 = t2 it reduces to the time-dependent variance 〈〈X2(t)〉〉 = σ2(t). The prefix auto is usedbecause X(t1), X(t2), ... belong to the same random process. If two random processes, say X(t)and Y (t), are involved in some problem, then expectations such as 〈X(t1)Y (t2)〉 may arise. Theresulting function of t1 and t2 is called correlation function.

2.1. STOCHASTIC PROCESSES 25

The generalization of the concept of a characteristic funtion to stochastic processes is the char-acteristic functional. Introduce an arbitrary auxiliary test function k(t), then the characteristicor moment generating functional is defined as the following functional of k(t)

G([k]) = 〈exp

[

i

∫ ∞

−∞dt k(t)X(t)

]

〉 (2.7)

The notation G([k]) emphasizes that G depends on the whole function k. Expand in powers ofk and get

G([k]) =∞∑

m=0

im

m!

∫

dt1...dtm k(t1)...k(tm)〈X(t1)...X(tm)〉 (2.8)

Thus each moment of the joint distribution of X(t1), X(t2), ... can be found as the coefficient ofthe term with k(t1)k(t2)... in this expression.Similarly, the cumulants can be found from

logG([k]) =∞∑

m=1

im

m!

∫

dt1...dtm k(t1)...k(tm)〈〈X(t1)...X(tm)〉〉 (2.9)

As done in Sec.1.5, we may attribute fixed valuesX1, ..., Xs to the random variableX at the timest1, ..., ts and consider the joint probability for the variable X to assume the values Xs+1, ..., Xn

at the later times ts+1, ..., tn, under the condition that at the times t1, ..., ts it had the sharpvalues x1, ..., xs,

P (xs+1, ts+1; ...;xn, tn|xs, ts; ...;x1, t1) = 〈δ(xn −X(tn))...δ(xs+1 −X(ts+1))〉Xts=xs,...,Xt1=x1

with tn > tn−1 > ... > t1 (2.10)

Again

Pn−s|s(xs+1, ts+1; ..., xn, tn|xs, ts; ...;x1, t1) =Pn(xn, tn; ...;x1, t1)

∫

dx1 ...∫

dxsPn(xn, tn; ...;x1, t1)(2.11)

2.1.1 Stationary processes

A process is stationary if all Pn are not changed by replacing ti with ti + T (T arbitrary)

Pn(x1, t1;x2, t2; ...;xn, tn) = Pn(x1, t1 + T ;x2, t2 + T ; ...;xn, tn + T ) (2.12)

It then follows that P1 cannot depend on t and that all the joint probability distributions Pn,n ≥ 2, can only depend on the time differences tk − tk−1 for all ts which occur in them. Asa consequence, the mean value of a stationary stochastic process is independent of time (i.e.〈X(t)〉 is constant) and all moments depend only on time differences and are not affected by ashift in time

〈X(t1)X(t2)...X(tn)〉 = 〈X(t1 + T )X(t2 + T )...X(tn + T )〉 (2.13)

Note that the time independence of P1 (or of the average 〈X〉) is a necessary but by no meanssufficient condition for the stationarity of the process. All physical processes in equilibrium arestationary. For zero-average stationary processes the correlation function is

κ(τ) = 〈X(t)X(t+ τ)〉 = 〈X(0)X(τ)〉 = 〈X(−τ)X(0)〉 = κ(−τ) (2.14)

Often there exists a constant τC such that κ(τ) ≃ 0 for τ > τC . One calls τC the autocorrelationtime of the stationary stochastic process.


2.1.2 Ergodic processes

When dealing with stochastic processes, what is usually available at one time is the time av-

erage of a given measurable quantity X(t) which fluctuate with time. However, what we haveconsidered above are ensemble averages, in which we repeat the same measurement for manydifferent copies of our system and compute averages 〈...〉. For very many processes, i.e. ergodic

processes, the ensemble average is equivalent to the time average over an arbitrary large timeT , which is then allowed to become infinite.

To arrive at a better formulation consider Brownian motion. One may actually observe a largenumber of Brownian particles and average the result; that means that one really has a physicalrealization of the ensemble (provided the particles do not interact). However, more often oneobserves one and the same particle on successive time intervals. The result will be the sameif one assume that trajectories that lie time intervals apart are statistically independent. Inpractice one simply observes the trajectory of a single particle during a long time. The ideais that the irregularly varying function may be cut into a collection of long time intervals andthat this collection can serve as the ensemble that defines the stochastic process. The conditionfor this “self- averaging” to work is that the behaviour of the function during one interval doesnot affect the behaviour during the next time interval. If this is so the time average equals theensemble average and the process is called ergodic. This justification by means of self-averagingapplies to stationary cases, because one may then choose the time intevals as long as one wishes.If the state of the system changes with time one must choose the interval long enough to smoothout the rapid fluctuations, but short compared to the overall change. The basic assumption forergodic processes is that such an intermediate interval exists.

2.2 Markov processes and the Chapman-Kolmogorov equation

A Markov process is defined by the following relation, which is called the Markov property

P1|n−1(xn, tn|xn−1, tn−1; ...;x1, t1) = P1|1(xn, tn|xn−1, tn−1), t1 < t2 < ... < tn (2.15)

The Markov property expresses that for a Markov process the probability of transition from thevalue xn−1 that the stochastic variable X takes at tn−1 to the value xn taken at tn, dependsonly on the value of X at the time tn−1 and not on the previous history of the system. In otherwords, if the present state of the system is known, we can determine the probability of any futurestate without reference to the past. We say that Markov processes have “short” memory. P1|1

is called the transition probability. The conditional probability P1|1 must satisfy the followingproperties:

P1|1 ≥ 0 (non negative) (2.16)∫

dx2 P1|1(x2, t2|x1, t1) = 1 (at t2 the particle must be somewhere) (2.17)

P1(x2, t2) =

∫

dx1P1|1(x2, t2|x1, t1)P1(x1, t1) (2.18)

Property (2.18) follows from Bayes relation P2(x2, t2;x1, t1) = P1|1(x2, t2|x1, t1)P1(x1, t1) whenintegrated over x1, and using the fact that P1(x2, t2) =

∫

dx1P2(x1, t1;x2, t2) is the marginal

probability distribution of P2 with respect to x2.

2.3. THE MASTER EQUATION 27

For a Markov process the joint probabilities for n ≥ 3 are all expressed in terms of P1 and P1|1.For n = 3 we have

P3(x1, t1;x2, t2;x3, t3) = P2(x1, t1;x2, t2)P1|2(x3, t3|x1, t1;x2, t2)

= P1(x1, t1)P1|1(x2, t2|x1, t1)P1|1(x3, t3|x2, t2) (2.19)

For general n

Pn(x1, t1; ...;xn, tn) = P1|1(xn, tn|xn−1, tn−1)...P1|1(x2, t2|x1, t1)P1(x1, t1) (2.20)

Taking relation (2.19), integrating it over x2 and dividing both sides by P1 gives us the Chapman-

Kolmogorov (C-K) equation (t1 < t2 < t3)

P1|1(x3, t3|x1, t1) =

∫

dx2 P1|1(x3, t3|x2, t2)P1|1(x2, t2|x1, t1) (2.21)

This equation states that a process starting at t1 with value x1 reaches x3 at t3 via any one ofthe possible values x2 at the intermediate time t2.

Because Markov processes are fully specified by P1 and P1|1, (the whole hierarchy Pn can beconstructed from them), a Markov process is stationary if P1 is independent on time and thetransition probability P1|1(x2, t2|x1, t1) depends only on the time interval t2 − t1, i.e.

P1|1(x2, t2|x1, t1) = P1|1(x2, τ |x1, 0), τ = t2 − t1 (2.22)

A homogeneous process is a non-stationary Markov process for which the transition probabilityP1|1 only depends on time differences, i.e. it is given by (2.22).

For stationary and homogeneous processes, a special notation is used for the transition proba-bility

P1|1(x2, t2|x1, t1) = Tτ (x2|x1), τ = t2 − t1 (2.23)

and the C-K equation reads (τ, τ ′ > 0)

Tτ+τ ′(x3|x1) =

∫

dx2 Tτ ′(x3|x2)Tτ (x2|x1), (2.24)

Exercise Explain why equation (2.21) does not hold for non-Markovian processes.

2.3 The Master equation

The Chapman-Kolmogorov equation for Markov processes is not of much assistance when onesearches for solutions of a given problem, because it is essentially a property of the solution.However it can be cast into a more useful form, the Master equation. This is a differentialequation for the transition probability. In order to derive it, one first needs to ascertain how thetransition probability behaves for short time differences.


Firstly, on inspecting the Chapman-Kolmogorov equation for equal time arguments one find thenatural result

P (x3, t3|x1, t) =

∫

dx2 P (x3, t3|x2, t)P (x2, t|x1, t) → P (x2, t|x1, t) = δ(x2 − x1) (2.25)

which is the zero-th order term in the short time behaviour of P (x′, t′|x, t), i.e. for t′ − t small.

Keeping this in mind we assume that for small τ the transition probability P (x2, t+ τ |x1, t) canbe Taylor expanded as follows:

P (x2, t+ τ |x1, t) = δ(x1 − x2) + τ Wt(x2|x1) +O(τ2) (2.26)

The delta function expresses that the probability to stay at the same state after a time intervalequal to zero is one, whereas the probability to change state after a time interval equal to zerois zero. Wt(x2|x1) is the time derivative of the transtion probability at τ = 0 and is interpretedas the transition probability per unit time from x1 to x2 at time t.

This expression must satisfy the normalization property. Therefore, the integral over x2 mustequal one. In order for this to happen, the above form must be corrected in the following sense:

P (x2, t+ τ |x1, t) =[

1− τa(0)(x1, t)]

δ(x1 − x2) + τ Wt(x2|x1) +O(τ2) (2.27)

where

a(0)(x1, t) =

∫

dx2Wt(x2|x1) (2.28)

The delta function has been corrected by the coefficient 1− τ∫

dx2W (x2|x1) which correspondsto the probability for no transition to have taken place at all.

From the definition of conditional probability and marginals we have

P2(x1, t1;x2, t2) = P1(x1, t1)P1|1(x2, t2|x1, t1) (2.29)

and

P1(x2, t2) =

∫

dx1P1(x1, t1)P1|1(x2, t2|x1, t1) (2.30)

In order to find a differential equation for the probability density P1(x, t) we rewrite the aboveequation in the form

P1(x, t+ τ) =

∫

dx′P1(x′, t)P1|1(x, t+ τ |x′, t) (2.31)

The time derivative of P1(x, t) is

∂P1(x, t)

∂t= lim

τ→0

P1(x, t+ τ)− P1(x, t)

τ(2.32)

and therefore to take the time derivative of (2.31) we need to evaluate the quantity

limτ→0

P1|1(x, t+ τ |x′, t)− P1|1(x, t|x′, t)τ

= W (x|x′)−∫

dxW (x|x′)δ(x− x′) (2.33)

where in the last equality we used (2.27).


Inserting into (2.32) gives us the differential form of CK equation which is called Master equation

∂

∂tP1(x, t) =

∫

dx′ [W (x|x′)P1(x′, t)−W (x′|x)P1(x, t)] (2.34)

This gives the rate of change of the probability density P1(x, t) due to transitions into the statex from all the other state x′ and transitions out of state x into all other states x′. There aremany applications of the master equation and we shall consider some of them in the subsequentsections. Note that in the Master equation t enters linearly in the first derivative. The masterequation is not invariant under time reversal. This equation describes the irreversible behaviourof the system.

Given a Markov processX(t), defined by P1(x1, t1) and P1|1(x2, t2|x1, t1), we can always extract asub-ensemble ofX(t),X⋆(t), characterised by P ⋆1 (x1, t1) = P (x1, t1|x0, t0) and P ⋆1|1(x2, t2|x1, t1) =

P1|1(x2, t2|x1, t1), i.e. by taking the sharp value x0 at t0, since P ⋆1 (x1, t0) = δ(x1 − x0). So themaster equation is an equation for all transition probabilities P1|1(x, t|x0, t0).

If the range of X is a discete set of states with labels n, the equation reduces to

dpn(t)

dt=∑

n′

[Wnn′pn′(t)−Wn′npn(t)] (2.35)

The physical meaning of the master equation is now clear: it is an equation of motion for theprobability pn of a state n. It is essentially a gain-loss equation: the first term is the gain forthe probability of state n due to transitions from other states n′, and the second term is the lossdue to transitions into other states n′.

Owing to W (x|x′)τ is the transition probability in a short time interval τ , it can be computed,for the system under study, by means of any available method valid for short times. Then, themaster equation serves to determine the evolution of the system over long time periods, at theexpense of assuming the Markov property.

The Markov equation can readily be extended to the case of a multicomponent Markov processXi(t), i = 1, 2, ..., N on noting that the Chapman-Kolmogorov equation is valid as it stands bymerely replacing x by x = (x1, ..., xN ).

2.3.1 Steady state

For a stationary or steady state solution, the left hand side of the Master equation equals zero.Therefore the total number of transition per time into state n must balance the total number oftransition per time out of state n, i.e. we have the balance for steady state

∑

n′

Wnn′pn′ =

(

∑

n′

Wn′n

)

pn (2.36)

One has detailed balance if each individual transition is balanced, i.e. if the number of transitionsper time from state n into state n′ balances the number of transitions per time from n′ to n

Wnn′pn′ = Wn′npn (2.37)


When detailed balance holds, the stationary distribution is often called an equilibrium distribu-tion. Note however that detailed balance is a necessary but not sufficient condition for thermo-dynamic equilibrium. Detailed balance in thermodynamic equilibrium follows from microscopictime reversibility. However, it may also hold in some cases for non-equilibrium.

2.3.2 Convergence to equilibrium

Here we study the conditions under which the dynamics given by the master equation (2.34)converges to a unique stationary (equilibrium) distribution

pn(∞) =1

Ze−βH(n) (2.38)

We show that if the transition rates are in detailed balance with a candidate distribution ofthe form (2.38), then this is a sufficient, although not necessary, condition for the probabilitypn(t) to converge to pn(∞). In addition, if the process is ergodic, p∞ is the unique stationary(equilibrium) distribution and the process always evolve to pn(∞) = e−βH(n)/Z.

The physically most intuitive convergence proofs are based on constructing a Lyapunov func-tion F (t). Here we choose the Kullback-Leibler divergence between the candidate stationarydistribution pn(∞) and the instantaneous distribution pn(t),

F (t) =∑

n

pn(t) lnpn(t)

pn(∞)=∑

n

pn(t)[ln pn(t) +H(n)] (2.39)

which obeys F (t) ≥ 0 for all t by definition, and is zero only if the measure pn(t) and p∞ areidentical. Using the master equation we have

dF

dt=

∑

n

[ln pn(t) +H(n)]∂P

∂t

=∑

n

[ln pn(t) +H(n)]∑

n

[Wnn′pn′(t)−Wn′npn(t)]

=1

2

∑

nn′

[(ln pn(t) +H(n))− (ln pn′(t) +H(n′))][Wnn′pn′(t)−Wn′npn(t)]

=1

2

∑

nn′

[(ln pn(t) +H(n))− (ln pn′(t) +H(n′))]

× [Wnn′e−H(n′)eH(n′)+ln pn′ (t) −Wn′ne−H(n)eH(n)+ln pn(t)]

(2.40)

We now use the detailed balance condition Wnn′e−H(n′) = Wn′ne−H(n)

dF

dt= −1

2

∑

n′n

Wn′ne−H(n)[(ln pn(t) +H(n))− (ln pn′(t) +H(n′))]

×[eH(n′)+ln pn′ (t) − eH(n)+ln pn(t)] ≤ 0 (2.41)

The last step derives from the general identity (ex−ey)(x−y) ≥ 0 for all (x, y) with equality onlyif x = y. Since F (t) is bounded from below, it is indeed a Lyapunov function for the process.


The distance between pn(t) and the measure pn(∞) decreases monotonically until dF/dt = 0,i.e. until pn(t) has reached a point pn such that

eH(n)pn = e−H(n′)pn′ or Wnn′ = 0 (2.42)

So either pn = χ(n)eβH(n), with χ(n) = χ(n′) for all n, n′, or Wnn′ = 0 for all n, n′. If we defineSn the set of states that are dynamically accessible from n, then for all n′ ∈ Sn we need tohave χ(n) = χ(n′). One such stationary solution pn is the equilibrium measure, correspondingto χ(n) = Z−1 for all x. If our process is ergodic, i.e. if Sn conincides with the full range of n,then it is the only such distribution to satisfy (2.42) and our process must always evolve towardsBoltzmann distribution pn(∞).

2.3.3 Mean-field equation

From the Master equation one can derive the dynamical equations for the averages of a Markovstochastic process. Let X be a physical quantity with a Markov character. The master equationdeterminates its probability distributiona at all t > 0. In ordinary macroscopic physics, however,one ignores fluctuations and treats X as if it were a non-stochastic, single valued quantity X.The evolution of X is described by a deterministic differential equation for X called the meanfield equation. As the master equation determines the entire probability distribution, it mustbe possible to derive from it the mean field equation as an approximation for the case thatfluctuation are negligible. Let us first write the equation for the time evolution of an arbitraryfunction 〈f(X)〉:

d

dt〈f(X)〉 =

∫

f(x)∂P1(x, t)

∂tdx =

∫

f(x)[W (x|x′)P1(x′, t)−W (x′|x)P1(x, t)]dxdx

′

=

∫

[f(x′)− f(x)]W (x′|x)P1(x, t)dxdx′ (2.43)

On applying this to f(X) = X one has

d〈x〉dt

=

∫

dx a1(x)P1(x, t) = 〈a1(x, t)〉 (2.44)

with

aν(x, t) =

∫

dx′ (x′ − x)νWt(x′|x) ν = 0, 1... (2.45)

Note that when a1 is linear function of x, one has 〈a1(x, t)〉 = a1(〈x〉, t), whence the mean fieldequation for the time evolution of 〈X(t)〉, is the same as

d

dt〈X〉 = a1(〈X〉, t) (2.46)

which is an ordinary differential equation and can be identified with the mean-field equation forthe system.

If however a1(〈X〉) is not a linear function of 〈X〉, (2.44) is not a closed equation for 〈x〉 buthigher order moments enter as well. The evolution of 〈X〉 in the course of time is therefore notdetermined by 〈X〉 itself, but is influenced by the fluctuations around this average (variance


σ2). So for non-linear a1(〈X〉) we need an equation for the variance as well. Similarly, usingf(X) = X2,

d

dt〈X2〉 =

∫

dx dx′ (x′2 − x2)W (x′|x)P1(x, t)

=

∫

dx dx′ [(x′ − x)2 + 2x(x′ − x)]W (x′|x)P1(x, t)

= 〈a2(X)〉+ 2〈Xa1(X)〉 (2.47)

This is identical withdσ2

dt= 〈a2(X)〉+ 2〈(X − 〈X〉)a1(X)〉 (2.48)

However, if a2 is a non-linear function of X, the equation involves even higher order moments.

2.4 Examples and exercises

The following two examples of Markov processes are of fundamental importance.

2.4.1 The Wiener-Levy process

The Wiener-Levy process was originally introduced to describe the behaviour of the position ofa free Brownian particle in one dimension. On the other hand, it plays a central role in therigorous foundation of the stochastic differential equations. The Wiener-Levy process is definedin the range −∞ < x <∞ through (t2 > t1 > 0)

P1|1(x2, t2|x1, t1) =1

√

2π(t2 − t1)exp

[

−(x2 − x1)2

2(t2 − t1)

]

with (2.49)

P1(X1, 0) = δ(x1) (2.50)

Then, the probability density for t1 > 0 is, according to (2.30),

P1(x1, t1) =1√2πt1

exp

(

− x21

2t1

)

(2.51)

This is a non-stationary (P1 depends on t1), Gaussian process. (A process is called a Gaussian

process if all its Pn are mutivariate Gaussian distributions. In that case all cumulants κm beyondm = 2 are zero. Gaussian processes are often used to approximate physical processes where itcan be assumed that higher order cumulants are negligible).

Exercise Show that

〈X(t1)〉 = 0 and 〈X(t1)X(t2)〉 = min(t1, t2) (2.52)

2.4.2 The Ornstein-Uhlenbeck process

The Ornstein-Uhlenbeck process was constructed to describe the behaviour of the velocity of afree Brownian particle in one dimension. It also describes the position of an overdamped particle


in a harmonic potential. It is defined by (τ = t2 − t1 > 0)

P1(x1) =1√2π

exp

[

−x21

2

]

P (x2, t2|x1, t1) =1

√

2π(1− e−2τ )exp

[

−(x2 − x1e−τ )2

2(1− e−2τ )

]

(2.53)

The O-U process is stationary, Gaussian and Markovian. According to Doob theorem, it isessentially the only process with these three properties.

(a) The Gaussian property is clear for P1. By using P2(x2, t2;x1, t1) = P1(x1)P1|1(x2, t2|x1, t1)show that P2(x2, t2;x1, t1) can be identified with the bivariate Gaussian distribution (1.50)with the following parameters

λ = e−τ , σ2 = 1− e−2τ (2.54)

with the particularity that σ2 = 1− λ2 in this case.

(b) Show that the O-U process has an exponential autocorrelation function

〈X(t1)X(t2)〉 = e−τ (2.55)

The evolution with time of the distribution P2(x2, t2;x1, t1), seen as the velocity of a Brownianparticle, has a clear meaning. For short time differences the velocity is strongly correlated withitself: λ ≃ 1 and the variance σ2 of the distribution shrinks to zero. As time elapses, λ decreasesand for long time differences λ ≃ 0 and the velocity has lost all memory of its value at the initialtime due to the collisions and hence P2(x2, t2;x1, t1) is completely uncorrelated.

Proof of Doob theorem Next we show that if X(t) is a stationary Gaussian Markov processthen X(t) is Ornstein-Uhlenbeck.

Let X(t) be a stationary Gaussian Markov process. By shifting and rescaling we can alwaysensure that P1(x) is Gaussian with zero mean and unit variance, i.e.

P1(x1) =1√2π

exp

[

−x21

2

]

(2.56)

The transition probability is Gaussian and has therefore the general form

P (x2, t2|x1, t1) = De−12(Ax2

2+2Bx1x2+Cx21) (2.57)

where A,B,C,D are functions of τ . The normalization yields D =√

A/2π and C = B2/A so

P (x2, t2|x1, t1) =

√

A

2πe−

12A(x2+

Bx1A

)2 (2.58)

Using∫

dx1 P1|1(x2, t2|x1, t1)P1(x1, t1) = P (x2, t2) gives in addition B2 = A(A − 1). The oneremaining unknown parameter A can be expressed in the equally unknown correlation functionusing

κ(t2 − t1) =

∫

dx1 dx2 x1x2 P1|1(x2, t2|x1, t1) (2.59)


which yields A = (1− κ2(t2 − t1))−1. Hence,

P (x2, t2|x1, t1) =1

√

2π(1− κ2(t2 − t1))e−

(x2−κ(t2−t1)x1)2

2(1−κ2) (2.60)

Now take a third time t3 and use (2.21) in (2.59)

κ(t3 − t1) =

∫

dx2 x3

∫

dx2 P1||1(x3, t3|x2, t2)

∫

dx1 x1P1|1(x2, t2|x1, t1)P1(x1)

=

∫

dx1dx2 κ(t3 − t2)x2x1P1|1(x2, t2|x1, t1)P1(x1) = κ(t3 − t2)κ(t2 − t1)

(2.61)

The functional relation for κ(τ) shows that

κ(τ) = e−γτ (2.62)

Substitution of this function in (2.60) gives (2.53) and completes the proof.

Exercise Prove, using the generating functional or otherwise, that ifX(t) is stationary, Gaussian(with zero mean and unit variance) and has an exponential correlation function κ(τ) = κ(0)e−γτ ,then X(t) is Ornstein-Uhlenbeck and hence Markovian.

2.4.3 The random walk in one dimension

A man moves along a line, taking, at random, steps to the left or to the right with equalprobability. The steps are of length d so that his position can take on only the values nd, wheren is integral.

The problem can be defined in two ways, leading to discrete time and continuous time randomwalks. The discrete time, which is more traditional, is to allow the walker to take steps at timesnτ (n integral) at which time he must step either left or right, with equal probability, so that

P1|1(nd, sτ |md, (s− 1)τ) =1

2δn,m−1 +

1

2δn,m+1 (2.63)

Exercise Show that in the limit of small jumps, and continuous time, the random walk isdescribed by a Wiener process.

Answer We want to find an equation for the evolution of the probability density to find thewalker at a certain distance after a given time is elapsed and show that this is solved by theWiener process.

From (2.30) we have

P1(nd, sτ) =∑

n′

P1(nd, sτ |n′d, (s−1)τ)P1(n′d, (s−1)τ) =

1

2P1((n−1)d, (s−1)τ)+

1

2P1((n+1)d, (s−1)τ)

(2.64)


and therefore

P1(nd, sτ)−P1(nd, (s−1)τ) =1

2[P1((n−1)d, (s−1)τ)+P1((n+1)d, (s−1)τ)−2P1(nd, (s−1)τ)]

(2.65)We rewrite equation (2.65)

P1(nd, sτ)− P1(nd, (s− 1)τ)

τ=P1((n− 1)d, (s− 1)τ) + P1((n+ 1)d, (s− 1)τ)− 2P1(nd, (s− 1)τ)

2d2

d2

τ(2.66)

If we now set nd = x, (s−1)τ = t and take the limit d→ 0, τ → 0 in such a way that D = d2/2τis finite, we obtain

∂P1(x, t)

∂t= D lim

d→0

[

P1(x− d, t)− P1(x, t)

d2+P1(x+ d, t)− P1(x, t)

d2

]

= D limd→0

[

1

d

∂P1(x, t)

∂x− 1

d

∂P1(x− d, t)∂x

]

= D limd→0

∂

∂x

P1(x, t)− P1(x− d, t)d

= D∂2P1(x, t)

∂x2(2.67)

So we have obtained the Fokker-Planck equation for diffusion

∂P1(x, t)

∂t= D

∂2P1(x, t)

∂x2(2.68)

Let us now solve equation (2.68) assuming that, initially, P1(x, 0) = δ(x), where δ(x) is a Diracdelta function. We first introduce the Fourier transform of P (x, t):

P (k, t) =

∫ ∞

−∞dxP (x, t)eikx (2.69)

Then the diffusion equation takes the form

∂P (k, t)

∂t= −Dk2P (k, t) (2.70)

We can solve this to obtain

P (k, t) = P1(k, 0)e−Dk2t (2.71)

The solution contains an arbitrary function P1(k, 0), which can be determined from the initialcondition, as P1(k, 0) = 1. We can now take the inverse transform to obtain

P1,1(x, t|0, 0) =1

2π

∫ ∞

−∞dk e−ikxe−Dk

2t =1√

4πDte−x

2/4Dt (2.72)

This gives the probability of finding the particle at point x at time t if it started at x = 0 attime t = 0 and corresponds to the Wiener process. It is interesting to obtain the equation ofmotion of the moments 〈x(t)〉 and 〈x2(t)〉. The moment

〈x(t)〉 =

∫ ∞

−∞dxxP1(x, t) (2.73)


gives the average position of the particle as a function of time. If we multiply (2.68) by xand integrate over x, by carrying an integration by parts and using the boundary conditionsP1(±∞, t) = 0, we find

d〈x(t)〉dt

= 0 (2.74)

Thus, the average position of the particle does not change with time. Let us now consider themoment 〈x2(t)〉. If we multiply (2.68) by x2 and integrate over x we find, by performing twointegrations by part and using the boundary conditions P1(±∞, t) = 0 and ∂P1(±∞, t)/∂x = 0,

d〈x2(t)〉dt

= 2D (2.75)

or〈x2(t)〉 = 2Dt (2.76)

This is characteristic of a diffusion process. The moments 〈x2(t)〉 is a measure of the width ofP1(x, t) at time t. We see that P1(x, t) spreads with time, as we expect.

Chapter 3

Markov Chains and One step

processes

One of the simplest application of the Master equation is to the case of Markov chains. Theseare processes which involve transitions between discrete stochastic variables at discrete times.

It is sometimes possible to solve the master equation if we can find the eigenvectors of the transi-tion matrix. Since this is not symmetric in general, the solution involves the introduction of leftand right eigenvectors. We shall obtain the general form of solution for the case of homogenoeusMarkov processes and write it in terms of left and right eigenvectors of the transition matrix.

Another simple application of the Master equation is to one-step processes. These are continuoustime Markov processes in discrete state space whose transitions are only between adjacent states.For some one-step processes the master equation can be solved exactly by using a generatingfunction (similar to characteristic function). We can obtain a partial differential equation forthe generating function which may be solved in simple cases.

3.1 Markov Chains

One of the simplest example of a Markov process is that of a Markov chain. This involvestransitions between values of a discrete stochastic variable X occurring at discrete times. We willlimit for simplicity to Markov chains with finite state space S. This is not essential, but removesdistracting technical complications. Let us assume that X can take on values in S = (1, 2, ..., N)and time is measured in discrete steps t = sτ , where τ is a fundamental time interval. Then forone step from t1 = mτ to t2 = nτ , (n > m), equation (2.18) can be written as

P1(i, n) =N∑

j=1

P1(j,m)P1|1(i, n|j,m) (3.1)

The theory of Markov chains is most highly developed for homogeneous chains and we shallmostly be concerned with these. A Markov chain is said to be homogeneous when the probabil-ities P1|1(i, n|j,m) depends on the time interval n−m but not on the time m. For such a chain

37

38 CHAPTER 3. MARKOV CHAINS AND ONE STEP PROCESSES

we define the n-step transition probabilities

Q(n)ji = P1|1(i,m+ n|j,m) = P1|1(i, n|j, 0) (3.2)

Of particular importance are the one-step transition probabilities, which we write simply as Qij ,(they depend only on the states and not on the times),

Qji = Q(1)ji = P (i,m+ 1|j,m) (3.3)

Since the system has to move to some state from any state j we have, for all j

N∑

k=1

Qjk = 1 (3.4)

The matrix Q of transition probabilities

Q =

Q11 Q12 . . .Q21 Q22 . . ....

......

(3.5)

is called stochastic matrix, the defining property of a stochastic matrix being that its elementsare non-negative and that all its row sums are unity. Note we use the convention that Qjidenotes the probability of a transition from state j (row suffix) to i (column suffix).

Often one is interested in the probability that after n steps the Markov chain is in a given state.Next we show that this problem reduces to calculating entries in the n-th power of the transitionmatrix Q. Let us introduce the vector of state occupation probabilities at time n,

p(n) = (p(n)1 , p

(n)2 , ..., p

(n)N ) (3.6)

whose components are defined as

p(n)j = P1(j, n). (3.7)

We can write equation (2.18), for one step from t1 = n− 1 to t2 = n as

p(n)i =

N∑

j=1

p(n−1)j Qji (3.8)

In matrix notationp(n) = p(n−1)Q (3.9)

and on iteration we obtainp(n) = p(0)Qn (3.10)

with Q0 being the identity matrix, i.e. Q0ij = δij . Consequently, the process can be described

completly by giving its initial probability vector p(0) and the transition matrix Q. The probabilityof the system taking a specific path of states is

PT (i0, 0; ...; iT , T ) = p(0)i0

T∏

n=1

Qin−1,in (3.11)

3.1. MARKOV CHAINS 39

If the system starts in any given state j, then p(0)k = δkj and

p(n)k = (Qn)jk (3.12)

From the fact that Qm+n = QmQn we have

Q(m+n)jk =

N∑

ℓ=1

Q(m)jℓ Q

(n)ℓk (3.13)

which is the Chapman-Kolmogorov equation for homogeneous Markov chain.

In the non-homogeneous case the transition probability

P1|1(i, s|j, r) (s > r) (3.14)

will depend on both r and s. In this case we write

Qji(r, s) = P1|1(i, s|j, r) (3.15)

In particular the one-step transition probabilities Qji(r, r+1) will depend on the time r and wewill have a sequence of stochastic matrices

Q(r) = Qji(r, r + 1) (r = 0, 1, ...) (3.16)

and the Chapman-Kolmogorov equation reads

Qjk(r, t) =∑

ℓ

Qjℓ(r, s)Qℓk(s, t) (r < s < t) (3.17)

We define stationary distribution of the Markov chain (3.9) a vector π = (π1, . . . , πN ) such that

π = πQ and ∀ i ∈ S : πi ≥ 0,∑

j

πj = 1 (3.18)

Thus π is a left eigenvector of the stochastic matrix, with eigenvalue λ = 1 and with non-negativeentries, and represents a time-independent solution of the Markov chain.

For later reference, we note that the fact that all the row sums of a stochastic matrix are unitycan be expressed in matrix notation by

QI = I (3.19)

with I = (1, ..., 1). So I is always a right eigenvector of Q, with eigenvalue λ = 1. Pre-multiplyingeach side by Q we have Q2I = QI = I and in general

QnI = I (3.20)

so Qn is also a stochastic matrix, for each integral n. Finally, we mention that stochasticmatrices map normalized probabilities onto normalized probability, as they conserve sign and

normalization of the state probabilities. (Prove this statement as an exercise.)


3.1.1 Eigenvalues and eigenvectors of stochastic matrices

We have seen that calculating the probability that the Markov chain is in a given state after nsteps, normally involves calculating entries in the n-th power of the transition matrix. This isbest done by using a spectral representation of the transition matrix, i.e. a decomposition of thematrix based on eigenvalues and eigenvectors. For pn to remain well-defined for n → ∞, it isvital that the eigenvalues are sufficiently small. We next present some facts about eigenvectorsand eigenvalues of stochastic matrices.

The matrix Q is in general not a symmetric matrix, therefore the right and left eigenvectors willbe different. We write the left and the right eigenvector problems as

Qx(j) = λjx(j) (3.21)

y(i)Q = λiy(i) (3.22)

x(j),y(i) ∈ |CN and λi,j ∈ |C (since Q need not be symmetric, eigenvalues and eigenvectors neednot be real-valued). Much can be extracted from the two defining properties Qik ≥ 0 ∀(i, k) ∈ S2

and∑

iQik = 1 ∀k ∈ S alone. Note that eigenvectors (x,y) of Q need not be probabilities inthe sense of the p(n), as they could have negative or complex entries.

• The spectra of left- and right- eigenvalues of Q are identical.

proof:Equations (3.21) and (3.22) give

det[Q− λR1I] = 0 (3.23)

det[Q† − λL1I] = 0 (3.24)

Rewriting gives

0 = det[Q† − λL1I] = det[(Q− λL1I)†] = det[Q− λL1I] (3.25)

Equating (3.23) and (3.25), which are both equal zero for arbitrary Q, gives λR = λL = λ.

• Right and left eigenvectors are biorthogonal.

proof:Pre-multiplying (3.21) by y(i), post-multiplying (3.22) by x(j) and subtracting, we find

(λi − λj)y(i)x(j) = 0 (3.26)

so y(i)x(j) = 0 if λi 6= λj , while y(i)x(i) 6= 0. We can scale these vectors so that∑

m y(i)m x

(j)m = δij and (equivalently)

∑

i y(i)m x

(i)n = δmn.

• One can expand the matrix Q in terms of its left and right eigenvector. From (3.22),∑

n y(i)n Qnm = λiy

(i)m , if we multiply by x

(i)ℓ and sum over i, we obtain

Qℓm =∑

i

λix(i)ℓ y

(i)m (3.27)

From Qnx(j) = λnj x(j) and y(i)Qn = λni y

(i) one also has

(Qn)ℓm =∑

i

λni x(i)ℓ y

(i)m (3.28)


• All eigenvalues λ of stochastic matrices Q obey

yQ = λy, y 6= 0 : |λ| ≤ 1 (3.29)

proof:Consider the left eigenvalue equation, λyi =

∑

j Qjiyj , take absolute values of both sides,sum over i ∈ S, and use the triangular inequality:∑

i

|λyi| =∑

i

|∑

j

Qjiyj | ⇒ |λ|∑

i

|yi| =∑

i

|∑

j

Qjiyj |

≤∑

i

∑

j

|Qjiyj | =∑

i

∑

j

Qji|yj | =∑

j

|yj |

Since y 6= 0 we know that∑

i |yi| 6= 0 and hence |λ| ≤ 1.

• Since Q has a right eigenvector with eigenvalue λ = 1, namely (1, ..., 1) due to∑

j Qij =1, ∀ i ∈ S, λ = 1 is always an eigenvalue. Therefore, there must exist at least one lefteigenvector with λ = 1.

3.1.2 Definitions based on accessibility of states

• Regular (or ergodic) Markov Chain

∃n ≥ 0 : (Qn)ij > 0 ∀ i, j (3.30)

Note that if the above holds for some n ≥ 0, it will hold for any n′ ≥ n. Thus, irrespectiveof the initial state i, after a finite number of iterations there is a non-zero probability forthe system to be in any state.

• Existence of paths

i→ j : ∃n ≥ 0 such that (Qn)ij > 0 (3.31)

i 6→ j : 6 ∃n ≥ 0 such that (Qn)ij > 0 (3.32)

• Communicating states

i↔ j : ∃n,m ≥ 0 such that (Qn)ij > 0 and (Qm)ji > 0 (3.33)

Given sufficient time we can always get from i to j and from j to i.

• Closed set

any set C ⊆ S of states such that ∀ i ∈ C,∀ j /∈ C i 6→ j (3.34)

So no state inside C can ever reach any state outside C. In other words, (Qn)ij = 0 forall n ≥ 0 if i ∈ C, j /∈ C.

• Absorbing state

A closed set with just one element. So if i is an absorbing state

Qji = 0 ∀j 6= i; Qii = 1 (3.35)

one cannot leave this state ever. In other words, a Markov chain has an absorbing state ifone or more rows of the transition matrix contain all zeros except for the diagonal element,which must be one.


• Irreducible set of states

This is any set C ⊆ S of states such that

∀ i, j ∈ C i↔ j (3.36)

All states in an irreducible set are connected to each other, in that one can go from anystate in C to any other state in C in a finite number of steps.

• Irreducible Markov chain

A Markov chain with the property that the complete set of states S is itself irreducible,

∀ i, j ∈ S i↔ j

Equivalently, one can go from any state in S to any other state in S in a finite numberof steps (a transition matrix with absorbing states cannot be irreducible). Note that allregular Markov chains are irreducible while not all irreducible Markov chains are regular.An example of irreducible chain which is not regular is provided by

Q =

(

0 11 0

)

(3.37)

We have Q2m+1 = Q and Q2m = I. However there is not n for which Qn has all entriesnon-zero.

3.1.3 Convergence to a stationary state

One often wishes to know the state of the system after a large number of transitions n → ∞.Does the system retain some memory of its initial state p(0), or after many transitions does thesystem proceed to some unique final state independent of the initial state? We shall see that thebehaviour of the probability vector Q(n), for large n, depends on the structure of the transitionmatrix Q.

A fundamental theorem, Perron-Froebenius theorem states that a real square matrix with pos-itive entries has a unique largest eigenvalue and the corresponding vector has stricly positivecomponents. As a consequence, if Q is the stochastic matrix of a regular Markov chain, thenit has only one eigenvalue with value λ = 1. Let us denote it λ1 = 1. The right eigenvectorassociated with the unit eigenvalue is x(1) = (1, ..., 1), and we denote the left eigenvector asso-ciated to it by y(1) = π (we will show shortly that π has all its components non-negative andit is normalized to one so it needs to be the stationary distribution, hence the notation). Then(3.28) can be rewritten as

(Qn)ℓm = x(1)ℓ y(1)

m +∑

j 6=1

λnj x(j)ℓ y(j)

m = πm +∑

j 6=1

λnj x(j)ℓ y(j)

m (3.38)

Since λj < 1 ∀ j 6= 1, we can take the limit n→∞ to obtain

limn→∞

(Qn)ℓm = πm (3.39)

This has many useful consequences:


• The sequence of powers of Q (Q,Q2,Q3, ...) tends to a unique transition matrix, M, whoserows π = (π1, ..., πN ) are real and obey πi ≥ 0 and

∑

i πi = 1, given that M is a stochasticmatrix (due to Qn being a stochastic matrix for any n ≥ 0).

limn→∞

Qn = M =

π1 π2 . . . πNπ1 π2 . . . πN...π1 π2 . . . πN

(3.40)

• It is easy to show that QM = M and π = p(0)M.

proof:Consider for example the case n = 2. Then

QM =

(

Q11 Q12

Q21 Q22

)(

π1 π2

π1 π2

)

=

(

π1∑

j Q1j π2∑

j Q1j

π1∑

j Q2j π2∑

j Q2j

)

= M (3.41)

and

p(0)M = (p(0)1 , p

(0)2 )

(

π1 π2

π1 π2

)

= (π1, π2) = π (3.42)

because rows of stochastic matrices add to one.

• The solution of the Markov chain p(n) will also converge to the (unique) fixed probabilityvector π

limn→∞

p(n)i = lim

n→∞

∑

j

p(0)j Q

(n)ji =

∑

j

p(0)j Mji ≡ πi (3.43)

• Such limiting vector π = limn→∞ p(n), is the stationary state of the Markov chain, i.e.

π = πQ (3.44)

proof:This follows from

π = limn→∞

p(0)Qn = limn→∞

p(0)Qn+1 = πQ (3.45)

We say that the distribution π is invariant for Q or that π is the steady-state vector ofthe Markov chain (in such a solution the probability to find any state will not change withtime).

Note that the stationary solution to which the Markov chain converges is unique (independent

of the choice made for the p(0)i ) because all rows of the matrix M are identical.

3.1.4 Time-reversal and detailed balance

For Markov chains, the past and future are independent given the present. This property issymmetrical in time and suggests looking at Markov chains with time running backwards. Onthe other hand, convergence to stationarity shows behaviour which is asymmetrical in time(a highly organised state such as a point mass decays to a disorganised one, the invariant


distribution; this is an example of entropy increasing). It suggests that if we want completetime symmetry we must begin at stationarity. A Markov chain at stationarity, run backwards,is again a Markov chain. The transition matrix may however be different. If, in addition,the Markov chain statisfies detailed balance, then the direct and reverse chains have the sametransition matrix and the chain is called reversible.

• defn: probability current in Markov chains

The net probability current Ji→j from any state i ∈ S to any state j ∈ S, in the stationarystate π = (π1, ..., πN ) of a Markov chain characterized by the stochastic matrix Q = Qijand state space S, is defined as

Ji→j = πiQij − πjQji (3.46)

consequences, conventions

1. The current is by definition anti-symmetric under permutation of i and j:

Jj→i = πjQji − πiQij = −[

πiQij − πjQji]

= −Ji→j

2. Imagine that the chain represents a random walk of a particle, in a stationary state.Since πi is the probability to find the particle in state i, and Qij the likelihood that itsubsequently moves from i to j, πiQij is the probability that we observe the particle

moving from i to j. With multiple particles it would be proportional to the number

of observed moves from i to j. Thus Ji→j represents the net balance of observedtransitions between i and j in the stationary state; hence the term ‘current’. IfJi→j > 0 there are more transitions i → j than j → i; if Ji→j < 0 there are moretransitions j → i than i→ j.

3. Conservation of probability implies that the sum over all currents is always zero:∑

ij

Ji→j =∑

ij

[

πiQij − πjQji]

=∑

i

πi(

∑

j

Qij)

−∑

j

πj(

∑

i

Qji)

=∑

i

πi −∑

j

πj = 1− 1 = 0

• defn: detailed balance

A stochastic matrix Q and a measure π are said to be in detailed balance if

πiQij = πjQji ∀i, j. (3.47)

If Q and π are in detailed balance, then π is invariant for Q.

(πQ)j =∑

i

πiQij =∑

i

πjQji = πj (3.48)

consequences, conventions

3.2. EXAMPLES AND EXERCISES ON MARKOV CHAINS 45

1. Let Xn be a Markov chain with transition matrix Q and stationary distribution π.We say thatXn is reversible ifXN−n is also a Markov chain with invariant distributionπ and transition matrix Q. This is equivalent to say that Q and π are in detailedbalance.

2. We see that detailed balance represents the special case where all individual currentsin the stationary state are zero: Ji→j = πiQij − πjQji = 0 for all i, j ∈ S.

3. Detailed balance is a stronger condition than stationarity. All regular Markov chainswith detailed balance have by definition a unique stationary state, but not all regularMarkov chains with stationary states obey detailed balance.

4. Markov chains used to model closed physical many-particle systems with noise areusually of the detailed balance type, as a result of the invariance of Newton’s laws ofmotion under time reversal t→ −t.

Example Consider the transition matrix

Q =

0 23

13

13 0 2

323

13 0

(3.49)

The invariant distribution π is found from π = πQ as π = (1/3, 1/3, 1/3). Clearly, πiQij 6=πjQji because Q is not symmetric. Therefore the chain is not reversible.

3.2 Examples and exercises on Markov chains

3.2.1 Two-state Markov chain

Let us consider in some detail a Markov chain with two states. This is the simplest non-trivialstate space. The most general two-state chain has transition matrix of the form

Q =

(

1− α αβ 1− β

)

(3.50)

In order to calculate the time-dependent probabilities p(n) for given initial probabilities p(0), itis often useful to use the diagonal or spectral representation of Q. Suppose that Q has distincteigenvalues, λ1, λ2. Then it is a standard result of matrix theory that we can find a 2×2 matrixU such that

Q = U

(

λ1 00 λ2

)

U−1 (3.51)

where the columns q1,q2 of U are solutions of the equations

Qqi = λiqi (i = 1, 2) (3.52)

Hence we have

Qn = U

(

λn1 00 λn2

)

U−1 (3.53)


Let us therefore carry out this representation for the two-state Markov chain. The eigenvaluesof Q are the solution of the characteristic equation

|Q− λI| = 0 (3.54)

Hence, λ1 = 1 and λ2 = 1−α−β (and λ1 6= λ2, provided α+β 6= 0). Note also that eigenvalueλ2 is in modulus less than unity unless α+ β = 0 or α+ β = 2. The corresponding eigenvectorsare

λ1 = 1 :

(1− α)x11 + αx1

2 = x11

βx11 + (1− β)x1

2 = x12⇒ (x1

1, x12) = (1, 1) (3.55)

λ2 = 1− α− β :

(1− α)x21 + αx2

2 = (1− α− β)x21

βx21 + (1− β)x2

2 = (1− α− β)x22⇒ (x2

1, x22) = (α,−β) (3.56)

We may take then

U =

(

1 α1 −β

)

so U−1 =1

α+ β

(

β α1 −1

)

(3.57)

Hence,

Q = U

(

1 00 1− α− β

)

U−1 (3.58)

Note that the rows of U−1 are given by the left eigenvectors (y11, y

12) = (β, α) and (y2

1, y22) =

(1,−1), associated to λ1 = 1 and λ2 = 1− α− β, respectively. Also, left and right eigenvectorsare biorthogonal, as they should. We can express the solution in terms of the eigenvectors of Q.One way to do that is to normalize the eigenvectors, so that x(i)y(j) = δij , i.e. introduce e(1) =

x(1)/(x(1)y(1)) and e(2) = x(2)/(x(2)y(2)), and expand (Qn)ij =∑2ℓ=1 λ

nℓ e

(ℓ)i y

(ℓ)j . Alternatively,

we can perform a matrix multiplication

Qn =1

α+ β

(

1 α1 −β

)(

1 00 (1− α− β)n

)(

β α1 −1

)

=1

α+ β

(

β αβ α

)

+(1− α− β)n

α+ β

(

α −α−β β

)

(3.59)

As we said, one question that often arises is whether after a sufficiently long period of timethe system settles down to a condition of statistical equilibrium in which the state occupationprobabilities are independnet of the initial condition. This is so if the matrix is regular, i.e. forα 6= 0 and β 6= 0 and α+ β 6= 2. The equilibrium distribution is found from π = πQ

απ1 − βπ2 = 0, −απ1 + βπ1 = 0 (3.60)

Clearly, |I−Q| = 0, so one of the equation is redundant and another equation is required to fixπ uniquely. The extra equation is provided by the fact that we need the condition π1 + π2 = 1for a probability distribution, so

π1 =β

α+ βπ2 =

α

α+ β(3.61)


The first term in (3.59) is constant and is seen to be

(

π1 π2

π1 π2

)

(3.62)

while the second term is a transient term and tends to zero rapidly as n increases, since |1 −α− β| < 1. Thus, as n→∞,

Qn →(

π1 π2

π1 π2

)

(3.63)

and

p(n) → p(0)

(

π1 π2

π1 π2

)

= (π1, π2) = π (3.64)

In the case of two state Markov chains we may use an alternative route to calcualte the n-steptransition probability from a given state i to another state j. We may exploit the relation

Qn+1 = QnQ to get a recurrence equation for Q(n)ij . As an example, we write Q

(n+1)11 as

Qn+111 = Q

(n)12 β +Q

(n)11 (1− α) (3.65)

We also know that Q(n)11 +Q

(n)21 = 1, so by eliminating Q

(n)12 we get a recurrence relation for Q

(n)11 :

Q(n+1)11 = β + (1− α− β)Q

(n)11 , Q

(0)11 = 1 (3.66)

This equation has the form of the recurrence relation

xn+1 = axn + b (3.67)

The way one solves such a relation is by looking first for a constant solution xn = x. Thenx = ax+ b, so provided a 6= 1 we have x = b/(1−a). Now yn = xn− b/(1−a) satisfies yn = aynso yn = any0. Thus the general solution for a 6= 1 is given by

xn = Aan +b

1− a (3.68)

where A = x0 − b/(1− a) is constant. When a = 1 the general solution is obviously

xn = x0 + nb (3.69)

Thus equation (3.66) has a unique solution

Qn11 =

βα+β + α

α+β (1− α− β)n for α+ β > 0

1 for α+ β = 0(3.70)

Exercise (Virus Mutation) - Suppose a virus can exist in N different strains and in eachgeneration either stays the same or with a probability α mutates to another strain, which is


chosen at random. What is the probability that the strain in the n-th generation is the same asthat in the 0-th?

Answer : 1N +

(

1− 1N

) (

1− αNN−1

)n(3.71)

Exercise (Marbles in urns) - There are two white marbles is urn A and four red marbles in urnB which can be interchanged. At each step of the process a marble is selected at random fromeach urn and the two marbles selected are interchanged.

• Find the transition matrix Q, and obtain its eigenvalues and its left and right eigenvectors.

• What is the probability that there are two red marbles in urn A after three steps? Andafter many steps?

• Express the probability vector p(n) in terms of the left and right eigenvectors for thisproblem.

3.2.2 Three-state chain

Consider the three state chain with transition matrix

Q =

0 1 00 1

212

12 0 1

2

(3.72)

The problem is to find a general formula for Q(n)11 and the invariant distribution. To find the

invariant distribution we solve the vector equation π = πQ. This yields two scalar equations.The extra equation π1 +π2 +π3 = 1 is needed to fix π uniquely and we find π = (1/5, 2/5, 2/5).To calculate Qn11 we first compute the eigenvalues of Q from the characterstic equation and weget

λ1 = 1, λ2,3 = ± i2

(3.73)

Having distinct eigenvalues, Q is diagonalizable, that is for some invertible matrix U we get

Q = U

1 0 00 i

2 00 0 − i

2

U−1 (3.74)

and hence

Qn = U

1 0 00 (i/2)n 00 0 (−i/2)n

U−1 (3.75)

which forces Q(n)11 to have the form

Q(n)11 = a+ b

(

i

2

)n

+ c

(−i2

)n

(3.76)

for some constants a, b, c. The answer we want is real so it makes sense to rewrite Q(n)11 in the

form

Q(n)11 = α+

(

1

2

)n

β cosnπ

2+ γ sin

nπ

2

(3.77)


for constants α, β, γ. The first few values of Q(n)11 are easy to obtain so we get equations for

α, β, γ:

1 = Q(0)11 = α+ β (3.78)

0 = Q(11)11 = α+

1

2γ (3.79)

0 = Q(2)11 = α− 1

4β (3.80)

so α = 1/5, β = 4/5, γ = −2/5 and

Q(n)11 =

1

5+

(

1

2

)n 4

5cos

nπ

2− 2

5sin

nπ

2

(3.81)

Alternatively, we could have used the fact that Q(n)ij → πj as n → ∞, for all i, j, to identify

a = 1/5, as

limn→∞

Q(n)11 = π1 =

1

5(3.82)

More generally, the following method may in principle be applied to find the formula for Q(n)ij

for any M -state chain and any state i, j

1. Compute the eigenvalues λ1, ..., λM of Q by solving the characteristic equation.

2. If the eigenvalues are distinct then Q(n)ij has the form

Q(n)ij = a1λ

n1 + ...+ aMλ

nM (3.83)

for some constants a1, ..., aM (depending on i, j). If an eigenvalue λ is repeated (once, say)then the general form includes the term (a+ bn)λn.

3. Complex conjugate eigenvalues are best written using sine and cosine.

Exrcise (Random walk on a graph) -Consider a graph ofN nodes, and connectivity matrix c. We denote the local degrees ki =

∑

j cij .A random walk on this graph is a Markov chain with transition matrix

Qij =

1k1

if cij = 1

0 if cij = 0(3.84)

(a) Justify (3.84).

(b) Show that random walks on graph are reversible Markov chains, with stationary distribu-tion π = (π1, ..., πN ), πi = ki/

∑

i ki.

Exercise A mouse lives in a house of three rooms, A, B, C. There are three doors between roomA and B, two doors between room B and C, and one door between room A and C, as shown infigure (3.1).


Figure 3.1: Mouse’s house

At regular time intervals, a door, in the room occupied by the mouse, is opened at randomand the mouse is trained to change room each time. After the mouse changes room the door isclosed.

(a) Find the transition matrix Q, i.e. the probabilities for the mouse to move from one roomto another.

(b) Approximately what fraction of its time will the mouse spend in each room?

(c) If the mouse wants to spend the same fraction of its time in each room, it cannot make atransition each time a door is opened, but sometimes it will need to reject the proposedmove and stay in the room it is occupying when the door is opened (see Fig. 3.2). Inother words, we must introduce a probability A(Y |X) that the mouse accept the proposedmove from room X to room Y (and changes room) and a probability 1−A(Y |X) that themouse rejects the move (and stays in X).

Figure 3.2: New rules

We callM : X → Y the elementary move by which the mouse changes room, so Y = M(X).There is only a limited set DX of moves M which can act on room X. Let us denote |DX |the number of moves that can act on state X (i.e. the number of doors through whichthe mouse can escape room X). Also, all the moves M are invertible, i.e. for each move


M ∈ DX there exists a unique M−1 such that M−1[M(X)] = M [M−1(X)] = X. We canwrite the transition probability to go from room X to room Y

QXY =∑

M∈DX

p(M |X)[

A(M(X)|X)δY,M(X) + [1−A(M(X)|X)δY,X ]]

(3.85)

where p(M |X) is the probability that move M is proposed to the mouse, when it is inroom X. Since each time moves are drawn at random and with equal probabilities fromthose that are allowed to act (each door is opened with the same probability), we have

p(M |X) =1

|DX |(3.86)

Working out the detailed balance condition, upon writing the equilibrium state π =(1/3, 1/3, 1/3), leads to the following condition for the acceptance probabilities:

(∀ X, ∀ M ∈ DX) p(M |X)A(M(X)|X) = p(M−1|M(X))A(X|M(X)) (3.87)

Find the transition matrix Q.

Hint: Acceptance probabilities of the form

A(Y |X) =|DX |

|DX |+ |DY |(3.88)

satisfy the relation1

|DX |A(Y |X) =

1

|DY |A(X|Y ) (3.89)

The choice of acceptance probability is not unique. Equation (3.88) corresponds to the Glauber

choice

A(Y |X) =P (M |Y )

P (M |X) + P (M |Y )(3.90)

Another popular choice is the Metropolis one

A(Y |X) = min

1,P (M |Y )

P (M |X)

(3.91)


3.3 One-step processes

In the previous section we considered processes in discrete time. Now we turn to continuous timeprocesses with discrete state space. The methods to be used will be similar, with transitionsoccurring between times i and i+ 1 now replaced by the ones occurring in a small time interval(t, t+ ∆t].

An important family of Markov processes in continuous time and discete state space are thegeneration-recombination or birth-death processes, which in short we call one-step processes.For these processes the range consists of integer n and only jumps between adjacent states arepermitted. We define gn = W (n+ 1|n, t) the probability per unit time for a jump from state nto state n+ 1 and rn = W (n− 1|n, t) the probability per unit time for a jump from n to n− 1.Therefore, for infinitesimal time interval ∆t, the following transition probability laws hold:

P1|1(n+ 1, t+ ∆t|n, t) = gn∆t+O(∆t)

P1|1(n− 1, t+ ∆|n, t) = rn∆t+O(∆t)

P1|1(n, t+ ∆t|n, t) = 1− (gn + rn)∆t+O(∆t)

P1|1(n+ k, t+ ∆t|n, t) = O(∆t2) ∀ |k| > 1 (3.92)

We can easily write the Master equation for the evolution of the probability distribution pn(t)that the system is in state n at time t. The probability of finding the system in state n at timet+ ∆t is given by the sum of probabilities of all previous states multiplied by the probability oftransion to state n. For small ∆t

pn(t+ ∆t) = [pn+1rn+1 + pn−1rn−1 + (1− gn − rn)pn]∆t (3.93)

Letting ∆t→ 0 we find the master equation for one-step processes

pn = lim∆→0

pn(t+ ∆t)− pn(∆t)∆t

= rn+1pn+1 + gn−1pn−1 − (rn + gn)pn (3.94)

One step processes occur at e.g. birth and death of individuals, generation and ricombinationprocesses of charge carriers, absorption or emission of photons or particles, arrival and departuresof customers. The name does not imply that it is impossible for n to jump by two or more unitsin a time ∆t, but only that the probability for this to happen is O(∆t2).

If n = 0 is a boundary, equation (3.94) is meaningless for n = 0 and has to be replaced with

p0 = r1p1 − g0p0 (3.95)

Alternatively, it is possible to declare (3.94) valid for n = 0 with the added definition r0 = g−1 =0. Similarly, an upper boundary N requires

pN = gN−1pN−1 − rNpN (3.96)

or rN+1 = gN = 0.

One step processes can be subdivided in the following categories, depending on the propertiesof the coefficients rn and gn

3.3. ONE-STEP PROCESSES 53

• Linear, if the coefficients are linear functions of n

• Non-linear, if the coefficients are non-linear functions of n

• Random-walks, if the coefficients are constant

3.3.1 The Poisson process

The Poisson process is a random walk over n = 0, 1, ... with steps to the right only, occurring atrandom times. It is defined by

rn = 0, gn = q, pn(0) = δn,0 (3.97)

i.e. no recombination exists and q is constant. The kronecher delta merely expresses that theprobability for no events to have occurred after time zero equals one, and the probability ofmore than one event occurring after time zero equals zero.

The Poisson process calculates the probability of n independent events occurring at time t > 0.These events could be for example the tunneling of electrons through a single barrier (shotnoise). If we indicate by N(t) the number of events occurring between the initial time t = 0and time t, each sample function N(t) takes only integer values n = 0, 1, ... and is a successionof steps of unit height at random moment. These time points constitute a random set of dotson the time axis and their number between any two times t1 and t2 is distributed according tothe Poissonian distribution. Since the events are independent, there is a probability q∆t of astep to occur in (t, t+ ∆t] regardless of what happened before, so N(t) is Markov. The Masterequation for Poisson process has the form

pn = q(pn−1 − pn) (3.98)

Master equations depending on discrete stochastic variables are most easily solved through useof generating function as

F (z, t) =∞∑

n=0

pn(t)zn (3.99)

Various moments of the stochastic variable n are obtained by taking derivatives of F (z, t) withrespect to z and allowing z → 1. For example

〈n〉 = limz→1

∂F (z, t)

∂z=

∞∑

n=0

npn(t) (3.100)

and

〈n2〉 − 〈n〉 = limz→1

∂2F (z, t)

∂z2=

∞∑

n=0

(n2 − n)pn(t) (3.101)

An alternative from of the last two equations is often convenient

〈n〉 = limz→1

∂ logF (z, t)

∂z

〈n2〉 − 〈n〉2 − 〈n〉 = limz→1

∂2F (z, t)

∂z2(3.102)


Multiplying the master equation by zn and summing over n we get

∂F (z, t)

∂t=

∑

n

znpn(t) =∑

n

zn[q(pn−1 − pn)]

= q∞∑

n=1

zn−1pn−1(t)− q∞∑

n=0

znpn(t) (3.103)

so∂

∂tF (z, t) = q(z − 1)F (z, t) (3.104)

and the solutionF (z, t) = F (z, 0)eq(z−1)t (3.105)

contains an arbitrary function F (z, 0) which is determined from initial conditions. If we take asour initial distribution pn(0) = δn,0 this is sufficient because, due to invariance for a shift in n,this also covers the case pn(0) = δn,m, for every m, and by suitable superposition of these, anyinitial distribution can be reproduced. We then get F (z, 0) = 1 and we find pn(t) by expandingF (z, t) in powers of z

F (z, t) = qq(z−1)t = eqztq−qt =∞∑

n=0

(qzt)n

n!e−qt =

∞∑

n=0

znpn(t) (3.106)

where

pn(t) =(qt)n

n!e−qt (3.107)

We now show that the time intervals between successive events (times at which unit stepsoccur) are (independently) distributed with the exponential probability distribution functionqe−qt. Set t0 as the new time origin. An event may or may not have occurred at t0. Anyproperty of the process after t0 is independent of what happened at t0 or before. Let t0 + tbe the time at which the first event occur after t0. Calculate the probability distribution of t.Define P (T ) = P (t > T ).

P (T + ∆T ) = P (t > T + ∆T ) = P (t > T and no event occurs in (t0 + T, t0 + T + ∆T])

= P (t > T )P (no event occurs in (t0 + T, t0 + T + ∆T]|t > T )

= P (t > T )P (no event occurs in (t0 + T, t0 + T + ∆T])

= P (T )(1− q∆t+O(∆T )) (3.108)

orP ′(T ) = −qP (T ) → P (T ) = e−qT (3.109)

where we usedP (0) = P (t > 0) = 1 (3.110)

So

P (t) = P (T > t) = 1− P (T ) = 1− e−qT (3.111)

The probability density WT (t) follows as

WT (t) =dP

dt= qe−qT (3.112)

Becuase 〈t〉 = q−1 it follows that the transition probability in the master equation equals theinverse expectation time for the occurrence of an event.

3.4. EXAMPLES AND EXERCISES ON ONE STEP PROCESSES 55

3.4 Examples and exercises on One step processes

3.4.1 The decay process

An example of a linear one-step process is the decay process.

Exercise Consider a piece of radioactive material. The number of active nuclei surviving attime t > 0, N(t), is a non-stationary Markov process. Find the time evolution for the averagenumber of survived nuclei.

AnswerLet pn(t) be the probability that there are n surviving nuclei at time t. If γ is the decayprobability per unit time for one nucleus, the transition probability from m to n in a short time∆t is

P1|1(n, t+ ∆t|m, t) = δm,n(1− γm∆t) + δm−1,nγm∆t+O(∆t2) (3.113)

which leads to the master equation

pn(t) = γ(n+ 1)pn+1(t)− γnpn(t) (3.114)

We now apply a device for linear Master equation which consists in multiplying both sides of(3.114) by n and summing over n, thus obtaining

∞∑

n=0

npn = γ∞∑

n=0

n(n+ 1)pn+1 − γ∞∑

n=0

n2pn

= γ∞∑

n=0

(n− 1)npn − γ∞∑

n=0

n2pn = −γ∞∑

n=0

npn (3.115)

Thus we have found the mean field equation for N(t)

d〈N(t)〉dt

= −γ〈N(t)〉 (3.116)

Solving the above equation for 〈N(0)〉 = n0 gives

〈N(t)〉 = n0e−γt (3.117)

3.4.2 Birth-death process

Linear and non-linear birth-death processes occur very commonly in chemistry and populationdynamics. Let us consider a population of m bacteria at time t such that

• The probability of a bacterium dying in time t→ t+ ∆t is rm∆t

• The probability of a bacterium being created in time t→ t+ ∆t is gm∆t

• The probability of no change in the number of bacteria in time t→ t+∆t is 1−(gm+rm)∆t

• The probability of more than one birth or death in time t→ t+ ∆t is zero.


Exercise Obtain a partial differential equation for the generating function.

AnswerWe can write

P1|1(n, t+ ∆t|m, t) = (1− (gm + rm)∆t)δn,m + (gmδn,m+1 + rnδn,m−1)∆t (3.118)

Let us now assume that the probability of a birth or death is proportional to the number ofbacteria present so that gm = mg and rm = mr. We obtain the master equation for the linearbirth-death process

∂P1(n, t)

∂t= (n− 1)gP1(n− 1, t) + (n+ 1)rP1(n+ 1, t)− (ng + nr)P1(n, t) (3.119)

It is linear because the coefficient on the right hand side are linear in n. Solution can agin befound by using generating function

F (z, t) =∞∑

n=−∞

pn(t)zn (3.120)

In terms of F the master equation reads

∂F

∂t= (z − 1)(gz − r)∂F

∂z(3.121)

which is a first-order linear partial differential equation and may be solved by using the methodsof characteristics.

3.4.3 Chemical reaction

An example of nonlinear one step process is a chemical reaction of the form

X → 2X with rate k (3.122)

X ← 2X with rate k′ (3.123)

In this case, for state n, the generation probability per unit time is gn = kn. For the reversereaction, one can assemble pairs of molecules of X in n(n− 1) ways, thus rn = n(n− 1)k′.

Exercise Show that the Master equation is

pn = k′n(n+ 1)pn+1 + k(n− 1)pn−1 − knpn − k′n(n− 1)pn (3.124)

and show that the equation for the average 〈n〉 is not closed (due to nonlinearity).

3.4.4 Continuos time random walk

Here we present a continuous time formulation of the random walk, which can be described bya master equation. A walker takes steps left or right with a probability per unit time q, whichmeans that the walker waits at each point for a variable time.

3.4. EXAMPLES AND EXERCISES ON ONE STEP PROCESSES 57

For unbounded symmetrical random walk rn = gn = q. The transition from m to n in a shorttime ∆t is

P (n, t+ ∆t|m, t) = q∆tδn,m+1 + q∆tδn,m−1 + (1− 2q∆t)δn,m (3.125)

and master equation results

∂pn(t)

∂t= qpn−1(t) + qpn+1(t)− 2qpn(t) (3.126)

The constant q can be absorbed into the time unit. This simple example often suffices to illustratemore complicated processes. In particular it contains the essential features of a physical diffusionprocess.

Exercise Show, by using generating function, that the solution of (3.126) is

pn(t) = e−2tq∞∑

l=0

t2l+n

(l + n)!l!= e−2tqI|n|(2tq) (3.127)

where In is the n-th modified Bessel function. Verify that 〈n(t)〉 = 0 and 〈n2(t)〉 = 2t, typicalof diffusion processes.

3.4.5 Diffusion-annihilation on complex networks

Consider the diffusion-annihilation process A + A → 0 on a complex network of size N , fullydefined by the adjacency matric cij , which takes the values cij = 1 if vertices i and j areconnected by an edge, and 0 otherwise. The network is undirected, so cij = cji. We denote

ki =∑

j

cij the degree of node i

p(k) =∑

i

δk,kithe degree distribution

〈k〉 =∑

k

kp(k) the average degree

W (k, k′) =

∑

ij cijδk,kiδk′,kj

N〈k〉 the degree correlation (3.128)

Each vertex in the network can host at most one A particle and the dynamics of the processis defined as follows: each particle jumps at a certain rate λ to a randomly chosen nearestneighbour. If the latter is empty the particle fills it, leaving the first vertex empty. If the nearestneighbour is occupied the two particles annihilate, leaving both vertex empty.

Exercise Let ni(t) be a binary random variable taking values 0 if vertex i is empty and 1if vertex i is occupied. The state at time t is completely defined by the state vector n(t) =(n1(t), n2(t), ..., nN (t)).

(a) Assuming that the time evolution of the particles follows a Poisson process, show that theevolution of ni(t) after a time increment dt can be expressed as

ni(t+ dt) = ni(t)θ(dt) + [1− ni(t)]σ(dt)


where θ(dt) and σ(dt) are binary random variables taking values

θ(dt) =

0 with probability λdt[

1 +∑

jcijnj(t)kj

]

1 otherwise

and

σ(dt) =

1 with probability λdt∑

jcijnj(t)kj

0 otherwise

(b) Find an equation for 〈ni(t + dt)〉|n(t), i.e. for the average evolution of the system, con-ditioned to the knowledge of its state at the previous time step. By taking the averageover all the possible configurations n, derive the following master equation for the densityρi(t) = 〈ni(t)〉 of A particle at site i:

dρi(t)

dt= −ρi(t) +

∑

j

cij1

kj[ρj(t)− 2ρij(t)]

where ρij(t) = 〈ni(t)nj(t)〉.

(c) Assume now that all the vertices with the same degrees are statistically equivalent, i.e.

ρi(t) ≡ ρk(t)∀ i such that∑

ℓ

ciℓ = k

ρij(t) ≡ ρkk′(t)∀ i, j such that∑

ℓ

ciℓ = k, and∑

ℓ

cjℓ = k′

Show that the density ρk(t) of A particles on nodes of degree k evolves according to

dρk(t)

dt= −ρk(t) +

〈k〉p(k)

∑

k′

ρk′(t)− 2ρkk′(t)

k′W (k, k′)

Chapter 4

Topics in equilibrium and

nonequilibrium statistical mechanics

Here we set up of the machinery for a microscopic probabilistic description of matter, i.e. lay thefoundations of statistical mechanics. We shall consider classical systems with a large numberof degrees of freedom such as N interacting particles in a box or N interacting objects in alattice. The motion of such objects is described by Newton’s laws or, equivalently, by Hamil-tonian dynamics. In 3 dimensions, such a system is specified by 6N independent positions andmomentum coordinates. In the 6N -dimensional phase space the state of the system is givenby a single point which moves according to Hamiltonian dynamics, as the state of the systemchanges. If we are given a real N -particle system, we never know exactly what its state is. Weonly know with a certain probability that it is one of the points in the phase space. Thus, thestate point can be regarded as a stochastic variable and we can assign a probability distributionto the points in phase space in accordance with our knowledge of the state of the system.

We can then view the phase space as a probability fluid which flows according to Hamiltoniandynamics. In this way we obtain a connection between the Mechanical description and a Proba-bilistic description, and the problem of finding an equation of motion for the probability densityreduces to a problem in fluid mechanics. For Hamiltonian systems, the phase space probabilitydistribution evolves according to Liouville equation.

The N -body probability density for a classical system contains much more information about thesystem than we would ever need or want. In practice, the main use of the probability densityis to find expectation values or correlation functions for various observables, since those arewhat we measure experimentally and what we deal with in thermodynamics. As a consequenceof the discrete nature of matter, the system receives random “kicks” from density fluctuationscontinually, even when it is in thermodynamic equilibrium, so systems in equilibrium undergofluctuations about their equilibrium state. Information about these fluctuations is containedin the dynamic correlation function for the equilibrium system, and in the power spectrum ofthe equilibrium fluctuations. The power spectrum is a quantity often measured in experiments.The time reversal invariance of the underlying Newtonian dynamics of the various degrees offreedom puts constraints on the behaviour of the dynamic equilibrium fluctuations and thetime-dependent correlation functions that characterize these fluctuations.

59

60CHAPTER 4. TOPICS IN EQUILIBRIUM AND NONEQUILIBRIUM STATISTICAL MECHANICS

Systems which are out of equilibrium generally return to the equilibrium state through a varietyof process which may or may not be coupled to one another. Generally there are a few degreesof freedom that decay beck to equilibrium very slowly. Fluctuations about the equilibriumstate decay on the average according to the same macroscopic laws that govern the decay ofa non-equilibrium system to the equilibrium state. If we can probe equilibrium fluctuationswe have a means of probing the decay process in a system. Linear response theory provides atool for probing equilibrium flucutations by applying a weak external field which couples to thesystem. The system responds to the field in a manner that depends entirely on the spectrum ofequilibrium flucutations. The response to the dynamic field is measured by the susceptibility.The fluctuation-dissipation theorem links the susceptibility to the correlation for equilibriumfluctuations. The spectrum of equilibrium fluctuations determines the rate of absorption ofenergy from the external field.

4.1 Equilibrium

Consider an isolated system whose energy is specified to lie in a narrow range. Let Ω be thenumber of states accessible to the system. In equilibrium this system is equally likely to befound in any of these states (postulate of equal a priori probability).

If a removal of constraints result in increasing the number of states accessible to the systemΩf > Ωi, immediately after the constraints are removed, the system occupies only a fractionP = Ωi/Ωf of the Ωf states now accessible. This is not an equilibrium situation. If Ωf >> Ωi

then there is a pronounced tendency for the situation to change in time until the system in theensemble are distributed equally over all the possible Ωf states (and equilibrium distributionof systems over accessible states is attained). In this case equilibrium does not prevail at allstage of the process and the process is irreversible (in this case the removal of constraints cannotrestore the initial situation). Irreversible thermodynamics comprises the phenomenologic studyof the approach to equilibrium and the behaviour of, and conditions for, a steady state in asystem close to equilibrium.

4.2 Probability distributions in dynamical systems

The state of a closed classical system with 3N degrees of freedom (for example, N particlesin a three-dimensional box) is specified by a set of 6N independent real variables XN =XN (q1, ..., qN , p1, ..., pN ), where qj and pj are the position and momentum of the jth parti-cle. The time evolution of (qi, pi) is governed by the Hamiltonian H(XN , t)

qi =∂H

∂pi

pi = −∂H∂qi

(4.1)

If the state XN is known at time t, it is completely determined for any other time (the motionis deterministic so it is uniquely determined from initial conditions). If the Hamiltoanian doesnot depend explicitely on the time, i.e. H = H(XN ), the system is conservative and the energyis conserved. Let Γ be the 6N -dimensional phase space. The state vector XN is a point in Γ.

4.2. PROBABILITY DISTRIBUTIONS IN DYNAMICAL SYSTEMS 61

As the state evolves in time, XN traces out a trajectory in Γ (different trajectories do not crossbecause the system is deterministic.)

When we deal with real systems we can never specify the state of the system. There is al-ways some uncertainty in the initial conditions. So we consider XN as a stochastic variableand introduce a probability density ρ(XN , t) on the phase space, where ρ(XN , t) dXN is theprobability that the state point XN lies in the volume element XN , XN + dXN at time t(dXN = dq1, ..., dqN , dp1, ..., dpN ). We introduce a picture of Γ filled with a fluid of state points.Each point of fluid is assigned a probability in accordance with our knowledge of the systemand carry this probaiblity for all time (probability is conserved). The change in our knowledgecorresponds to the fluid flowing. State points must always lie somewhere so

∫

dXN ρ(XN , t) = 1 (4.2)

If at some time there is only a small uncertainty in the state of the system, the probabilitydensity will be sharply peaked in the region where the state is known to be located and zeroelsewhere. As time passes either probability remains sharply peaked (although the peak regioncan move) and we do not lose knowledge about the state of the system, or it spreads out andbecomes rather uniformly distributed (all knowledge is lost). The probability behaves like afluid in phase space. So we shall use fluid mechanics to obtain equations for prbability densities.

Consider a small volume V0 at a fixed location in Γ. Since the probability is conserved, the totaldecrease in probability in V0 per unit time is entirely due to the flow of probability through thesurface of V0

dP (V0)

dt=

∂

∂t

∫

V0

dXN ρ(XN , t) = −∮

SdSN ρ(XN , t)XN (4.3)

By using Gauss theorem we get

∂

∂t

∫

V0

dXN ρ(XN , t) = −∫

V0

dXN ∇XN (ρ(XN , t)XN ) (4.4)

with

∇XN =

(

∂

∂q1, ...,

∂

∂qN,∂

∂p1, ...,

∂

∂pN

)

(4.5)

Thus we obtain the continuity equation

∂

∂tρ(XN , t) +∇XN (ρ(XN , t)XN ) = 0 (4.6)

that, we rewrite

∂ρ

∂t+

N∑

i

(

∂

∂qi(ρqi) +

∂

∂pi(ρpi)

)

= 0 (4.7)

The second term on the LHS evaluates to

qi∂ρ

∂qi+ pi

∂ρ

∂pi(4.8)


because

ρ

(

∂qi

∂qi+∂pi

∂pi

)

= ρ

(

∂2H

∂qi∂pi− ∂2H

∂pi∂qi

)

= 0 (4.9)

so finally

∂ρ

∂t+

N∑

i

(

qi∂ρ

∂qi+ pi

∂ρ

∂pi

)

= 0 (4.10)

Note that this equation gives the time rate of change of ρ(XN , t) at a fixed point in phase space.We can rewrite the continuity equation in terms of the operator

L =N∑

i

(

∂H

∂pi∂

∂qi− ∂H

∂qi∂

∂pi

)

(4.11)

as∂ρ

∂t= −Lρ (4.12)

or, by introducing the Liouville operator L = −iL, as

i∂ρ

∂t= Lρ (4.13)

which is called Liouville equation. The Liouville operator is a Hermitian differential operator. Ifwe know the probability distribution at time t = 0 we can know the probability density at timet

ρ(XN , t) = e−iLtρ(XN , 0) (4.14)

A probability density which remains constant (stationary) in time satisfies Lρstat = 0. Liouvilleequation implies that the volume of a region in the phase space which evolves according toHamilton equation is conserved. Following Gibbs, the probability density is often interpreted interms of the concept of ensemble of systems. If we look at each system at a given time, it will berepresented by a point in the 6N -dimensional phase space. The density of points representingan ensemble of n systems in phase space is given by nρ(XN , t). Equation (4.14) gives thegeneral equation for the evolution of the probability density of a classical dynamical system.However it is different in an important respect from the Master equation we have derived inthe pervious chapter. Liouville operator is Hermitian so has real eigenvalues. So the solutions(4.14) will oscillate and not decay to a unique equilibrium state. Furthermore, if we reverse thetime we do not change the equations of motion for the probability density, since the Liouvilleoperator L changes the sign under time reversal. This is different from the Master equationand the corresponding Fokker-Planck equation which will change into a new equation undertime reversal. Equation (4.14) does not admit an irreversible decay of the system to a uniqueequilibrium state, as we commonly observe in nature. The problem of obtaining irreversibledecay to equilibrium from the Liouville equation, is one of the central problem of statisticalphysics and it is beyond the scope of this course.

4.3 Detailed balance

In this section we prove detailed balance for closed, isolated, classical systems, i.e. for systemswhich can be described in terms of an Hamiltonian, for which energy is a constant of motion

4.3. DETAILED BALANCE 63

(so that the trajectories in Γ are confined to single energy shells) and for which no exchange ofmatter takes place (so that the set of microscopic variables is fixed).

The law of microscopic reversibility states that in equilibrium a process and its reverse willoccurr with the same frequency. Consider a 2-dimensional system with N degrees of freedoms.A microstate (configuration) will be represented by a point in the 2N -dimensional phase space.Assume that the phase point moves in the course of time t from X = (q, p) to Xt = (q′, p′)under Hamiltonian dynamics. Hamiltonian systems are time reversal invariant, so reversal ofall moments in the system will cause the system to retrace its steps. So if the system startsin Xt = (q′,−p′) it will be at a time t later in X = (q,−p). The bar operation conserves thevolume in Γ so P (q′, p′, t|q, p, 0)dqdp = P (q,−p, t|q′,−p′, 0)dq′dp′.

Let Y (q, p) be a set of macroscopic observables. Let A be a set of microscopic configurations(phase points) such that Y (q, p) = y. Let B be the set of phase points for which Y (q, p) = y′.Assume Y are even functions of the moments YX = YX .

P (y′, t|y, 0)P (y, 0) =

∫

Y (q,p)=ydp dp

∫

Y (q′,p′)=y′dq′ dp′ P (q′, p′, t|q, p, 0)P (q, p, 0)

=

∫

Y (q,p)=ydq dp

∫

Y (q′,p′)=y′dq′ dp′ P (q,−p, t|q′,−p′, 0)P (q, p, 0)

(4.15)

In equilibrium, by equal a priori probability, P (q, p) = P (q′, p′) = 1/Ω(E), where Ω(E) is thevolume of the energy shell, so

P (y′, t|y, 0)P (y, 0) =

∫

Y (q′,−p′)=y′dq′ dp′

∫

Y (q,−p)=ydq dpP (q,−p, t|q′,−p′, 0)P (q′, p′, 0)

= P (y, t|y′, 0)P (y′, 0) (4.16)

In discrete state spaceWnn′pn′ = Wn′npn ∀ n, n′ (4.17)

Detailed balance holds for closed, isolated, classical systems in equilibrium with

• Hamiltonian is an even function of all moments (If H is not even in the moments, e.g.because of a field, we could proceed as before, but we would have to revert the field aswell.)

• Observable Y is an even function of all moments.

For systems which are not in equilibrium we cannot apply the laws of microscopic reversibilityand detailed balance does not work. However, a system can be brought to a steady state byoutside influences (gradients maintained by reservoirs). In this case equation

∑

n′

Wnn′pn′ =∑

nn′

Wn′npn (4.18)

holds and we get in the master equation

dpndt

=∑

n′

(Wnn′pn′ −Wn′npn) = 0 (4.19)


4.4 Time dependent correlations

Microscopic reversibility implies that the time dependent correlation function of mesoscopicfluctuations x = (x1, ..., xN ), xi = Xi − X0

i , of the state variables Xi, about the equilibriumvalue X0

i , obey the relation〈xi(0)xj(t)〉 = 〈xi(t)xj(0)〉 (4.20)

In fact, we have for the correlation matrix

C(t) = 〈x(0)x(t)〉 =

∫

dx dx′ xx′P (x, 0)P (x′, t|x, 0)

=

∫

dx dx′ xx′P (x′, 0)P (x, t|x′, 0) = 〈x(t)x(0)〉 (4.21)

so C(t) = CT (t). For systems governed by a stationary distribution (such as systems in equilib-rium) we also have

C(t) = 〈x(t)x(0)〉 = 〈x(t+ T )x(T )〉= 〈x(0)x(−t)〉 = 〈x(−t)x(0)〉T = CT (−t) (4.22)

where in the last line we set T = −t. Microscopic reversibility and stationarity therefore imply

C(t) = C(−t) (4.23)

4.5 Wiener-Khintchine theorem

The Wiener-Khinctchine theorem enables us to obtain a relation between the correlation ma-trix of time dependent fluctuations of a certain quantity X(t) and its spectral density matrixfor ergodic systems with stationary distribution functions. The spectral density matrix of afluctuating quantity X(t) is defined as

S(ω) = limT→∞

1

T|X(ω;T )|2 where X(ω;T ) =

∫ T

−Tdt e−iωtX(t;T ) (4.24)

and

X(t;T ) =

X(t) |t| < T0 |t| ≥ T (4.25)

Because X(t) is real,

|X(ω, T )|2 = X⋆(ω;T )X(ω;T ) = X(−ω;T )X(ω;T ) (4.26)

so

S(ω) = limT→∞

1

T

∫ T

−Tdt e−iωtx(t;T )

∫ T

−Tdt′ eiωt

′

x(t′;T )

= limT→∞

1

T

∫ T

−Tdt

∫ T−t

−T−tdτ eiωτ x(t;T )x(t+ τ ;T )

= limT→∞

1

T

∫ ∞

−∞dτ eiωτ

∫ ∞

−∞dt x(t;T )x(t+ τ ;T ) (4.27)

4.5. WIENER-KHINTCHINE THEOREM 65

If we now invoke the ergodic theorem, we can equate the time average to the ensemble averageand thus express the correlation function of the fluctuating quantity X(t) as

C(τ) = 〈X(t)X(t+ τ)〉 = limT→∞

1

T

∫ T

−TdtX(t)X(t+ τ) = lim

T→∞

1

T

∫ ∞

−∞dtX(t;T )X(t+ τ ;T )

(4.28)We can then relate the spectral density matrix to the correlation matrix (Wiener-Khintchinetheorem)

S(ω) =

∫ ∞

−∞dτ eiωτC(τ) (4.29)

From time reversal invariance C(τ) = C(−τ),

S(ω) =

∫

dτ cos(ωτ)C(τ) = S(−ω) (4.30)

We conclude that the spectral density matrix S(ω) is a real and even function of ω. This impliesthat

C(τ) =1

2π

∫

dω e−iωτS(ω) =1

2π

∫

dω cos(ωτ)S(ω) (4.31)

4.5.1 White noise

The term white noise is applied, in analogy with the case of light, to any fluctuating quantityX with δ-correlation

〈X(t)X(t′)〉 = δ(t− t′) (4.32)

because the spectral distribution is then independent on the frequency ω

S(ω) =1

2π

∫ ∞

−∞δ(τ)e−iωτdτ =

1

2π

In the case of light, the frequencies correspond to different colors of light and if we perceive lightto be white, it is found that in practice all colors are present in equal proportions (the opticalspectrum of white light is thus flat) at least within the visible range. If the spectral densitydepends on the frequency, one uses the term colored noise.

White noise cannot actually exist (one simple demonstration is the UV catastrophe). However,models with almost flat S(ω) are very useful and much used. A typical one is

S(ω) =1

π(ω2τ2C + 1)

that is flat provided ω ≪ τ−1C . The Fourier transform can be explicitely evaluated in this case

(by residues) to give

C(τ) =1

τCe−τ/τC

so that the autocorrelation function vanishes for τ ≫ τC , which is called the correlation time

of the fluctuating quantity X. Thus, the delta function correlation appears as an idealization,only valid on sufficiently long time scale.


4.6 Fluctuation-Dissipation theorem

In the next few sections we examine the relaxation toward equilibrium of a system that hasbeen brought out of equilibrium by an external fluctuation. The main result is that for smalldeviation from equilibrium this relaxation is described by equilibrium time correlation function(Onsager regression law), i.e. the response of a system in thermodynamic equilibrium to a smallapplied force is the same as its response to a spontaneous fluctuation. So it is possible to probethe equilibrium fluctuation by applying a weak external field which couples to particles in themedium but yet is too weak to affect the medium. The system will respond to the field andabsorb energy from it in a manner which depends entirely on the spectrum of the equilibriumfluctuations. The flucutation dissipation theorem connects the irreversible dissipation of energyinto heat to reversible thermal fluctuations at thermodynamic equilibrium.

It was Einstein to observe, in 1905, in his paper on Brownian motion, that the same randomforces that cause the erratic motion of a particle in Brownian motion, could also cause drag ifthe particle were pulled through the fluid. In other words the fluctuations of the particle at resthave the same origin as the dissipative fluctuation force one must do work against if one triesto perturb the system in a particular direction.

4.6.1 Linear response

In this section we study the small deviations from equilibrium driven by a small perturbationapplied to the system when the equilibrium situation is described by an unperturbed classicalHamiltonian H0(q, p). (q, p) is the short-hand notation for the full set of canonical variables(q1, ..., qN , p1, ..., pN ), N being the number of the degrees of freedom. The equilibrium averageof a classical dynamical variable A(q, p) is

〈A〉eq =

∫

dq dpA(q, p)Peq (4.33)

where the probability density Peq is the normalized Boltzmann weight

Peq =e−βH0(q,p)

Z0, with Z0 =

∫

dq dp e−βH0 (4.34)

Let us perturb the Hamiltonian H0 by applying a constant perturbation coupled to the observ-able B, such that the energy changes from H0 into H1 = H0 + δH, with δH = −B f .

f is often called (external) field (or force, from forced harmonic oscillators where perturbationsare in the form δH = −f x). Let 〈...〉 denote average computed with the perturbed Hamiltonian.We define 〈δA〉 the variation of the response 〈A〉 of the system

〈δA〉 = 〈A〉 − 〈A〉eq =

∫

dq dpAe−β(H0+δH)

∫

dq dp e−β(H0+δH)−∫

dq dpAe−βH0

∫

dq dp e−βH0(4.35)

to a variation of the forces f . Since we are interested in small deviations from equilibriumHamiltonian H0, i.e. δH is small, we can restrict to the limit f → 0 and we can expande−βδH = 1 − βδH + ... = 1 − βBf + .... If we truncate the expansion to linear orders in the

4.6. FLUCTUATION-DISSIPATION THEOREM 67

forces f , we get the linear response

〈δA〉 =

∫

dq dpAe−βH0(1− βδH)

Z0

(

1 + β

∫

dq dp δHe−βH0

Z0

)

− 〈A〉eq

= −β〈AδH〉eq + β〈A〉eq〈δH〉eq= βf〈AB〉eq − βf〈A〉eq〈B〉eq= βf〈〈AB〉〉eq (4.36)

The deviation from equilibrium is linearly related to perturbation. So far we considered H1 timeindependent. Now assume that the perturbation f has been on for infinite time during whichthe system was in equilibrium with perturbed Hamiltonian, i.e. with the equilibrium probabilitydistribution

P1 =e−βH1(q,p)

∫

dq dp e−βH1(q,p)(4.37)

and is switched off at t = 0 (this introduces an explicit time dependence). We are interested inmeasurements in the system for t > 0 as it relaxes to equilibrium. Let

H =

H0 + δH ≡ H1 t < 0H0 t ≥ 0

(4.38)

The expectation value of A(t) calculated over the nonequilibrium ensemble is given by the aver-age over all possible dynamical paths originating from initial configurations at t = 0, weightedwith their equilibrium distribution P1. It is convenient to rewrite A(t) = A(t; q, p), whereq = q(0) and p = p(0), as a function of the intial conditions. The change in the measuredquantity, for βf ≪ 1

〈δA(t)〉 =

∫

dq dpA(t; q, p)e−βH0(q,p)+βB(0)f

∫

dq dp e−βH0(q,p)+βB(0)f−∫

dq dpA(t; q, p)e−βH0(q,p)

∫

dq dp eβH0(q,p)

= βf(〈A(t)B(0)〉eq − 〈A(t)〉eq〈B(0)〉eq)= βfCAB(t) (4.39)

where we have defined the Kubo function

CAB(t) = 〈〈A(t)B(0)〉〉eq. (4.40)

Note that CAB(t) = CBA(−t) because in absence of force the system is invariant under timeshifts 〈〈A(t)B(0)〉〉eq = 〈〈A(0)B(−t)〉〉eq. Equation (4.39) is normally referred to as Onsagerregression law.

For general time dependent perturbing forces f we write the linear response as

〈δA(t)〉 =

∫ t

−∞dt′RAB(t, t′)f(t′) (4.41)

where we have introduced the retarded Green function

RAB(t, t′) =δ〈A(t)〉δf(t′)

(4.42)


which gives the impulse response function, i.e. the response of 〈A(t)〉 at time t to a small impulseat time t′ in the field f(t) = fδ(t− t′) conjugate to B.

Rather often results are presented in the frequency domain. One defines the Fourier transformand its inverse as

A(ω) =

∫

dt eiωtA(t), A(t) =

∫

dω

2πe−iωtA(ω) (4.43)

The linear susceptibility χAB is the Fourier transform of (4.42).

Function RAB and its Fourier trasnform have the following properties:

• Because of the stationarity of the unperturbed system

RAB(t, t′) = RAB(t− t′) (4.44)

• Since the response must be causal (cannot preceed the force that causes it, i.e. perturba-tions cannot propagate backwards in time)

RAB(t− t′) = 0 for t < t′ (4.45)

• Finally, the response is real, so

χAB(ω) = χ⋆AB(−ω) (4.46)

Thus, if we introduce the real and imaginary part of the Fourier component of the response

χAB(ω) = χ′AB(ω) + iχ′′

AB(ω), χ′AB(ω) = ReχAB(ω) χ′′

AB(ω) = ImχAB(ω) (4.47)

we immediately get

χ′AB(ω) = χ′

AB(−ω) χ′′AB(ω) = −χ′′

AB(−ω) (4.48)

i.e. the real part of the susceptibility is an even function of ω, while the imaginary part isodd.

One can find relations between the real and the imaginary part of the susceptibility, Kramers-

Kronig relations, which are a direct consequnce of the response causality.

Using the Fourier representation, the convolution (4.41) is transformed into a product

〈δA(ω)〉 = RAB(ω)f(ω) (4.49)

and we find that a force of a given frequency can only excite a response of the same frequency.This will not be true if the response depends on the force (nonlinear response).

For the constant external perturbation switched off at t = 0, (4.41) specializes to f(t) = fθ(−t)

〈δA(t)〉 = f

∫ 0

−∞dt′RAB(t− t′) = f

∫ ∞

tdτ RAB(τ) (4.50)


Differentiating (4.50) with respect to time and using (4.39) then gives

RAB(t) =

−β ddtCAB(t) for t ≥ 0

0 for t < 0(4.51)

or

RAB(t) = −β ddtCAB(t)θ(t) (4.52)

4.6.2 The fluctuation-dissipation theorem

Now we restrict to work with single dynamical variable A, and we write simply R for RAA, so

R(t) = −βθ(t)C(t) (4.53)

The fluctuation-dissipation theorem (FDT) provides a relation between the response functionχ(t) and the equilibrium autocorrelation function

∂χ

∂t= R(t) = − 1

T

∂C(t)

∂t(4.54)

(Boltzmann constant has been here set to one).

Often, the FDT is written in terms of two-time dynamical quantities R(t, t′) and C(t, t′) =〈A(t)A(t′)〉, where t′ is normally thought of as a waiting time from system preparation. Inequilibrium, due to time traslation invariance we have C(t, t′) = C(t− t′) and R(t, t′) = R(t− t′)and the FDT takes the form

R(t, t′) = β∂

∂t′C(t− t′)θ(t− t′) (4.55)

The FDT can also be expressed in the frequency domain, where it relates the imaginary part ofthe response to oscillatory perturbations to the spectral density of the autocorrelation function

χ′′(ω) = Imχ(ω) =

∫ ∞

−∞dtR(t) sin(ωt)

= −β∫ ∞

0dt

d

dt〈〈δA(0)δA(t)〉〉 sin(ωt)

= β

∫ ∞

0dt 〈〈δA(0)δA(t)〉〉ω cos(ωt) =

βω

2S(ω) (4.56)

where we have used the definition of Fourier transform, the fluctuation theorem (4.39), integra-tion by part, the autocorrelation property C(t) = C(−t) and the Wiener-Khintchine theorem(4.31). Hence,

C(t) =1

β

∫

dω

π

χ′′(ω)

ωe−iωt =

1

β

∫

dω

π

χ′′(ω)

ωcos(ωt) (4.57)

where we used χ′′(ω) = −χ′′(−ω). The results (4.56) and (4.57) apply to correlation andresponse of two dynamical variables A,B that have the same symmetry under time reversal, i.e.such that 〈δA(0)δB(t)〉 = 〈δA(t)δB(0)〉, so that C(t) = C(−t) at stationarity.

We will show in the next section that the imaginary part of the response is linked to dissipation.


4.6.3 Power absorption

The work per unit time done by an external force f(t) to change the state of the system x byan amount dx is

dW

dt= f(t)

dx

dt(4.58)

The power absorbed by the system equals the average rate at which work is done on the system

P (t) = 〈dWdt〉 = f(t)〈x〉 = f(t)

d

dt

∫

dt′R(t− t′)f(t′) (4.59)

If we rewrite the right hand side in terms of Fourier transforms f(ω) and χ(ω) we obtain

P (t) = f(t)d

dt

∫

dω

2πωχ(ω)f(ω)e−iωt

= −if(t)

∫

dω

2πωχ(ω)f(ω)e−iωt

= −i∫

dωdω′

4π2e−i(ω+ω′)tf(ω′)χ(ω)f(ω)ω (4.60)

We can now compute the power absorbed and the total energy absorbed for various types ofexternal forces.

Delta function force

Let us assume that at t = 0 a delta force is applied F (t) = Fδ(t). Then

f(ω) = f (4.61)

Substituting into (4.60), we obtain

P (t) = −if2∫

dωdω′

4π2e−i(ω+ω′)tχ(ω)ω

= −if2∫

dω

2πe−iωtχ(ω)ωδ(t) (4.62)

We can find the total energy absorbed by integrating over all times

Wabs =

∫ ∞

−∞dt P (t) = −if2 1

2π

∫ ∞

−∞dω ωχ(ω) (4.63)

Since χ′(ω) = χ′(−ω) and χ′′(−ω) = −χ′′(ω) only the imaginary part contributes, as it shouldsince the total energy absorbed must be a real quantity

Wabs = f2 1

2π

∫ ∞

−∞dω ωχ′′(ω) (4.64)


Oscillating force

Now let us consider a monochromatic oscillating force with pulsation ω0

f(t) = f cos(ω0t) = fe−iω0t + eiω0t

2(4.65)

so thatf(ω) = πf [δ(ω − ω0) + δ(ω + ω0)] (4.66)

One has〈x(t)〉 = f [χ′(ω0) cos(ω0t) + χ′′(ω0) sin(ω0t)] (4.67)

From experience with forced harmonic oscillator, we know that dissipation is governed by thatpart of the respone which is in quadrature of phase with the driving force (i.e. out of phase byπ/2 with the force), so the dissipative part of the response is controlled by the imaginary partχ′′(ω0) of χ(ω0), whereas its in-phase, reversible, part is controlled by the real part χ′(ω0) ofχ(ω0). Using the Fourier transorm we obtain

P (t) = − if2

4[ω0χ(ω0)(e

−2iω0t + 1)− ω0χ(−ω0)(e2iω0t + 1)] (4.68)

The average over a period T = π/ω0 gives

Wabs = − if2ω2

0

4π

∫ π/ω0

0[χ(ω0)(e

−2iω0t + 1)− χ(−ω0)(e2iω0t + 1)]

= − if2ω0

4[χ(ω0)− χ(−ω0)] (4.69)

Using the symmetry properties of χ′(ω) and χ′′(ω) we get

Wabs =f2ω0

2χ′′(ω0) (4.70)

So the average power absorbed depends on the imaginary part of the response function. Inprinciple, the average power absorbed can be measured and therefore χ′′(ω) can be measured.Finally, since the fluctuation-dissipation theorem relates χ′′(ω) to the spectral density of equi-librium fluctuations, it is possible to probe equilibrium fluctuations by applying a weak externalfield to the system.

Exercise - Consider a forced one-dimensional harmonic oscillator with mass m, natural fre-quence ω0 and damping constant γ

x(t) +γ

mx+ ω2

0x =f(t)

m(4.71)

The equilibrium position of the oscillator, in absence of force (i.e. for f(t) = 0), is x = 0.Assume the average displacement from the equilibrium position in presence of the force f(t) is

〈x(t)〉 =

∫

dt′R(t− t′)f(t′) (4.72)

where R(t) is the response function. Define the dynamical susceptibility χ(ω) as the Fouriertransform of the response function

R(t) =

∫

dω

2πχ(ω)e−iωt (4.73)


(a) Write the explicit expression of the real and imaginary part of χ(ω), χ′(ω) and χ′′(ω)respectively.

(b) Find the location of the poles ω1 and ω2 of χ(ω) in the complex ω plane.

(c) Find the response function R(t) in the cases γ/m < 2ω0 (underdamping) and γ/m > 2ω0

(overdamping)

(d) Starting from the work per unit time done by the external force on the oscillator

dW

dt= f(t)x(t) (4.74)

and taking a periodic f(t)f(t) = f cos(ω0t) (4.75)

show that the instantaneous power absorbed by the medium is

P (t) =if2

4

[

ω0χ(ω0)(e−2iω0t + 1)− ω0χ(−ω0)(e

2iω0t + 1)]

(4.76)

(e) Show that the time average over a period T = π/ω0 is

Wabs = −f2ω20χ

′′(ω0)/2 (4.77)

(f) By invoking the fluctuation-dissipation theorem find an expression for the correlation func-tion 〈xx〉eq for the fluctuations about the equilibrium position in absence of the externalforce.

Chapter 5

Fokker Planck

5.1 Kramers-Moyal expansion

For the case when x is a continuous variable and changes in x take place in small jumps, theKramers-Moyal expansion of the master equation casts this integro-differential equation into theform of a differential equation of infinite order. It is therefore not easier to handle, but undercertain conditions, one may break off after a suitable number of terms. When this is done afterthe second-order terms one gets a partial differential equation of second order for P1(x, t) calledthe Fokker-Planck equation.

Let us first express the transition probability as a function of the size ξ = x − x′ of jumps,W (x|x′) = W (x− x′|x′) = W (ξ|x′). The master equation then reads

∂

∂tP1(x, t) =

∫

dξ [W (ξ|x− ξ)P1(x− ξ, t)−W (−ξ|x)P1(x, t)]

=

∫

dξW (ξ|x− ξ)P1(x− ξ, t)− P1(x, t)

∫

dξW (−ξ|x) (5.1)

where the sign change associated with the change of variable x′ → ξ = x− x′ is absorbed in theboundaries.1

If we now assume that changes in the coordinate x can only take place in small jumps, thetransition probability per unit timeW (x|x′) decreases rapidly as |x−x′| increases, i.e. W (ξ|x−ξ)is a sharply peaked function of ξ, but varies slowly enough with x. A second assumption is thatP1(x, t) itself also varies slowly with x. Now expand W (ξ|x− ξ)P1(x− ξ, t) in powers of ξ

∂

∂tP1(x, t) =

∫

dξ [W (ξ|x)P1(x, t) +∞∑

m=1

(−1)m

m!ξm

∂m

∂xm[W (ξ|x)P1(x, t)]

−P1(x, t)

∫

dξW (−ξ|x) (5.2)

1by considering a symmetrical integration interval extending from −∞ to +∞

∫ ∞

−∞

dx′f(x′) = −

∫ x−∞

x+∞

dξ f(x − ξ) = −

∫ −∞

∞

dξ f(x − ξ) =

∫ ∞

−∞

dξ f(x − ξ)

Moreover, since finite integration limits would incorporate an additional dependence on x we shall restrict toproblems to which the boundary is irrelevant.

73

74 CHAPTER 5. FOKKER PLANCK

the first and the last term on the RHS cancel and we are left with

∂P1(x, t)

∂t=

∞∑

m=1

(−1)m

m!

∂m

∂xm

[∫

dξ ξmW (ξ|x)]

P1(x, t)

(5.3)

Finally, on intrducing the jump moments

a(m)(x) =

∫

dξ ξmW (ξ|x) (5.4)

one gets the Kramers-Moyal expamsion of the master equation

∂P1(x, t)

∂t=∑

m=1

(−1)m

m!

∂m

∂xm

[

a(m)(x, t)P1(x, t)]

(5.5)

Formally, equation (5.5) is identical with the master equation and is therefore not easier to dealwith, but it suggests that one may break off after a suitable number of terms. For instance,there could be situations where, for m > 2, a(m)(x, t) is identically zero or negligible. In thiscase one is left with

∂P1(x, t)

∂t= − ∂

∂x

[

a(1)(x, t)P1(x, t)]

+1

2

∂2

∂x2[a(2)(x, t)P1(x, t)] (5.6)

which is the celebrated Fokker-Planck equation. The first term is called drift or transport termand the second one is the diffusion term, while a(1)(x, t) and a(2)(x, t) are the drift and diffusion”coefficients”. By definition the Fokker-Planck equation is always a linear equation. Hence theadjective “linear” is available for use in a different sense. We shall call a FP equation linear

if the drift coefficient is a linear function of x a(1) = A0 + A1x and the diffusion coefficient isconstant.

It is worth recalling that, as the Master equation, the Fokker-Planck equation applies to P ⋆1 (x, t) =P1|1(x, t|x0, t0) of every subprocess that can be extracted from a Markov stochastic process byimposing initial condition P ⋆1 (x, t0) = δ(x− x0).

It is important to note that in many physical cases the size of the jumps is determined by somediscrete finite parameter and the assumption inherent in the derivation of the Fokker-Planckequation does not apply.

5.1.1 The jump moments

The main use of FP equation is as an approximate description for any Markov process whoseindividual jumps are small. Although the FP equation can still not be solved explicitely exceptfor a few special cases, it can be determined for any actual stochastic process with a minimumknowledge about the underlying mechanism: in fact, it does not require the knowledge of theentire kernel W (ξ|x) but merely of the functions a(1) and a(2),

In order to calculate a(m)(x, t) we must use the relation (2.26) between W and the transitionprobability for short time differences. Firstly, we use W (x′|x) = W (ξ|x) with x = x′ + ξ andrewrite the jump moments as

a(m)(x, t) =

∫

dx′ (x′ − x)mW (x′|x) (5.7)

5.1. KRAMERS-MOYAL EXPANSION 75

We introduce the quantity

M (m)(x; ∆t, t) =

∫

dx′ (x′ − x)mP (x′, t+ ∆t|x, t) (m ≥ 1) (5.8)

which is the average of [X(t + ∆t) − X(t)]m with sharp intiial value X(t) = x (conditionalaverage). Then by using the short-time transition probability (2.26), one can write

M (m)(x; ∆t, t) =

∫

dx′ (x′ − x)mδ(x′ − x)[1− a(0)(x, t)∆t] +W (x′|x)∆t+O(∆t2)

= ∆t

∫

dx′ (x′ − x)mW (x′|x) +O(∆t2)

= a(m)(x, t)∆t+O(∆t2) (5.9)

Therefore one can calculate the jump moments from the derivatives of the conditional averagesas follows

a(m)(x, t) =∂

∂∆tM (m)(x; ∆t, t)|∆t=0 (5.10)

Finally, on writingM (m)(x,∆t, t) = 〈[X(t+ ∆t)−X(t)]m〉|X(t)=x (5.11)

one can alternatively express the jump moments as

a(m)(x, t) = lim∆t→0

1

∆t〈[X(t+ ∆t)−X(t)]k〉X(t)=x (5.12)

Therefore, setting up a Fokker-Plank equation only requires computing the average change 〈∆x〉and its mean square 〈(∆x)2〉 to first order in ∆t. In this way the actual equation of motioncan be solved only during ∆t, which can be done by some perturbation theory. The Fokker-Planck equation then serves to find the long-time behaviour. In Chapter 6, which is devotedto Langevin, we shall calculate the corresponding jump moments in terms of the short-timeconditional averages by means of this formula.

5.1.2 Expression for the multivariate case

The above formulae can be extended to the case of a multi-component Markov process Xi(t),i = 1, 2, .., N . The Kramers-Moyal expansion is in this case given by the multivariate Taylorexpansion

∂P (x, t)

∂t=∑

m=1

(−1)m

m!

∑

j1...jm

∂m

∂xj1 ...∂xjm

[

a(m)j1...jm

(x, t)P (x, t)]

(5.13)

while the Fokker-Planck equation is then given by

∂P (x, t)

∂t= −

∑

i

∂

∂xi

[

a(1)i (x, t)P (x, t)

]

+1

2

∑

ij

∂2

∂xi∂xj[a

(2)ij (x, t)P (x, t)] (5.14)

The jump moments are given by the natural generalization of (5.4)

a(m)j1...jm

(x) =

∫

dx′ (x′j1 − xj1)...(x′jm − xjm)W (x′|x) (5.15)

and can be calculated by mean of the corresponding generalization of (5.12)

a(m)j1...jm

(x, t) = lim∆t→0

1

∆t〈m∏

µ=1

[Xjµ(t+ ∆t)−Xjµ(t)]〉X(t)=x (5.16)


5.1.3 Deterministic processes: Liouville equation

Neglecting the fluctuations means, for the Fokker-Planck equation, neglecting the diffusion term.In this case the equation reduces to the Liouville equation

∂P (x, t|y, t′)∂t

= −∑

i

∂

∂xi[a

(1)i (x, t)P (x, t|y, t′)] (5.17)

which occurs in classical mechanics. This equation describes a completely deterministic motion,i.e. it is equivalent to the system of differential equations

dx(t)

dt= a(1)(x(t), t) (5.18)

In fact, if a system is governed by (5.18), it follows from Section 4.2 that the equation of motionfor the probability P (x, t) to find the system in state x is

∂P

∂t= − ∂

∂xi(xiP ) = − ∂

∂xi(a

(1)i P ) (5.19)

It is easy to check that if z(y, t) is the solution of (5.18) with initial condition z(y, t′) = y, thenthe solution to (5.17) with the initial condition

p(x, t′|y, t′) = δ(x− y) (5.20)

isp(x, t|y, t′) = δ[x− z(y, t)] (5.21)

Thus, if the particle is in a well-defined initial position y at time t′, it stays on the trajectoryobtained by solving the ordinary differential equation.

Exercise Prove this result by direct substitution.

Exercise Consider a particle moving under the effect of the Hamiltonian H(x, p) = p2/2m +V (x). Write the equation of motion for the probability density to find the system in a region ofphase space (x, p).

5.2 Fokker-Planck as a continuity equation

Equation (5.6) may be written as a continuity equation for the probability density

∂P

∂t= −∂J

∂x(5.22)

where J(x, t) is a probability flux or current

J(x, t) =

[

a(1)(x, t)− 1

2

∂

∂xa(2)(x, t)

]

P (x, t) (5.23)

If the probability current vanishes at the boundaries x = xmin and x = xmax, (5.22) guaranteesthat the normalization is preserved at all times

∫ xmax

xmin

dxP (x, t) = const (5.24)

5.3. MACROSCOPIC EQUATION 77

For natural boundary conditions xmin = −∞ and xmax = +∞, P (x, t) and the probabilitycurrent (5.23) also vanish at x = ±∞.

For a stationary process the probability current must be constant. So if the probability currentvanishes somewhere, it must be zero everywhere. So with natural boundary conditions theprobability current must be zero. So we get the stationary solution from

J = a(1)Ps(x)−1

2

∂

∂xa(2)Ps(x) = 0 (5.25)

which we can rewrite as

− ∂

∂x[a(2)Ps] +

2a(1)

a(2)[a(2)Ps] = 0 (5.26)

Upon integration we obtain

Ps(x) =N0

a(2)e2∫

dx′ 2a(1)

a(2) = N0e−φ(x) (5.27)

where we have defined the potential φ(x) as

φ(x) = log a(2) − 2

∫ x

dx′a(1)

a(2)(5.28)

and N0 is an integration constant which ensures normalization of Ps(x, t), i.e.

Ps(x) =e−φ(x)

∫

dx e−φ(x)(5.29)

Note that it is not always possible to find an integrable solution. For linear Fokker-Planckequation (a(1) = A0 + A1x) with A1 < 0, the stationary solution (5.27) is Gaussian. In fact,the stationary Markov process determined by the linear Fokker-Planck equation is the Ornstein-Uhlenbeck process.

Nonstationary solutions of the FP equation are more difficult to obtain. A general expressionfor the nonstationary solution can be found only for special drift and diffusion coefficients.

5.3 Macroscopic equation

If we multiply (5.6) by x and integrate over x we find

∂t〈x〉 = 〈a(1)(x)〉 (5.30)

where we carried out integrations by part and assumed that P and its derivative goes to zero asx→∞. Similarly, one can get an equation for the fluctuations

∂

∂t〈x2〉 = 2〈xa(1)(x, t)〉+ 〈a(2)(x, t)〉 (5.31)

One can use the macroscopic equation (5.30) as an alternative, more phenomelogical way to getthe drift and diffusion coefficient, in the case of linear Fokker-Planck equation.


If the equation is linear one has 〈a(1)(x)〉 = a(1)(〈x〉) and one obtains a differential equation for〈x〉

∂t〈x〉 = a(1)(〈x〉) (5.32)

This equation is identified with the macroscopic equation of motion of the system which isprobably known. So one can get a(1) from the knowledge of the macroscopic behaviour andsubsequently can find a(2) by identifying (5.27) with the equilibrium distribution one can getfrom statistical mechanics.

5.4 Examples of Fokker-Planck equations

Wiener process

In Einstein’s explanation of Brownian motion, he arrived at the following diffusion equation forthe position

∂

∂tP = D

∂2P

∂x2(5.33)

Comparing with (5.6), this is a Fokker-Planck equation with vanishing drift coefficient a(1) = 0(because no forces act on the particle, hence the net drift is zero) and constant diffusion coefficienta(2)(x) = 2D (because the properties of the surrounding medium are homogeneous, otherwiseD = D(x)).The solution of this equation for initial condition P (x, 0) = δ(x) was (4), which corresponds tothe Wiener-Levy process (2.49).

Klein-Kramers equations

The true diffusion equation, in phase space (x, v), for a Brownian particle of mass m moving ina potential V (x), with arbitrary damping coefficient γ, is

∂P

∂t= −v∂P

∂x+

∂

∂v

[(

γ

mv − F (x)

m

)

P

]

+γkBT

m2

∂2P

∂v2(5.34)

where F (x) = −dV/dx. We shall give a proof of (5.34) in the context of Langevin equations(Section 6.7).

Ornstein-Uhlenbeck process

We have stated without proof that the Ornstein-Uhlenbeck process describes the time evolutionof the transition probability of the velocity of a free Brownian particle. We shall demonstratethis, by solving the equation for the marginal distribution for v obtained from the no potentiallimit of (5.34). Integrating (5.34) and using

∫

dx ∂xP (x, v, t) = 0 (since P (x = ±∞, v, t) = 0)we obtain the equation for the marginal probability PV (v, t) =

∫

dxP (x, v, t)

∂P

∂t=

γ

m

(

∂

∂v(vP ) +

kT

m2

∂2P

∂v2

)

(5.35)

where we have rescaled γ to absorb the factor m, i.e. γ → γm. This is a FP equation with lineardrift coefficient and constant diffusion coefficient and it is solved by the transition probabilityof the O-U process (2.53).

5.4. EXAMPLES OF FOKKER-PLANCK EQUATIONS 79

To demonstrate the usefulness of the FP equation we calculate the stationary distribution func-tion for the Brownian motion process described by equation (5.35). We immediately get

J =

(

−γv − γkT

m

∂

∂v

)

Ps = 0 (5.36)

which leads to the Maxwell distribution

Ps(v) =

√

m

2πkTexp

(

−mv2

2kT

)

(5.37)

Next we turn to nonstationary solution of (5.35) subject to the initial condition

P (v, t0|v0, t0) = δ(v − v0) (5.38)

The solution is best found by making a Fourier transform with respect to v, i.e.

P (v, t|v′, t′) =1

2π

∫

dk P (k, t|v′, t′)e−ikv (5.39)

If we multiply the Fokker-Planck equation by eikv and integrate over v, by carrying out a fewintegrations by part and setting to zero the contribution from the boundaries we get an equationfor the Fourier transform (effectively one replaces ∂/∂v with −ik and v with −i∂/∂k)

∂P

∂t= − γ

mk∂

∂kP − γ kBT

m2k2P (5.40)

which is simpler than (5.35) because only first order derivatives with respect to k occur. Becauseof (5.38), the initial condition for the Fourier transform is

P (k, t′|v0, t0) = eikv0 (5.41)

We set τ = m/γ and D = κBT/m, rewrite the first order equation (5.40) as

τ∂P

∂t+ k

∂P

∂k= −Dk2P (5.42)

and solve it by using the method of characteristics. 2 The subsidiary system is

dt

τ=dk

k= − dP

Dk2P(5.45)

2Briefly, if we have a differential equation of the form

P∂f

∂x+ Q

∂f

∂y= R (5.43)

and u(x, y, f) = a and v(x, y, t) = b are two solution of the subsidiary system

dx

P=

dy

Q=

df

R(5.44)

the general solution of the original equation is an arbitrary function of u and v, h(u, v) = 0


Two integrals are easily obtained considering the systems t, k and k, P :

dt

τ=

dk

k→ k = ae

t−t0τ → ke−

t−t0τ = a (5.46)

−Dkdk =dP

P→ −1

2Dk2 = ln P + c→ eDk

2/2P = b (5.47)

Then, the solution h(u, v) = 0 can be solved for v as v = φ(u) with still an arbitrary function φ,

P (k, t|v0, t0)eDk2

2 = φ(ke−t−t0

τ ) (5.48)

which is to be determined from initial conditions

P (k, t0|v0, t0)eDk2

2 ≡ eikv0+Dk2

2 = φ(k) (5.49)

leading to the desired general solution of (5.40)

P (k, t|v0, t0) = e−12Dk2

φ(ke−(t−t0)/τ )

= exp

[

ikv0e−(t−t0)/τ − Dk2

2(1− e−2(t−t0)/τ )

]

(5.50)

which is the characteristic function of a Gaussian distribution with µ = v0e−(t−t0)/τ and σ2 =

D(1− e−2(t−t0)/τ ). By performing the integral in (5.39) we finally obtain the probability distri-bution solving (5.42)

P (v, t|v0, t0) =

√

1

2πD(1− e−2(t−t0)/τ )exp

[

−(v − v0e−(t−t0)/τ )2

2D(1− e−2(t−t0)/τ )

]

(5.51)

and by inserting the original parameters for the original equation for PV

P (v, t|v0, t0) =

√

m

2πκBT (1− e−2γ(t−t0))exp

[

−m(v − v0e−γ(t−t0))22κBT (1− e−2γ(t−t0))

]

(5.52)

In the limit γ → 0 we recover the result for the Wiener process. For positive γ and large timedifferences γ(t− t0)≫ 1 the solution crosses over to the stationary Maxwell distribution

Ps(v) =

√

m

2πkBTe−mv

2/2kBT (5.53)

At any arbitrary time it is a Gaussian distribution with mean value 〈v(t)〉 = v0e−γt. For γ ≤ 0

no stationary solution exists.The O-U process may be equally well described by a linear Langevin equation with GaussianLangevin forces.

Chapter 6

Langevin equation

In this chapter we investigate the solution of the Langevin equation for simple processes. Wesolve the Langevin equation for Brownian motion and show how quantitites of physical interestcan be calculated. We also show how a Fokker-Planck equation can be set up starting from theLangevin equation. For the non-linear Langevin equation some difficulties arise when writingthe Fokker-Planck equation, leading to the Ito-Stratonovich dilemma.

6.1 Langevin equation for one variable

The Langevin equation for one variable is a differential equation of the form

dy

dt= A(y, t) +B(y, t)Γ(t) (6.1)

where Γ(t) is a given stochastic process. The terms A(y, t) and B(y, t)Γ(t) are often referred toas the drift and diffusion terms, respectively. Due to the presence of Γ(t), equation (6.1) is astochastic differential equation, i.e. a differential equation comprising terms which are randomfunctions of time, with given stochastic properties. To solve a Langevin equation then means todetermine the statistical properties of the process y(t), which is defined when an initial conditionis supplied.

The choice of Γ which renders y(t) a Markov process is that of white noise1

〈Γ(t)〉 = 0

〈Γ(t)Γ(t′)〉 = 2Dδ(t− t′) (6.2)

Since (6.1) is a first-order differential equation, for each sample function (realization) of the noiseΓ(t), it determines y(t) uniquely when an initial condition y(t0) is given. In addition, the valuesof the fluctuating term at different times are statistically independent, due to the δ-correlatednature of the Langevin force Γ(t). Therefore, the values of Γ(t) at previous times t < t0 cannotinfluence the conditional probability at later times t > t0. From these arguments it follows theMarkovian character of the solution of the Langevin equation.

1We say that Γ(t) is a white-noise force because its spectral density, which by the Wiener-Khintchine theoremis the Fourier transform of the correlation function, is independent of the frequency ω.

81

82 CHAPTER 6. LANGEVIN EQUATION

Since Γ(t) is usually the result of a large number of independent processes (e.g. collisions ofa Brownian particles with molecules of fluid), it is customary to assume that in addition to(6.2), Γ(t) is Gaussian. In other words, one assumes that the random variables Γi = Γ(ti) aredistributed according to the Gaussian distribution function

P (Γ1, ...,Γn) =1

(2π)n/2√

detCexp

−1

2

∑

ij

Γi(C−1)ijΓj

(6.3)

with zero mean (i.e. 〈Γi〉 = 0) and Cij = 〈ΓiΓj〉. It then follows that the higher-order correlationfunctions of Γ(t) are obtained from the second order ones by (see Sec. 1.6)

〈Γ(t1)Γ(t2)...Γ(t2n−1)〉 = 0 (6.4)

〈Γ(t1)Γ(t2)...Γ(t2n)〉 =∑

P

〈Γ(ti1)Γ(ti2)〉〈Γ(ti3)Γ(ti4)〉...〈Γ(ti2n−1)Γ(ti2n)〉 (6.5)

where the sum has to be performed over those permutations which lead to different expressionsfor Ci1i2Ci3i4 ...Ci2n−1i2n . These are (2n)!/(2n n!), because there are overall (2n)! permutationsof the 2n times ti, however the result is invariant under interchange of the two times in each ofthe n correlation function and under permutation of the n correlation functions.

It should be noted that for singular correlations like (6.2) the distribution function (6.3) makessense only if the Cij are finite and if the limit Cij → 0 is considered. For instance we may usethe representation

δǫ(t) =

1ǫ − ǫ

2 < t < ǫ2

0 elsewhere(6.6)

and then consider the limit ǫ→ 0.

6.2 Langevin equation for Brownian motion

Consider the Langevin equation

v + γv = Γ(t) (6.7)

with noise correlation function

〈Γ(t)Γ(t′)〉 = 2Dδ(t− t′) (6.8)

Because of the linearity of (6.7) it is sufficient to know the two-time correlation (6.8) of theLangevin force to calculate the two-time correlation function of the velocity 〈v(t1)v(t2)〉. If oneis interested in multi-time correlation functions of the velocity (or if the equation is not linear)one needs multi-time correlation functions of the Langevin force.

We now proceed to solve equation (6.7) for the initial condition describing a sharp value in t = 0,say v(0) = v0:

v(t) = v0 e−γt +

∫ t

0Γ(t′)e−γ(t−t

′)dt′ (6.9)

6.2. LANGEVIN EQUATION FOR BROWNIAN MOTION 83

By using (6.8) we obtain for the correlation function of the velocity

〈v(t1)v(t2)〉 = v20e

−γ(t1+t2) +

∫ t1

0

∫ t2

0e−γ(t1+t2−t′1−t

′2)2Dδ(t′1 − t′2)dt′1dt′2

= v20e

−γ(t1+t2) +

∫ min(t1,t2)

0e−γ(t1+t2−2t′1)2Ddt′1

= v20e

−γ(t1+t2) +D

γ(e−γ|t1−t2| − e−γ(t1+t2)) (6.10)

For large times t1, t2 ≫ 1/γ, the velocity correlation function is independent of the initial velocityv0 and is only a function of the time difference t1 − t2

〈v(t1)v(t2)〉 =D

γe−γ|t1−t2| (6.11)

The unknown constant D can now be identified by employing the fact that in the stationarystate the average energy of the Brownian particle

〈E〉 =1

2m〈v2(t)〉 =

mD

2γ

must have the known thermal value 〈E〉 = kT/2 given by equipartition law of classical statisticalmechanics. Hence

D = γkT/m. (6.12)

This equation relates D, the size of the fluctuating term, to the damping constant γ. It isthe simplest form of the general fluctuation-dissipation theorem. The noise is fully expressedin the macroscopic damping constant together with the temperature. Just as Einstein relation,one obtains an identity because one knows the size of equilibrium fluctuations from ordinaryequilibrium statistical mechanics. The physical picture is that the random kicks tend to spreadout v while the damping term tries to bring v back to zero. The balance between these twoopposing tendencies is the equilibrium distribution.In practice, the fluctuation-dissipation theorem tells us that in equilibrium, wherever there isdamping there must be fluctuations. They are small from the macroscopic point of view becauseof the factor kT , but can be observed when magnified. For instance, the velocity fluctuations ofa Brownian particle are not seen but they build up a mean square displacement which can beobserved under a microscope.

6.2.1 Mean-square displacement

For the Brownian particle it is difficult to measure the velocity correlation function. It is mucheasier to study the mean-square value of its displacement.

〈(x(t)− x0)2〉 = 〈[

∫ t

0v(t1)dt1]

2〉 = 〈∫ t

0v(t1)dt1

∫ t

0v(t2)dt2〉 =

∫ t

0dt1

∫ t

0dt2〈v(t1)v(t2)〉

=

(

v20 −

D

γ

)

(1− e−γt)2γ2

+2D

γ2t− 2D

γ3(1− e−γt) (6.13)

[Note: were we to start with an initial velocity distribution for the stationary state, i.e. 〈v20〉 =

D/γ rather than with the sharp velocity v0 the first term on the RHS would vanish.] Note two


interesting limiting cases. For large time, i.e. t≫ 1/γ, the leading term is

〈(x(t)− x0)2〉 =

2D

γ2t = 2DEt with DE =

D

γ2=kT

mγ

and the particle behaves like a diffusing particle executing a random walk, with Einstein diffusionconstant DE . For t ≪ 1/γ one observes a ballistic behaviour 〈[x(t) − x0]

2〉 = v20t

2, i.e. duringa short time interval the particle behaves as if it were a free particle moving with the constantthermal velocity v = (κT/m)1/2. Langevin’s derivation is more general than Einstein’s since italso yields the short time dynamics.

6.2.2 Stationary velocity distribution function

In order to derive the stationary velocity distribution, we first calculate all the moments 〈v2n〉,from which the characteristic function is determined, and then calculate its Fourier transformto obtain the distribution function.

In the stationary state, i.e. for large times, (6.9) becomes (by setting t− t′ = τ and, because ofthe factor e−γt, extending the range of integration to ∞)

v(t) =

∫ ∞

0dτ e−γτΓ(t− τ) (6.14)

Assuming the Langevin force is Gaussian we have

〈v2n+1(t)〉 = 0

〈v2n(t)〉 =

∫ ∞

0...

∫ ∞

0dτ1...dτ2ne

−γ(τ1+...+τ2n)〈Γ(t− τ1)...Γ(t− τ2n)〉

=(2n)!

2nn!

[∫ ∞

0

∫ ∞

0dτ1dτ2 e

−γ(τ1+τ2)2Dδ(τ1 − τ2)]n

=(2n)!

2nn!

(

D

γ

)n

(6.15)

The characteristic function becomes

G(k) = 1 +∞∑

n=1

(ik)n〈vn(t)〉/n! =∞∑

n=0

(ik)2n〈v2n(t)〉/(2n)!

= exp

(

−k2D

2γ

)

(6.16)

and the distribution function is therefore given by

P (v) =1

2π

∫ ∞

−∞C(u)e−iuvdu =

√

m

2πkTexp

(

−mv2

2kT

)

(6.17)

i.e. the stationary distribution is the Maxwell distribution. Indeed, one sees from (6.14) that v(t)is a linear combination of all values that Γ(t′) takes for 0 ≤ t′ ≤ t. Since the joint distributionof all Γ(t′) is Gaussian, it follows that the value v at time t has a Gaussian distribution. By thesame argument, the joint distribution of v(t1), v(t2), ... is Gaussian.

6.2. LANGEVIN EQUATION FOR BROWNIAN MOTION 85

6.2.3 Conditional probability

We now wish to calculate the conditional probability P (v, t|v0, t0) of finding the particle withvelocity v at time t, knowing that it had velocity v0 at time t0. We shall take for simplicity t0 = 0and write P (v, t|v0, t0) = P (v, t|v0). First we show that bacause Γ(t) is a Gaussian process, theintegral in equation (6.9) ia Gaussian. We therefore conclude that the variable U = v − v0e−γtis a Gaussian random process. Let us divide the [0, t] interval in N small intervals of lengthǫ = t/N,N ≫ 1 with tn = nǫ and define the random variable Bn

ǫ by

Bnǫ =

∫ tn+ǫ

tndt′ Γ(t′) (6.18)

Bnǫ is time independent due to time traslation invariance. We note that 〈Bn

ǫ 〉 = 0 and that from(6.8)

〈Bnǫ B

mǫ 〉 =

∫

dt′dt′′ 〈Γ(t)Γ(t′)〉 = 2Dǫδnm (6.19)

From (6.9) we have

U = v − v0e−γt = e−γt∫ t

0dt′eγt

′

Γ(t′)

= e−γtN−1∑

n=0

∫ tn+ǫ

tndt′eγt

′

Γ(t′) ≃ e−γtN−1∑

n=0

eγtn∫ tn+ǫ

tndt′ Γ(t′)

= e−γtN−1∑

n=0

eγntBnǫ (6.20)

which is a Reimann approximation to the integral in (6.9). This shows that U is the sumof a large number of independent variables. From the central limit theorem the probabilitydistribution of U is Gaussian with mean 〈U〉 = 0 and variance given by 〈U2〉

〈U2〉 = e−2γtN−1∑

m,n=0

eγtn+γtm〈Bnǫ B

mǫ 〉

= ǫ2De−2γtN−1∑

n=0

e2γtn = 2De−2γt∫ t

0dt′ e2γt

′

=D

γ(1− e−2γt) = γDE(1− e−2γt) (6.21)

This gives the probability distribution of U , or equivalently of v

P (v, t|v0) =1

√

2πγDE(1− e−2γt)exp

[

− (v − v0e−γt)22γDE(1− e−2γt)

]

(6.22)

It is instructive to look at the short and long time limits. In the long time limit t ≫ 1/γ, onereaches an equilibrium situation governed by a Boltzmann distribution

P (v, t|v0)→1√

2πγDEexp

(

−v2

2γDE

)

=

√

m

2πkTexp

(

−mv2

2kT

)

(6.23)


The limit t≪ 1/γ

P (v, t|v0)→1√

4πDtexp

(

(v − v0)24Dt

)

(6.24)

shows that the short time limit is dominated by diffusion 〈(v − v0)2〉 ∼ 2Dt. As one can write〈|v(t+ ǫ)− v(t)|〉 ∝

√t, one sees that the trajectory v(t) is a continuous but non-differentiable

function of t. The stochastic process whose transition probability we have just derived is oftencalled Ornstein-Uhlenbeck process, or just O-U process.

The conditional distribution (6.22) determines uniquely all the higher order distribution func-tions because the process is Markov.

6.3 Colored noise

The process described by the Langevin equation (6.7) with δ-correlated Langevin force is aMarkov process, i.e. its conditional probability at time tn only depends on the value of thestochastic variable y(tn−1) at the next earlier time.The Markovian property is destroyed if Γ(t) is no longer δ-correlated. For example, the processdescribed by

y = h(y) + Γ(t) (6.25)

with a Gaussian distributed force Γ which has correlation

〈Γ(t1)Γ(t2)〉 =D

γe−γ|t1−t2| (6.26)

is no longer a Markov process for finite γ (although Γ itself is still a Markov process). However,if we introduce an additional variable η, then a Markov process results for the two dimensionalprocess

y = h(y) + η

η = −γη + Γ(t) (6.27)

with δ-correlated noise 〈Γ(t1)Γ(t2)〉 = 2Dδ(t1 − t2). Because of (6.7, 6.8, 6.11) it is easilyseen that these are equivalent to (6.25, 6.26). Thus, by introducing new random variables,non-Markovian processes may be reduced to Markovian processes.

6.3.1 Derivation of the Fokker-Planck equation

Consider the Brownian motion in the absence of external forces. We are now interested in thetime dependence of the probability P (v, t) that the particle’s velocity at time t lies between vand v + dv. One expect that this probability does not depend on the entire past history of theparticle, but that it is determined if one knows that v = v0 at some earlier time t0. (We makea Markov assumption). Hence, one can write P more explicitely as a conditional probabilitywhich depends on v0 and t0 as parameters, P (v, t|v0, t0). Since nothing in this problem dependson the origin from which time is measured (i.e. the problem is time translational invariant), Pmust only depend on the time difference τ = t − t0. Thus one can simply write P (v, τ, v0). Ifτ → 0 one knows that v = v0 so

P (v, τ |v)→ δ(v − v0) (6.28)

6.3. COLORED NOISE 87

On the other hand, if τ → ∞ the particle must come to equilibrium with the surroundingmedium at temperature T , irrespective of its past history. Hence P becomes independent of v0and of the time and it must reduce to the canonical Maxwell-Boltzmann distribution

P (v, t|v0)→(

mβ

2π

)1/2

e−βmv2

2 (6.29)

We can calculate the moments D(1) and D(2)

D(1) = limτ→0

1

τ〈∆v〉v(t)=v

D(2) = limτ→0

1

τ〈(∆v)2〉v(t)=v (6.30)

from the Langevin equation for the Brownian particle (cfr Section 6.2)

dv

dt= −γv + Γ(t) (6.31)

〈Γ(t)Γ(t′)〉 =2γkT

mδ(t− t′) (6.32)

By integrating the equation of motion over a small time τ we get

v(t+ τ)− v(t) = −γ∫ t+τ

tdt′ v(t′) +

∫ t+τ

tdt′ Γ(t′)

≃ −γv(t)τ +

∫ t+τ

tdt′ Γ(t′) (6.33)

Taking the ensemble average at fixed v(t) = v

〈∆v〉 = 〈v(t+ τ)− v(t)〉v(t)=v = −γv (6.34)

which leads to

D(1) = −γv (6.35)

Squaring (6.33) and taking the average we also get

〈(∆v)2〉v(t)=v = γ2v2τ2 +

∫ t+τ

tdt′ dt′′ 〈Γ(t′)Γ(t′′)〉

= γ2v2τ2 +2kTγ

mτ (6.36)

where we have used 〈Γ(t)〉 = 0. This yields

D(2) =2kT

mγ (6.37)

The Fokker-Planck equation for the Brownian particle is

∂P

∂τ= −γ ∂

∂v(vP ) + γ

kT

m

∂2P

∂v2(6.38)


This is the FP equation for the O-U process that we have treated in Sec. 5.4.

Exercise Show that the property of Γ(t) to be Gaussian white noise is expressed by the followingidentity of its characteristic functional

〈exp[i

∫

dt k(t)Γ(t)]〉 = exp

[

−D∫

dt k2(t)

]

(6.39)

Exercise Prove that

W (t) =

∫ t

0dt′ Γ(t′) (6.40)

is the Wiener process.

Answer:Let us fromally write dW/dt = Γ(t) and see which are the properties of the W (t) so defined. Onintegrating over an interval τ we get

w(τ) = ∆W (t) = W (t+ τ)−W (t) =

∫ t+τ

tdsΓ(s) (6.41)

Let us show that w(τ) is indeed the Wiener process. Firstly, w(τ) is Gaussian because Γ(t) isso. Furthermore, it is easy to show that

w(0) = 0 , 〈w(τ)〉 = 0, and 〈w(τ1)w(τ2)〉 = 2Dmin(t1, t2) (6.42)

6.4 Quasilinear Langevin equation

Suppose one has a physical system with a nonlinear equation of motion y = h(y) and followingthe Langevin approach one adds a Langevin term to describe the fluctuations in the system

y = A(y) + Γ(t), 〈Γ(t)Γ(t′)〉 = 2Dδ(t− t′) (6.43)

We call this equation “quasilinear” to express the fact that the coefficient of Γ(t) is still aconstant. Although its solution cannot be given explicitely it can still be argued that it isequivalent to the quasilinear Fokker-Planck equation

∂P (y, t)

∂t= − ∂

∂yA(y)P +D

∂2P

∂y2(6.44)

First, it is clear that for each sample function Γ equation (6.44) uniquely determines y(t) wheny(0) is given. Since the values of Γ at different times are stochastically independent, it followsthat y(t) is Markovian. Hence, it obeys a Master equation which may be rewritten in theKramer-Moyal form. Now we have to calculate the first and second order jump moments (5.4).

An exact consequence of (6.43) is

∆y =

∫ t+∆t

tdt′A(y(t′)) +

∫ t+∆t

tdt′ Γ(t′) (6.45)

6.5. NON-LINEAR LANGEVIN EQUATION 89

Hence the average with fixed y is

〈∆y〉y(t)=y = A(y)∆t+O(∆t)2 (6.46)

That yields the first term in (6.44). Next

〈(∆y)2〉y(t)=y = 〈(

∫ t+∆t

tdt′A(y(t′))

)2

〉+ 2

∫ t+∆t

tdt′

∫ t+∆t

tdt′′ 〈A(y(t′))Γ(t′′)〉

+

∫ t+∆t

tdt′

∫ t+∆t

tdt′′ 〈Γ(t′)Γ(t′′)〉 (6.47)

The first term is O(∆t)2 and therefore does not contribute to a2. The last line equals 2D∆t asbefore and agrees with (6.44). In the second line expand A(y(t′)):

2A(y(t))∆t

∫ t+∆t

tdt′′ 〈Γ(t′′)〉+ 2A′(y(t))

∫ t+∆t

tdt′

∫ t+∆t

tdt′′ 〈[y(t′)− y(t)]Γ(t′′)〉+ ... (6.48)

The first term vanishes and the second one is O(∆t)2, because it is a double integral and doesnot contain delta functions. By similar arguments one sees that the dots are higher order in ∆tas well as 〈(∆y)n〉 for n > 2. This proves the equivalence of (6.43) and (6.44).

6.5 Non-linear Langevin equation

For one stochastic variable y, the general Langevin equation has the form

y = A(y, t) +B(y, t)Γ(t) (6.49)

The Langevin force is again assumed Gaussian with zero mean and δ correlation function

〈Γ(t)〉 = 0, 〈Γ(t)Γ(t′)〉 = 2Dδ(t− t′) (6.50)

For time independent function A and B and B 6= 0 one can always reduce the equation withmultiplicative noise

y = A(y) +B(y)Γ(t) (6.51)

to an equation with additive noise by a simple transformation of variable

η =

∫ y dy′

B(y′)≡ f(y), y = f−1(η) (6.52)

One can then rewrite (6.51) in terms of η

η = A1(η) + Γ(t) (6.53)

with additive noise force given by

A1(η) = A(y)/B(y) = A(f−1(η))/B(f−1(η)) (6.54)


which is a “quasilinear” equation, because A1 is a non-linear function of η, but the coefficientof Γ(t) is still a constant. This has been shown above to be equivalent to the FP equation forthe probability density P1 for the stochastic variable η

∂P1(η, t)

∂t= − ∂

∂ηA1(η)P1 +D

∂2P1

∂η2(6.55)

where the connection between P1(η, t) and P1(y, t) follows from (1.14) as

P1(η) = P1(y)B(y). (6.56)

Transforming back to the original y we have

∂P (y, t)

∂t= − ∂

∂yA(y)P +D

∂

∂yB(y)

∂

∂yB(y)P (6.57)

where we used∂

∂η=∂y

∂η

∂

∂y= B(y)

∂

∂y(6.58)

Thus the Langevin equation (6.1) is equivalent to the Fokker-Planck equation (6.57); or alter-natively

∂P (y, t)

∂t= − ∂

∂y[A(y) +DB(y)B′(y)]P+D

∂2

∂y2[B(y)]2P (6.59)

where we used

B(y)∂

∂y[B(y)P ] +B(y)B′(y)P =

∂

∂y[B2P ] (6.60)

However, there is a problem concerning (6.49). This equation, as it stands, has no well-definedmeaning. To see this, note that Γ(t), because it has no correlation time, can be visualized as asequence of delta peaks arriving at random times ti. According to (6.49) each delta function inΓ(t) causes a jump in y(t). Hence, the value of y at the time that the delta function arrives isundetermined and therefore also the value of B(y). The equation does not specify if one shouldinsert in B(y) the value of y before the jump, after the jump, or perhaps the mean value of thesetwo values. These different options lead to different Fokker-Planck equations.One cannot answer these questions from a purely mathematical point of view, and prescriptions,like the Ito or Stratonovich definitions, are needed to determine B(y) unambiguously.

Stratonovich opted for the mean value. He reads (6.49) as

y(t+ ∆t)− y(t) = A(y(t))∆t+B

(

y(t) + y(t+ ∆t)

2

)∫ t+∆t

tdt′ Γ(t′) (6.61)

It can be proved that this choice leads to (6.59). This shows that our naive use of the transfor-mation of variables amounted to opting for the Stratonovich interpretation.

Ito opted for the value of y before the arrival of the delta peak. He reads (6.49) as

y(t+ ∆t)− y(t) = A(y(t))∆t+B(y(t))

∫ t+∆t

tdt′ Γ(t′) (6.62)

6.5. NON-LINEAR LANGEVIN EQUATION 91

and proved that this is equivalent to

∂P (y, t)

∂t= − ∂

∂y[A(y)P ] +D

∂2

∂y2[B(y)]2P (6.63)

Apparently this interpretation is not compatible with the familiar way of transforming variables:new transformation laws have to be formulated. We note that (6.63) and (6.59) differ in thedrift coefficient. The difference between their coefficients is sometimes called ”spurious drift”.The Stratonovich integral is the natural choice for an interpretation which assumes Γ is a realnoise (not a white noise) with finite correlation time, which is then allowed to become infinites-imally small after calculating measurable quantities.

6.5.1 Definition of the Stochastic integral

The previous Section calls for the introduction of stochastic integration. Suppose f(t) is anarbitrary function of time and W (t) is the Wiener process. We define the stochastic integral

I =

∫ t

t0f(t′)dW (t′) (6.64)

as a kind of Rieman-Stieltjes integral. Namely, we divide the interval [t0, t] into n subintervalsby means of partitioning points

t0 ≤ t1 ≤ ... ≤ tn−1 ≤ tand define intermediate points τi such that

ti−1 ≤ τi ≤ ti

The stochastic integral is defined as the mean square limit 2 of the partial sum

Sn =n∑

i=1

f(τi)[W (ti)−W (ti−1)] (6.65)

It is heuristically quite easy to see that in general the integral defined as the limit of Sn dependson the particular choice of intermediate points τi.

Exercise - Show that if we take, for example, f(t) = W (t) and choose for all i

τi = αti + (1− α)ti−1 (0 < α < 1) (6.66)

we get: 〈Sn〉 =∑ni=1(ti − ti−1)α = (t− t0)α.

The choice of intermediate points characterised by α = 0, i.e. τi = ti−1 defines the Ito stochasticintegral of the function f(t):

I(I) =

∫ t

t0f(t′)dW (t′) = ms− lim

n→∞

n∑

i=1

f(ti)[W (ti)−W (ti−1)] (6.67)

2We say that the mean square limit of Xn as n → ∞ is X and write ms − limn→∞ Xn = X if limn→∞〈(Xn −X)2〉 = 0


The following alternative definition of the integral I, is due to Stratonovich

I(S) =

∫ t

t0f(t′)dW (t′) = ms− lim

n→∞

n∑

i=1

f(ti) + f(ti−1)

2[W (ti)−W (ti−1)] (6.68)

Exercise - Consider, as an example, f(t) = W (t). Calculate I =∫ tt0W (t′)dW (t′) using Ito and

Stratonovich rule.

(a) Show that Ito’s rule gives

I(I) =1

2[W (t)2 −W (t0)

2 − (t− t0)]

Note the difference with the ordinary integration in which the term (t − t0) would beabsent. The reason for this is that W (t + τ) − W (t) is order

√t, so, in contrast with

ordinary integration, terms of second order in ∆W do not vanish on taking the limit.

(b) Show that the anomalous term (t− t0) does not occur in the Stratonovich integral

I(S) =1

2[W (t)2 −W (t0)

2] (6.69)

6.6 The Kramers-Moyal coefficients for the Langevin equation

Here we show a general method to construct a Fokker-Planck equation for Markov processesobeying equation (6.1) with Gaussian delta correlated noise, directly in terms of the coefficientsappearing in the equation of motion. Let us calculate the successive coefficients occurring in theKramers-Moyal expansion

a(n)(x, t) = limτ→0

1

τ〈[y(t+ τ)− x]n〉y(t)=x (6.70)

where y(t+ τ) (τ > 0) is a solution of (6.49) which at time t has the sharp value y(t) = x. Wefirst cast the differential equation (6.49) in an integral form:

y(t+ τ)− x =

∫ t+τ

tdt′A(y(t′), t′) +

∫ t+τ

tdt′B(y(t′), t′)Γ(t′) (6.71)

and assume that A and B can be expanded about y(t′) = x

A(y(t′), t′) = A(x, t′) +A′(x, t′)(y(t′)− x) + ...

B(y(t′), t′) = B(x, t′) +B′(x, t′)(y(t′)− x) + ... (6.72)

where the primes denote partial derivative with respect to y, evaluated at the initial point x.Inserting these expansions in (6.71) we have

y(t+ τ)− x =

∫ t+τ

tdt′ [A(x, t) +B(x, t)Γ(t′)]

+

∫ t+τ

tdt′ [A′(x, t′) +B′(x, t′)Γ(t′)](y(t′)− x) + ... (6.73)

6.6. THE KRAMERS-MOYAL COEFFICIENTS FOR THE LANGEVIN EQUATION 93

For y(t′)− x in the integrand we iterate (6.73) thus obtaining

y(t+ τ)− x =

∫ t+τ

tdt′ [A(x, t) +B(x, t)Γ(t′)]

+

∫ t+τ

tdt′ [A′(x, t′) +B′(x, t′)Γ(t′)]

∫ t′

tdt′′ [A(x, t′′) +B(x, t′′)Γ(t′′)] + ...

(6.74)

If we now take the average of (6.74) for fixed y(t) = x and use (6.50) we have

〈y(t+ τ)− x〉 =

∫ t+τ

tdt′A(x, t′) +

∫ t+τ

tdt′

∫ t′

tdt′′A(x, t′′)A′(x, t′)

+

∫ t+τ

tdt′

∫ t′

tdt′′B(x, t′′)B′(x, t′) 2Dδ(t′ − t′′) + ... (6.75)

An important point of definition arises here. It frequently occurs in the study of stochastic dif-ferential equations that in integrals involving delta functions the argument of the delta functionis equal to either the upper or the lower limit of integrals, that is we find integrals like

I1 =

∫ t2

t1dt f(t)δ(t− t1) or I2 =

∫ t2

t1dt f(t)δ(t− t2) (6.76)

Various conventions can be made concerning the values of these integrals. If one counts all theweight of a delta function at the lower limit of an integral and none of the weight at the upperlimit, this yields I1 = f(t1) and I2 = 0, and amounts to choosing the Ito convention, whereincrements “point to the future”

∫ t1t0f(t)Γ(t)dt =

∫ t1t0f(t)dW (t) = limn→∞

∑ni=1 f(ti−1)[W (ti)−

W (ti−1)] (note the interpretation of the integral of Γ(t) as the Wiener process W (t) is used).If the delta function has half its weight counted in both the limit of an integral, this yieldsI1 = 1

2f(t1) and I2 = 12f(t2) and amounts to choosing Stratonovich convention of stochastic

integration∫ t1t0f [x(t), t]dW (t) = limn→∞

∑ni=1 f(x(ti)+x(ti−1)

2 , ti−1)[W (ti) −W (ti−1)]. Here weuse Stratonovich convention and get

〈y(t+ τ)− x〉 =

∫ t+τ

tdt′A(x, t′) +

∫ t+τ

tdt′

∫ t′

tdt′′A(x, t′′)A(x, t′) +D

∫ t+τ

tdt′B(x, t′)B′(x, t′) + ...

(6.77)

This is equivalent to take for the δ function any representation δǫ(t) symmetric around theorigin, for example (6.6) and finally take ǫ → 0. As a result, e.g. I2 = f(t2)

∫ 0t1−t2

dτ δǫ(τ) =

f(t2)∫ 0−ǫ/2 dτ

1ǫ = 1

2f(t2)Finally, on taking the limit τ → 0 and considering that only terms of order τ contribute toa(1)(x, t) = limτ→0

1τ 〈y(t+ τ)− y(t)〉|y(t)=x, we finally get

a(1)(x, t) = A(x, t) +DB(x, t)∂B(x, t)

∂x(6.78)

Note that the terms not written down in (6.74) vanish in the limit τ → 0 because integrals notcontaining the Langevin force are O(tn) where n is the number of integrals, whereas integralscontaining the Langevin force either vanish, if the number of integrals is odd, or are orderO(tn/2)


if the number n of integrals is even. Using these arguments for the higher order coefficients, weconclude that

a(2)(x, t) = limτ→0

1

τ

∫ t+τ

tdt′

∫ t+τ

tdt′′B(x, t′)B(x, t′′)2Dδ(t′ − t′′) = 2DB2(x, t) (6.79)

whereas all the coefficientsa(n) = 0 ∀n ≥ 3. (6.80)

One can see that the term a(1) contains, in addition to the deterministic drift A(x, t), a termwhich is called the spurious or noise-induced drift

a(1)spurious(x, t) = D

∂B(x, t)

∂xB(x, t) (6.81)

This arises from the fact that during a change of Γ(t) also y(t) changes and therefore 〈B(y(t), t)Γ(t)〉is no longer zero.In conclusion, for Markov stochastic processes determined by the Langevin equation (6.49)with δ-correlated Gaussian-distributed Langevin forces, all KM coefficients vanish for n ≥ 3.Therefore, the distribution of probability obeys the Fokker-Planck equation

∂P (y, t)

∂t= − ∂

∂y[A(y, t) +DB(y, t)

∂B(y, t)

∂y]P +D

∂2

∂y2[B2(y, t)P ] (6.82)

6.6.1 Multivariate case

The Langevin equation for a multi-component process y has the form

dyidt

= Ai(y, t) +∑

k

Bik(y, t)Γk(t) (6.83)

where Γk are white noise terms with

〈Γk(t)〉 = 0

〈Γk(t)Γℓ(t′)〉 = 2Dδkℓδ(t− t′) (6.84)

The generalization of (6.78,6.79,6.80) to the multivariate case are

a(1)i (y, t) = Ai(y, t) +D

∑

jk

Bjk(y, t)∂Bik(y, t)

∂xj(6.85)

a(2)ij (y, t) = 2D

∑

k

Bik(y, t)Bjk(y, t) (6.86)

a(m)j1,...,jm

(y, t) = 0, for m ≥ 3 (6.87)

The corresponding Fokker-Planck equation is

∂P

∂t= −

N∑

i=1

∂

∂yi

A(1)i (y, t) +D

N∑

j,k=1

Bjk(y, t)∂Bik(y, t)

∂yj

P

+D∑

ij

∂2

∂yi∂yj

[

∑

k

Bik(y, t)Bjk(y, t)

]

P

(6.88)

which is entirely determined by the coefficients of the Langevin equation.

6.7. DIFFUSION IN PHASE SPACE: KLEIN-KRAMERS EQUATION 95

6.7 Diffusion in phase space: Klein-Kramers equation

Let us consider the following generalisation of the Langevin equation (6.7), in order to accountfor the presence of an external potential V (x)

md2x

dt2= −γ dx

dt− V ′(x) + Γ(t) (6.89)

This is simply Newton equation augmented by the fluctuating force and may describe the motionin one dimension of a colloidal particle in presence of a potential V (x). We can rewrite the secondorder differential equation as a pair of first order differential equations

x = v

v = −V′(x)

m− γ

mv +

1

mΓ(t) (6.90)

The state of the particle is described by its position x and velocity v. To make contact withHamiltonian systems, we may regard the particle as a Hamiltonian dynamical system, in whichthe momentum equation is augmented by a friction and a noise term.

Then, comparing with the multivariate Langevin equation (6.83) we identify Γx(t) = 0 andΓv(t) = Γ(t), as well as

Ax = v Bxx = 0 Bxv = 0Av = − 1

m(γv + V ′) Bvx = 0 Bvv = 1m

(6.91)

Inserting these results in the general Fokker-Planck equation (6.88) one gets

∂P

∂t= −∂P

∂xv − ∂

∂v

[

− 1

m

(

γv + V ′(x))

P

]

+D

m2

∂2P

∂v2(6.92)

Gathering the Hamiltonian terms and identifying D = γκBT we finally find the famous Klein-

Kramers equation

∂P

∂t= −v∂P

∂x+V ′

m

∂P

∂v+γ

m

(

∂

∂vv +

κBT

m

∂2

∂v2

)

P (6.93)

Probability flow

It is instructive to derive Klein-Kramers equation as the equation of motion for the probabilitydensity in phase space, along the lines of Section 4.2. The probability to find the Brownianparticle in the interval x → x + dx and v → v + dv, at time t, is ρ(x, v, t)dxdv. As explainedin Sec. 4.2 we may view the probability distribution as a fluid whose density at point (x, v) isgiven by ρ(x, v, t). Then, applying standard fluid dynamics, we find the following laws for theprobability flow:

∂ρ(t)

∂t= −∂(xρ(t))

∂x− ∂(vρ(t))

∂v

= −ρ(t)[

∂x

∂x+∂v

∂v

]

− x∂ρ(t)∂x

− v ∂ρ∂v

(6.94)


where we let ρ(t) = ρ(x, v, t). The first term on the right hand side, which did not contributein the case of Hamiltonian systems, now yields a contribution −γρ/m, due to the presence ofdissipation. Also, friction and noise term provide additional constributions to v∂ρ/∂v, so theequation for the probability flow is now given by

∂ρ(t)

∂t= −L0ρ(t)− L1ρ(t) (6.95)

where the differential operators L0 and L1 are defined

L0 = − γm

+ v∂

∂x− V ′(x)

m

∂

∂v− γ

mv∂

∂v(6.96)

and

L1 =1

mΓ(t)

∂

∂v(6.97)

Since Γ(t) is a stochastic variable, the time evolution of ρ(x, v, t) will be different for eachrealization of Γ(t). So we are interested in the average probability

P (x, v, t) = 〈ρ(x, v, t)〉Γ (6.98)

Let us derive P (x, v, t). This is easier done in terms of the new probability density σ(t), suchthat

ρ(t) = e−L0tσ(t) (6.99)

which satisfies the equation∂σ

∂t= −V (t)σ(t) (6.100)

where

V (t) = eL0tL1(t)e−L0t (6.101)

The formal solution is

σ(t) = e−∫ t

0dt′ V (t′)σ(0) (6.102)

We now expand the exponential

σ(t) =

[

∞∑

n=0

(−1)n

n!

(∫ t

0dt′ V (t′)

)n]

σ(0) (6.103)

Because the noise Γ(t) has zero mean and is Gaussian, Wick’s theorem applies. Only even valuesof n remain

〈σ(t)〉Γ =

[

∞∑

n=0

1

2n!〈(∫ t

0dt′ V (t′)

)2n

〉Γ]

σ(0) (6.104)

and the average 〈(

∫ t0 dt

′ V (t′))2n〉Γ will decompose into (2n!)/n!2n identical terms, each contain-

ing a product of n pairwise averages 〈∫ t0 dti V (ti)

∫ t0 dtj V (tj)〉Γ. Thus, we have

〈σ(t)〉Γ =

[

∞∑

n=0

1

n!

(

1

2

∫ t

0dt2dt1 〈V (t2)V (t1)〉Γ

)n]

σ(0) (6.105)

6.7. DIFFUSION IN PHASE SPACE: KLEIN-KRAMERS EQUATION 97

We can now sum this series to obtain

〈σ(t)〉Γ = exp

[

1

2

∫ t

0dt2dt1 〈V (t2)V (t1)〉Γ

]

σ(0) (6.106)

Let us compute the integral

1

2

∫ t

0dt2dt1 〈V (t2)V (t1)〉Γ =

1

2m2

∫ t

0dt2dt1 〈Γ(t2)Γ(t1)〉ΓeL0t2

∂

∂ve−L0(t2−t1) ∂

∂ve−L0t1

=D

m2

∫ t

0dt1 e

L0t1∂2

∂v2e−L0t1 (6.107)

Taking the time derivative of (6.106) and substituting (6.107) we get

∂〈σ(t)〉∂t

=D

m2eL0t

∂2

∂v2e−L0t〈σ(0)〉 (6.108)

Going back to the original variables we finally obtain

∂〈ρ(t)〉∂t

= −L0〈ρ(t)〉Γ +D

m2

∂2〈ρ(t)〉Γ∂v2

(6.109)

The equation for the probability density P = 〈ρ〉 takes the form

∂P

∂t= −v∂P

∂x+

∂

∂v

[(

γ

mv +

V ′(x)

m

)

P

]

+D

m2

∂2P

∂v2(6.110)

6.7.1 The strong friction limit: Smoluchowski equation

Consider a Brownian particle moving in one dimension in a potential well V (x), and assumethat the friction coefficient γ is very large so that the velocity of the Brownian particle relaxesto its stationary state very rapidly. Then we can neglect time variations in the velocity (setdv/dt ≈ 0). With this assumption, the Langevin equation reduces to

x = −V′(x)

γ+

Γ(t)

γ(6.111)

Comparing with the univariate Langevin equation, we identify

A = −V′

γ, B =

1

γ(6.112)

Inserting these results in the Fokker-Planck equation one gets (setting D/γ = κBT ) the Smolu-

chowski equation∂P

∂t=

1

γ

∂

∂x

[(

V ′(x) + κBT∂

∂x

)

P

]

(6.113)

The result D/γ = κBT can be obtained on inserting the marginal Boltzmann distributionP0 ∝ exp[−V (x)/κBT ] and finding the conditions for it to be a stationary solution. Let usspecialize the above results to the case of free particle and harmonic oscillator.


• In absence of potential, Smoluchowski equation reduces to the equation for free diffusion

∂P

∂t=κBT

γm

∂2

∂x2P = DE

∂2

∂x2P (6.114)

with solution given by the Wiener-Levy process.

• For the overdamped harmonic oscillator, V (x) = mω2x2/2, the Fokker-Planck equation issolved by the Ornstein-Uhlenbeck process with τ = γ/mω2 and D = κBT/mω

2

P (x, t) =

√

mω2

2πκBT (1− e−2t/τ )exp

[

−mω2(x− x0e

−t/τ )2

2κBT (1− e−2t/τ )

]

(6.115)

6.8 Relaxation times

The solution of the Fokker-Planck equation for the overdamped harmonic oscillator relaxes tothe equilibrium state P0 ∝ exp(−1

2mω2x2/κBT ) in the timescale τ = γ/mω2. This is also called

the relaxation time and in this problem it is independent of the temperature.

It is very common in physics to find expressions for different relaxation times that dependexponentially on the temperature T (Arrhenius law), which is the generic behaviour when, toestablish equilibrium at low temperature, potential barriers need to be overcome. This was notthe situation in the previous example so such dependence was absent. We shall now solve a simpleproblem with potential barriers in the regime of low temperature and see how the exponentialdependence arises. Let us consider an overdamped particle, described by Smoluchowski equation

∂P

∂t=

1

γ

(

∂

∂xV ′ +

κBT

m

∂2

∂x2

)

P = −∂J∂x

(6.116)

where the last equality defines the current probability J . At very low temperature, the prob-ability to escape from the metastable minimum is very low (zero at T = 0, for a deterministicsystem). Therefore the flux of particles over the barrier is very slow, and we can solve theproblem as if it were stationary. Then the expression for J is assumed to be independent of x,and we can integrate the differential equation for P

−γJ = V ′P +κBT

m

∂P

∂x(6.117)

thus obtaining (U = mV and D = κBT/mγ)

P = e−U/κBT[

P (c)eU(c)/κBT − J

D

∫ x

cdx′ eU(x′)/κBT

]

(6.118)

where c is an arbitrary point. If we choose c well outside the barrier region c → ∞, we haveP (c) ≃ 0 and we find

J = −DP (x)eU(x)/κBT

∫ xc dx

′ eU(x′)/κBT(6.119)

Since J is independent of x, we can choose x at will. We choose x = a, the metastable minimum,so the integral in the denominator covers the entire maximum. The main constribution to the

6.9. FLUCTUATION-DISSIPATION RELATIONS FOR LANGEVIN EQUATION 99

integral comes from a small region about the maximum x = b, so we expand there U(x) ≃Ub − 1

2mω2b (x − b)2, being mω2

b = U ′′(b). The integration limits can be shifted to ±∞, so theresulting Gaussian integral leads to

J = DP (a)

√

mω2b

2πκBTe−(Ua−Ub)/κBT (6.120)

Since the flux J leaving the well is given by the number of particles Na in the well times theescape rate, the relaxation rate is defined as

1

τ=

J

Na= D

√

mω2b

2πκBT

P (a)

Nae−(Ub−Ua)/κBT (6.121)

To obtain the number of particle in the well, we integrate P (x) in a region around a, withP (x) = Ce−U(x)/κBT and U(x) = Ua + 1

2mω2a(x − a)2, where mω2

a = U ′′(a), so P (a)/Na =√

mω2a/2πκBT . So one finds for the relaxation time,

1

τ=ωaωb2πγ

e−(Ub−Ua)/κBT (6.122)

the typical exponential dependence on the barrier height over the temperature.

6.9 Fluctuation-dissipation relations for Langevin equation

In this section we derive the fluctuation-dissipation relations in systems subject to a determini-tistic drift ∂H[x(t)]/∂x(t) and a random force Γ(t), evolving according to Langevin dynamics

dx(t)

dt= −∂H[x(t)]

∂x(t)+ Γ(t) (6.123)

where Γ(t) is a Gaussian white noise with 〈Γ(t)Γ(t′)〉 = 2Tδ(t − t′). We define the two-timecorrelation function as

C(t, t′) = 〈x(t)x(t′)〉 (6.124)

and assume t > t′. In order to calculate the response of the system to a time-dependent externalperturbation, we consider the following perturbation

Hh[x(t)] = H[x(t)] + h(t)x(t) (6.125)

The variation of the quantity x(t) induced by the presence of h(t), is measured by the two-timeresponse function R(t, t′), defined as

R(t, t′) =δ〈x(t)〉δh(t′)

(6.126)

If the system is in equilibrium, the response function is not independent of the correlation, butit is related to the correlation by the celebrated fluctuation-dissipation theorem

R(t, t′) =1

Tθ(t− t′)∂C(t, t′)

∂t′(6.127)


and it is also a homogeneous function of time.

Let us compute the following derivative of C(t, t′):

∂C(t, t′)

∂t= 〈[

∂H[x(t)]

∂x(t′)+ Γ(t)

]

x(t′)〉 (6.128)

and subtract the derivative with respect to the argument t′

∂C(t, t′)

∂t− ∂C(t, t′)

∂t′= A(t, t′)− 〈x(t)Γ(t′)〉 with (6.129)

A(t, t′) = −〈∂H[x(t)]

∂x(t)x(t′)〉+ 〈∂H[x(t′)]

∂x(t′)x(t)〉 (6.130)

where we dropped the noise term 〈x(t′)Γ(t)〉 because of causality. At equilibrium, due to timereversal, the correlation functions satisfy

〈B(t)D(t+ τ)〉 = 〈B(t+ τ)D(t)〉 (6.131)

where B(t), D(t) are any two functions of x(t). Therefore, the asymmetry term A on the righthand side of (6.129) cancel. The last term can be evaluated by mean of Novikov’s theorem,which states that if Γ(t) is a Gaussian process and f is a function of x(t), then

〈f [x(t)]Γ(t′)〉 =

∫

dt′′δ〈f [x(t)]

∂Γ(t′′)〈Γ(t′′)Γ(t′)〉

=

∫

dt′′δ〈f [x(t)]

∂h(t′′)〈Γ(t′′)Γ(t′)〉 (6.132)

Setting f [x(t)] = x(t), we obtain

〈x(t)Γ(t′)〉 =

∫

dt′′R(t, t′′)〈Γ(t′′)Γ(t′)〉 = 2TR(t, t′) (6.133)

and we find∂C(t, t′)

∂t− ∂C(t, t′)

∂t′= −2TR(t, t′) (6.134)

At equilibrium, due to invariance under translation in time, the two-time average depends onlyon the time difference,

∂C(t, t′)

∂t= −∂C(t, t′)

∂t′(6.135)

hence we retrieve the fluctuation-dissipation theorem

∂C(t, t′)

∂t′= TR(t, t′) (6.136)

and the response function has the form

R(t, t′) =1

Tθ(t− t′)∂C(t, t′)

∂t′(6.137)

6.9. FLUCTUATION-DISSIPATION RELATIONS FOR LANGEVIN EQUATION 101

Proof of Novikov’s theorem

Novikov theorem states that for a multivariate Gaussian distribution with zero mean

P (x) =

√

detA

(2π)nexp

(

−1

2xAx

)

(6.138)

the averages of the type 〈xif(x)〉 can be obtained as

〈f(x)xi〉 =∑

m

〈 δf∂xm〉〈xixm〉 (6.139)

Note that applying this result to f(x) = xjxkxℓ and using ∂xi/∂xm = δim we get Wick’s formula

〈xixjxkxℓ〉 = 〈xixj〉〈xkxℓ〉+ 〈xixk〉〈xjxℓ〉+ 〈xixℓ〉〈xjxk〉 (6.140)

We prove this theorem in three steps.

• Denote E(x) = 12

∑

ij xiAijxj . We have (using the symmetry of A)

∂E

∂xm=∑

j

Amjxj → xi =∑

m

(A−1)im∂E

∂xm(6.141)

• Inserting in the definition of average for 〈xif(x)〉

〈xif(x)〉 = C

∫

dxxif(x)e−E(x) = C∑

m

(A−1)im

∫

dxf(x)∂E

∂xme−E(x)

= C∑

m

(A−1im)

∫

dx∂f

∂xme−E(x) =

∑

m

(A−1im)〈 ∂f

∂xm〉 (6.142)

• Using the above result for f(x) = xj , and ∂xj/∂xm = δjm, we get 〈xixj〉 = (A−1)ij .Insertion of this above completes the proof.

6.9.1 Fluctuation-dissipation violation

If the system is out of equilibrium, neither homogeneity nor fluctuation-dissipation theoremhold and the asymmetry term may be present. Equation (6.137) is not valid in general and ageneralised relation between response and correlation must be considered

R(t, t′) =1

Tθ(t− t′)X(t, t′)

∂C(t, t′)

∂t′(6.143)

with X a function of both times t, t′. Deviations from X = 1 quantify the out-of-equilibriumdynamics.


The off-dynamics of a simple unfrustrated system

The simplest example of a dynamical system that does not reach equilibrium is the randomwalk. In the continuum limit the quantity x(t) satisfies the very simple differential equation

dx(t)

dt= Γ(t) (6.144)

with Γ(t) a Gaussian noise with variance given by

〈Γ(t)Γ(t′)〉 = 2Tδ(t− t′) (6.145)

The correlation function is easily obtained from

C(t, t′) = 〈x(t)x(t′)〉 =

∫ t

0dt1

∫ t′

0dt2 〈Γ(t1)Γ(t2)〉 = 2T min(t, t′) (6.146)

whereas in order to get the response function we apply a small perturbation to the HamiltonianH → H +

∫

dt h(t)x(t) so the dynamical equation becomes

dx(t)

dt= h(t) + Γ(t) (6.147)

Averaging over the zero mean noise we get

〈x(t)〉 =

∫ t

0dt1 h(t1) (6.148)

so

R(t, t′) =δ〈x(t)〉δh(t′)

= θ(t− t′) (6.149)

Hence the relation (6.129) is satisfied with

X(t, t′) =1

2(6.150)

a constant function ∀ t, t′, but different from the usual FDT result X = 1, i.e. the system neverreaches equilibrium.

Chapter 7

Applications to complex systems

[This chapter follows closely the book “Neural information processing systems”1]

Statistical mechanics deals with large systems of interacting microscopic elements. Its strategyis to avoid solving models of such systems at the microscopic level, but trying instead to usethe microscopic laws to calculate laws describing the behaviour of suitable chosen macroscopicobservables. It turns out that if we consider large systems, macroscopic laws are usually deter-ministic and simple even though the microscopic ones are stochastic and complex. Macroscopicobservables normally present a kind of average behaviour of the microscopic ones so the moreelements we average over the smaller the fluctuations of the average. In the limit of an infinitelylarge system the fluctuations vanish and the averages evolve deterministically. Statistical me-chanics enable the reduction from the microscopic to the macroscopic level. Here we applystatistical mechanics tools to recurrent neural networks (there are feedback loops between theneurons), functioning as associative memories: by appropriate choice of the synaptic weightswe can ensure that the neural firing patterns, starting from a given initial state, will convergetowards one of a number of patterns that we wish to store.

7.1 Neural networks models

One can think of neural networks as systems of N neurons which are either firing (on) or quiet(off), so that the state of each neuron can be described by a binary variable σi ∈ −1, 1. Eachneuron interacts with other neurons via synaptic connections, whose strengths are representedby real valued numbers Jik, and is influenced by a field at its location

hi(t) =N∑

k=1

Jikσk(t) + θi (7.1)

where θi is a threshold and/or external stimuli, characteristic for each neuron.

In the McCulloch-Pitts model time is discretized in units ∆ and each neuron evolves in time

1by ACC Coolen, R Kuhn, P Sollich

103

104 CHAPTER 7. APPLICATIONS TO COMPLEX SYSTEMS

according to the deterministic rule

σi(t+ ∆) = sgn(hi(t)) (7.2)

In the case we wish to take into account noise in systems with McCulloch-Pitts type neurons(neurons not always do what they are expected to do), it is convenient to assume that thethresholds are random variables, distributed simmetrically around an average value θ⋆i , all withthe same variance, and write them in a form where the average and variance are explicit:

θi(t) = θ⋆i + Tzi(t) (7.3)

〈zi(t)〉 = 0, 〈z2i (t)〉 = 1 (7.4)

T 2 = 〈[θi(t)− θ⋆i ]2〉 (7.5)

The parameter T measures the level of noise in the system (plays the role of temperature inmagnetic systems). For T = 0 we return to our previous deterministic law; for T → ∞ thesystem behaves in a completely random manner. The probability to find a neuron in stateσi(t+ ∆) can be expressed in terms of the distribution P (z) of the indipendent noise variableszi(t). For symmetric noise distributions, that is P (z) = P (−z) ∀z, this probability can bewritten in a very compact way

Prob[σi(t+ ∆)] = g

(

σi(t+ ∆)hi(t)

T

)

(7.6)

with (7.7)

g(x) =

∫ x

−∞dz P (z) (7.8)

[We used Prob[σi(t + ∆) = 1] = Prob[hi(t) + Tzi(t) > 0] = g[hi(t)/T ] and Prob[σi(t + ∆) =−1] = Prob[hi(t) + Tzi(t) < 0] = g[−hi(t)/T ]]

A natural choice for the distribution P (z) of the noise variable zi is the Gaussian rule

P (z) =1√2πe−z

2/2 (7.9)

which leads to

g(x) =1

2

[

1 + erf

(

x√2

)]

(7.10)

where erf(x) denotes the error function. An alternative choice which simplifies considerablymany calculations, is to replace the above function g(x) by the following (qualitatively similar)one:

g(x) =1

2[1 + tanh(x)] (7.11)

which corresponds to the noise distribution

P (z) =1

2

[

1− tanh2(z)]

(7.12)

In a network of such units, updates can be carried out either in parallel (synchronously) orsequentially (one after the other). In the first case, since all noise variables are independent, the

7.2. MICROSCOPIC DYNAMICS IN PROBABILISTIC FORM 105

combined probability to find state σ(t+ ∆) = (σ1(t+ ∆), ..., σN (t+ ∆)) ∈ −1, 1N equals theproduct of the individual probabilities, so

parallel : Prob[σ(t+ ∆)] =N∏

i=1

g

(

σi(t+ ∆)hi(t))

T

)

(7.13)

In the second case we have to take into account that only one neuron changes its state at a time.The candidate is drawn at random with probability N−1, resulting in

sequential :

choose i randomly from 1, ..., NProb[σi(t+ ∆)] = g

(

σi(t+∆)hi(t)T

) (7.14)

If we choose (7.11) for g, abbreviate the inverse noise level as T−1 = β and use tanh(σz) =σ tanh(z) for σ ∈ −1, 1, we can write the microscopic dynamics in the following form

parallel : Prob[σ(t+ ∆)] =N∏

i=1

1

2[1 + σi(t+ ∆) tanh(βhi(σ(t)))] (7.15)

sequential :

choose i randomly from 1, ..., NProb[σi(t+ ∆)] = 1

2 [1 + σi(t+ ∆) tanh(βhi(σ(t)))](7.16)

Note that the microscopic laws can be rewritten in a number of ways, for example via

1

2[1 + σ tanh(βh)] =

eβσh

2 cosh(βh)=

1

1 + e−2βhσ(7.17)

7.2 Microscopic dynamics in probabilistic form

Next, we write the above two versions of the stochastic dynamics in terms of the evolvingmicroscopic state probabilities

pt(σ) = Prob[σ(t) = σ] (7.18)

7.2.1 Parallel dynamics

For parallel dynamics we can set the elementary time unit to ∆ = 1. Then,

pt+1(σ) =N∏

i=1

1

2[1 + σi tanh(βhi(σ(t))) =

N∏

i=1

eβσihi(σ(t))

2 cosh(βhi(σ(t)))(7.19)

If instead of the precise microscopic state σ(t), the probability distribution pt(σ′) over all pos-

sibile values σ′ of σ(t) at time t is given, the above expression generalizes to the correspondingaverage over σ′:

pt+1(σ) =∑

σ′

W (σ,σ′)pt(σ′) (7.20)

This defines a Markov chain with transition probability matrix

W (σ,σ′) =N∏

i=1

eβσihi(σ′)

2 cosh(βhi(σ′))(7.21)


which satisfies

W (σ,σ′) ∈ [0, 1]∑

σ

W (σ,σ′) = 1 (7.22)

If we think of pt(σ) as a 2N -dimensional vector, indexed by the possible states σ of the network,then the vector pt+1 is the product of the 2N × 2N transition matrix W with the vector pt.

7.2.2 Sequential dynamics

In the case of sequential dynamics the stochasticity of the dynamics is both in the choice of thesite i to be updated and in the stochastic update of the chosen neuron. If both σ(t) and thechosen site i are given we find

pt+∆(σ) =1

2[1 + σi tanh(βhi(σ(t)))

∏

j 6=i

δσj ,σj(t) (7.23)

After averaging this expression over the random site i we obtain

pt+∆(σ) =1

N

N∑

i=1

1

2[1 + σi tanh(βhi(σ(t)))

∏

j 6=i

δσj ,σj(t)

(7.24)

If instead of the precise microscopic state σ(t), only the probability distribution pt(σ′) is given,

this expression is again to be averaged over the possible states σ′ at time t

pt+∆(σ) =∑

σ′

W (σ,σ′)pt(σ′) (7.25)

where now

W (σ,σ′) =1

N

N∑

i=1

1

2[1 + σi tanh(βhi(σ

′))∏

j 6=i

δσj ,σ′j

(7.26)

The transition matrix can be written in a simpler form in terms of the i-th spin-flip operator

Fiσ = (σ1, ..., σi−1,−σi, σi+1, ..., σN ) (7.27)

and using the definition of Kronecher deltas δσ,σ′ =∏Ni=1 δσi,σ′

i

W (σ,σ′) =1

N

N∑

i=1

1

2[1 + σ′i tanh(βhi(σ

′))]δσ,σ′ +1

2[1− σ′i tanh(βhi(σ

′))]δFiσ,σ′

(7.28)

If we also define

wi(σ′) =

1

2[1− σ′i tanh(βhi(σ

′))] (7.29)

as the probability of neuron i being updated in state σi = −σ′i at time t + ∆ if it was in theopposite state σ′i at time t, we can write the transition matrix in yet another form

W (σ,σ′) = δσ,σ′ +1

N

N∑

i=1

[wi(σ′)δFiσ,σ′ − wi(σ′)δσ,σ′ ] (7.30)

7.3. MACROSCOPIC DYNAMICS IN PROBABILISTIC FORM 107

Using expression (7.30) we are able to rewrite the Markov chain (7.20) as a Master equation

pt+∆(σ)− pt(σ) =1

N

N∑

i=1

[wi(Fiσ)pt(Fiσ)− wi(σ)pt(σ)] (7.31)

In order to observe changes at a finite rate when N →∞, we ensure that the time increment ∆for a single sequential update has durationN−1. Choosing ∆ = N−1 and with the approximation

pt+1/N (σ)− pt(σ) ≈ 1

N

d

dtpt(σ) (7.32)

we obtain the master equation in continuous time

d

dtpt(σ) =

N∑

i=1

[wi(Fiσ)pt(Fiσ)− wi(σ)pt(σ)] (7.33)

The quantities wi(σ) now play the role of transition rates: in any short interval dt of rescaledtime, the probability for neuron i to change state is wi(σ)dt.

7.3 Macroscopic dynamics in probabilistic form

In non-equilibrium statistical mechanics one aims to derive from the microscopic stochasticlaws for the evolution of the system’s configuration σ, dynamical equations for the probabilitydistribution Pt(Ω) of a suitable small set of macroscopic quantities Ω(σ) = (Ω1(σ), ...,Ωn(σ)).In many cases one finds, as a further simplification, that the macroscopic dynamics becomesdeterministic for N →∞. It turns out that differential equations for the probability distributionof suitably defined macroscopic state variables can be derived if the interaction matrix has asuitable structure. There will be a restriction on the number n of macroscopic state variables ifwe require the evolution of the variables Ω to obey a closed set of deterministic laws.

7.3.1 Sequential dynamics

We begin to look at macroscopic dynamics for sequential update.

Toy model

We illustrate the basic ideas with the help of a simple toy model, defined by

Jij =J

Nηiξj θi = 0 (7.34)

The local fields become hi(σ) = Jηim(σ) with m(σ) = N−1∑

k ξkσk. Since they depend on σ

only through m, the latter quantity is a natural candidate observable for a macroscopic level ofdescription. The probability of finding the value m(σ) = m at time t is given by

Pt(m) =∑

σ

pt(σ)δ(m−m(σ)) (7.35)

Its time derivative is obtained from the continuous time master equation

∂

∂tPt(m) =

∑

i

∑

σ

δ(m−m(σ))[wi(Fiσ)pt(Fiσ)− wi(σ)pt(σ)] (7.36)


Relabeling the summation variable σ → Fiσ in the first term

∂

∂tPt(m) =

∑

i

∑

σ

wi(σ)pt(σ)[δ(m−m(Fiσ))− δ(m−m(σ))] (7.37)

Because m(Fiσ) = m(σ)− 2σiξi/N , the arguments of the two delta functions differ only by anO(N−1) term, so we can make a Taylor expansion

∂

∂tPt(m) =

∑

i

∑

σ

wi(σ)pt(σ)[

(

2

Nσiξi

)

∂

∂mδ(m−m(σ)) +O(N−2)] (7.38)

It may seem strange to make the Taylor expansion of a δ-function, which is not exactly asmooth object. The validity of this operation can be confirmed by applying the expansion toan expression of the form

∫

dmPt(m)G(m), where G is a smooth test function. Pulling them-derivative in front of the sum and reordering the i-dependent terms

∂

∂tPt(m) =

∂

∂m

[

∑

σ

pt(σ)δ(m−m(σ))2

N

∑

i

σiξiwi(σ)

]

+O(N−1)] (7.39)

and inserting (7.29)

∂

∂tPt(m) = − ∂

∂m

∑

σ

pt(σ)δ(m−m(σ))

×[

1

N

∑

i

ξi tanh(βhi(σ))−m(σ)

]

+O(N−1)] (7.40)

The key simplification which allows us to eliminate pt(σ) is that the local field depends on thestate σ only through m(σ) so

∂

∂tPt(m) = − ∂

∂m

Pt(m)

[

1

N

∑

i

ξi tanh(βJmηi)−m]

+O(N−1) (7.41)

In the thermodynamic limit only the first term survives and we have a closed equation for themacroscopic probability distribution Pt(m)

∂

∂tPt(m) = − ∂

∂m[Pt(m)F (m)] (7.42)

with

F (m) = limN→∞

1

N

∑

i

ξi tanh(βJmηi)−m (7.43)

The solution of the Liouville equation (7.42), for any function F (m), is the probability density

Pt(m) =

∫

dm0 P0(m0)δ(m−m⋆(t;m0)) (7.44)

where m⋆(t;m0) denotes the solution of the differential equation

d

dtm⋆ = F (m⋆) for the initial condition m⋆(0) = m0 (7.45)


For ηi = ξi we recover the macroscpic laws for the Hopfield model with a single pattern. In thatparticular case we have F (m) = tanh(βJm)−m and

dm

dt= tanh(βJm)−m (7.46)

For t → ∞ we will converge to a point where m = tanh(βmJ). This solution also provides amacroscopic characterization of the thermodynamic equilibrium for this model. We will see thatthe detailed balance condition for a stochastic dynamics to converge to equilibrium is satisfiedwhen the interaction matrix is symmetric. In this case the long time behaviour of the systemcan be analysed using equilibrium statistical mechanics. For non-symmetric interactions, on theother hand, detailed balance is absent and dynamical studies are the only option.

General formalism

We now generalize to less trivial choices of the interaction matrix and try to calculate the timeevolution of a set of macroscopic state variables Ω(σ) = (Ω1(σ), ...,Ωn(σ)). The equivalent of(7.47) and (7.48) are now

Pt(Ω) =∑

σ

pt(σ)δ(Ω−Ω(σ)) (7.47)

and∂

∂tPt(Ω) =

∑

i

∑

σ

δ(Ω−Ω(σ))[wi(Fiσ)pt(Fiσ)− wi(σ)pt(σ)] (7.48)

The difference between the arguments of the delta functions is now Ωµ(Fiσ)−Ωµ(σ) = ∆iµ(σ),(this represents the sensitivity of the macroscopic observables to single neuron state change)which is no longer O(N−1), so we cannot truncate our expansion to linear order

∂

∂tPt(Ω) =

∑

ℓ≥1

(−1)ℓ

ℓ!

n∑

µ1=1

...n∑

µℓ=1

∂ℓ

∂Ωµ1 ...Ωµℓ

[Pt(Ω)F (ℓ)µ1,...,µℓ

(Ω; t)] (7.49)

with

F (ℓ)µ1,...,µℓ

(Ω; t) = 〈N∑

j=1

wj(σ)∆jµ1(σ)...∆jµℓ(σ)〉Ω;t (7.50)

defined in terms of the conditional (or sub-shell) average

〈...〉Ω;t =

∑

σpt(σ)...δ(Ω−Ω(σ))

∑

σpt(σ)δ(Ω−Ω(σ))

(7.51)

As in the previous toy model, the Kramers-Moyal expansion (7.49) is to be interpreted ina distributional sense, that is, only to be used in expressions of the form

∫

dΩPt(Ω)G(Ω),with sufficiently smooth function G(Ω). The first (ℓ = 1) term gives the Liouville equationwhich describes deterministic flow in the Ω space, driven by the flow field F(1). Includingthe second term (ℓ = 2) gives the Fokker-Planck equation which in addition to the flow alsodescribes diffusion of the macroscopic probability density Pt(Ω), generated by the diffusion

matrix

F(2)µν

. A sufficient condition for the observables Ω(σ) to evolve in time deterministically

in the limit N → ∞ can be derived by using the distributional interpretation of the Kramers-Moyal expansion and making the derivatives ∂/∂Ωµ apply to the functionG alone, via integration


by parts; since the latter are (uniformly) bounded by assumption, and |wj(σ)| < 1, the conditionto have the ℓ ≥ 2 terms in the expansion suppressed as N →∞ is

limN→∞

∑

ℓ≥2

1

ℓ!

n∑

µ1=1

...n∑

µℓ=1

N∑

j=1

wj〈|∆jµ1(σ)...∆jµℓ(σ)|〉Ω;t = 0 (7.52)

If for a given set of macroscopic observables this condition is satisfied (this will impose restrictionon the number n) we can for large N describe the evolution of the macroscopic probabilitydistribution by the Liouville equation

∂

∂tPt(Ω) = −

n∑

µ=1

∂

∂Ωµ[Pt(Ω)F (1)

µ (Ω)] (7.53)

the solution of which describes deterministic flow

Pt(Ω) =

∫

dΩ0 P0(Ω0)δ(Ω−Ω⋆(t;Ω0))

d

dtΩ⋆ = F (1)(Ω⋆(t;Ω0); t) Ω⋆(0;Ω0) = Ω0 (7.54)

Equations (7.54) will in general not be autonomous. A way to allow the elimination of theexplicit time dependence and thereby turn the state variable Ω into an autonomous level ofdescription is to choose (if possible) the macroscopic state variables Ω in such a way that thereis no explicit time dependence in the flow field F(1)(Ω; t), i.e. there must exist a vector fieldΦ(Ω) such that

limN→∞

N∑

j=1

wj∆j(σ) = Φ(Ω(σ)) (7.55)

If for example, one chooses the macroscopic state variable Ωµ to depend linearly on the micro-scopic state variables σ, i.e.

Ω(σ) =1

N

N∑

j=1

ωjσj (7.56)

then we obtain

limN→∞

N∑

j=1

wj∆j(σ) = limN→∞

1

N

N∑

j=1

ωj tanh(βhj(σ))−Ω (7.57)

Then, the only further condition to eliminate the explicit time dependence from F(1) is thatall local fields hk depend (to leading order in N) on the microscopic state σ only through themacroscopic state variables Ω. Since the local fields depend linearly on σ this in turn impliesthat the interaction matrix must be separable, i.e. Jij = Q(ξi, ξj) = N−1ξiAξj . The choiceAµ,ν = δµ,ν corresponds to the Hopfield model with simultaneous storage of several patterns,ξµ = (ξµ1 , ..., ξ

µN ), (µ = 1, ..., p). Taking then ωµj = ξµj , Ωµ becomes the overlap between the µ-th

stored pattern and the state of the system. The fixed point equations can be written then as

Ωµ = 〈ξµ tanh(β∑

ν

ξνΩν)〉ξµ (7.58)

where 〈f(ξ)〉ξ = N−1∑

i f(ξi). If it is not possible to find a set of macroscopic observables thatsatisfies (7.55) additional assumptions or restrictions are needed.


7.3.2 Parallel dynamics

We now turn to the case of parallel dynamics, i.e. to the discrete time stochastic microscopiclaws (7.20) and try to get discrete mappings (instead of differential equations) for the evolutionof macroscopic probability distribution.

Toy model

We first see what happens to our toy model. The evolution of the macroscopic probabilitydistribution is easily obtained as

Pt+1(m) =∑

σσ′

δ(m−m(σ))W (σ,σ′)pt(σ′)

=

∫

dm′ Wt(m,m′)Pt(m

′) (7.59)

with

Wt(m,m′) =

∑

σσ′ δ(m−m(σ))δ(m′ −m(σ′))W (σ,σ′)pt(σ′)

∑

σ′ δ(m′ −m(σ′))pt(σ′)(7.60)

We next insert

W (σ,σ′) =∏

i

eβhi(σ′)σi

2 cosh(βhi(σ′)), hi(σ) = Jηim(σ) (7.61)

and m(σ) = N−1∑

k ξkσk, and we try to describe the dynamics at the macroscopic level ofm(σ). Because the fields only depend on the microscopic state σ through m(σ), pt(σ) dropsout of Wt and this loses its explicit time dependence, i.e. Wt(m,m

′)→ W (m,m′):

W (m,m′) = e−∑

iln cosh(βJm′ηi)〈δ[m−m(σ)]eβJm

′∑

iηiσi〉σ (7.62)

with 〈...〉σ = 2−N∑

σ.... Inserting the integral representation for the δ-function

δ(m−m(σ)) =βN

2π

∫

dk eiβNkm−iβ∑

iξiσi (7.63)

allows to preform the spin average, giving

W (m,m′) =βN

2π

∫

dk eNψ(m,m′,k) (7.64)

withψ(m,m′, k) = iβkm+ 〈ln cosh(βJηm′ − iβkξ)〉η,ξ − 〈ln cosh(βJηm′)〉η (7.65)

where 〈f(η)〉η = N−1∑

i f(ηi). Since the kernel W (m,m′) is normalized by construction, i.e.∫

dmW (m,m′) = 1, we find that for N → ∞ the expectation value with respect to W (m,m′)of any sufficiently smooth function f(m) will be determined only by the value m⋆(m′) of m inthe relevant saddle point of ψ:

∫

dmf(m)W (m,m′) =

∫

dmdk f(m)eNψ(m,m′,k)

∫

dmdk eNψ(m,m′,k)→ f(m⋆(m′)) (N →∞) (7.66)

Variation of ψ with respect to k and m gives the saddle point equation

m = 〈ξ tanh(βJηm′ − βξk)〉η,ξ, k = 0 (7.67)


We can now conclude from (7.66) that

limN→∞

W (m,m′) = δ(m−m⋆(m′)) m⋆(m′) = 〈ξ tanh(βJηm′)〉η,ξ (7.68)

and the macroscopic equation becomes

Pt+1(m) =

∫

dm′ δ[m− 〈ξ tanh(βJηm′ − βξk)〉η,ξ]Pt(m′) (7.69)

which describes deterministic evolution. If at time 0 we know m exactly, this will remain so forfinite timescale, and m will evolve according to a discrete version of the sequential dynamicallaw (7.46)

mt+1 = 〈ξ tanh(βJηmt)〉η,ξ (7.70)

7.3.3 General formalism

For more general sets of intensive macroscopic variables Ω(σ) = (Ω1(σ), ...,Ωn(σ)) an equationfor

Pt(Ω) =∑

σ

pt(σ)δ(Ω−Ω(σ)) (7.71)

is obtained as

Pt+1(Ω) =

∫

dΩ′ Wt(Ω,Ω′)Pt(Ω

′) (7.72)

with

W (Ω,Ω′) =

∑

σσ′ δ(Ω−Ω(σ))δ(Ω′ −Ω(σ′))W (σ,σ′)pt(σ′)

∑

σ′ δ(Ω′ −Ω(σ′))pt(σ′)(7.73)

and

W (σ, σ′) =∏

i

eβσihi(σ′)

2 cosh(βσihi(σ′))(7.74)

If the local fields hi(σ) depend on the microscopic state σ only through the values of Ω(σ),again Wt(Ω,Ω

′)→ W (Ω,Ω′) and

W (Ω,Ω′) =

(

βN

2π

)n ∫

dK eNψ(Ω,Ω′,K) (7.75)

with

ψ(Ω,Ω′,K) = iβKΩ +1

N〈eβ[

∑

iσihi(Ω

′)−iNKΩ(σ)]〉σ −1

N

∑

i

ln cosh(βhi(Ω′)) (7.76)

Using the normalization relation∫

dΩ W (Ω,Ω′) = 1, we can write expectation values withrespect to W (Ω,Ω′) of macroscopic quantities f(Ω) as

∫

dΩ f(Ω)W (Ω,Ω′) =

∫

dΩdk f(Ω)eNψ(Ω,Ω′,k)

∫

dΩdk eNψ(Ω,Ω′,k)→ f(Ω⋆(Ω′)) (N →∞) (7.77)

For the saddle point method to apply in determining the leading order in N of the average (7.77)we encounter a restriction on the number n of our macroscopic variables, since n determines thedimension of the integration concerned. (We must require that Gaussian fluctuations around


the maximum of ψ are suppressed as N →∞, which is only guaranteed if limN→∞ n/√N = 0).

Variation of ψ with respect to K and Ω gives the saddle point equation

Ω =〈Ω(σ)eβ[

∑

iσihi(Ω

′)−iNKΩ(σ))]〉σ〈eβ[

∑

iσihi(Ω′)−iNKΩ(σ))]〉σ

; K = 0 (7.78)

We can now conclude that

limN→∞

W (Ω,Ω′) = δ(Ω−Ω⋆(Ω′)) Ω⋆(Ω′) =〈Ω(σ)eβ

∑

iσihi(Ω

′)〉σ〈eβ

∑

iσihi(Ω′)〉σ

(7.79)

and the macroscopic equation becomes

Pt+1(Ω) =

∫

dΩ′ δ

(

Ω− 〈Ω(σ)eβ∑

iσihi(Ω

′)〉σ〈eβ

∑

iσihi(Ω′)〉σ

)

Pt(Ω′) (7.80)

This equation again describes deterministic evolution

Ωt+1 =〈Ω(σ)eβ

∑

iσihi(Ω(t))〉σ

〈eβ∑

iσihi(Ω(t))〉σ

(7.81)

Exercise Consider a simple system, consisting of N = 2 neurons with no self-interactions andno external inputs, so that the fields acting on neuron 1 and 2 are, respectively

h1(σ) = J12σ2 h2(σ) = J21σ1 (7.82)

where the vector σ = (σ1, σ2) collects the four possible states of the two neurons. Write downthe transition probabilities W (σ,σ′) for both parallel and sequential dynamics and verify thatis satisfies the properties of a stochastic matrix.

7.4 Detailed balance

We have shown that the dynamics of a recurrent network of binary units takes the form of aMarkov chain, for both parallel and sequential updates

pt+1(σ) =∑

σ′

W (σ,σ′)pt(σ) (7.83)

From the theory of Markov chains, we know that if the transition matrix is regular, then theMarkov chain is ergodic and the probability distribution pt(σ) of a finite system is guaranteedto converge to a unique stationary distribution p∞(σ) for t → ∞. If, in addition, the systemsatisfies detailed balance, the resulting stationary state is called an equilibrium state and p∞(σ)has the form of a Boltzmann distribution.


7.4.1 Interaction symmetry

The unique stationary distribution p∞(σ) over the 2N possible microscopic states σ, which isreached in the limit N →∞, is determined by the condition

for all σ ∈ −1, 1N : p∞(σ) =∑

σ′

W (σ,σ′)p∞(σ′) (7.84)

While it is reassuring that p∞ exists, to calculate it we would have to solve this system of 2N

linear equation for the 2N values p∞(σ), subject to conditions p∞(σ) ≥ 1 and∑

σp∞(σ) = 1.

This seems rather a hopless task.

Detailed balance is a special feature which greatly simplifies the calculation of the stationarydistribution. For systems which satisfy detailed balance the stationary distribution is such that

W (σ,σ′)p∞(σ′) = W (σ′,σ)p∞(σ′) (7.85)

We note that any distribution which satisfies (7.85) is necessarily stationary; the converse is nottrue.

Next we show that for our neural network models, detailed balance holds if and only if theinteractions are symmetric, i.e. Jij = Jji for all neuron pairs (i, j) (for sequential dynamicsthere must also be no self-interactions). We consider the case of parallel and sequential dynamicsseparately.

Parallel dynamics

For parallel dynamics, the detailed balance condition becomes

eβ∑N

i=1σihi(σ

′)p∞(σ′)∏Ni=1 cosh(βhi(σ′))

=eβ∑N

i=1σ′

ihi(σ)p∞(σ)∏Ni=1 cosh(βhi(σ))

for all σ,σ′ (7.86)

with hi(σ) =∑

j Jijσj + θi The transition matrix indeed describes an ergodic system. Sincefrom any initial state one can reach any final state with nonzero probability, the stationaryprobability p∞(σ) must be nonzero. Without loss of generality we can therefore put

p∞(σ) = eβ[∑N

i=1θiσi+K(σ)]

N∏

i=1

cosh(βhi(σ)) (7.87)

Detailed balance then simplifies to

K(σ)−K(σ′) =∑

ij

σi(Jij − Jji)σ′j (7.88)

Exercise Show that if interactions are symmetric this conditions is satisfied by K(σ) = K (Ka constant, which is determined from normalization of p∞) and the equilibrium distribution hasthe Boltzmann form

p∞(σ) =1

ZeβH(σ) Z =

∑

σ

e−βH(σ) (7.89)


with the pseudo-Hamiltonian (the Hamiltonian still depends on the noise parameter β)

H(σ) = −∑

i

θiσi −1

βln cosh(βhi) (7.90)

Conversely, if detailed balance holds, i.e. there is a function K(σ) obeying (7.88), taking anaverage over the 2N states σ′ of (7.88) shows that K(σ) = 〈K(σ′)〉 ∀ σ, so K(σ) is againconstant, and this implies symmetry of interactions.

Sequential dynamics without self-interaction

Now consider the case of sequential dynamics without self-interactions, i.e. Jii = 0. Detailedbalance is now non-trivial only if σ′ = Fiσ, where (7.85) simplifies to

e−β(Fiσi)hi(Fiσ)p∞(Fiσ)

cosh(βhi(Fiσ))=e−βσihi(σ)p∞(σ)

cosh(βhi(σ))for all σ and all i (7.91)

with hi(σ) =∑

j 6=i Jijσj + θi. Note that hi(σ) = hi(Fiσ).Since the system is ergodic, all stationary probabilities p∞(σ) must be non-zero and we canwrite

p∞(σ) = eβ[∑N

k=1θkσk+ 1

2

∑

k 6=ℓσkJklσℓ+K(σ)]

(7.92)

Exercise Show that the detailed balance condition simplifies to

K(Fiσ)−K(σ) =∑

ℓ6=i

σi(Jℓi − Jiℓ)σℓσi (7.93)

and that if the interactions are symmtric, this is satisfied by K(σ) = K (K a constant, de-termined from normalization of p∞). The stationary probability distribution then takes theBoltzmann form

p∞(σ) =1

ZeβH(σ) Z =

∑

σ

e−βH(σ) (7.94)

with the Hamiltonian

H(σ) = −1

2

∑

k 6=ℓ

Jklσkσℓ −∑

k

θkσk (7.95)

Conversely, if detailed balance holds, i.e. there is a function K(σ) obeying (7.93),

K(FjFiσ)−K(Fjσ)− [K(Fiσ)−K(σ)] = −2σi(Jji − Jij)σj (7.96)

Since the LHS is symmetric under permutaiton of the pair (i, j) so must be the RHS, whichimplies symmetry of interactions, i.e. Jij = Jji. Taking an average over the 2N states σ′ of(7.88) shows that K(σ) = 〈K(σ′)〉 ∀ σ, so K(σ) is again constant, and this implies symmetryof interactions.

Chapter 8

Generating functional analyisis

8.1 General

The approach followed in the previous chapter to derive closed macroscopic laws from the micro-scopic equations fails when the number of attractors of the dynamics is no longer small comparedto the number N of microscopic neuronal variables. This is due to the presence of a number of”disorder” variables per degree of freedom which is proportional to N , over which we are forcedto average the macroscopic laws. This situation is reflected in the inability to find an exact setof closed equations for a finite number of observables. The only fully exact procedure availableat present is generating functional analysis, also called ”path integral formalism” or ”dynamicmean-field theory” and is based on a philosophy different from the one described so far. Ratherthan working with the probability pt(σ) of finding a microscopic state σ at time t in order tocalculate the statistics of a set of macroscopic ovservables Ω(σ) at time t, one here turns tothe probability Prob[σ(0), ...,σ(tm)] of finding a microscopic path σ(0)→ σ(t1)→ ...→ σ(tm).The idea is to concentrate on the moment generating function Z, which like Prob[σ(0), ...,σ(tm)]fully captures the statistics of paths. Correlations and response functions will be the languagein which the generating functional methods are formulated. We first illustrate how to calculatethe generating function for a system described by one degree of freedom and then we move toconsider systems whose state is described by a vector.

8.2 Generating function

Consider a one-dimensional random variable φ evolving according to the following Markov pro-cess:

φ(m+ 1) = f(φ(m)) + η(m) (8.1)

where η(m) is a random variable generated independently at each time step from a Gaussianprobability density p(η) with zero mean and variance 2T . The “temperature” T govern the sizeof the fluctuations. Instead of studying the evolution of a single instance of φ one can study theevoution of the probability distribution pm(φ). This evolution is described by

pm+1(φ) =

∫

dφ′Wm(φ|φ′)pm(φ′) (8.2)

117

118 CHAPTER 8. GENERATING FUNCTIONAL ANALYISIS

where the transition probability is given by

Wm(φ|φ′) =

∫

dη p(η)δ(φ− f(φ′)− η) (8.3)

We introduce the following representation of the Dirac delta-function

δ(φ) =

∫

dφ√2πeiφφ (8.4)

We integrate over the Gaussian noise η by using the identity∫

dx√2π

e−12ax2−bx = e−

b2

2a (8.5)

thus obtaining

Wm(φ|φ′) =

∫

dφ√2π

exp[

iφ(φ− f(φ′))− T φ2]

(8.6)

We study the probability that a certain path in time is taken by φ. The probability for aparticular path from step m = 0 to step m = M can be written as

p(φ(M), ..., φ(1), φ(0)) =M−1∏

m=0

Wm(φ(m+ 1)|φ(m))p0(φ(0)) (8.7)

We will denote averages over paths as 〈...〉. A convenient way to study the joint probabilitydistribution (8.7) is by its characteristic or moment generating function

ZM [ψ] = 〈exp

[

M∑

m=0

iψ(m)φ(m)

]

〉

=

∫ M∏

m=0

dφ(m) p(φ(M), ..., φ(0)) exp[M∑

m=0

iψ(m)φ(m)] (8.8)

This function, which is like the Fourier transform of the path probability distribution, can beused to generate all moments of the distribution. For example, the first and the second moments(“magnetization” and two-time correlation) are given by

M(m) ≡ 〈φ(m)〉 =∂ZM [ψ]

i∂ψ(m)|ψ=0

C(m,n) ≡ 〈φ(m)φ(n)〉 =∂2Z

i∂ψ(m)i∂ψ(n)|ψ=0 (8.9)

Upon introducing the notation Dφ =∏M−1m=0 [dφ/

√2π] for path integration, the full generating

function looks like

ZM [ψ] =

∫

dφ(M)DφDφ exp

[

S[φ, φ] +M∑

m=0

iψ(m)φ(m)

]

p0(φ(0)) (8.10)

where in the effective action

S[φ, φ] =M−1∑

m=0

iφ(m)[φ(m+ 1)− f(φ(m))]− 1

2

M−1∑

m=0

2T φ2(m) (8.11)

8.3. LANGEVIN DYNAMICS 119

only appears the equation of motion without noise and the effect of having a noise is expressedonly through the noise variance 2T . When we introduce a time dependent field in the system,i.e.

φ(m+ 1) = f(φ(m)) + θ(m) + η(m) (8.12)

we will have in S an extra term∑M−1m=0 iφ(m)θ(m). So if we derive the average of any quantity

with respect to θ(n) we will pull down a factor φ(n) from the exponential. The response ofM(m) to an infinitesimally small external field θ at step n is

G(m,n) ≡ ∂M(m)

∂θ(n)=

∂2Z

∂θ(n)i∂ψ(m)|ψ=0 = 〈iφ(n)φ(m)〉 (8.13)

Because of the normalization of the generating function Z[0] = 1, averages involving only φ arezero, for example

〈φ(m)〉 =∂Z[0]

i∂θ(m)= 0, 〈φ(m)φ(n)〉 =

∂2Z

i∂θ(m)i∂θ(n)= 0 (8.14)

These rather simple identities may be helpful in reducing the complexity of the problems lateron.

8.3 Langevin dynamics

We now consider the dynamics of one degree of freedom φ(t), which evolves according to theHamiltonian H[φ(t)], in presence of noise η(t). The Langevin equation is

φ =∂H[φ(t)]

∂φ(t)+ η(t) (8.15)

where η is a Gaussian white noise with properties:

〈η(t)〉 = 0, 〈η(t)η(t′)〉 = 2Tδ(t− t′). (8.16)

If we want to construct a generating function for this continuous time process, it is natural todiscretise time in steps of size ∆. The evolution equation gives

φ(t+ ∆) = φ(t) +

∫ t+∆

tdt′

∂H[φ(t′)]

∂φ(t′)+

∫ t+∆

tdt′ η(t′) (8.17)

At this point we have a choice in how to calculate the integral of the force f(φ(t′)) = ∂H[φ(t′)]∂φ(t′) .

The simplest way is to apply the Ito interpretation to the Langevin equation and introduce adiscrete time t = m∆ and discretize the equation in the following way

φ(m+ 1) = φ(m) + ∆f(φ(m)) + η(m) (8.18)

where the integrated noise η(m) =∫ (m+1)∆m∆ dt′ η(t′) is Gaussian with zero mean and variance

〈η(m)η(n)〉 = 2T∆δm,n (8.19)


Up to some factors ∆ this process is identical to the one in the previous section, so we caneasily write down the generating function corresponding to our new process (we only rescaleψ(m)→ ∆ψ(m)):

ZM [ψ] =

∫

dφ(M)DφDφ exp

[

∆S[φ, φ] + ∆M∑

m=0

iψ(m)φ(m)

]

p0(φ(0)) (8.20)

where

S[φ, φ] =M−1∑

m=0

iφ(m)[φ(m+ 1)− φ(m)

∆− f(φ(m))]− 1

2

M−1∑

m=0

2T φ2(m) (8.21)

At this point we can take the limit ∆ → 0 to recover a description of the original process.This replaces ∆

∑

m by∫

dt . The generating function becomes a generating functional on thefunction ψ(t), which can be written as

ZM [ψ] =

∫

DφDφ exp

[

S[φ, φ] +

∫ ∞

0dt′ iψ(t)φ(t)

]

p0(φ(0)) (8.22)

where the effective action is

S[φ, φ] =

∫ ∞

0dt′ iφ(t)[φ(t)− f(φ(t))]− 1

2

∫ ∞

0dt′ 2T φ2(t) (8.23)

and we have introduced φ(t) = lim∆→0(φ(t+ ∆)− φ(t)/∆). A rigorous definition of the contin-uous time limit of the integrals over Dφ and Dφ has been given by Wiener. For the time beingwe ignore this difficulty and treat the integrals over the paths like a product of many normalintegrals. An alternative derivation of (8.22) can be obtained by starting with the (discretisedversion of the) probability measure on a path η(t). We assume η is a Gaussian white noise, i.e.drawn from the distribution

p[η] ∝ exp

[

−1

2

∫

dt dt′ η(t)D−1(t− t′)η(t′)]

with 〈η(t)η(t′)〉 = 2Tδ(t− t′) = D(t− t′) and 〈η(t)〉 = 0. (8.24)

The distribution p[η] induces a distribution of x, P [x].In order to calculate the average over the noise of a generic observable A, function of the degreeof freedom x(t), one performs

〈A(x)〉 =

∫

Dηp(η)A(xη) =

∫

Dηp(η)∫

dx δ(x− xη)A(x)

=

∫

Dηp(η)∫

dx δ(E[x(t)]) | δEδx| A(x) (8.25)

where E[x(t)] = ∂tx+ ∂xH − η. In the last line we used the identity

δ[f(x)] =δ(x− x0)

| f ′(x) | with f(x0) = 0 (8.26)

On the other hand, 〈A(x)〉 =∫

dxP (x)A(x), so the noise-induced distribution of x is determinedas

P (x) =

∫

Dηp(η)δ(E[x(t)]) | δEδx| (8.27)

8.4. A SIMPLE EXAMPLE: THE HARMONIC OSCILLATOR 121

In order to calculate the determinant of the Jacobian

J(t, t′) =| δE(x(t))

δx(t′)|

of the transformation x→ E(x), we rewrite the Langevin equation in integral form

x(t+ τ) = x(t)−∫ t+τ

tdt′

∂H(x(t′))

∂x+

∫ t+τ

tdt′ η(t′) (8.28)

The integral of the force F (x(t′)) = ∂H(x(t′))∂x can be approximated in two ways as τ → 0:

∫ t+τ

tdt′ F (x(t′)) ≃

τF (x(t)) Ito prescriptionτ [F (x(t)) + F (x(t+ τ))]/2 Stratonovich prescription

(8.29)

We use Ito prescription and discretize the equation

xk = xk−1 − τF (xk−1) +Wk (8.30)

The element Jkl of the Jacobian can now be found by calculating the variation of

Ek = xk − xk−1 + τF (xk−1)−Wk

when xl is changed

Jkl =∂Ek∂xl

=

1 k = l1 + τF ′ l = k − 10 otherwise

→ J =

1 0 0 · · · 0

y 1 0. . .

...

0 y. . .

. . . 0...

. . .. . .

. . . 00 · · · 0 y 1

(8.31)

so the determinant of the Jacobian is |J | = 1, and going back to continuous time we have

P (x) =

∫

Dηp(η)δ(E[x(t)]) (8.32)

If we use Stratonovich definition J 6= 1 and the equation gains a term which is not presentin the Ito generating functional. As long as we have additive noise in the Langevin equation,all measurable quantities are unaffected by the choice of the interpretation of the Langevinequation. In the case of multiplicative noise, i.e. if we consider noise terms like g(φ(t))η(t) thechoice to evaluate g(φ(t)) before changing φ due to the impact of the noise term at time t or toapproximate the noise by a smoother function has an influence on the evolution.

8.4 A simple example: the harmonic oscillator

We consider the evolution of one degree of freedom system, whose dynamics, in absence of noise,is governed by the Hamiltonian

H =1

2mx2 (8.33)


In presence of noise the time evolution of the system is given by the Langevin equation

∂tx = −mx+ η(t) (8.34)

and we assume Gaussian white noise, i.e.

〈η(t)〉 = 0 〈η(t)η(t′)〉 = D(t− t′) (8.35)

The genereting functional is given by

Z =

∫

DxDx exp

−1

2

∫

dt dt′ x(t)D(t, t′)x(t′) + i

∫

dt [x(t)∂tx+ x(t)mx(t)]

(8.36)

Because ix∂tx = −ix∂tx, we can rewrite the first term in the square brackets as

x∂tx =1

2x∂tx−

1

2x∂tx

So if we defineφ(t) = (x(t), x(t)) (8.37)

the generating functional cvan be expressed as

Z =

∫

Dφ exp

−1

2

∫

dt dt′ φ(t)A(t, t′)φ(t′)

(8.38)

where

A(t, t′) =

(

D −i∂t − imi∂t − im 0

)

The elements of the inverse matrix

A−1 =

(

0 GRGA C

)

(8.39)

will give the propagators1 we are interested in (we first state the results and then we prove them):

〈x(t)x(t′)〉 = [A−1(t, t′)]22 (8.40)

〈x(t)x(t′)〉 = [A−1(t, t′)]11 = 0 (8.41)

〈x(t)x(t′)〉 = [A−1(t, t′)]21 = GA(t, t′) advanced propagator (8.42)

〈x(t)x(t′)〉 = [A−1(t, t′)]12 = GR(t, t′) retarded propagator (8.43)

with

GA(ω) =1

ω − im GR(ω) = − 1

ω + imC(ω) =

D(ω)

ω2 +m2(8.44)

We first prove the results stated in (8.40,8.41,8.42,8.43) and then we prove (8.44). For thecorrelator one has

〈xx〉 =

∫

DxDxxxe− 12φ(t)A(t,t′)φ(t′)

∫

DxDxe− 12φ(t)Aφ(t′)

= −2∂

∂A22

log

∫

DxDxe− 12φAφ

= −2∂

∂A22

log(detA)−12 =

∂

∂A22

log(detA) = (A−122 ) (8.45)

1We distinguish the adavanced and retarded propagators. The former is zero if t is to the past of t′, the latteris zero if t is to the future of t′.

8.4. A SIMPLE EXAMPLE: THE HARMONIC OSCILLATOR 123

and similarly one can derive (8.41,8.42, 8.43).

In order to obtain C,GA and GR, there are two ways.

First way

Go directly to Fourier space: −i∂t → ω.

A(ω) =

(

D(ω) ω − im−ω − im 0

)

→ A−1(ω) =1

ω2 +m2

(

0 −ω + imω + im D(ω)

)

=

(

0 1−ω−im

1ω−im

D(ω)ω2+m2

)

(8.46)

Second way

In matrix notation

I = AA−1 =

(

D −i∂t − imi∂t − im 0

)(

0 GRGA C

)

=

(

−(i∂t +m)GA DGR − (i∂t + im)C0 (i∂t − im)GR

)

=

(

δ(t− t′) 00 δ(t− t′)

)

(8.47)

Going to Fourier space and equating the elements of the matrices we have

−(i∂t +m)GA = δ(t− t′) → (ω − im)GA(ω) = 1 (8.48)

(i∂t − im)GR = δ(t− t′) → −(ω + im)GR(ω) = 1 (8.49)

the first two equations of (8.44) follow. The third one is easily verified by observing thatC(ω) = D(ω)/(ω2 +m2) solves

D(t, t′)GR − (i∂t + im)C(t, t′) = 0 → − D(ω)

ω + im+ (ω − im)

D(ω)

ω2 +m2= 0 (8.50)

Going back to time domain we have

GA(t− t′) =

∫

dω eiω(t−t′)GA(ω) =

∫

dω eiω(t−t′) 1

ω − im= em(t−t′)θ(t− t′) (8.51)

where in the last step we performed an integration by residue, noting that the integrand has asingularity at ω = im and closing the integration path in the positive semiplane.

For the correlation function we have

C(t− t′) =

∫

dω eiω(t−t′) D(ω)

ω2 +m2= i lim

ω→±im(ω ∓ im)eiω(t−t′) D

ω2 +m2

=

e−m(t−t′)D(im)2m t > t′

em(t−t′)D(−im)−2m t′ > t

(8.52)


Now we use D(t, t′) = 2Tδ(t− t′), so D(ω) = 2T , independent of ω. So for all t

C(t, t′) = e−m|t−t′| T

m(8.53)

We retrieve the fluctuation-dissipation theorem:

θ(t− t′)∂tC = −Te−m|t−t′|θ(t− t′) = −TGA (8.54)

8.5 Quenched disorder

So far we have looked only at one-dimensional stochastic processes. The application of the thoeryto multi-component processes is straightforward. We now consider a system of N microscopicparticles, which we label by Roman indices, i = 1, ..., N . The state of every microscopic elementevolves in time according to the following Langevin equation

φi(t) = f0(φi(t)) + fJi (φ(t)) + ηi(t) (8.55)

where the term fJi contains all the interactions between the different particles and particle i.The interactions are random and fixed during the life of the system, so that they are referred toas “quenched” disorder.The noise η does not depend on the quenched disorder and is again assumed Gaussian

〈ηi(t)ηj(t′)〉 = 2Tδ(t− t′)δij ≡ D0i (t− t′)δij (8.56)

We defineL0i (φi(t)) = ∂tφi(t)− f0(φi(t)); LJ = −fJi (φ(t)) (8.57)

The Langevin equation now readsL0i + LJi = ηi(t) (8.58)

and following the same steps as before, the generating functional is given by

Z[ψ] =

∫

∏

i

DφiDφi exp

∑

i

∫

dt′ iψi(t)φi(t)

+∑

i

∫

dt iφi(t)[L0i + LJi ]−

1

2

∫

dt dt′∑

i

φi(t)D0i (t− t′)φi(t′)

(8.59)

In dynamics Z[0] plays the same role as partition function in thermodynamics. This wouldsuggest to average over the quenched disorder logZ rather than Z itself. However, since Z[0] = 1,Z can be safely averaged over the quenched disorder and there is no need for replicas in dynamics.Time will play role of replicas, though. By averaging over the quenched disorder we decouplethe sites but couple different times.

Z[ψ] =

∫

∏

i

DφiDφi exp

∑

i

∫

dt′ iψi(t)φi(t) +∑

i

∫

dt iφi(t)L0i −

1

2

∑

i

φi(t)D0i (t− t′)φi(t′)

×exp[∑

i

∫

dt iφi(t)LJi ] (8.60)

8.5. QUENCHED DISORDER 125

Now the disorder is only contained in the exponent and can be integrated out. This average willrenormalize the coefficients of φ2 and φ giving rise to a new effective Langevin equation. Define

e∆[φ,φ] ≡ ei∑

i

∫

dt φi(t)LJi (8.61)

Once the average is done, it will be possible to isolate various terms in ∆:

∆[φ, φ] = −1

2

∫

dt dt′∑

i

φi(t)D1(t− t′)φi(t′) + i

∑

i

∫

dt φi(t)L1 + · · · (8.62)

L1 remormalizes the disorder independent part of the Langevin equation and D1 renormalizesthe noise correlator

Z[ψ] =∏

i

∫

DφiDφie−12

∫

dt dt′ φ(t)(D0+D1)φ(t′)+∫

dt iφ(t)(L0+L1)+∫

dt iψ(t)φ(t) (8.63)

So we have transformed our original problem into the problem of solving the dynamics of asystem evolving according to the effective Langevin equation

L0 + L1 = ξ(t) (8.64)

with 〈ξξ〉 = D0 +D1 (8.65)

In other words, the disorder is no longer present in our system, but the new noise ξ is no longerδ-correlated in time. The integration over the disorder has introduced a sort of memory in thedynamics of the system. This is a common phenomena in statistical physics: whenever startingfrom a Markovian stochastic process we integrate over some degree of freedom (disorder, fastvariables, momenta, etc.) we end up with a new effective equation which is no longer Markovianand where modes which were previously uncoupled are now coupled.

Appendix A

Kramers-Kronig relations

Let us study before the consequences of causality. Causality of R(t) (R(t) = 0 ∀ t < 0) allowsto define the Laplace transform

χ(z) =

∫ ∞

0dt eiztR(t) (A.1)

as an analytic function for any complex value of z = z1 + iz2, with z2 > 0. Function χ(ω)is compex (χ(ω) = χ⋆(−ω)) but causality enables us to obtain a relation between its real andimaginary parts. We do this by introducing the following trick. Integrate the function

f(z) =χ(z)

z − ω (A.2)

over a contour C so that the contour does not enclose any pole of f(z). Since χ(z) → 0 asz2 →∞ there will be no contribution from the semicircle at infinity. Thus

∫

χ(z)

z − ω =

∫ ω−r

−∞dω′ χ(ω)

ω′ − ω +

∫ ∞

ω+rdω′ χ(ω)

ω′ − ω

+ir

∫

dφeiφχ(ω + reiφ)

ω + reiφ − ω = 0 (A.3)

It is useful to introduce the Cauchy principal part

P

∫ ∞

−∞dω′ χ(ω)

ω′ − ω = limr→0

[∫ ω−r

−∞dω′ χ(ω)

ω′ − ω +

∫ ∞

ω+rdω′ χ(ω)

ω′ − ω

]

−i limr→0

∫ 0

πdφ χ(ω + reiφ) = iπχ(ω) (A.4)

or

χ(ω) =1

πiP

∫ ∞

−∞dω′ χ(ω′)

ω′ − ω (A.5)

This equation is a consequence of causality and allows us to relate the real part χ′(ω) and theimaginary part χ′′(ω) of the response matrix:

χ′(ω) =1

πiP

∫ ∞

−∞dω′ χ

′′(ω′)

ω′ − ω

χ′′(ω) =1

πiP

∫ ∞

−∞dω′ χ

′(ω′)

ω′ − ω (A.6)

127

128 APPENDIX A. KRAMERS-KRONIG RELATIONS

These relations are called the Kramers-Kronig relations and allow to calculate the real partof χ(ω) from the imaginary part and viceversa. As we have seen, the imaginary part can beobtained from experiments.

Dynamical Analysis of Complex Systems 7CCMCS04 · 2011-05-05 · contains many major concepts which...

Documents

Transcript of Dynamical Analysis of Complex Systems 7CCMCS04 · 2011-05-05 · contains many major concepts which...