· GRADIENT FLOW STRUCTURE FOR MCKEAN-VLASOV EQUATIONS ON DISCRETE SPACES MATTHIAS ERBAR, MAX...

GRADIENT FLOW STRUCTURE FOR MCKEAN-VLASOV

EQUATIONS ON DISCRETE SPACES

MATTHIAS ERBAR, MAX FATHI, VAIOS LASCHOS, AND ANDRE SCHLICHTING

Abstract. In this work, we show that a family of non-linear mean-field equations ondiscrete spaces, can be viewed as a gradient flow of a natural free energy functionalwith respect to a certain metric structure we make explicit. We also prove that thisgradient flow structure arises as the limit of the gradient flow structures of a naturalsequence of N -particle dynamics, as N goes to infinity.

1. Introduction

In this work, we are interested in the gradient flow structure of McKean-Vlasov equa-tions on a finite discrete spaces. They take the form

c(t) = c(t)Q(c(t)) (1.1)

where c(t) is a flow of probability measures on a fixed finite set X = {1, . . . , d}, andQxy(µ) is collection of parametrized transition rates, that is for each µ ∈ P(X ), Q(µ) isa Markov transition kernel.

Such non-linear equations arise naturally as the scaling limit for the evolution of theempirical measure of a system of N particles undergoing a linear Markov dynamics withmean field interaction. Here the interaction is of mean field type if the transition rateQNi;x,y for the i-th particle to jump from site x to y only depends on the empirical measureof the particles.

Mean-field systems are commonly used in physics and biology to model the evolutionof a system where the influence of all particles on a tagged particle is the average ofthe force exerted by each particle on the tagged particle. In the recent work [7], it wasshown that whenever Q satisfies suitable conditions a free energy of the form

F(µ) =∑x∈X

µx logµx +∑x∈X

µxKx(µ) (1.2)

for some appropriate potential K : P(X ) × X → R (see Definition 2.3) is a Lyapunovfunction for the evolution equation (1.1), i.e. it decreases along the flow.

In this work, we show that this monotonicity is actually a consequence of a morefundamental underlying structure. Namely, we exibit a novel geometric structure on thespace of probability measure P(X ) that allows to view the evolution equation (1.1) asthe gradient flow of the free energy F .

Date: January 29, 2016.2010 Mathematics Subject Classification. 34A34, 49J40, 49J45, 60J27.Key words and phrases. Gradient flow structure, weakly interacting particles systems, nonlinear

Markov chains, McKean-Vlasov dynamic, mean-field limit, evolutionary Gamma convergence, trans-portation metric.

1

2 MATTHIAS ERBAR, MAX FATHI, VAIOS LASCHOS, AND ANDRE SCHLICHTING

This gradient flow structure is a natural non-linear extension of the discrete gradientflow structures that were discovered in [30] and [33] in the context of linear equationsdescribing Markov chains.

Moreover, we shall show that our new gradient flow structure for the non-linear equa-tion arises as the limit of the gradient flow structures associated to a sequence of mean-field N particle Markov chains. As an application, we use the stability of gradient flowsto show convergence of these mean-field dynamics to solutions of the non-linear equation(1.1), see Theorem 3.12.

1.1. Gradient flows in spaces of probability measures. Classically, a gradient flowis an ordinary differential equation of the form

x(t) = −∇F(x(t)).

By now there exist a extensive theory, initiated by De Giorgi and his collaborators [15],giving meaning to the notion of gradient flow when the curve x takes values in a metricspace.

Examples of these generalized gradient flows are the famous results by Otto [27, 36]stating that many diffusive partial differential equations can be interpreted as gradientflows of appropriate energy functionals on the space of probability measures on Rd

equipped with the L2 Wasserstein distance. These include the Fokker–Planck and thepourous medium equations. An extensive treatment of these examples in the frameworkof De Giorgi was accomplished in [2].

Gradient flow structures allow to better understand the role of certain Lyapunovfunctionals as thermodynamic free energies. Recently, also unexpected connections ofgradient flows to large deviations have been unveiled [1] [16], [19], [23].

Since the heat equation is the PDE that governs the evolution of the Brownian mo-tion, a natural question was whether a similar structure can be uncovered for reversibleMarkov chains on discrete spaces. This question was answered positively in works ofMaas [30] and Mielke [33], which state that the evolution equations associated to re-versible Markov chains on finite spaces can be reformulated as gradient flows of theentropy (with respect to the invariant measure of the chain) for a certain metric struc-ture on the space of probability measures over the finite space. In [22], a gradient flowstructure for discrete porous medium equations was also uncovered, based on similarideas.

In Section 2, we shall highlight a gradient flow structure for (1.1), which is a naturalnon-linear generalization of the structure discovered in [30] and [33] for such non-linearMarkov processes. This structure explains why the non-linear entropies of [7] are Lya-punov functions for the non-linear ODE. Moreover, we shall show in Section 3 that thisstructure is compatible with those of [30] and [33], in the sense that it arises as a limitof gradient flow structures for N -particle systems as N goes to infinity.

1.2. Convergence of gradient flows. Gradient flows have proven to be particularlyuseful for the study of convergence of sequences of evolution equations to some limit sincethey provide a very rigid structure. Informally, the philosophy can be summarized asfollows: consider a sequence of gradient flows, each associated to some energy functionalFN and some metric structure. If the sequence FN converges in some sense to a limitF∞ and if the metric structures converge to some limiting metric, then one would expect

GRADIENT FLOW STRUCTURE FOR MCKEAN-VLASOV EQUATIONS ON DISCRETE SPACES 3

the sequence of gradient flows to converge to a limit that can be described as a gradientflow of F∞ for the asymptotic metric structure.

There are several ways of rigorously implementing out this philosophy to actuallyprove convergence in concrete situations. The one we shall be using in this work isdue to Sandier and Serfaty in [37], and was later generalized in [39]. Other methods,based on discretization schemes, have been developed in [3] and [13]. See also the recentsurvey [34] for an extension of the theory to generalized gradient systems. In the contextof diffusion equations, arguments similar to those of [39] have been used in [24] to studylarge deviations.

In the discrete setting, we can combine the framework of [30] and [33] with the methodof [39] to study scaling limits of Markov chains on discrete spaces. In this work, we shalluse this method to study scaling limits ofN -particle mean-field dynamics on finite spaces.While the convergence statement could be obtained through more classical techniques,such as those of [35, 40], our focus here is on justifying that the gradient flow structurewe present is the natural one, since it arises as the limit of the gradient flow structuresfor the N -particle systems.

While we were writing this article, we have been told by Maas and Mielke that theyhave also successfully used this technique to study the evolution of concentrations inchemical reactions. We also mention the work [26], which showed that the metric asso-ciated to the gradient flow structure for the simple random walk on the discrete torusZ/NZ converges to the Wasserstein structure on P(T), establishing compatibility ofthe discrete and continuous settings in a typical example. The technique can also beused to prove convergence of interacting particle systems on lattices, such as the simpleexclusion process (see [25]). The technique is not restricted to the evolution of proba-bility measures by Wasserstein-type gradient flows, but can be also applied for instanceto coagulation-fragmentation processes like the Becker-Doring equations, where one canproof macroscopic limits (see [38]).

1.3. Continuous mean-field particle dynamics. Let us briefly compare the scalinglimit for discrete mean-field dynamics considered in this paper with the more classi-cal analogous scaling limit for particles in the continuum decribed by McKean-Vlasovequations.N -particle mean-field dynamics describe the behavior of N particles given by some

spatial positions x1(t), .., xN (t), where each particle is allowed to interact through theempirical measure of all other particles.

In nice situations, when the number of particles goes to infinity, the empirical measureof the system 1

N

∑δxi(t) converges to some probability measure µ(t), whose evolution

is described by a McKean-Vlasov equation. In the continuous setting, with positions inRd, such an equation can for example be a PDE of the form

∂tµ(t) = ∆µ(t) + div(µ(t)(∇W ∗ µ(t)))

where ∇W ∗ µ is the convolution of µ with an interaction that derives from a potentialW . The according free energy in this case is given by

F(µ) =

{∫ dµdx (x) log dµ

dx (x) dx+ 12

∫ ∫W (x− y)µ(dx)µ(dy), µ� L,

∞, otherwise,


i.e. formally Kx(µ) = 12(W ∗ µ)(x) in (1.2). More general PDEs, involving diffusion

coefficients and confinement potentials, are also possible. We refer to [17, 40] for moreinformation on convergence of N -particle dynamics to McKean-Vlasov equations. Wealso refer to [14, 12] for the large deviations behavior. An important consequence of thisconvergence is that, for initial conditions for which the particles are exchangeable, thereis propagation of chaos: the law of two different tagged particles become independent asthe number of particles goes to infinity [40, Proposition 2.2].

It has been first noted in [9] that McKean-Vlasov equations on Rd can be viewed asgradient flows of the free energy in the space of probability measures endowed with theWasserstein metric. This fact has been useful in the study of the long-time behavior ofthese equations (cf. [9, 10, 11, 31] among others). The study of long-time behavior ofparticle systems on finite spaces has attracted recent interest (see for example [29] forthe mean-field Ising model), and we can hope that curvature estimates for such systemsmay be useful to tackle this problem, as they have been in the continuous setting. Sincelower bounds on curvature are stable, the study of curvature bounds for the mean fieldlimit (which is defined as convexity of the free energy in the metric structure for whichthe dynamic is a gradient flow, see for example [21]) can shed light on this problem. Weleave this issue for future work. We must also mention that Wasserstein distances havealso been used to quantify the convergence of mean-field systems to their scaling limit,see for example [5].

2. Gradient flow structure of mean-field systems on discrete spaces

2.1. Gradient flows in a metric setting. Let briefly recall the basic notions concern-ing gradient flows in metric spaces. For an extensive treatment we refer to [2].

Let (M,d) be a complete metric space. A curve (a, b) 3 t 7→ u(t) ∈ M is said to belocally p-absolutely continuous if there exists m ∈ Lploc((a, b)) such that

∀a ≤ s < t ≤ b : d(u(s), u(t)) ≤∫ t

sm(r) dr. (2.1)

We write for short u ∈ ACploc((a, b), (M,d). For any such curve the metric derivative is

defined by

|u′(t)| = lims→t

d(u(s), u(t))

|s− t|.

The limit exists for a.e. t ∈ (a, b) and is the smallest m in (2.1), see [2, Thm. 1.1.2].Now, let Φ : M → R be lower semicontinuous function. The metric analogue of the

modulus of the gradient of Φ is given by the following definition.

Definition 2.1 (Strong upper gradient). A function G : M → [0,∞], is a strong uppergradient for Φ if for every absolutely continuous curve u : (a, b)→M, the function G(u)is Borel and

|Φ(u(s))− Φ(u(t))| ≤∫ t

sG(u(r))|u′|(r)dr, ∀a < s ≤ t < b.


By Young’s inequality, we see that the last inequality implies that

Φ(u(s))− Φ(u(t)) ≤ 1

2

∫ t

s|u′|2(r)dr +

1

2

∫ t

sG(u(r))dr ,

for any absolutely continuous curve u provided the G is a strong upper gradient.The following definition formalizes what it means for a curve to be a gradient flow

of the function Φ in the metric space (M,d). Shortly, it is a curve that saturates theprevious inequality.

Definition 2.2 (Curve of maximal slope). A locally absolutely continuous curve u :(a, b) → M is called a curve of maximal slope for Φ with respect to its strong uppergradient G if for all a ≤ s ≤ t ≤ b we have the energy identity

Φ(u(s))− Φ(u(t)) =1

2

∫ t

s|u′|2(r)dr +

1

2

∫ t

sG(u(r))dr . (2.2)

When Φ is bounded below one has a convenient estimate on the modulus of continuityof a curve of maximal slope u. By Holder’s inequality and (2.2) we infer that for alls < t we have

d(u(s), u(t)) ≤∫ t

s|u′|(r)dr ≤

√t− s

√2

(∫ t

s|u′|2(r)dr

) 12

≤√t− s

√2(Φ(u(0)

)− Φmin

). (2.3)

2.2. Discrete setting. Let us now introduce the setting for the discrete McKean–Vlasov equations that we consider.

In the sequel, we will denote with P(X ), the space of probability measure on X , andwith P∗(X ) the set of all measures that are strictly positive, i.e.

µ ∈ P∗(X ) iff ∀x ∈ X : µx > 0,

and finally with Pa(X ), the set of all measures that have everywhere mass bigger than a,i.e.

µ ∈ Pa(X ) iff ∀x ∈ X : µx ≥ a.As in [6, 7], we shall consider equations of the form (1.1) where Q is Gibbs with somepotential function K. Here is the definition of such transition rates, taken from [7]:

Definition 2.3. Let K : P(X )× X → R be such that for each x ∈ X ,Kx : P(X )→ Ris a twice continuously differentiable function on P(X ). A family of matrices {Q(µ) ∈RX×X }µ∈P(X ) is Gibbs with potential function K, if for each µ ∈ P(X ) : Q(µ) is the ratematrix of an irreducible, reversible ergodic Markov chain with respect to the probabilitymeasure

πx(µ) =1

Z(µ)exp(−Hx(µ)), with Hx(µ) =

∂

∂µxU(µ), and U(µ) =

∑x∈X

µxKx(µ).

(2.4)In particular Q(µ) satisfies the detailed balance condition wrt. π(µ), that is for all x, y ∈X

πx(µ)Qxy(µ) = πy(µ)Qyx(µ) (2.5)


holds. Moreover, we assume that for each x, y ∈ X the map µ 7→ Qxy(µ) is Lipschitzcontinuous over P(X ).

In the above definition we say that a function Kx(·) : P(X ) → R, is (twice) contin-uously differentiable iff it can be extended in an open neighborhood of P(X ) ⊂ Rd inwhich it is (twice) continuously differentiable in the usual sense.

Remark 2.4. There are many ways of building a Markov kernel that is reversible withrespect to a given probability measure. The most widely used method is the Metropolisalgorithm, first introduced in [32]:

QMHxy (µ) = min

(πy(µ)

πx(µ), 1

)= e−(Hy(µ)−Hx(µ))+ , with (a)+ := max {0, a} . (2.6)

By this choice of the rates it is only necessary to calculate H(µ) and not the partitionsum Z(µ) in (2.4), which often is a costly computational problem.

A general scheme for obtaining rates satisfying the detailed balance condition wrt. πin (2.4) is to consider

Qxy(µ) =

√πy(µ)√πx(µ)

Axy(µ), (2.7)

where {A(µ)}µ∈P(X ) is a family of irreducible symmetric matrices. If we choose Axy(µ) =

αx,y min

(√πy(µ)√πx(µ)

,

√πx(µ)√πy(µ)

)with α ∈ {0, 1}X×X an irreducible symmetric adjacency ma-

trix, we recover the Metropolis algorithm on the corresponding graph.

We will be interested in the non-linear evolution equation

cx(t) =∑y∈X

cy(t)Qxy(c(t)) . (2.8)

By the Lipschitz assumption on Q this equation has a unique solution.One goal will be to express this evolution as the gradient flow of the associated free

energy functional F : P(X )→ R defined by

F(µ) =∑x∈X

µx logµx + U(µ), with U(µ) =∑x∈X

µxKx(µ) . (2.9)

To this end, it will be convenient to introduce the so-called Onsager operator K(µ) :RX → RX . It is defined as follows:

Let Λ : R+ ×R+ → R+, denote the logarithmic mean given by

Λ(s, t) =

∫ 1

0sαt1−αdα .

Λ is continuous, increasing in both variables, jointly concave and 1-homogeneous. Seefor example [30, 21] for more about properties of this logarithmic mean. In the sequelwe are going to use the following notation

wxy(µ) := Λ(µxQxy(µ), µyQyx(µ)) (2.10)

since this term will appear very often. By the definition of Λ and the properties of Q weget that wxy is uniformly bounded on P(X ), by a constant Cw.


Now, we can define

K(µ) :=1

2

∑x,y

wxy(µ) (ex − ey)⊗ (ex − ey) , (2.11)

where {ex}x∈X is identified with the standard basis of RX . More explicitly, we have for

ψ ∈ RX : (K(µ)ψ

)x

=∑y

wxy(µ)(ψx − ψy

).

With this in mind, we can formally rewrite the evolution (2.8) in gradient flow form:

c(t) = −K(c(t))DF(c(t)), (2.12)

where DF(µ) ∈ RX is the differential of F given by DF(µ)x = ∂µxF(µ).Finally, let us introduce the Fisher information I : P(X ) → [0,∞] defined for µ ∈

P∗(X ) by

I(µ) :=∑

(x,y)∈Eµ

wxy(µ) (log(µxQxy(µ))− log(µyQyx(µ)))2 (2.13)

where, for µ ∈ P(X ), we define the edges of possible jumps by

Eµ := {(x, y) ∈ X × X : Qxy(µ) > 0} . (2.14)

For µ ∈ P(X ) \ P∗(X ) we set I(µ) = +∞.I gives the dissipation of F along the evolution, namely, if c is a solution to (2.8) then

d

dtF(c(t)) = −I(c(t)) .

2.3. Continuity equation and action. In the sequel we shall use the notation for thediscrete gradient. Given a function ψ ∈ RX we define ∇ψ ∈ RX×X via ∇xyψ := ψy−ψxfor x, y ∈ X . We shall also use a notion of discrete divergence, given v ∈ RX×X , wedefine δv ∈ RX via (δv)x = 1

2

∑y(vxy − vyx).

Definition 2.5 (Continuity equation). Let T > 0 and µ, ν ∈ P(X ). A pair (c, v) is

called a solution to the continuity equation, for short (c, v) ∈ ~CET (µ, ν), if

(i) c ∈ C0([0, T ],P(X ));(ii) c(0) = µ; c(T ) = ν;

(iii) v : [0, T ]→ RX×X is measurable and integrable;(iv) The pair (c, v) satisfies the continuity equation for t ∈ (0, T ) in the weak form,

i.e. for all ϕ ∈ C1c ((0, T ),R) and all x ∈ X holds∫ T

0

[ϕ(t) cx(t)− ϕ(t) (δv)x(t)

]dt = 0. (2.15)

In a similar way, we shall write (c, ψ) ∈ CET (µ, ν) if (µ,w(µ)∇ψ) ∈ ~CET (µ, ν) forψ : [0, T ] → RX and (w∇ψ)xy(t) := wxy(µ(t))∇xyψ(t) defined pointwise. In the case

T = 1 we will often neglect the time index in the notation setting ~CE(µ, ν) := ~CE1(µ, ν).Also, the endpoints (µ, ν) will often be suppressed in the notation.


To define the action of a curve it will be convenient to introduce the function α :R×R+ → R+ defined by

α(v, w) :=

v2

w , w > 0

0 , v = 0 = w

+∞ , else

. (2.16)

Note that α is convex and lower semicontinuous.

Definition 2.6 (Curves of finite action). Given µ ∈ P(X ), v ∈ RX×X and ψ ∈ RX , wedefine the action of (µ, v) and (µ, ψ) via

~A(µ, v) :=1

2

∑x,y

α (vxy, wxy(µ)) , (2.17)

A(µ, ψ) := ~A(µ,w(µ)ψ) =∑x,y

(ψy − ψx)2wxy(µ) . (2.18)

Moreover, a solution to the continuity equation (µ, v) ∈ CET is called a curve of finiteaction if ∫ T

0

~A(c(t), v(t)) dt <∞ . (2.19)

It will be convenient to note that for a given solution (c, v) to the continuity equationwe can find a vector field v = w(c)∇ψ of gradient form such that (c, v) still solves thecontinuity equation and has lower action.

Proposition 2.7 (Gradient fields). Let (c, v) ∈ ~CET (µ, ν) be a curve of finite action,then there exists ψ : [0, T ]→ RX measurable such that (c, ψ) ∈ CET (µ, ν) and∫ T

0A(c(t), ψ(t)) dt ≤

∫ T

0

~A(c(t), v(t)) dt. (2.20)

Proof. Given c ∈ P(X ) we will endow RX×X with the weighted inner product

〈Ψ,Φ〉µ :=1

2

∑x,y

ΨxyΦxywxy(µ) ,

such that ~A(µ, v) = |Ψ|2µ if vxy = wxyΨxy. Denote by Ran(∇) := {∇ψ : ψ ∈ RX } ⊂RX×X the space of gradient fields. Moreover, denote by

Ker(∇∗µ) :=

{Ψ ∈ RX×X :

∑x,y

(Ψyx −Ψxy)wxy(µ) = 0

}the space of divergence free vector fields. Note that we have the orthogonal decomposi-tion

RX×X = Ker(∇∗µ)⊕⊥ Ran(∇) .

Now, given (c, v) ∈ ~CET (µ, ν), we have ~A(c(t), v(t)) < ∞ for a.e. t ∈ [0, T ]. Thus,from (2.16) we see that for a.e. t and all x, y we have that vxy(t) = 0 whenever wxy(c(t)) =


0. Hence, we can define

Ψxy(t) :=vxy(t)

wxy(c(t))for a.e. t ∈ [0, T ].

Then ψ : [0, T ] → RX can be given by setting ∇ψ(t) to be the orthogonal projectionof Ψ(t) onto Ran(∇) w.r.t. 〈·, ·〉c(t). The orthogonal decomposition above then implies

immediately that (c, w∇ψ) ∈ ~CET (µ, ν) and that |∇ψ(t)|2c(t) ≤ |Ψ(t)|2c(t) = ~A(c(t), v(t))

for a.e. t ∈ [0, T ]. This yields (2.20). �

2.4. Metric. We shall now introduce a new transportation distance on the space P(X ),which will provide the underlying geometry for the gradient flow interpretation of themean field evolution equation (1.1).

Definition 2.8 (Transportation distance). Given µ, ν ∈ P(X ), we define

W2(µ, ν) := inf

{∫ 1

0A(c(t), ψ(t)) dt : (c, ψ) ∈ CE1(µ, ν)

}. (2.21)

Remark 2.9. As a consequence of Proposition 2.7 and the fact that for any µ ∈ P(X )

and ψ ∈ RX give rise to v ∈ RX×X via vxy = wxy(µ)∇xyψ such that A(µ, ψ) = ~A(µ, v)we obtain an equivalent reformulation of the function W:

W2(µ, ν) = inf

{∫ 1

0

~A(c(t), v(t)) dt : (c, v) ∈ ~CE1(µ, ν)

}. (2.22)

It turns out that W is indeed a distance.

Proposition 2.10. The function W defined in (2.8) is a metric and the metric space(P(X ),W) is seperable and complete. Moreover, any two points µ, ν ∈ P(X ) can bejoined by a constant speed W-geodesic, i.e. there exists a curve (γt)t∈[0,1] with γ0 = µand γ1 = ν satisfying W(γs, γt) = |t− s|W(µ, ν) for all s, t ∈ [0, 1].

We defer the proof of this statement until Section 4. Let us give a characterization ofabsolutely continuous curves w.r.t. W.

Proposition 2.11. A curve c : [0, T ] → P(X ) is absolutely continuous wrt. to W iff

there exists ψ : [0, T ]× X → R such that (c, ψ) ∈ CET , and∫ T

0

√A(c(t), ψ(t)) dt <∞.

Moreover, we can choose ψ such that the metric derivative of c is given as |c′(t)| =√A(c(t), ψ(t)) for a.e. t.

Proof. The proof is identical to the one of [18, Thm. 5.17]. �

2.5. Gradient flows. In this section, we shall present the interpretation of the discreteMcKean-Vlasov equation as a gradient flow with respect to the distanceW. We will usethe abstract framework introduced in Section 2.1 above, where (M,d) = (P(X ),W) andΦ = F .

Lemma 2.12. Let I : P(X ) → [0,∞] defined in (2.13) denote the Fisher information

and let F : P(X ) → R defined in (2.9) denote the free energy. Then,√I is a strong

upper gradient for F on (P(X ),W), i.e. for (c, ψ) ∈ CET and 0 ≤ t1 < t2 ≤ T holds

|F(c(t2)−F(c(t1))| ≤∫ t2

t1

√I(c(t))

√A(c(t), ψ(t)) dt. (2.23)


Proof. Let c : (a, b) → (P(X ),W) be a W−absolutely continuous curve with ψ the as-

sociated gradient potential such that (c, ψ) ∈ CE(c(a), c(b)) and |c′|(t) =√A(c(t), ψ(t))

for a.e. t ∈ (a, b). We can assume wlog. that∫ t2t1

√A(c(t), ψ(t))

√I(c(t))dt is finite. For

the proof, we are going to define the auxiliary functions

Fδ(µ) =∑x∈X

(µx + δ) log(µx + δ) + U(µ).

The function Fδ(µ) are Lipschitz continuous and converge uniformly to F , as δ → 0. ByLemma 4.2, c is also absolutely continuous with respect to Euclidean distance. There-with, since Fδ are Lipschitz continuous, we have that Fδ(c) : (a, b) → R is absolutelycontinuous and hence

Fδ(c(t2))−Fδ(c(t1)) =

∫ t2

t1

d

dtFδ(c)(t)dt =

∫ t2

t1

DFδ(c(t))c(t) dt,

where DFδ(c(t)) is well defined for a.e. t and given in terms of

∂exFδ(c(t)) = Hx(c(t)) + log(cx(t) + δ) + 1.

Now, we have by using the Cauchy-Schwarz inequality∫ t2

t1

|DFδ(c(t))c(t)| dt ≤∫ t2

t1

∣∣∣∣∣∣12∑x,y∈X

(ψx − ψy)(∂exFδ(c(t))− ∂eyFδ(c(t)))wxy(c(t))

∣∣∣∣∣∣ dt≤∫ t2

t1

√1

2

∑x,y∈X

(∇xyψ(t))2wxy(c(t))

√1

2

∑x,y∈X

(∂exFδ(c(t))− ∂eyFδ(c(t)))2wxy(c(t)) dt

=

∫ t2

t1

√A(c(t), ψ(t))×√

1

2

∑x,y∈X

(log(cx(t) + δ) +Hx(c(t))− log(cy(t) + δ)−Hy(c(t)))2wxy(c(t)) dt.

≤∫ t2

t1

√A(c(t), ψ(t))

√2I(c(t)) + C2

H

∑x,y∈X

wxy(c(t)) dt

For the last inequality, we observe that since Hx for x ∈ X are uniformly bounded and fora < b, δ > 0 it holds b

a ≥b+δa+δ , it is easy to see that the quantity | log(cx(t)+δ)+Hx(c(t))−

log(cy(t)+δ)−Hy(c(t))| is bounded by | log(cx(t))+Hx(c(t))−log(cy(t))−Hy(c(t))|+CHwith CH only depending on H. Moreover, we observe that by definitions of Hx and πfrom (2.4), it holds

|log(µx) +Hx(µ)− log(µy)−Hy(µ)| =∣∣∣∣log

µxπx(µ)

− logµy

πy(µ)

∣∣∣∣= |log (µxQxy(µ))− log (πx(µ)Qxy(µ))− log (πy(µ)Qyx(µ))− log (µyQyx(µ))| .

Then, by the detailed balance condition (2.5) the two middle terms cancel and we arrive

at I(µ). Since, we assumed∫ t2t1

√A(c(t), ψ(t))

√I(c(t)) dt to be finite, we can apply the

dominated convergence theorem and get the conclusion. �


Proposition 2.13. For any absolutely continuous curve (c(t))t∈[0,T ] in P(X ) holds

J (c) := F(c(T ))−F(c(0)) +1

2

∫ T

0I(c(t)) dt+

1

2

∫ T

0A(c(t), ψ(t)) dt ≥ 0 (2.24)

Moreover, equality is attained if and only if (c(t))t∈[0,T ] is a solution to (1.1). In thiscase c(t) ∈ P∗(X ) for all t > 0.

In other words, solution to (1.1) are the only gradient flow curves (i.e. curves ofmaximal slope) of F .

Proof. The first statement follows as above by Young’s inequality from the fact that Iis strong upper gradient for F .

Now let us assume that for a curve c, J (c) ≤ 0 holds. Then since (2.23) holds forevery curve we can deduce that we actually have

F(c(t2))−F(c(t1)) +1

2

∫ t2

t1

I(c(t))dt+1

2

∫ t2

t1

A(c(t), ψ(t))dt = 0, 0 ≤ t1 ≤ t2 ≤ T.

Since∫ T

0 I(c(t))dt < ∞, we can find a sequence εn, converging to zero, such thatI(c(εn)) < ∞. By continuity of c, we can find a, ε > 0, such that c(t) ∈ Pa(X ), fort ∈ [εn, εn + ε]. Now, since I is Lipschitz in Pa(X ), we can apply the chain rule forεn ≤ t1 ≤ t2 ≤ Tn and get

F(c(t1))−F(c(t2)) =

∫ t2

t1

〈DF(c(t)),K(c(t))∇ψ(t)〉

=1

2

∫ t2

t1

A(c(t), ψ(t))dt+1

2

∫ t2

t1

I(c(t))dt,

by comparison we get

〈DF(c(t)),K(c(t))∇ψ(t)〉 =√A(c(t), ψ(t)) I(c(t)) = A(c(t), ψ(t)) = I(c(t)),

for t ∈ [εn, εn + ε]. From which, by an application of the inverse of the Cauchy-Schwarzinequality, we get that ψx(t)− ψy(t) = ∂exF(c(t))− ∂eyF(c(t)). Now we have

c(t) = −K(c(t))DF(c(t))

= −1

2

∑x,y

wxy(c(t)) (ex − ey)(∂exF(c(t))− ∂eyF(c(t))

)= −1

2

∑x,y

(Qxy(c(t))cx(t)−Qyx(µ)cy(t)) (ex − ey)

= −∑x

(∑y

(Qxy(c(t))cx(t)−Qyx(µ)cy(t))

)ex = c(t)Q(c(t))

(2.25)

on the interval [εn, εn + ε]. We actually have that c(t) is a solution to c(t) = c(t)Q(c(t))on [εn, T ]. Indeed, let Tn = sup{t′ ≤ T : c(t) = c(t)Q(c(t)),∀t ∈ [εn, t

′]}. We havec(Tn) ∈ Pb(X ), for some b > 0, because c is a solution to c(t) = c(t)Q(c(t)), on [εn, Tn)and the dynamics are irreducible. Now if we apply the same argument for Tn, thatwe used for εn, we can extent the solution beyond Tn. If Tn < T, then we will get a


contradiction, Therefore Tn = T. Now by sending εn to zero we get that c is a solutionto c(t) = c(t)Q(c(t)), on [0, T ].

Now on the other hand if c is a solution to c(t) = c(t)Q(c(t)) on [0, T ], we can getthat for every ε > 0, it exists a > 0, such that c(t) ∈ Pa(X ) on [ε, T ]. The choiceψ(t) = DF (t), satisfies the continuity equation (see (2.25))), and by applying the chainrule, we get that

F(c(T ))−F(c(ε)) +1

2

∫ T

εI(c(t))dt+

1

2

∫ T

εA(c(t), ψ(t))dt = 0.

Sending ε to zero concludes the proof. �

Remark 2.14. Note that the formulation above contains the usual entropy entropy-production relation for gradient flows. If c is a solution to (1.1), then ψ(t) = −DF(c(t))and especially it holds that A (c(t),−DF(c(t))) = I(c(t)) and hence (2.24) becomes

F(c(t)) +

∫ T

0I(c(t)) dt = F(µ). (2.26)

2.6. Lifted dynamics on the space of random measures. It is possible to lift theevolution c(t) = c(t)Q(c(t)) in P(X ) to an evolution for measures C on P(X ). This isconvenient, if one does not want to start from a deterministic point but consider randominitial data. The evolution is then formally given by

∂tC(t, ν) + divP(X ) (C(t, ν) (νQ(ν))) = 0, with divP(X ) =∑x∈X

∂ex . (2.27)

Notation 2.15. In the following, all quantities connected to the space P(P(X )) willbe denoted by blackboard-bold letters, like for instance random probability measures M ∈P(P(X )) or functionals F : P(P(X ))→ R.

The evolution (2.27) also has a natural gradient flow structure that is obtained bylifting the gradient flow structure of the underlying dynamics. In fact, (2.27) will turnout to be a gradient flow w.r.t. to the classical L2-Wasserstein distance on P(P(X )),which is build from the distance W on the base space P(X ). To establish this gradientstructure, we need to introduce lifted analogues of the continuity equation and the actionof a curve as well as a probabilistic representation result for the continuity equation.

Definition 2.16 (Lifted continuity equation). A pair (C,V) is called a solution to the

lifted continuity equation, for short (C,V) ∈ ~CET (M,N), if

(i) [0, T ] 3 t 7→ C(t) ∈ P(P(X )) is weakly∗ continuous,(ii) C(0) = M; C(T ) = N;

(iii) V : [0, T ]× P(X )→ RX×X is measurable and integrable w.r.t. C(t, dµ)dt,(iv) The pair (C,V) satisfies the continuity equation for t ∈ (0, T ) in the weak form,

i.e. for all ϕ ∈ C1c ((0, T )× P(X )) holds∫ T

0

∫P(X )

(ϕ(t, ν)− 〈∇ϕ(t, ν), δV(t, ν)〉)C(t, dν) dt = 0, (2.28)

where δV : P(X )→ RX×X , is given by δV(ν)x :=∑

y(Vxy(ν)− Vyx(ν)).


Here we consider P(X ) as a subset of Euclidean space RX with 〈·, ·〉 the usual innerproduct. In particular, ∇ϕ(t, µ) = (∂µxϕ(t, µ))x∈X denotes the usual gradient on RX

and we have explicitly

〈∇ϕ(t, ν), δV(t, ν)〉 =∑x

∂µxϕ(t, µ)(δV(t, µ))x =1

2

∑xy

∂µxϕ(t, µ)(Vxy(t, µ)− Vyx(t, µ)

).

Thus, (2.28) is simply the weak formulation of the classical continuity equation in RX .In a similar way, we shall write (C,) ∈ CET (M,N) if : [0, T ] × P(X ) → RX is a

function such that (C, V) ∈ ~CET (M,N) with Vxy(t, µ) = wxy(µ)∇xy(t, µ). In this casewe have that δV(µ) = K(µ)(µ), where K(µ) is the Onsager operator defined in (2.11).Solutions to (2.27) are understood as weak solutions like in Definition (2.16). That is Cis a weak solution to (2.27) if (C,∗) ∈ CET with ∗(ν) := −DF(ν). This leads, via theformal calculation

δV∗(ν) := K(ν)∗(ν) = −K(ν)DF(ν) = νQ(ν),

to the formulation: For all ϕ ∈ C1c ([0, T ]× P(X )) we have∫ T

0

∫P(X )

(ϕ(t, ν)− 〈∇ϕ(t, ν), νQ(ν)〉)C(t, dν) dt = 0 . (2.29)

By the Lipschitz assumption on Q the vectorfield νQ(ν) given by the components(νQ(ν))x =

∑y νyQxy(ν) is also Lipschitz. Then standard theory implies that equa-

tion (2.29) has a unique solution (cp. [2, Chapter 8]).

Definition 2.17 (Lifted action). Given M ∈ P(P(X )), V : P(X )→ RX and : P(X )→RX , we define the action of (M,V) and (M,) by

~A(M,V) :=

∫P(X )

~A(ν,V(ν)) M(dν) , (2.30)

A(M,) :=

∫P(X )

A(ν,(ν)) M(dν) . (2.31)

The next result tells us that is is sufficient to consider only gradient vector fields. Itis the analog of Proposition 2.7.

Proposition 2.18 (Gradient fields for Liouville equation). If (C,V) ∈ ~CET is a curve offinite action, then there exists : [0, T ]×P(X )→ RX measurable such that (C,) ∈ CETand ∫ T

0A(C(t),(t))dt ≤

∫ T

0

~A(C(t),V(t))dt. (2.32)

Proof. Given a solution (C,V) ∈ ~CET , for each t and ν ∈ P(X ) we apply the contruction

in the proof of Proposition 2.7 to V(ν) to obtain (t, ν) with A(ν,(t, ν)) ≤ ~A(ν,V(t, ν)).It is readily checked that (C,) ∈ CET . Integration against C and dt yields (2.32). �

Definition 2.19 (Lifted distance). Given M,N ∈ P(P(X )) we define

W2(M,N) := inf

{∫ 1

0A(C(t),(t)) dt : (C,) ∈ CET (M,N)

}. (2.33)


Analogously to Remark 2.9 we obtain an equivalent formulation of W:

W2(M,N) = inf

{∫ 1

0

~A(C(t),V(t)) dt : (C,V) ∈ ~CET (M,N)

}. (2.34)

The following result is a probabilistic representation via characteristics for the continuityequation. It is a variant of [2, Prop. 8.2.1] adapted to our setting.

Proposition 2.20. For a given M,N ∈ P(P(X )) let (C,) ∈ CET (M,N) be a solution ofthe continuity equation with finite action.

Then there exists a probability measure Θ on P(X )×AC([0, T ],P(X )) such that:

(1) Any (µ, c) ∈ supp Θ is a solution of the ODE

c(t) = K(c(t))(t, c(t)) for a.e. t ∈ [0, T ] ,

c(0) = µ .

(2) For any ϕ ∈ C0b (P(X )) and any t ∈ [0, T ] holds∫

P(X )ϕ(ν) C(t, dν) =

∫P(X )×AC([0,T ],P(X ))

ϕ(c(t)) Θ(dµ0, dc). (2.35)

Conversely any Θ satisfying (1) and∫P(X )×AC([0,T ],P(X ))

∫ T

0A (c(t),(t, c(t))) dt Θ(dµ, dc) <∞ (2.36)

induces a family of measures C(t) via (2.35) such that (C,) ∈ CET (M,N).

We will also use the measure Θ on ×AC([0, T ],P(X )) given by

Θ(dc) =

∫P(X )

Θ(dµ, dc) . (2.37)

Note that (2.35) then can be rewritten as C(t) = (et)#Θ, where et : AC([0, T ],P(X )) 3c 7→ c(t) ∈ P(X ) denotes the evaluation at time t.

Proof. Let (C,) ∈ CET (M,N) be a solution of the continuity equation with finite action.Define V : [0, T ] × P(X ) → RX×X via Vxy(t, ν) = wxy(ν)∇xy(t, ν) and note thatδV(t, ν) = K(ν)(t, ν). We view P(X ) as a subset of RX and δV as a time-dependentvector field on RX and note that (C, δV) is a solution to the classical continuity equationin weak form ∫ T

0

∫RX

(ϕ(t, ν)−∇ϕ(t, ν)δV(t, ν))C(t, dν) dt = 0

for all ϕ ∈ C1c

((0, T ) ×RX

). Moreover, note that for any ∈ RX we have by Jensens

inequality

|K(ν)|2 =∑x∈X

∣∣∣∣∣∣∑y∈X

wxy(ν)(x − y)

∣∣∣∣∣∣2

≤∑x,y∈X

Cwwxy(ν) |(x − y)|2 = CwA(ν,(ν)) ,

withCw := max

x,y∈Xsup

ν∈P(X )wxy(ν) = max

x,y∈Xsup

ν∈P(X )Λ (νxQxy(ν), νyQyx(ν)) .


Since Q : P(X )→ R+ is continuous, Cw is finite. This yields the integrability estimate∫ T

0

∫Rd

|δV(t, ν)|2 dC(t, ν) dt ≤ C∫ T

0A(C(t),(t)) dt <∞ .

Now, by the representation result [2, Proposition 8.2.1] for the classical continuityequation there exists a probability measure Θ on RX × AC([0, T ],RX ) such that any(µ, c) ∈ supp Θ satisfies c(0) = µ and c(t) = δV(t, c(t)) in the sense weak sense (2.15) andmoreover, (2.35) holds with P(X ) replaced by RX . Since, C(t) is supported on P(X ) wefind that Θ is actually a measure on P(X )×AC([0, T ],P(X )), where absolute continuityis understood still w.r.t. the Euclidean distance. To see that Θ is the desired measureit remains to check that for Θ-a.e. (µ, c) we have that c is a curve of finite action. Butthis follows by observing that (2.35) implies∫ ∫ T

0A(c(t),(t, c(t))) dt Θ(dµ, dc) =

∫ T

0A(C(t),(t)) dt <∞ .

This finishes the proof of the first statement.The converse, statement follows in the same way as in [2, Proposition 8.2.1]. �

Proposition 2.21 (Identification with Wasserstein distance). The distance W definedin (2.33) coincides with the L2-Wasserstein distance on P(P(X )) wrt. the distance Won P(X ). More precisely, for M,N ∈ P(P(X )) there holds

W2(M,N) = W 2W(M,N) := inf

G∈Π(M,N)

{∫P(X )×P(X )

W2(µ, ν) dG(µ, ν)

}, (2.38)

where Π(M,N) is the set of all probability measures on P(X ) × P(X ) with marginals Mand N.

Proof. We first show the inequality “≥”. For ε > 0 let (C,) be a solution to the

continuity equation such that∫ 1

0 A(C(t),(t))dt ≤ W2(M,N)+ε and let Θ be the measureon AC([0, T ],P(X )) given by the previous Proposition. Then we obtain a couplingG ∈ Π(M,N) by setting G = (e0, e1)#Θ. This yields

W 2W(M,N) ≤

∫W2(µ, ν) dG(µ, ν) =

∫W2(c(0), c(1))dΘ

≤∫ ∫ 1

0A(c(t),(t, c(t)))dtdΘ(c) =

∫ 1

0A(C(t),(t))dt ≤ W2(M,N) + ε .

Since ε was arbitrary this yields the inequality “≥”.To prove the converse inequality “≤”, fix an optimal coupling G, fix ε > 0 and choose

for G-a.e. (µ, ν) a couple (cµ,ν , vµ,ν) ∈ ~CE1(µ, ν) such that∫ 1

0~A(cµ,ν(t), vµ,ν(t))dt ≤

W(µ, ν) + ε. Now, define a family of measures C : [0, 1] → P(P(X )) and a family ofvector-valued measures V : [0, 1]→ P(P(X ); RX×X ) via

dC(t, ν) =

∫P(X )×P(X )

dδcµ,ν(t)(ν)dG(µ, ν) ,

V (t, ν) =

∫P(X )×P(X )

vµ,ν(t)dδcµ,ν(t)(ν)dG(µ, ν) .


Note that V (t)� C(t) and define V : [0, 1]×P(X )→ RX×X as the density of V w.r.t. C.

By linearity of the continuity equations have that (C,V) ∈ ~CE1(M,N). Moreover, we find∫ 1

0

~A(C(t),V(t))dt =

∫ 1

0

∫~A(cµ,ν(t), vµ,ν(t))dG(µ, ν)dt

≤∫W2(µ, ν)dG(µ, ν) + ε = W 2

W(M,N) + ε .

Since ε was arbitrary, in view of (2.34) this finishes the proof. �

Finally, we can obtain a gradient flow structure for the Liouville equation (2.27) ina straightforward manner by averaging the gradient flow structure of the underlyingdynamical system.

To this end, given M ∈ P(P(X )) define the free energy by

F(M) :=

∫P(X )

F(ν) M(dν) , (2.39)

and define the Fisher information by

I(M) := A(M,−DF) =

∫P(X )

I(ν) M(dν). (2.40)

Proposition 2.22 (Gradient flow structure for Liouville equation). The Liouville equa-

tion (2.27) is the gradient flow of F wrt. W. Moreover precisely,√I is a strong upper

gradient for F on the metric space (P(P(X )),W) and any solution to (2.27) is a curve ofmaximal slope. In other words, for any absolutely continuous curve C in P(P(X )) holds

J(C) := F(C(T ))− F(C(0)) +1

2

∫ T

0I(C(t)) dt+

1

2

∫ T

0A(C(t),(t)) dt ≥ 0 (2.41)

with (C(t),(t)) ∈ CET . Moreover, equality holds if and only if C solves (2.29).

Proof. Let Θ be the disintegration of C from Proposition 2.20 defined in (2.37). The fact

that√I is a strong upper gradient of F can be seen by integrating its defining inequality

on the underlying level (2.23) wrt. Θ

|F(C(t2))− F(C(t1))| ≤∫

AC([0,T ];P(X ))|F(c(t2))−F(c(t1))| Θ(dc)

≤∫

AC([0,T ];P(X ))

∫ t2

t1

√I(c(t))

√A(c(t), ψ(t)) dt Θ(dc).

Then, using Jensen’s inequality on the concave function (a, b) 7→√ab , we get the strong

upper gradient property for√I.

The “if” part of the last claim is easily verified from the definition. Now, assume thatJ(C) = 0. Since C is absolutely continuous, we can apply Proposition 2.20 and obtain theprobabilistic representation Θ ∈ P (AC([0, T ]× P(X ))) (2.37) such that C(t) = (et)]Θ.Then, (2.41) can be obtained by just integrating J from (2.24) along Θ and it holdsJ (c) = 0 for Θ-a.e. c ∈ AC([0, T ],P(X )). These c are by Proposition 2.13 solutionsto (2.12) and satisfy c(t) ∈ P∗(X ) for all t > 0. Then, we can conclude by the conversestatement of Proposition 2.20 that K(c(t))(t, c(t)) = −K(c(t))DF(c(t)), which implies


since c(t) ∈ P∗(X ) for t > 0 up to a constant that (t, c(t)) = −DF(c(t)) and hence Csolves (2.29). �

3. From weakly interacting particle systems to mean field systems

In this section, we will show how the gradient flow structure we described in theprevious sections arises as the limit of gradient flow structures for N -particle systemswith mean field interactions, in the limit N →∞. Moreover, we show that the empiricaldistribution of the N -particle dynamics converges to a solution of the non-linear equation(1.1).

Notation 3.1. For N an integer bold face letters are elements connected to the spaceXN and hence implicitly depending on N . Examples are vectors x,y ∈ XN , matrices

Q ∈ RXN×XN or measures µ ∈ P(XN ). For i ∈ {1, . . . , N} let ei be the placeholder for

i-th particle, such that x · ei = xi ∈ X is the position of the i-th particle. For x ∈ XNand y ∈ X we denote by xi;y the particle system obtained from x where the i-th particlejumped to site y

xi;y := x− (xi − y)ei = (x1, . . . ,xi−1, y,xi+1, . . . ,xN ).

LN (x) will denote the empirical distribution for x ∈ XN , defined by

LN (x) :=1

N

N∑i=1

δxi (3.1)

We introduce the discretized simplex PN (X ) ⊂ P(X ), given by

PN (X ) :={LN (x) : x ∈ XN

}. (3.2)

Let us introduce a natural class of mean-field dynamics for the N -particle system. Wefollow the standard procedure outlined in Remark 2.4.

In analog to Definition 2.3, we fix K : P(X ) × X → R such that for each x ∈ X ,Kx

is a twice continuously differentiable function on P(X ) and set U(µ) :=∑

x∈X µxKx(µ).

For every natural number N define the probability measure πN for x ∈ XN by

πNx :=1

ZNexp

(−NU

(LNx

)), (3.3)

and ZN :=∑x∈XN exp

(−NU

(LNx

))is the partition sum. This shall be the invariant

measure of the particle system and is already of mean-field form.To introduce the dynamic, we use a family

{AN (µ) ∈ RX×X

}µ∈PN (X )

of irreducible

symmetric matrices and define a family of rate matrices{QN (µ) ∈ RX×X

}µ∈PN (X )

for

any x ∈ XN , y ∈ X , i ∈ {1, . . . N} by

QNxi,y(LNx) :=

√πNxi;y

πNxANx,y(L

Nx) = exp(−N

2

(U(LNxi;y)− U(LNx)

))Ax,y(L

Nx).

(3.4)Finally, the actual rates of the N -particle system are given in terms of the rate matrix

QN ∈ RXN×XN : QN

x,xi;y := QNxi,y(LN (x)). (3.5)

By construction QN is irreducible and reversible wrt. the unique invariant measure πN .


Remark 3.2. The irreducible family of matrices{AN (µ)

}µ∈P(X )

encodes the under-

lying graph structure of admissible jumps and also the rates of the jumps. For in-stance, ANx,y(ν) = αx,y for any symmetric adjacency matrix α ∈ {0, 1}X×X corresponds

to Glauber dynamics on the corresponding graph. Another choice is ANx,y(LNx) :=

exp(−N

2

∣∣U(LNx)− U(LNxi;y)∣∣), which corresponds to Metropolis dynamic on the com-

plete graph. In particular, all of these examples satisfy Assumption 3.3.

Assumption 3.3. There exists a family of irreducible symmetric matrices {A(µ)}µ∈P(X )

such that µ 7→ A(µ) is Lipschitz continuous on P(X ) and the family{AN (µ)

}µ∈P(X )

of

irreducible symmetric matrices satisfies

∀x, y ∈ X : ANx,y → Ax,y on P(X ). (3.6)

Lemma 3.4. Assume{QN (µ) ∈ RX×X

}µ∈PN (X )

is given by (3.4) with AN satisfying

Assumption 3.3, then for all x, y ∈ XQNx,y → Qx,y on P(X ) (3.7)

with Qx,y(µ) =√

πy(µ)πx(µ)Ax,y(µ) with π given in (2.4). In particular, µ 7→ Qx,y(µ) is

Lipschitz continuous on P(X ) for all x, y ∈ X .

Proof. By [6, Lemma 4.1] holds for x ∈ XN with µ = LNx, y ∈ X and i ∈ {1, . . . , N}πNxi;y

πNx= N

(U(LNx)− U(LNxi;y)

)= ∂µxU(µ)− ∂µyU(µ) +O(N−1) =

πy(µ)

πx(µ),

which shows by Assumption 3.3 the convergence statement. The Lipschitz continuity fol-lows, since A is assumed Lipschitz and the function µ 7→ ∂µxU(µ) = µx+

∑y µy∂µxKy(µ)

is continuously differentiable, since K is assumed twice continuously differentiable. �

Remark 3.5. The mean-field behavior is manifested in the convergence statement (3.7).The typical example we have is mind, as presented in Section 4 of [6], is as follows: themean-field model is described by

Kx(µ) := V (x) +∑y

W (x, y)µy

where V is a potential energy and W an interaction energy between particles on sitesx and y. For the N particle system, we can use a Metropolis dynamics, where possiblejumps are those between configurations that differ by the position of a single particle,and reversible with respect to the measure

πNx = A−1 exp(−UN (x)); UN (x) :=

N∑i=1

V (xi) +1

N

N∑i=1

N∑j=1

W (xi, xj).

Note, the by the definition of the LN , we have the identity

UN (x) = NU(LNx) with U(µ) =∑x

µxKx(µ),

which makes it consistent with Definition 3.3. This is a typical class of mean-field spinsystems from statistical mechanics.


In particular, the Curie-Weiss mean-field spin model for ferromagnetism is obtainedby choosing X = {−,+}, V (−) = V (+) = W (−,−) = W (+,+) = 0 and W (−,+) =W (+,−) = β > 0. This is among the simplest models of statistical mechanics showinga phase transition in the free energy

Fβ(µ) :=∑

σ∈{−,+}

(logµσ +Kσ(µ))µσ = µ− logµ− + µ+ logµ+ + 2βµ−µ+

at β = 1. For β ≤ 1 the free energy is convex whereas for β > 1 it is non-convexon P(X ). We will investigate this phase transition on the level of curvature for themean-field system as well as for the finite particle system in future work.

3.1. Gradient flow structure of interacting N-particle systems. The N -particledynamic on XN is now defined by the rate matrixQN given as in (3.5) with the generator

LNf :=

N∑i=1

∑y∈X

(f(xi;y)− f(x))QNx,xi;y . (3.8)

Likewise the evolution of a initial density µ0 ∈ P(XN ) satisfies

c(t) = c(t)QN . (3.9)

Since by construction the rate matrix QN defined in (3.5) satisfy the detailed balancecondition w.r.t. πN (3.3), this is the generator of a reversible Markov process w.r.t. πN

on the finite space XN . Hence, we can use the framework developed in [30] and [33] toview this dynamic as a gradient flow of the relative entropy with respect to its invariantmeasure. Let us introduce the relevant quantities.

We define the relative entropy H(µ|πN ) for µ,πN ∈ P(XN ) by setting

FN (µ) := H(µ|πN ) =∑x∈XN

µx logµxπNx

. (3.10)

Futher more we define the action of a pair µ ∈ P(XN ), ψ ∈ RXN

by

AN (µ,ψ) =1

2

∑x,y

(ψy −ψx)2wNx,y(µ), (3.11)

where the weights wNx,y(µ) are defined like in (2.10) as follows

wNx,y(µ) := Λ

(µxQ

N (x,y),µyQN (y,x)

). (3.12)

Then, a distance WN on P(XN ) is given by

WN (µ,ν)2 := inf(c(t),ψ(t))

∫ 1

0AN (c(t),ψ(t))dt (3.13)

where the infimum runs over all pairs such that c is a path from µ to ν in P(XN ), andsuch that the continuity equation

cx(t) +∑y

(ψy(t)−ψx(t))wNx,y(c(t)) = 0 (3.14)

holds. For details of the construction and the proof that this defines indeed a distancewe refer to [30]. In particular, we note that for any absolutely continuous curve c :


[0, T ]→ (P(XN ),WN ) there exist a function ψ : [0, T ]→ RXN

such that the continuityequation (3.14) holds.

Finally, we define the N -particle Fisher information by

IN (µ) :=

∑

(x,y)∈Eµ

wNxy(µ)(log(µxQ

Nxy(µ))− log(µyQ

Nyx(µ)))2 µ ∈ P∗(XN )

∞. otherwise

(3.15)

We formulate the statement that (3.9) is the gradient flow of FN w.r.t. WN again interms of curves of maximal slope.

Proposition 3.6. For any absolutely continuous curve c : [0, T ] → (P(XN ),WN ) thefunction J N given by

J N (c) := FN (c(T ))−FN (c(0)) +1

2

∫ T

0IN (c(t)) dt+

1

2

∫ T

0AN (c(t),ψ(t)) dt,

(3.16)

is non-negative, where ψt is such that the continuity equation (3.14) holds. Moreover, acurve c is a solution to c(t) = c(t)QN if and only if equality holds.

Proof. The proof is exactly the same as for Proposition 2.13, so we omit it. �

3.2. Convergence of gradient flows. In this section we prove convergence of the em-pirical distribution of the N -particle system (3.9) to a solution of the non-linear equation(1.1). This will be done by using the gradient flow structure exibited in the previoussections together with the techniques developed in [39] on convergence of gradient flows.

Heuristiclly, consider a sequence of gradient flows associated to a senquence of metricspaces and engergy functionals. Then to prove convergence of the flows it is sufficientto establish convergence of the metrics and the energy functionals in the sense thatfunctionals of the type (3.16) satisfy a suitable notion of Γ− lim inf estimate.

In the following Theorem 3.8 we adapt the result in [39] to our setting.We consider the sequence of metric spaces SN := (P(XN ),WN ) with WN defined

in (3.13) and the limiting metric space S := (P(P(X )),W) with W defined in (2.33). Thefollowing notion of convergence will provide the correct topology in our setting.

Definition 3.7 (Convergence of random measures). A sequence µN ∈ P(XN ) convergesin τ topology to a point M ∈ P(P(X )) if and only if LN#(µN ) ∈ P(P(X )) converges in dis-

tribution to M, where LN : XN → P(X ) is defined in (3.1). Likewise, for(cN (t)

)t∈[0,T ]

with cNt ∈ P(XN ): cNτ→ C if for all t ∈ [0, T ] : LN] c

N (t) ⇀ C(t).

Theorem 3.8 (Convergence of gradient flows a la [39]). Assume there exists a topology τsuch that whenever a sequence cN ∈ AC([0, T ],SN ) converges pointwise wrt. τ to a limitC ∈ AC([0, T ],S), then this convergence is compatible with the energy functionals, thatis

cNτ→ C ⇒ lim inf

N→∞

1

NFN (cN (T )) ≥ F(C(T ))−F0, (3.17)

for some finite constant F0 ∈ R. In addition, assume the following holds


(1) lim inf-estimate of metric derivatives:

lim infN→∞

1

N

∫ T

0AN (cN (t),ψN (t)) dt ≥

∫ T

0A(C(t),(t)) dt, (3.18)

where (cN ,ψN ) and (C(t),(t)) are related as solutions of certain continuityequations in SN and S, respectively.

(2) lim inf-estimate of the slopes pointwise in t ∈ [0, T ]:

lim infN→∞

1

NIN (cN (t)) ≥ I(C(t)). (3.19)

Let cN be a curve of maximal slope on (0, T ) for J N (3.16) such that cN (0)τ→ C(0)

which is well-prepared in the sense that limN→∞FN (cN (0)) = F(C(0)). Then C is acurve of maximal slope for J (2.41) and

∀t ∈ [0, T ) : limN→∞

1

NFN (cN (t)) = F(C(t)) (3.20)

1

NAN (cN ,ψN )→ A(C,) in L2[0, T ] (3.21)

1

NIN (cN )→ I(C) in L2[0, T ] (3.22)

Proof. Let us sketch the proof. The assumptions (3.17), (3.18), (3.19) and the well-preparedness of the initial data allow to pass in the limit in the individual terms of1NJ N (3.16) to obtain

lim infN→∞

1

NJ N (cN ) ≥ J(C). (3.23)

Hence, if each cN is a curve of maximal slope wrt. 1NJ N (cN ) then so is C wrt. J. The

other statements also can be directly adapted from [39]. �

3.3. Application. To apply Theorem 3.8, we first have to show the convergence of theenergy (3.17).

Proposition 3.9 (lim inf inequality for the relative entropy). Let a sequence µN ∈P(XN ) be given such that µN

τ→ M as N →∞, then

limN→∞

1

NH(µN |πN ) ≥

∫P(X )

(F(ν)−F0) M(dν) = F(M)−F0, (3.24)

where

F0 := infµ∈P(X )

F(µ). (3.25)

Proof. By the definition of πN (3.3) and U(ν) (2.9) for some µ ∈ P(X ) follows

log1

πNx− logZN =

N∑i=1

Kxi(LN (x)) =

∑x∈X

Kx(LN (x))NLNx (x) = NU(LN (x)). (3.26)


Therefore, we can write

1

NH(µN |πN ) =

1

N

∑x∈XN

µNx logµNxπNx

=1

N

∑x∈XN

µNx logµNx +∑x∈XN

µNx U(LN (x)) +1

NlogZN

=1

NH(µN ) + ELN#µN

[U ] +1

NlogZN . (3.27)

Let us investigate the first term. We define for ν ∈ PN (X )

TN (ν) ={x ∈ XN : LN (x) = ν

}. (3.28)

Therefore, we can write

H(µ) =∑

ν∈PN (X )

∑x∈TN (ν)

µNx logµNx

=∑

ν∈PN (X )

LN#µN (ν)

∑x∈TN (ν)

µNxLN#µ

N (ν)log

µNx |TN (ν)|LN#µ

N (ν)− log

|TN (ν)|LN#µ

N (ν)

.

Now, we observe that the first term inside the bracket is the relative entropy of theconditioned measure µN (·|LN (x) = ν) wrt. to the uniform measure on TN (ν) and espe-cially non-negative. If, we split the logarithm in the second term, we obtain the followingdecomposition

1

NH(µ) =

1

NELN#µN

[H(µN (• | LN = ν) | 1/|TN (ν)|

)]+

1

NHN

(LN#µ

N∣∣∣ 1/ |PN (X )|

)− 1

Nlog |PN (X )|

− 1

NELN#µN

[log |TN |]

≥ − 1

Nlog |PN (X )| − 1

NELN#µN

[log |TN |]

(3.29)

where HN(·∣∣∣1/ |PN (X )|

)is the relative entropy on P(PN (X )), with respect to the

uniform measure on the set, and where we used that the first two terms are non-negative.The cardinality of PN (X ) is given by

(N+d−1N

)≤ Nd−1/d! and hence log |PN (X )| ≤

(d − 1) logN . Moreover, by Stirling’s formula (cf. Lemma A.1), it follows that for anyν ∈ PN (X )

− 1

Nlog |TN (ν)| = − 1

Nlog

N !∏x∈X (Nν(x))!

≥∑x∈X

ν(x) log ν(x)− log(N + 1)

N.

Hence, we can pass to the limit by the lower semi-continuity of the entropy and obtain

limN→∞

1

NH(µ) ≥ lim

N→∞

(ELN#µN

[H(ν)]− (d− 1) logN

N− log(N + 1)

N

)≥ EM (H(ν)) .


We can pass directly to the limit in the second term of (3.27). Now for 1N logZN , we

can write by using (3.26)

ZN =∑

ν∈PN (X )

e−NU(ν)|TN (ν)|. (3.30)

Hence, by using Sanov’s and Varadhan’s theorem [20] on the asymptotic evaluation ofexponential integrals, it easily follows that

limN→∞

1

NlogZN = − inf

ν∈P(X )

{∑x∈X

ν(x) log ν(x) + U(ν)

}=: −F0.

�

The other ingredient of the proof of Theorem 3.8 consists in proving the convergenceof the metric derivatives (3.18) and slopes (3.19).

Proposition 3.10 (Convergence of metric derivative and slopes). Let cN be an elementof AC([0, T ],P(XN )), and choose ψN : [0, T ] → P(XN ) such that (cN ,ψN ) solves thecontinuity equation. Furthermore, assume that

cNτ→ C, (3.31)

with some measurable C : [0, T ]→ P(P(X )), and that

lim infN→∞

∫ T

0

1

NAN (cN (t),ψN (t))dt <∞.

Then C ∈ AC ([0, T ],P(P(X ))) , and it exists : [0, T ] → P(P(X )), for which (C,)satisfy the continuity equation and for which we have

lim infN→∞

∫ T

0

1

NAN (cN (t),ψN (t))dt ≥

∫ T

0A(C(t),(t)) dt (3.32)

and

lim infN→∞

∫ T

0

1

NIN

(cN (t)

)dt ≥

∫ T

0I (C(t)) dt. (3.33)

Proof. Let us summarize consequences of the assumption cNτ→ C. By Definition 3.7,

this means LN] cN (t) ⇀ C(t) for all t ∈ [0, T ]. For x, y ∈ X we define two auxiliary

measures �N ;x,y;1(t),�N ;x,y;2(t) ∈ P(P(X )× P(X )

)by setting

�N ;x,y;1(t, ν, µ) := δνN ;x,y(µ)νxQNxy(ν)LN] c

N (t, ν)

�N ;x,y;2(t, ν, µ) := δµN ;y,x(ν)µyQNyx(µ)LN] c

N (t, µ),

where νN ;x,y := ν− δx−δyN . Then, we have �N ;x,y;1(t, ν, µ) = �N ;y,x;2(t, µ, ν). Due to (3.7)

from Lemma 3.4 it holds

�N ;x,y;1(t, ν, µ) ⇀ δν(µ)νxQxy(ν)C(t, ν) (3.34)

�N ;x,y;2(t, ν, µ) ⇀ δµ(ν)µxQxy(µ)C(t, µ). (3.35)


In the sequel, we will decompose the sum over all possible jumps of the particle systemin different ways

∑x,y

f(x,y) =∑

ν,µ∈PN (X )

∑x:LNx=νy:LNy=µ

f(x,y) =∑x,y∈X

∑ν∈PN (X )

N∑i=1

∑x:LNx=νxi=x

f(x,xi;y).

where xi;y = x− (xi − y)ei and f : XN ×XN → R with f(x,y) = 0 unless y = xi;y forsome i ∈ {1, . . . , N} and y ∈ X . We define the following vectorfield on P(X )× P(X )

vN ;x,y(t, ν, µ) :=1

2NδνN ;x,y(µ)

∑x:LNx=νy:LNy=µ

(ψNy (t)−ψNx (t)

)wNxy

(cN (t)

)

=1

2NδνN ;x,y(µ)

N∑i=1

∑x:LNx=νxi=x

(ψNxi;y(t)−ψ

Nx (t)

)wNxxi;y

(cN (t)

),

where wNxy

(cN (t)

)is defined in (3.12). From the definition of vN ;x,y(t) and the Cauchy-

Schwarz inequality, it follows that for ν, µ ∈ P(X ) with µ = νN ;x,y for some x, y ∈ X

∣∣vN ;x,y(t, ν, µ)∣∣ ≤

1

2N

∑x:LNx=νy:LNy=µ

(ψNy (t)−ψNx (t)

)2wNxy

(cN (t)

)12

×

1

2N

N∑i=1

∑x:LNx=νxi=x

wNxxi;y

(cN (t)

)12

.

By using the identity

1

N

N∑i=1

∑x:LN (x)=νxi=x

cNx (t) =∑x

δν(LN (x)) LNx (x) cNx (t) = LN#cN (t, ν) νx,

and the fact that the logarithmic mean is jointly concave and 1-homogeneous, we canconclude

1

N

N∑i=1

∑x:LNx=νxi=x

wNxxi;y

(cN (t)

)≤ Λ

(�N ;x,y;1(t, ν, νN ;x,y),�N ;x,y;2(t, ν, νN ;x,y)

),

which first shows that vN ;x,y(t) � Λ(�N ;x,y;1(t),�N ;x,y;2(t)

)as product measure on

P(X ) × P(X ). Moreover, by summation and integration over any Borel I ⊂ [0, T ] we


get∫I

∑ν,µ∈PN (X )

∣∣vN ;x,y(t, ν, µ)∣∣ dt ≤ (√T ∫ T

0

1

NA(cN (t),ψ(t)) dt

) 12

×

|I|2

∑ν,µ∈PN (X )

supt∈I

Λ(�N ;x,y;1(t, ν, µ),�N ;x,y;2(t, ν, µ)

) 12

.

(3.36)

The second sum is uniformly bounded in N , since X is finite and by Lemma 3.4 QN → Quniformly with Q continuous in the first argument on the compact space P(X ). Now,from (3.36), we conclude that for some subsequence and all x, y ∈ X we have

vN ;x,y ⇀ vx,y with vx,y a Borel measure on [0, T ]× P(X )× P(X ). (3.37)

Using the Jensen inequality applied to the 1-homogeneous jointly convex function R×R2

+ 3 (v, a, b) 7→ v2

Λ(a,b) , we get∫ T

0

1

NA(cN (t),ψN (t)

)dt

=

∫ T

0

1

2

∑x,y

∑ν∈PN (X )

1

N

N∑i=1

∑x:LNx=νxi=x

((ψNxi;y

(t)−ψNx (t))wNxxi;x,y

(cN (t)))2

wNxxi;x,y

(cN (t))dt

≥∫ T

0

1

2

∑x,y

∑ν∈PN (X )

(vN (t, ν, νN ;x,y)

)2Λ (�N ;x,y;1(t, ν, νN ;x,y),�N ;x,y;2(t, ν, νN ;x,y))

dt .

The last term can be written as

1

2

∑x,y

F (vN ;x,y,�N ;x,y;1,�N ;x,y;2) ,

where the functional F on triples of measure on [0, T ]× P(X )2 is defined via

F (v,�1,�2) =

∫ T

0

∫∫P(X )2

α

(dvdσ,d�1

dσ,d�2

dσ

)dσ ,

where α is the function defined in (2.16) and σ is any measure on [0, T ]×P(X )2 such thatv,�1,�2 � σ. The definition does not depend on the choice of σ by the 1-homogeneityof α. Then, by a general result on lower semicontinuity of integral functionals [8,Thm. 3.4.3] we can conclude, that

lim infN→∞

F (vN ;x,y,�N ;x,y;1,�N ;x,y;2) ≥ F (vx,y,�x,y;1,�x,y;2) .

In particular, this implies

dvx,y � Λ

(d�x,y;1

dσ,d�x,y;2

dσ

)dσ,


which by (3.34) and (3.35) is given in terms of

Λ

(d�x,y;1

dσ(t, ν, µ),

d�x,y;2

dσ(t, ν, µ)

)dσ = δν(µ)Λ (νxQxy(ν), νyQyx(ν))C(t, dν)dt.

Therefore, with the notation of Proposition 2.18, we obtain the statement

lim infN→∞

∫ T

0

1

NA(cN (t),ψ(t)N

)dt ≥ 1

2

∑x,y

∫ T

0

∫P(X )

(Vxy(t, ν))2

Λ (νxQxy(ν), νyQyx(ν))C(t, dν) dt

=

∫ T

0

~A(C(t),V(t)) dt with Vxy(t, ν) :=dvx,y

dC(t)dt.

From the convergence of the vector field (3.37) it is straightforward to check that (C,V) ∈~CET (C0,CT ) and hence by the conclusion of Proposition 2.18, there exists : [0, T ] ×P(X )→ RX such that

lim infN→∞

1

N

∫ T

0A(cN (t),ψN (t)

)dt ≥

∫ T

0

~A(C(t),V(t)) dt ≥∫ T

0A(C(t),(t)) dt,

which concludes the first part.The lim inf estimate of the Fisher information follows by a similar but simpler argu-

ment. We introduce the convex 1-homogeneous function λ(a, b) = (a− b) (log a− log b)and write

1

NIN

(cN (t)

)dt

=1

2

∑x,y

∑ν∈PN (X )

1

N

N∑i=1

∑x:LNx=νxi=x

λ(cNx (t)QN

xxi;y(cN (t)), cNxi;y(t)Q

Nxi;yx(cN (t))

)

≥ 1

2

∑x,y

∫∫P(X )2

λ

(d�N ;x,y;1(t)

dσ,d�N ;x,y;2(t)

dσ

)dσ.

Then, the result follows by an application of [8, Thm. 3.4.3]. �

In order to apply Theorem 3.8, we still need to prove that a sequence of N -particledynamics starting from nice initial conditions is tight.

Lemma 3.11. Let XN be the continuous Markov jump process with generator (3.8),then the sequence of laws of empirical measures is tight in the Skorokhod topology, i.e. itholds for any x ∈ X and ε > 0

limδ→0

lim supN→∞

P

[sup|t−s|≤δ

∣∣LNx (XN (t))− LNx (XN (s))∣∣ > ε

]= 0. (3.38)

The proof follows standard arguments for tightness of empirical measures of sequencesof interacting particle systems. Since there is no original argument here, the expositionshall be brief, and we refer to [28] for more details, in a more general context of interactingparticle systems.


Proof. The process

MNx (t) = LNx (XN (t))− LNx (XN (0))

−∫ t

0

∑y 6=x

LNx (XN (s))QNxy(LN (XN (s)))− LNy (XN (s))QNyx(LN (XN (s)))ds

is a martingale. Since the rates are bounded,∣∣∣∣∣∣∫ t

s

∑y 6=x

LNx (XN (r))QNxy(LN (XN (r)))− LNy (XN (r))QNyx(LN (XN (r)))dr

∣∣∣∣∣∣ ≤ C|t− s|and therefore, to prove (3.38), it is enough to show that

P

[sup|t−s|≤δ

∣∣MNx (t)−MN

x (s)∣∣ > ε

]is small. To do so, we shall estimate the quadratic variation of the martingale. It isgiven by

〈MN 〉(t) =1

N2

∑y 6=x

((NLNx (XN (t))− 1)2 − (NLNx (XN (t)))2)QNxy(LN (XN (s)))

+ ((NLNx (XN (t)) + 1)2 − (NLNx (XN (t))2)QNyx(LN (XN (s)))+

2

N2

∑y 6=x

NLNx (XN (t))(LNx (XN (s))QNxy

(LN (XN (s))

)− LNy (XN (s))QNyx

(LN (XN (s))

)).

Using the boundedness of the rates, it is straightforward to see that

|〈MN 〉(t)| ≤ CN−1.

The quadratic variation of the martingale vanishes. From Doob’s martingale inequality,we deduce that for any ε > 0

P

[sup

0≤s≤t≤T|MN

x (t)−MNx (s)| > ε

]−→ 0.

Tightness of the sequence of processes follows. �

We can now combine the work done so far to obtain our main result.

Theorem 3.12 (Convergence of the particle system to the mean field equation). LetcN be a solution to (3.9). Moreover assume its initial distribution to be well prepared

1

NFN (cN (0))→ F(C(0))−F0 with LN] cN (0) ⇀ C(0) as N →∞. (3.39)

Then it holds

LN] cN (t) ⇀ C(t) for all t ∈ (0,∞) , (3.40)

with C a weak solution to (2.27) and moreover

1

NFN (cN (t))→ F(C(t))−F0 for all t ∈ (0,∞). (3.41)


Proof. Fix T > 0. By the tightness Lemma 3.11, we have that the sequence of empiricalmeasures LN] cN : [0, T ] → P(P(X )) is tight wrt. the Skorokhod topology [4, Theorem

13.2]. Hence, there exist a measurable curve C : [0, T ] → P(P(X )) such that up to asubsequence LN] cN (t) weakly converges to C(t) for all t ≥ 0. By the Propositions 3.9

and 3.10, we get from Theorem 3.8 that (3.41) holds and C is curve of maximal slope forthe functional J. By Proposition 2.22, it is characterized as weak solution to (2.27). ByLemma 3.4 the limiting rate matrix Q is Lipschitz on P(X ) providing uniqueness of theLiouville equation (2.27). Hence, the convergence actually holds for the full sequence. �

Corollary 3.13. In the setting of Theorem 3.12 assume in addition that

LN] cN (0) ⇀ δc(0) for some c(0) ∈ P(X ) . (3.42)

Then it holdsLN] cN (t) ⇀ δc(t) for all t ∈ (0,∞) , (3.43)

with c a solution to (1.1) and moreover

1

NFN (cN (t))→ F(c(t))−F0 for all t ∈ (0,∞) . (3.44)

Proof. The proof is a direct application of Theorem 3.12 and a variance estimate for theparticle system (Lemma B.1). �

4. Properties of the metric W

In this section, we give the proof of Propostion 2.10 stating that W defines a distanceon P(X ) and that the resulting metric space is seperable, complete and geodesic. Theproof will be accomplished by a sequence of lemmas giving various estimates on andproperties of W. Some work is needed in particular to show finiteness of W.

Lemma 4.1. For µ, ν, and T > 0 we have

W(µ, ν) = inf

{∫ T

0

√A(c(t), ψ(t)) dt : (c, ψ) ∈ CET (µ, ν)

}. (4.1)

Proof. This follows from a standard reparametrization argument. See for instance [18,Thm. 5.9] for details in a similar situation. �

From the previous lemma we easily deduce the triangular inequality

W(µ, η) ≤ W(µ, ν) +W(ν, η) ∀µ, ν, η ∈ P(X ) , (4.2)

by concatenating two curves (c, ψ) ∈ CET (µ, ν) and (c′, ψ′) ∈ CET (ν, η) to form a curvein CE2T (µ, η).

The next sequence of lemmas puts W in relation with the total variation distance onP(X ). To proceed, we define similarly to [30], for every µ ∈ P(X ), the matrix

Bxy(µ) :=

{∑z 6=xwxz(µ), x = y,

−wxy(µ), x = y(4.3)

Now (2.21) can be rewritten as

W2(µ, ν) = inf

{∫ 1

0〈B(c(t))ψ(t), ψ(t)〉 dt : (c, ψ) ∈ CE1(µ, ν)

}, (4.4)


where the 〈ψ, φ〉 =∑

x∈X ψxφx is the usual inner product on RX .

Lemma 4.2. For any µ, ν ∈ P(X ) holds

W(µ, ν) ≥ 1√2‖µ− ν‖.

Moreover, for every a > 0, it exists a constant Ca, such that for all µ, ν ∈ Pa(X )

W(µ, ν) ≤ Ca‖µ− ν‖ .

Proof. The proof of the lower bound on W can be obtained very similar to [21, Propo-sition 2.9].

Let us show the upper bound. Following Lemma A.1. in [30], we notice that for µ ∈Pa(X ), the map ψ 7→ B(µ)ψ has an image of dimension d−1. In addition, the dimensionof the space {ax :

∑x∈X ax = 0} is d−1, therefore the map is surjective. From the above

we get that the matrix B(µ) restricts to an isomorphism B(µ), on the d− 1 dimensional

space {ax :∑ax = 0}. Now, since the mapping Pa(X ) 3 µ → ‖B−1(µ)‖, is continuous

with respect to the euclidean metric, we have an upper bound 1c by compactness. Also

Pa(X ) 3 µ → ‖B(µ)‖ has an upper bound C as a result of all entries in B(µ) beinguniformly bounded. From that we get

c‖ψ‖ ≤ ‖B(µ)ψ‖ ≤ C‖ψ‖,∀µ ∈ Pa(X ) (4.5)

for some suitable positive constants.Similarly to the proof of Lemma 3.19 in [30], for t ∈ [0, 1], we set c(t) = (1−t)µ+tν and

note that c(t) lies in Pa(X ), since it is a convex set. Since c(t) = ν−µ ∈ RangeB(c(t)),there exists a unique element ψ(t) for which we have c(t) = B(c(t))ψ(t), and ‖ψ(t)‖ ≤1c‖µ− ν‖.

From that we get

W(µ, ν) ≤∫ 1

0〈B(c(t))ψ(t), ψ(t)〉 dt ≤ 1

c2C‖µ− ν‖2.

�

Lemma 4.3. For every µ ∈ P(X ), ε > 0 it exists δ > 0 such that W(µ, ν) < ε, for everyν ∈ P(X ), with ‖µ− ν‖ ≤ δ

The proof of this lemma is similar to the proof [30, Theorem 3.12] and uses comparisonofW to corresponding quantity on the two point space X = {a, b}. However, significantlymore care is needed in the present setting to implement this argument. The reason beingthat the set of pairs of points x, y with Qx,y(µ) > 0 now depends on µ.

Proof. Let ε > 0 and µ ∈ P(X ) be fixed. Since X is finite, it holds

inf {Qxy(µ) : (x, y) ∈ Eµ} = a > 0.

Let Br(µ) = {ν ∈ P(X ) : ‖ν − µ‖ < r} denote a r-neighborhood around µ. Since, Q(µ)is continuous in µ, there exists for η > 0 a δ1 > 0 s.t.

∀ν ∈ Bδ1(µ) holds |Q(ν)−Q(µ)|L∞(X×X ) ≤ η.Especially, it holds by choosing η ≤ a/2 that Eµ ⊆ Eν and in addition

inf {Qxy(ν) : (x, y) ∈ Eµ, ν ∈ Bδ1(µ)} ≥ a/2.


For the next argument, observe that by the concavity of the logarithmic mean and afirst order Taylor expansion holds for a, b, s, t, η > 0

Λ ((s+ η)a, (t+ η)b) ≤ Λ(sa, tb) + η (∂sΛ(sa, tb) + ∂tΛ(sa, tb))

= Λ(sa, tb) + η (aΛ1(sa, tb) + bΛ2(sa, tb)) ,

where Λi is the i-th partial derivative of Λ. Therefore we can estimate for ν ∈ Bδ(µ).

Λ (Qxy(ν)ν(x), Qyx(ν)ν(y))− Λ (Qxy(µ)ν(x), Qyx(µ)ν(y))

≤ Λ ((Qxy(µ) + η) ν(x), (Qyx(µ) + η) ν(y))− Λ (Qxy(µ)ν(x), Qyx(µ)ν(y))

≤ η(ν(x)Λ1 (Qxy(µ)ν(x), Qyx(µ)ν(y)) + ν(y)Λ2 (Qxy(µ)ν(x), Qyx(µ)ν(y))

)≤ 2η

a

(Qxy(µ)ν(x)Λ1(Qxy(µ)ν(x), Qyx(µ)ν(y))

+Qyx(µ)ν(y)Λ2(Qxy(µ)ν(x), Qyx(µ)ν(y)))

=2η

aΛ(Qxy(µ)ν(x), Qyx(µ)ν(y)) ≤ Λ(Qxy(µ)ν(x), Qyx(µ)ν(y)),

Moreover, the last identity follows directly from the one-homogeneity of the logarithmicmean. Furthermore, we used η ≤ a

2 to obtain the last estimate. Repeating the argumentfor the other direction we get

1

2Λ (Qxy(µ)ν(x), Qyx(µ)ν(y)) ≤ Λ (Qxy(ν)ν(x), Qyx(ν)ν(y))

≤ 2Λ (Qxy(µ)ν(x), Qyx(µ)ν(y))(4.6)

Now, let c be an absolutely continuous curve with respect to WQ(µ), where WQ(µ) is thedistance that corresponds to the linear Markov process with fixed rates Q(µ), and livesinside the ball Bδ1(µ), then it is also absolutely continuous with respect to W, and if ψ

solves the continuity equation for c, with respect to the rates Q(µ), then it exists a ψ,that solves the continuity equation with respect to the variable rates Q(c(t)) and∫ 1

0A(c(t), ψ(t))dt ≤ 2

∫ 1

0AQ(µ)(c(t), ψ(t))dt, (4.7)

where AQ(µ) is the action with fixed rate kernel Q(µ).Indeed let ψ be a solution for the continuity equation for c with respect to the fixed

rates Q(µ), i.e.

cx(t) =∑y

(ψy(t)− ψx(t))Λ(cx(t)Qxy(µ), cy(t)Qyx(µ)).

For (x, y) ∈ Eµ, and t ∈ [0, 1] we define

vxy(t) := (ψy(t)− ψx(t))Λ(cx(t)Qxy(µ), cy(t)Qyx(µ)).


Then, it is easy to verify that (c, v) ∈ ~CE(c(0), c(1)) (cf. Definition 2.5) and we canestimate∫ 1

0

~A(c(t), v(t)) dt =

∫ 1

0

1

2

∑x,y

α (vxy(t),Λ(cx(t)Qxy(c(t)), cy(t)Qyx(c(t)))) dt

=

∫ 1

0

1

2

∑x,y

(ψy(t)− ψx(t))2×

× Λ(cx(t)Qxy(µ), cy(t)Qyx(µ))

Λ(cx(t)Qxy(c(t)), cy(t)Qyx(c(t)))Λ(cx(t)Qxy(µ), cy(t)Qyx(µ)) dt

(4.6)

≤∫ 1

0

1

2

∑x,y

(ψy(t)− ψx(t))2 2Λ(cx(t)Qxy(µ), cy(t)Qyx(µ))dt.

Now, the existence of ψ is a straightforward application of Lemma 2.7.Having established (4.7), the final result will follow by a comparison with the two-point

space for the Wasserstein distance with fixed rate kernel Q(µ).Now, for ν ∈ Bδ(µ), we can find a sequence of at most (d − 1) measures µi ∈ Bδ(µ),

such that µ0 = µ and µK = ν and

supp(µi − µi−1

)= {xi, yi} ∈ Eµ for i = 1, . . . ,K.

Indeed we can use the following matching procedure: Find a pair (i, j) with µi 6= νiand µj 6= νj . Set h = min {|µi − νi|, |µj − νj |}. Then define µ1

i := µi±h and µ1j := µj∓h

with signs chosen as the sign of νi−µi. After this step at least (d− 1)-coordinates of µ1

and ν agree. This procedure finishes after at most d− 1 steps, because the defect massof the last pair will match. Therewith, we can compare with the two-point space [30,Lemma 3.14]

WQ(µ)(µi−1, µi) ≤ 1√

2pxiyi

∣∣∣∣∣∫ 1−2µi−1

xi

1−2µixi

√arctanh r

rdr

∣∣∣∣∣ ≤ δ1

2,

with pxiyi = Qxiyi(µ)πxi(µ). The last estimate follows from the fact, that the function√arctanh r

r dr, is integrable in [−1, 1]. Therefore, we can find a δ ≤ δ1 such that for any a, b

with |a − b| ≤ δ, we have∫ 1−2b

1−2a

√arctanh r

r dr ≤ δ12 min{1,

√2pxiyi}. Finally, by Lemma

4.2, we can infer that any curve has Euclidean length smaller the its action value. Wecan conclude that the AQ(µ)-minimizing curve between any µi−1, µi, stays inside the ballBδ1(µ), from which we can further conclude that

W(µi−1, µi) ≤ 2WQ(µ)(µi−1, µi) ≤ 2

δ1

2= δ1

By an application of the triangular inequality (4.2), we get W(µ, ν) ≤ (d− 1)δ1, and theproof concludes if we pick δ such that (d− 1)δ1 ≤ ε. �

Lemma 4.4. For µk, µ ∈ P(X ), we have

W(µk, µ)→ 0⇔ ‖µk − µ‖ → 0.

Moreover, the space P(X ), along with the metric W, is a complete space.


Proof. The proof is a direct application of Lemmas 4.2 and 4.3. �

Theorem 4.5 (Compactness of curves of finite action). Let {(ck, vk)}k, with

(ck, vk) ∈ ~CET (ck(0), ck(T )),

be a sequence of weak solutions to the continuity equation with uniformly bounded action

supk∈N

{∫ T

0

~A(ck(t), vk(t)) dt

}≤ C <∞. (4.8)

Then, there exists a sub-sequence and a limit (c, v), such that ck converges uniformly to

c in [0, T ]; (c, v) ∈ ~CET (µ, ν(t)) and the for the action we have

lim infk→∞

∫ T

0

~A(ck(t), vk(t)) dt ≥∫ T

0

~A(c(t), v(t)) dt. (4.9)

Proof. Let x, y ∈ X and (ck, vk) be given as in the statement. Using the Cauchy-Schwarzinequality, we see that for any Borel I ⊂ [0, T ] we have the a priori estimate on vk

∫I

1

2

∑x,y

∣∣∣vkxy(t)∣∣∣ dt ≤ ∫ T

0

(~A(ck(t), vk(t)

)) 12

(1

2

∑x,y

wxy (c(t))

) 12

dt ≤√CT√Cw |I|,

with wxy (c(t)) from (2.10). Since Q is continuous on P(X ) by Definition 2.3,

supν∈P(X )

1

2

∑x,y

wxy (ν) = Cw <∞.

Together with the assumption (4.8), the whole rhs. is uniformly bounded in k. Therefore,for a subsequence holds vkxy ⇀ vxy as Borel measure on [0, T ] and all x, y ∈ X . Now, wechoose a sequence of smooth test functions ϕε in (2.15), which converge to the indicatorof the interval [t1, t2] as ε→ 0. Therewith and using the above a priori estimate on vk,we deduce∣∣∣ckx(t2)− ckx(t1)

∣∣∣ ≤ ∫ t2

t1

1

2

∑y∈X

(∣∣∣vkxy(t)∣∣∣+∣∣∣vkyx(t)

∣∣∣) dt ≤√CCw

√|t2 − t1|. (4.10)

Hence, ck is equi-continuous and therefore converges (upto a further sub-sub-sequence)to some continuous curve c. This, already implies that we can pass to the limit in (2.15)and obtain that (c, v) ∈ CET .

Moreover, we can deduce since ν 7→ Q(ν) is continuous for all x, y ∈ X also ck1;x,y :=

ckx(t)Qxy(ck(t))→ cx(t)Qxy(c(t)) =: c1;x,y(t) and analogue with ck2;x,y := cky(t)Qyx(ck(t)).

We rewrite the action (2.17) as

~A(ck(t), vk(t)) =1

2

∑x,y

α(vkx,y(t),Λ

(ck1;x,y(t), c

k2;x,y(t)

))


The conclusion (4.9) follows now from [8, Thm. 3.4.3] by noting that (v, c1, c2) 7→α (v,Λ (c1, c2)) is lsc., jointly convex and 1-homogeneous and hence

lim infk

∫ T

0

~A(ck(t), vk(t)) dt ≥∫ T

0

1

2

∑x,y

α (vx,y(t),Λ (c1;x,y(t), c2;x,y(t))) dt

=

∫ T

0

~A(c(t), v(t)) dt.

�

We can now give the proof of Proposition 2.10:

Proof of Proposition 2.10. Symmetry of W is obvious, the coincidence axiom followsfrom Lemma 4.2 and the triangular inequality from Lemma 4.1 as indicated above. Thefiniteness of W comes by using Lemmas 4.2, 4.3 and the triangular inequality. ThusW defines a metric. Completeness and separability follow directly from Lemmas 4.4and 4.2. By the direct method of the calculus of variations and the compactness resultsProposition 4.5, we obtain for any µ, ν ∈ P(X ) a curve (γt)t∈[0,1] with minimal action

connecting them, i.e.W(µ, ν) =∫ 1

0 A(γt, ψt)dt =∫ 1

0 |γ′t|2dt, where in the last equality we

used Proposition 2.11. From this, it is easy to see that γ is a constant speed geodesic. �

Appendix A. Stirling formula with explicit error estimate

Lemma A.1. Let ν ∈ PN (X ), then it holds

− log(N + 1)

N≤ − 1

Nlog

N !∏x∈X (Nν(x))!

−∑x∈X

ν(x) log ν(x) ≤ |X | logN

N+

1

N. (A.1)

Proof. We write

− logN !∏

x∈X (Nν(x))!=∑x∈X

Nν(x)∑k=1

log k −N∑k=1

log k

≥∑x∈X

∫ Nν(x)

1log y dy −

∫ N+1

1log(y) dy

=∑x∈X

[Nν(x) (logNν(x)− 1)− 1]− (N + 1) (log(N + 1)− 1)− 1

= N∑x∈X

ν(x) log ν(x) +NRN ,

where the remainder RN can be estimated as follows

RN =|X |N

+ logN

N + 1− log(N + 1)

N≥ − log(N + 1)

N

for |X | ≥ 2 and N ≥ 1. The other bound can be obtained by shifting the integrationbounds appropriately in the above estimate. �


Appendix B. Variance estimate for the particle system

Lemma B.1. For the N -Particle process XN with generator 3.8 holds for some C > 0and all t ∈ [0, T ] with T <∞

∀x ∈ X : var[LNx (XN (t))

]≤ eCt

(var[LNx (XN (0))

]+O(N−1)

). (B.1)

Proof. We write Nx(x) = NLNx (x) the number of particles at site x. The empiricaldensity of particles at site x is then Nx(x)/N = LNx (x). Therewith, we have

d

dtvar(Nx(XN (t))) = E[LNN2

x(XN (t))]− 2E[Nx(XN (t))]E[LNNx(XN (t))]

= E

[∑y

Nx(XN (t))(N2x(XN (t))− (Nx(XN (t))− 1)2)QNxy(L

N (XN (t)))

+∑y

Ny(XN (t))(N2

x(XN (t))− (Nx(XN (t)) + 1)2)QNyx(LN (XN (t)))

]

− 2E[Nx(XN (t))]E

[∑y

Nx(XN (t))QNxy(LN (XN (t)))−Ny(X

N (t))QNyx(LN (XN (t)))

]

= 2E[Nx(XN (t))2Qxy(LN (XN (t)))]− 2E

[∑y

Nx(XN (t))Ny(XN (t))QNyx(LN (XN (t)))

]

− 2E[Nx(XN (t))]E

[∑y

Nx(XN (t))QNxy(LN (XN (t)))−Ny(X

N (t))QNyx(LN (XN (t)))

]+O(N)

≤ C var(Nx(XN (t))) + C∑y 6=x

var(Nx(XN (t)))1/2 varNy(XN (t))

1/2

≤ C∑y

var(Ny(XN (t))) +O(N).

In theses computations, we used the fact that QN is uniformly bounded and that thestate space is finite.

Henced

dtvar(Nx(XN (t))/N) ≤ C

∑y

var(Ny(XN (t))/N) +O(N−1)

and therefore, using Gronwall’s Lemma, as soon as the sum of initial variances goes tozero when N goes to infinity, it also goes to zero at any positive time, and uniformly onbounded time intervals. �

Acknowledgements. This work was done while the authors were enjoying the hos-pitality of the Hausdorff Research Institute for Mathematics during the Junior TrimesterProgram on Optimal Transport, whose support is gratefully acknowledged. We wouldlike to thank Hong Duong for discussions on this topic. M.F. gratefully acknowledges


funding from NSF FRG grant DMS-1361185 and GdR MOMAS. M.E. and A.S. acknowl-edge support by the German Research Foundation through the Collaborative ResearchCenter 1060 The Mathematics of Emergent Effects.

References

[1] S. Adams, N. Dirr, M. A. Peletier, and J. Zimmer. From a large-deviations principle to the Wasser-stein gradient flow: a new micro-macro passage. Comm. Math. Phys., 307(3):791–815, 2011.

[2] L. Ambrosio, N. Gigli, and G. Savare. Gradient flows in metric spaces and in the space of probabilitymeasures. Lectures in Mathematics ETH Zurich. Birkhauser Verlag, Basel, second edition, 2008.

[3] L. Ambrosio, G. Savare, and L. Zambotti. Existence and stability for Fokker-Planck equations withlog-concave reference measure. Probab. Theory Related Fields, 145(3-4):517–564, 2009.

[4] P. Billingsley. Probability and measure. Wiley Series in Probability and Mathematical Statistics:Probability and Mathematical Statistics. John Wiley & Sons, Inc., New York, second edition, 1986.

[5] F. Bolley, A. Guillin, and C. Villani. Quantitative concentration inequalities for empirical measureson non-compact spaces. Probab. Theory Related Fields, 137(3-4):541–593, 2007.

[6] A. Budhiraja, P. Dupuis, M. Fischer, and K. Ramanan. Limits of relative entropies associated withweakly interacting particle systems. pages 1–29, dec 2014.

[7] A. Budhiraja, P. Dupuis, M. Fischer, and K. Ramanan. Local stability of Kolmogorov forwardequations for finite state nonlinear Markov processes. Electron. J. Probab., 20:no. 81, 30, 2015.

[8] G. Buttazzo. Semicontinuity, relaxation and integral representation in the calculus of variations,volume 207 of Pitman Research Notes in Mathematics Series. Longman Scientific & Technical,Harlow; copublished in the United States with John Wiley & Sons, Inc., New York, 1989.

[9] J. A. Carrillo, R. J. McCann, and C. Villani. Kinetic equilibration rates for granular media andrelated equations: entropy dissipation and mass transportation estimates. Rev. Mat. Iberoamericana,19(3):971–1018, 2003.

[10] J. A. Carrillo, R. J. McCann, and C. Villani. Contractions in the 2-Wasserstein length space andthermalization of granular media. Arch. Ration. Mech. Anal., 179(2):217–263, 2006.

[11] P. Cattiaux, A. Guillin, and F. Malrieu. Probabilistic approach for granular media equations in thenon-uniformly convex case. Probab. Theory Related Fields, 140(1-2):19–40, 2008.

[12] P. Dai Pra and F. den Hollander. McKean-Vlasov limit for interacting random processes in randommedia. J. Statist. Phys., 84(3-4):735–772, 1996.

[13] S. Daneri and G. Savare. Lecture notes on gradient flows and optimal transport. In Optimal trans-portation, volume 413 of London Math. Soc. Lecture Note Ser., pages 100–144. Cambridge Univ.Press, Cambridge, 2014.

[14] D. A. Dawson and J. Gartner. Large deviations from the McKean-Vlasov limit for weakly interactingdiffusions. Stochastics, 20(4):247–308, 1987.

[15] E. De Giorgi, A. Marino, and M. Tosques. Problems of evolution in metric spaces and maximaldecreasing curve. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8), 68(3):180–187, 1980.

[16] N. Dirr, V. Laschos, and J. Zimmer. Upscaling from particle models to entropic gradient flows. J.Math. Phys., 53(6):063704, 9, 2012.

[17] R. Dobrushin. Vlasov equations. Functional Analysis and Its Applications, 13(2):115–123, 1979.[18] J. Dolbeault, B. Nazaret, and G. Savare. A new class of transport distances between measures. Calc.

Var. Partial Differential Equations, 34(2):193–231, 2009.[19] M. Duong, V. Laschos, and M. Renger. Wasserstein gradient flows from large deviations of many-

particle limits. . . . : Control, Optimisation and . . . , 2013.[20] P. Dupuis and R. S. Ellis. A weak convergence approach to the theory of large deviations. Wiley

Series in Probability and Statistics: Probability and Statistics. John Wiley & Sons, Inc., New York,1997. A Wiley-Interscience Publication.

[21] M. Erbar and J. Maas. Ricci curvature of finite Markov chains via convexity of the entropy. Arch.Ration. Mech. Anal., 206(3):997–1038, 2012.

[22] M. Erbar and J. Maas. Gradient flow structures for discrete porous medium equations. DiscreteContin. Dyn. Syst., 34(4):1355–1374, 2014.


[23] M. Erbar, J. Maas, and M. Renger. From large deviations to Wasserstein gradient flows in multipledimensions. Electronic Communications in Probability, 20(89):1–12, nov 2015.

[24] M. Fathi. A gradient flow approach to large deviations for diffusion processes. page 38, may 2014.[25] M. Fathi and M. Simon. The gradient flow approach to hydrodynamic limits for the simple exclusion

process. arXiv [math.PR], pages 1–16, jul 2015.[26] N. Gigli and J. Maas. Gromov-Hausdorff convergence of discrete transportation metrics. SIAM J.

Math. Anal., 45(2):879–899, 2013.[27] R. Jordan, D. Kinderlehrer, and F. Otto. The variational formulation of the Fokker-Planck equation.

SIAM J. Math. Anal., 29(1):1–17, 1998.[28] C. Kipnis and C. Landim. Scaling limits of interacting particle systems, volume 320 of Grundlehren

der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences]. Springer-Verlag, Berlin, 1999.

[29] D. A. Levin, M. J. Luczak, and Y. Peres. Glauber dynamics for the mean-field Ising model: cut-off,critical power law, and metastability. Probab. Theory Related Fields, 146(1-2):223–265, 2010.

[30] J. Maas. Gradient flows of the entropy for finite Markov chains. J. Funct. Anal., 261(8):2250–2292,2011.

[31] F. Malrieu. Convergence to equilibrium for granular media equations and their Euler schemes. Ann.Appl. Probab., 13(2):540–560, 2003.

[32] N. Metropolis, A. W. Rosenbluth, M. N. Rosenbluth, A. H. Teller, and E. Teller. Equations of StateCalculations by Fast Computing Machines. J. Chem. Phys., 21(6):1087–1091, 1953.

[33] A. Mielke. Geodesic convexity of the relative entropy in reversible Markov chains. Calc. Var. PartialDifferential Equations, 48(1-2):1–31, 2013.

[34] A. Mielke. On evolutionary Gamma convergence for gradient systems. WIAS Preprint, 1915, 2014.[35] K. Oelschlager. A martingale approach to the law of large numbers for weakly interacting stochastic

processes. Ann. Probab., 12(2):458–479, 1984.[36] F. Otto. The geometry of dissipative evolution equations: the porous medium equation. Comm.

Partial Differential Equations, 26(1-2):101–174, 2001.[37] E. Sandier and S. Serfaty. Gamma-convergence of gradient flows with applications to Ginzburg-

Landau. Comm. Pure Appl. Math., 57(12):1627–1672, 2004.[38] A. Schlichting. Macroscopic limits of the Becker-Doring equations via gradient flows. preprint, 2016.[39] S. Serfaty. Gamma-convergence of gradient flows on Hilbert and metric spaces and applications.

Discrete Contin. Dyn. Syst., 31(4):1427–1451, 2011.

[40] A.-S. Sznitman. Topics in propagation of chaos. In Ecole d’Ete de Probabilites de Saint-Flour XIX—1989, volume 1464 of Lecture Notes in Math., pages 165–251. Springer, Berlin, 1991.

University of Bonn, GermanyE-mail address: [email protected]

University of California, BerkeleyE-mail address: [email protected]

Weierstrass InstituteE-mail address: [email protected]

University of Bonn, GermanyE-mail address: [email protected]

· GRADIENT FLOW STRUCTURE FOR MCKEAN-VLASOV EQUATIONS ON DISCRETE SPACES MATTHIAS ERBAR, MAX...

Documents

Transcript of · GRADIENT FLOW STRUCTURE FOR MCKEAN-VLASOV EQUATIONS ON DISCRETE SPACES MATTHIAS ERBAR, MAX...