Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We...

8
Proximal Recursion for Solving the Fokker-Planck Equation Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker- Planck or Kolmogorov’s forward equation that governs the time evolution of the joint probability density function of a continuous-time stochastic nonlinear system. Numerical solu- tion of this equation is fundamental for propagating the effect of initial condition, parametric and forcing uncertainties through a nonlinear dynamical system, and has applications encompassing but not limited to forecasting, risk assessment, nonlinear filter- ing and stochastic control. Our methodology breaks away from the traditional approach of spatial discretization for solving this second-order partial differential equation (PDE), which in general, suffers from the “curse-of-dimensionality”. Instead, we numerically solve an infinite dimensional proximal recursion in the space of probability density functions, which is theoretically equivalent to solving the Fokker-Planck-Kolmogorov PDE. We show that the dual formulation along with the introduction of an entropic regularization, leads to a smooth convex optimization problem that can be implemented via suitable block co-ordinate iteration and has fast convergence due to certain contraction property that we establish. This approach enables meshless implementation leading to remarkably fast computation. I. I NTRODUCTION Given a deterministic or stochastic dynamical system in continuous time over some finite dimensional state space, say R n , we consider the problem of propagating the trajectory ensembles or densities subject to stochastic initial conditions – often referred to as the belief or uncertainty propagation problem. Mathematically, this amounts to solving an initial value problem associated with a partial differential equation (PDE) of the form ∂ρ ∂t = Lρ, ρ(x,t = 0) = ρ 0 (x) given, (1) describing the transport of the density function ρ(x,t), which is a function of the state vector x R n , and time t 0. Here, L is a spatial operator that guarantees ρ 0, and R R n ρ(x,t)dx =1 for all t 0. Without loss of generality, one can interpret ρ(x,t) as the joint probability density function (PDF) of the state vector x at time t. We refer to (1) as transport PDE. The structural form of L in (1) depends on the underlying trajectory level dynamics. For example, consider the case when the dynamics of x(t) R n is governed by an ordinary differential equation (ODE) ˙ x = f (x,t), subject to random initial condition x(t = 0) = x 0 with known joint PDF ρ 0 (for notational ease, we write x 0 ρ 0 ). Then, Lρ ≡ -∇ · (ρf ), where denotes the gradient with respect to (w.r.t.) the standard Euclidean metric, and the resulting first order transport PDE is known as the Liouville equation. Kenneth F. Caluya, and Abhishek Halder are with the Department of Applied Mathematics, University of California, Santa Cruz, CA 95064, USA, {kcaluya,ahalder}@ucsc.edu More generally, consider the case when the dynamics of x(t) R n is governed by an Itˆ o stochastic differential equation (SDE) dx = f (x,t)dt + g(x,t)dw, x(t = 0) = x 0 ρ 0 (given), the process noise w(t) R m is Wiener and satisfy E [dw i dw j ]= δ ij dt for all i, j =1,...,n, where δ ij =1 for i = j , and zero otherwise. Then, Lρ ≡ -∇ · (ρf )+ 1 2 n X i,j=1 2 ∂x i ∂x j (ρgg > ) ij , (2) and the resulting transport PDE (2) is known as the Fokker- Planck or Kolmogorov’s forward equation. Hereafter, we will refer it as the FPK PDE. The problem of uncertainty propagation, that is, the prob- lem of computing ρ(x,t) that satisfies a PDE of the form (2), is ubiquitous across science and engineering. Representative applications include meteorological forecasting [1], disper- sion analysis in spacecraft entry-descent-landing [2], orienta- tion density evolution for liquid crystals in chemical physics [3]–[5], motion planning in robotics [6]–[8], computing the prior PDF in nonlinear filtering [9], [10], probabilistic model validation [11]–[13], and analyzing the statistical mechanics of macromolecules [14]. In all these applications, it is of importance to compute the joint PDF ρ(x,t) in a scalable and unified manner, rather than employing specialized techniques in a case-by-case basis or developing discretization-based PDE solvers which suffer from the “curse of dimensionality” [15]. The Liouville PDE being first order, can be solved efficiently using the method-of-characteristics [2]. However, solving the second order FPK PDE in a manner that avoids both spatial discretization and function approximation, re- mains challenging to date. In this paper, we pursue the solution of (1) through a variational viewpoint arising from the theory of optimal mass transport [16]. This viewpoint, first proposed in [17], interprets (1) as a gradient or steepest descent of certain functional Φ(·) on the infinite dimensional manifold of PDFs with finite second (raw) moments, denoted as 1 D 2 := {ρ : R n 7R | ρ 0, Z R n ρ =1, E ρ [x > x] < ∞}. Specifically, let k =0, 1, 2,..., and for some fixed time-step h> 0, consider a variational recursion % k (x) = arg inf %D2 1 2 d 2 (%, % k-1 )+ h Φ(%), (3) subject to the initial condition % 0 (x) := ρ 0 (x), i.e., the initial PDF of (1). Here, d(·, ·) is a distance metric on the manifold 1 We denote the expectation operator w.r.t. the measure ρ(x)dx as Eρ [·]. arXiv:1809.10844v2 [math.OC] 15 Nov 2018

Transcript of Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We...

Page 1: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

Proximal Recursion for Solving the Fokker-Planck Equation

Kenneth F. Caluya, Abhishek Halder

Abstract— We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that governs thetime evolution of the joint probability density function of acontinuous-time stochastic nonlinear system. Numerical solu-tion of this equation is fundamental for propagating the effect ofinitial condition, parametric and forcing uncertainties through anonlinear dynamical system, and has applications encompassingbut not limited to forecasting, risk assessment, nonlinear filter-ing and stochastic control. Our methodology breaks away fromthe traditional approach of spatial discretization for solvingthis second-order partial differential equation (PDE), which ingeneral, suffers from the “curse-of-dimensionality”. Instead, wenumerically solve an infinite dimensional proximal recursion inthe space of probability density functions, which is theoreticallyequivalent to solving the Fokker-Planck-Kolmogorov PDE. Weshow that the dual formulation along with the introduction of anentropic regularization, leads to a smooth convex optimizationproblem that can be implemented via suitable block co-ordinateiteration and has fast convergence due to certain contractionproperty that we establish. This approach enables meshlessimplementation leading to remarkably fast computation.

I. INTRODUCTION

Given a deterministic or stochastic dynamical system incontinuous time over some finite dimensional state space, sayRn, we consider the problem of propagating the trajectoryensembles or densities subject to stochastic initial conditions– often referred to as the belief or uncertainty propagationproblem. Mathematically, this amounts to solving an initialvalue problem associated with a partial differential equation(PDE) of the form

∂ρ

∂t= Lρ, ρ(x, t = 0) = ρ0(x) given, (1)

describing the transport of the density function ρ(x, t), whichis a function of the state vector x ∈ Rn, and time t ≥ 0.Here, L is a spatial operator that guarantees ρ ≥ 0, and∫Rn ρ(x, t)dx = 1 for all t ≥ 0. Without loss of generality,

one can interpret ρ(x, t) as the joint probability densityfunction (PDF) of the state vector x at time t. We referto (1) as transport PDE.

The structural form of L in (1) depends on the underlyingtrajectory level dynamics. For example, consider the casewhen the dynamics of x(t) ∈ Rn is governed by an ordinarydifferential equation (ODE) x = f(x, t), subject to randominitial condition x(t = 0) = x0 with known joint PDF ρ0

(for notational ease, we write x0 ∼ ρ0). Then, Lρ ≡ −∇ ·(ρf), where ∇ denotes the gradient with respect to (w.r.t.)the standard Euclidean metric, and the resulting first ordertransport PDE is known as the Liouville equation.

Kenneth F. Caluya, and Abhishek Halder are with the Department ofApplied Mathematics, University of California, Santa Cruz, CA 95064,USA, {kcaluya,ahalder}@ucsc.edu

More generally, consider the case when the dynamics ofx(t) ∈ Rn is governed by an Ito stochastic differentialequation (SDE) dx = f (x, t) dt + g(x, t) dw, x(t = 0) =x0 ∼ ρ0 (given), the process noise w(t) ∈ Rm is Wienerand satisfy E [dwidwj ] = δijdt for all i, j = 1, . . . , n, whereδij = 1 for i = j, and zero otherwise. Then,

Lρ ≡ −∇ · (ρf) +1

2

n∑

i,j=1

∂2

∂xi∂xj(ρgg>)ij , (2)

and the resulting transport PDE (2) is known as the Fokker-Planck or Kolmogorov’s forward equation. Hereafter, we willrefer it as the FPK PDE.

The problem of uncertainty propagation, that is, the prob-lem of computing ρ(x, t) that satisfies a PDE of the form (2),is ubiquitous across science and engineering. Representativeapplications include meteorological forecasting [1], disper-sion analysis in spacecraft entry-descent-landing [2], orienta-tion density evolution for liquid crystals in chemical physics[3]–[5], motion planning in robotics [6]–[8], computing theprior PDF in nonlinear filtering [9], [10], probabilistic modelvalidation [11]–[13], and analyzing the statistical mechanicsof macromolecules [14]. In all these applications, it is ofimportance to compute the joint PDF ρ(x, t) in a scalable andunified manner, rather than employing specialized techniquesin a case-by-case basis or developing discretization-basedPDE solvers which suffer from the “curse of dimensionality”[15]. The Liouville PDE being first order, can be solvedefficiently using the method-of-characteristics [2]. However,solving the second order FPK PDE in a manner that avoidsboth spatial discretization and function approximation, re-mains challenging to date.

In this paper, we pursue the solution of (1) through avariational viewpoint arising from the theory of optimalmass transport [16]. This viewpoint, first proposed in [17],interprets (1) as a gradient or steepest descent of certainfunctional Φ(·) on the infinite dimensional manifold of PDFswith finite second (raw) moments, denoted as1

D2 := {ρ : Rn 7→ R | ρ ≥ 0,

Rnρ = 1, Eρ[x>x] <∞}.

Specifically, let k = 0, 1, 2, . . ., and for some fixed time-steph > 0, consider a variational recursion

%k(x) = arg inf%∈D2

1

2d2 (%, %k−1) + h Φ(%), (3)

subject to the initial condition %0(x) := ρ0(x), i.e., the initialPDF of (1). Here, d(·, ·) is a distance metric on the manifold

1We denote the expectation operator w.r.t. the measure ρ(x)dx as Eρ [·].

arX

iv:1

809.

1084

4v2

[m

ath.

OC

] 1

5 N

ov 2

018

Page 2: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

D2. Then, the idea is to design the metric d(·, ·) and thefunctional Φ(·) in (3) such that %k(x) → ρ(x, t = kh) ash ↓ 0, i.e., in the small time-step limit, the solution of thevariational recursion (3) converges (in strong L1 sense) tothat of (1). The main result in [17] was to show that for FPKoperators of the form (2) with f being a gradient vector fieldand g being a scalar multiple of identity matrix, the distanced(·, ·) can be taken as the Wasserstein-2 metric with Φ(·) asthe free energy functional. We will make these ideas precisein Section II and III. The resulting variational recursion (3)has since been known as the Jordan-Kinderlehrer-Otto (JKO)scheme [18], and we will refer the FPK operator with suchassumptions on f and g to be in “JKO canonical form”.Similar gradient descent schemes have been derived for manyother PDEs; see e.g., [19] for a recent survey.

To motivate gradient descent in infinite dimensionalspaces, we appeal to a more familiar setting, i.e., gradientdescent in Rn associated with the flow

dx

dt= −∇ϕ (x) x(0) = x0, (4)

where x,x0 ∈ Rn and ϕ : Rn → R≥0, and is continuouslydifferentiable. The Euler discretization for (4) is given by

xk − xk−1 = −h∇ϕ(xk−1), (5)

which can be rewritten as a variational recursion

xk = arg minx

1

2‖ x− xk−1 ‖2 +h ϕ(x) + o(h). (6)

In the optimization literature, the mapping xk−1 7→ xk,given by

prox‖·‖hϕ(xk−1) := arg min

x

1

2‖ x− xk−1 ‖2 +h ϕ(x), (7)

is called the “proximal operator” [20, p. 142]. The sequence{xk} generated by the proximal recursion

xk = prox‖·‖hϕ(xk−1), k = 0, 1, 2, . . . (8)

converges to the flow of the ODE (4), i.e., the sequencesatisfies xk → x(t = kh) as the step-size h ↓ 0. Using thefinite dimensional viewpoint (7), we define

proxd2

hΦ(%k−1) := arg inf%∈D2

1

2d2 (%, %k−1) + h Φ(%), (9)

as an infinite dimensional proximal operator. As mentionedabove, the sequence {%k} generated by the proximal re-cursion (3) converges to the flow of the PDE (4), i.e., thesequence satisfies %k(x) → ρ(x, t = kh) as the step-sizeh ↓ 0. We also note that in the finite dimensional case,

d

dtϕ = 〈∇ϕ,−∇ϕ〉 = − ‖ ∇ϕ ‖2< 0 (10)

which implies ϕ decays along the flow of (4). As we will seenext, the appeal of using (3) to solve the FPK PDE comesfrom the fact that the Euclidean gradient descent can begeneralized to the manifold D2 by appropriately choosingthe metric d(·, ·) and the functional Φ(·) in (3), in parallelwith the quantities ‖ · ‖ and ϕ(·) in (8), respectively.

Fig. 1: The JKO scheme can be described by successive evaluationof proximal operators to recursively update PDFs from time t =(k − 1)h to t = kh for k = 1, 2, . . ., and time-step h > 0.

In this paper, we will develop an algorithm to solve theFPK PDE via proximal recursion of the form (3) withoutmaking any spatial discretization. A schematic is shown inFig. 1. The resulting recursion is proved to be contractive andenjoys fast numerical implementation. Numerical simulationresults show the efficacy of the proposed formulation.

II. PRELIMINARIES

In the following, we provide the definitions of theKullback-Leibler divergence, and the 2-Waserstein metric,which will be useful in the sequel. We also point out somenotations used throughout this paper.

Definition 1: The Kullback-Leibler divergence betweentwo probability measures dπi(x) = ρi(x)dx, i = {1, 2},is given by

DKL (dπ1 ‖ dπ2) :=

∫ρ1(x) log

ρ1(x)

ρ2(x)dx, (11)

which is non-negative, and vanishes if and only if ρ1 = ρ2.However, (11) is not a metric since it is neither symmetric,nor does it satisfy the triangle inequality.

Definition 2: The 2-Wasserstein metric between two prob-ability measures dπ1(x) = ρ1(x)dx and dπ2(y) = ρ2(y)dysupported respectively on X ,Y ⊆ Rn, is denoted asW (π1, π2) (equivalently, W (ρ1, ρ2) whenever π1, π2 areabsolutely continuous so that the PDFs ρ1, ρ2 exist), andarises in the theory of optimal mass transport [16]; it isdefined as

W (π1, π2) :=(

infdπ∈Π(π1,π2)

X×Y‖ x− y ‖22 dπ (x,y)

) 12

, (12)

where Π (π1, π2) denotes the collection of all probabilitymeasures on the product space X × Y having finite secondmoments, with marginals π1 and π2, respectively. Its square,W 2(π1, π2) equals [21] the minimum amount of work re-quired to transport π1 to π2 (or equivalently, ρ1 to ρ2). It iswell-known [16, Ch. 7] that W (π1, π2) defines a metric onthe manifold D2.

Notations: Throughout the paper, we will use bold-facedcapital letters for matrices and bold-faced lower-case lettersfor column vectors. We use the symbol 〈·, ·〉 to denote the Eu-clidean inner product. In particular, 〈A,B〉 := trace(A>B)

Page 3: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

denotes Frobenius inner product between matrices A andB, and 〈a, b〉 := a>b denotes the inner product betweencolumn vectors a and b. We use N (µ, σ2) to denote aunivariate Gaussian PDF with mean µ and variance σ2.Likewise, N (µ,Σ) denotes a multivariate Gaussian PDFwith mean vector µ and covariance matrix Σ. The operandslog(·), exp(·) and ≥ 0 are to be understood as element-wise.The notations � and � denote element-wise (Hadamard)product and division, respectively. We use In to denote then×n identity matrix. The symbols 1 and 0 stand for columnvectors of appropriate dimension containing all ones, and allzeroes, respectively.

III. JKO CANONICAL FORM

In this paper, we consider the Ito SDE

dx = −∇ψ (x) dt +√

2β−1 dw, x(0) = x0, (13)

where the time t ∈ [0,∞), the state vector x ∈ Rn, the driftpotential ψ : Rn 7→ (0,∞), the diffusion coefficient β > 0,and the initial condition x0 ∼ ρ0(x). For the sample pathx(t) dynamics given by the SDE (13), the flow of the jointPDF ρ (x, t) is governed by the FPK PDE

∂ρ

∂t= ∇ · (ρ∇ψ) + β−1∆ρ, ρ(x, 0) = ρ0(x), (14)

and its solution satisfies ρ ≥ 0,∫Rn ρ dx = 1 for all t ∈

[0,∞). It is easy to verify that the unique stationary solutionof (14) is the Gibbs PDF ρ∞(x) = κ exp (−βψ(x)), wherethe normalizing constant κ :=

∫Rn exp(−βψ(x)) is referred

to as the partition function.A Lyapunov functional associated with the FPK PDE (14)

is the free energy

F (ρ) := Eρ[ψ + β−1 log ρ

](15)

= β−1DKL (ρ ‖ exp (−βψ(x))) ≥ 0, (16)

that decays [17] along the solution trajectory of (14), i.e.,ddtF < 0. This follows from re-writing (14) as

∂ρ

∂t= ∇ · (ρ∇ζ) , where ζ := β−1 (1 + log ρ) + ψ, (17)

and consequentlyd

dtF = −Eρ

[‖ ∇ζ ‖2

]< 0, (18)

with equality achieved at the stationary solution ρ∞ =κe−βψ(x). In our context, (18) serves as the infinite-dimensional analog of (10). The term free energy is mo-tivated by noting that (15) can be seen as the sum ofthe potential energy

∫Rn ψ(x)ρ dx and the internal energy

β−1∫Rn ρ log ρ dx. When ψ = 0, the PDE (14) reduces to

the heat equation, which by (15), can then be interpreted asan entropy maximizing flow.

The seminal paper [17] establishes that the FPK PDE (14)can be seen as the gradient descent flow of the free energyfunctional F (·) w.r.t. the 2-Wasserstein Metric. Specifically,the solution of (14) can be recovered from the followingproximal recursion of the form (3):

%k = proxW2

hF (·)(%k−1) (19a)

= arg inf%∈D2

1

2W 2(%k−1, %) + h F (%), k = 1, 2, . . . (19b)

with %0 ≡ ρ0(x) (from (14)) as h ↓ 0. Next, we develop aframework to numerically solve (19).

IV. MAIN RESULTS

To solve (19), we discretize time as t = 0, h, 2h, . . .,and develop an algorithm to solve (19) without makingany spatial discretization. In other words, we would like toperform the recursion (19) on weighted scattered point cloud{xik, %ik}Ni=1 of cardinality N at tk = kh, k ∈ N, wherethe location of the point xik ∈ Rn denotes the state-spacecoordinate, and the corresponding weight %ik ∈ R≥0 denotesthe value of the joint PDF evaluated at that point at time tk.Such weighted scattered point cloud representation of (19)results in the following problem:

%k = arg min%

{min

M∈Π(%k−1,%)

1

2〈Ck,M〉+ h 〈ψk−1

+β−1 log%,%〉}, (20)

to be solved for k = 1, 2, . . ., where the drift potential vectorψk−1 ∈ RN is given by

ψk−1(i) := ψ(xik−1

), i = 1, 2, . . . , N.

Similarly, the probability vectors %,%k−1 ∈ RN . Further-more, for each k = 1, 2, . . ., the matrix Ck ∈ RN×N isgiven by

Ck(i, j) :=‖ xik − xjk−1 ‖22, i, j = 1, 2, . . . , N,

and Π(%k−1,%) stands for the set of all matrices M ∈RN×N such that

M ≥ 0, M1 = %k−1, M>1 = %. (21)

Due to the nested minimization structure in (20), its numer-ical solution is far from obvious. Notice that the inner min-imization in (20) is a standard linear programming problemif it were to be solved for a given %, as in the Monge-Kantorovich optimal mass transport [16]. However, the outerminimization in (20) precludes a direct numerical approach.

To circumvent the aforesaid issues, following [22], wefirst regularize and then dualize (20). Specifically, addingan entropic regularization H(M) := 〈M , logM〉 in (20)yields

%k = arg min%

{min

M∈Π(%k−1,%)

1

2〈Ck,M〉+ εH(M)

+h 〈ψk−1 + β−1 log%,%〉}, (22)

where ε > 0 is a regularization parameter. The entropicregularization is standard in optimal mass transport liter-ature [23], [24] and leads to efficient Sinkhorn iterationfor the inner minimization. In our context, the entropicregularization “algebrizes” the inner minimization in thesense if λ0,λ1 are Lagrange multipliers associated with the

Page 4: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

equality constraints in (21), then the optimal coupling matrixMopt := [mopt(i, j)] in (22) has the Sinkhorn form

mopt(i, j) = exp (λ0(i)h/ε) exp (−Ck(i, j)/(2ε))

exp (λ1(j)h/ε) . (23)

Since the objective in (22) is proper convex and lower semi-continuous in %, the strong duality holds, and we considerthe Lagrange dual of (22) given by:

λopt0 ,λopt

1 = arg maxλ0,λ1≥0

{〈λ0,%k−1〉 − F ?(−λ1)

− εh

(exp(λ>0 h/ε) exp(−Ck/2ε) exp(λ1h/ε)

)}, (24)

where

F ?(y) := supx∈Rn

{〈y,x〉 − F (x)} (25)

is the Legendre-Fenchel transform of the free energy F (·)given by (15). Next, we derive the first order optimalityconditions for (24), and then provide an algorithm to solvethe same.

A. Conditions for Optimality

Given the vectors %k−1,ψk−1, the matrix Ck, and thepositive scalars β, h, ε in (24), let

y := exp(λ0h/ε), z := exp(λ1h/ε), (26)Γk := exp(−Ck/2ε), ξk−1 := exp(−βψk−1 − 1). (27)

The following result provides a way of computing λopt0 ,λopt

1

in (24), and consequently %k in (22).Theorem 1: The vectors λopt

0 ,λopt1 in (24) can be found

by solving for y and z from the following system ofequations:

y � (Γkz) = ρk−1, (28a)

z �(Γk>y)

= ξk−1 � z−βεh , (28b)

and then inverting the maps (26). The vector %k in (22),i.e., the proximal update (Fig. 1) can then be obtained as

%k = zopt �(Γk>yopt

), (29)

where (yopt, zopt) denotes the solution of (28).Proof: By (15) and (25), we have

F ?(λ) = sup%∈RN

{λ>%−ψ>%− β−1%> log%

}. (30)

We seek an explicit algebraic expression of (30) to besubstituted in (24). Setting the gradient of the objectivefunction in (30) w.r.t. % to zero, and solving for % yields

%max = exp(β(λ−ψ)− 1). (31)

Substituting (31) back into (30), results

F ?(λ) = β−11> exp(β(λ−ψ)− 1). (32)

Fixing λ1, and taking the gradient of the objective in (24)w.r.t. λ0, gives (28a). Likewise, fixing λ0, and taking thegradient of the objective in (24) w.r.t. λ1 gives

∇λ1F?(−λ1) = z �

(Γk>y). (33)

Using (32) to simplify the left-hand-side of (33) results in(28b). To derive (29), notice that combining the last equalityconstraint in (21) with (23), (26) and (27) gives

%k = (Mopt)>1 =

N∑

j=1

mopt(j, i) = z(i)

N∑

j=1

Γk(j, i)y(j),

which is equal to z � Γ>k y, as claimed.

B. Algorithm

1) Proximal recursion: We now propose a block co-ordinate iteration scheme to solve (28). Specifically, theproposed procedure, which we call PROXRECUR, and detailin Algortihm 1, takes %k−1 as input and returns the proximalupdate %k as output for k = 1, 2, . . .. In addition to the data%k−1,ψk−1,Ck, β, h, ε,N , the Algorithm 1 requires two pa-rameters as user input: numerical tolerance δ, and maximumnumber of iterations L. The computation in Algorithm 1, aspresented, involves making an initial guess for the vector zand then updating y and z until convergence.

Algorithm 1 Proximal recursion to compute %k from %k−1

1: procedure PROXRECUR(%k−1, ψk−1, Ck, β, h, ε, N ,δ, L)

2: Γk ← exp(−Ck/2ε)3: ξ ← exp(−βψk−1 − 1)4: z0 ← randN×1 . initialize5: z ←

[z0,0N×(L−1)

]

6: y ←[%k−1 � (Γkz0) ,0N×(L−1)

]

7: ` = 1 . iteration index8: while ` ≤ L do9: z(:, `+ 1)←

(ξk−1 �

(Γ>k y(:, `)

)) 11+βε/h

10: y(:, `+ 1)← %k−1 � (Γkz(:, `+ 1))11: if ‖ y(:, `+1)−y(:, `) ‖< δ & ‖ z(:, `+1)−z(:

, `) ‖< δ then . error within tolerance12: break13: else14: `← `+ 115: end if16: end while17: return %k ← z(:, `)�

(Γ>k y(:, `)

)

18: end procedure

Several questions arise: how can one ensure that sucha procedure converges? Also, even if convergence can beguaranteed, is the rate fast in practice? The latter issue isimportant since the time-step h in the JKO scheme is small,and during the computation of Algorithm 1, the physical timeis “frozen”. We will establish the convergence guaranteed byshowing certain contractive properties of the recursion givenin Algorithm 1. Before doing so, we next outline the overallalgorithmic setup to implement the proximal recursion overprobability weighted scattered point cloud data.

Page 5: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

D2. Then, the idea is to design the metric d(·, ·) and thefunctional �(·) in (3) such that %k(x) ! ⇢(x, t = kh) ash # 0, i.e., in the small time-step limit, the solution of thevariational recursion (3) converges (in strong L1 sense) tothat of (1). The main result in [17] was to show that for FPKoperators of the form (2) with f being a gradient vector fieldand g being a scalar multiple of identity matrix, the distanced(·, ·) can be taken as the Wasserstein-2 metric with �(·) asthe free energy functional. We will make these ideas precisein Section II and III. The resulting variational recursion (3)has since been known as the Jordan-Kinderlehrer-Otto (JKO)scheme [18], and we will refer the FPK operator with suchassumptions on f and g to be in “JKO canonical form”.Similar gradient descent schemes have been derived for manyother PDEs; see e.g., [19] for a recent survey.

To motivate gradient descent in infinite dimensionalspaces, we appeal to a more familiar setting, i.e., gradientdescent in Rn associated with the flow

dx

dt= �r' (x) x(0) = x0, (4)

where x, x0 2 Rn and ' : Rn ! R�0, and is continuouslydifferentiable. The Euler discretization for (4) is given by

xk � xk�1 = �hr'(xk�1), (5)

which can be rewritten as a variational recursion

xk = arg minx

1

2k x� xk�1 k2 +h '(x) + o(h). (6)

In the optimization literature, the mapping xk�1 7! xk,given by

proxk·kh'(xk�1) := arg min

x

1

2k x� xk�1 k2 +h '(x), (7)

is called the “proximal operator” [20, p. 142]. The sequence{xk} generated by the proximal recursion

xk = proxk·kh'(xk�1), k = 0, 1, 2, . . . (8)

converges to the flow of the ODE (4), i.e., the sequencesatisfies xk ! x(t = kh) as the step-size h # 0. Using thefinite dimensional viewpoint (7), we define

proxd2

h�(%k�1) := arg inf%2D2

1

2d2 (%, %k�1) + h �(%), (9)

as an infinite dimensional proximal operator. As mentionedabove, the sequence {%k} generated by the proximal re-cursion (3) converges to the flow of the PDE (4), i.e., thesequence satisfies %k(x) ! ⇢(x, t = kh) as the step-sizeh # 0. We also note that in the finite dimensional case,

d

dt' = hr',�r'i = � k r' k2< 0 (10)

which implies ' decays along the flow of (4). As we will seenext, the appeal of using (3) to solve the FPK PDE comesfrom the fact that the Euclidean gradient descent can begeneralized to the manifold D2 by appropriately choosingthe metric d(·, ·) and the functional �(·) in (3), in parallelwith the quantities k · k and '(·) in (8), respectively.

Fig. 1: The JKO scheme can be described by successive evaluationof proximal operators to recursively update PDFs from time t =(k � 1)h to t = kh for k = 1, 2, . . ., and time-step h > 0.

In this paper, we will develop an algorithm to solve theFPK PDE via proximal recursion of the form (3) withoutmaking any spatial discretization. A schematic is shown inFig. 1. The resulting recursion is proved to be contractive andenjoy fast numerical implementation. Numerical simulationresults show the efficacy of the proposed formulation.

II. PRELIMINARIES

In the following, we provide the definitions of theKullback-Leibler divergence, and the 2-Waserstein metric,which will be useful in the sequel. We also point out somenotations used throughout this paper.

Definition 1: The Kullback-Leibler divergence betweentwo probability measures d⇡i(x) = ⇢i(x)dx, i = {1, 2},is given by

DKL (d⇡1 k d⇡2) :=

Z⇢1(x) log

⇢1(x)

⇢2(x)dx, (11)

which is non-negative, and vanishes if and only if ⇢1 = ⇢2.However, (11) is not a metric since it is neither symmetric,nor does it satisfy the triangle inequality.

Definition 2: The 2-Wasserstein metric between two prob-ability measures d⇡1(x) = ⇢1(x)dx and d⇡2(y) = ⇢2(y)dysupported respectively on X , Y ✓ Rn, is denoted asW (⇡1,⇡2) (equivalently, W (⇢1, ⇢2) whenever ⇡1,⇡2 areabsolutely continuous so that the PDFs ⇢1, ⇢2 exist, andarises in the theory of optimal mass transport [16]; it isdefined as

W (⇡1,⇡2) :=✓

infd⇡2⇧(⇡1,⇡2)

Z

X⇥Yk x� y k22 d⇡ (x, y)

◆ 12

, (12)

where ⇧ (⇡1,⇡2) denotes the collection of all probabilitymeasures on the product space X ⇥ Y having finite secondmoments, with marginals ⇡1 and ⇡2, respectively. Its square,W 2(⇡1,⇡2) equals [21] the minimum amount of work re-quired to transport ⇡1 to ⇡2 (or equivalently, ⇢1 to ⇢2). It iswell-known [16, Ch. 7] that W (⇡1,⇡2) defines a metric onthe manifold D2.

Notations: Throughout the paper, we will use bold-facedcapital letters for matrices and bold-faced lower-case lettersfor column vectors. We use the symbol h·, ·i to denote the Eu-clidean inner product. In particular, hA, Bi := trace(A>B)

D2. Then, the idea is to design the metric d(·, ·) and thefunctional �(·) in (3) such that %k(x) ! ⇢(x, t = kh) ash # 0, i.e., in the small time-step limit, the solution of thevariational recursion (3) converges (in strong L1 sense) tothat of (1). The main result in [17] was to show that for FPKoperators of the form (2) with f being a gradient vector fieldand g being a scalar multiple of identity matrix, the distanced(·, ·) can be taken as the Wasserstein-2 metric with �(·) asthe free energy functional. We will make these ideas precisein Section II and III. The resulting variational recursion (3)has since been known as the Jordan-Kinderlehrer-Otto (JKO)scheme [18], and we will refer the FPK operator with suchassumptions on f and g to be in “JKO canonical form”.Similar gradient descent schemes have been derived for manyother PDEs; see e.g., [19] for a recent survey.

To motivate gradient descent in infinite dimensionalspaces, we appeal to a more familiar setting, i.e., gradientdescent in Rn associated with the flow

dx

dt= �r' (x) x(0) = x0, (4)

where x, x0 2 Rn and ' : Rn ! R�0, and is continuouslydifferentiable. The Euler discretization for (4) is given by

xk � xk�1 = �hr'(xk�1), (5)

which can be rewritten as a variational recursion

xk = arg minx

1

2k x� xk�1 k2 +h '(x) + o(h). (6)

In the optimization literature, the mapping xk�1 7! xk,given by

proxk·kh'(xk�1) := arg min

x

1

2k x� xk�1 k2 +h '(x), (7)

is called the “proximal operator” [20, p. 142]. The sequence{xk} generated by the proximal recursion

xk = proxk·kh'(xk�1), k = 0, 1, 2, . . . (8)

converges to the flow of the ODE (4), i.e., the sequencesatisfies xk ! x(t = kh) as the step-size h # 0. Using thefinite dimensional viewpoint (7), we define

proxd2

h�(%k�1) := arg inf%2D2

1

2d2 (%, %k�1) + h �(%), (9)

as an infinite dimensional proximal operator. As mentionedabove, the sequence {%k} generated by the proximal re-cursion (3) converges to the flow of the PDE (4), i.e., thesequence satisfies %k(x) ! ⇢(x, t = kh) as the step-sizeh # 0. We also note that in the finite dimensional case,

d

dt' = hr',�r'i = � k r' k2< 0 (10)

which implies ' decays along the flow of (4). As we will seenext, the appeal of using (3) to solve the FPK PDE comesfrom the fact that the Euclidean gradient descent can begeneralized to the manifold D2 by appropriately choosingthe metric d(·, ·) and the functional �(·) in (3), in parallelwith the quantities k · k and '(·) in (8), respectively.

Fig. 1: The JKO scheme can be described by successive evaluationof proximal operators to recursively update PDFs from time t =(k � 1)h to t = kh for k = 1, 2, . . ., and time-step h > 0.

In this paper, we will develop an algorithm to solve theFPK PDE via proximal recursion of the form (3) withoutmaking any spatial discretization. A schematic is shown inFig. 1. The resulting recursion is proved to be contractive andenjoy fast numerical implementation. Numerical simulationresults show the efficacy of the proposed formulation.

II. PRELIMINARIES

In the following, we provide the definitions of theKullback-Leibler divergence, and the 2-Waserstein metric,which will be useful in the sequel. We also point out somenotations used throughout this paper.

Definition 1: The Kullback-Leibler divergence betweentwo probability measures d⇡i(x) = ⇢i(x)dx, i = {1, 2},is given by

DKL (d⇡1 k d⇡2) :=

Z⇢1(x) log

⇢1(x)

⇢2(x)dx, (11)

which is non-negative, and vanishes if and only if ⇢1 = ⇢2.However, (11) is not a metric since it is neither symmetric,nor does it satisfy the triangle inequality.

Definition 2: The 2-Wasserstein metric between two prob-ability measures d⇡1(x) = ⇢1(x)dx and d⇡2(y) = ⇢2(y)dysupported respectively on X , Y ✓ Rn, is denoted asW (⇡1,⇡2) (equivalently, W (⇢1, ⇢2) whenever ⇡1,⇡2 areabsolutely continuous so that the PDFs ⇢1, ⇢2 exist, andarises in the theory of optimal mass transport [16]; it isdefined as

W (⇡1,⇡2) :=✓

infd⇡2⇧(⇡1,⇡2)

Z

X⇥Yk x� y k22 d⇡ (x, y)

◆ 12

, (12)

where ⇧ (⇡1,⇡2) denotes the collection of all probabilitymeasures on the product space X ⇥ Y having finite secondmoments, with marginals ⇡1 and ⇡2, respectively. Its square,W 2(⇡1,⇡2) equals [21] the minimum amount of work re-quired to transport ⇡1 to ⇡2 (or equivalently, ⇢1 to ⇢2). It iswell-known [16, Ch. 7] that W (⇡1,⇡2) defines a metric onthe manifold D2.

Notations: Throughout the paper, we will use bold-facedcapital letters for matrices and bold-faced lower-case lettersfor column vectors. We use the symbol h·, ·i to denote the Eu-clidean inner product. In particular, hA, Bi := trace(A>B)

Fig. 2: Schematic of the proposed algorithmic setup for propagatingthe joint state PDF as probability weighted scattered point cloud{xik, %ik}Ni=1. The location of the points {xik}Ni=1 are updated viaEuler-Maruyama scheme; the corresponding probability weights areupdated via Algorithm 1.

2) Overall scheme: Samples from the known initial jointPDF ρ0 are generated as point cloud {xi0, %i0}Ni=1. Then fork = 1, 2, . . ., the point clouds {xik, %ik}Ni=1 are updated asshown in Fig. 2. Specifically, the state vectors are updated viaEuler-Maruyama scheme applied to the underlying SDE; thecorresponding probability weights are updated via Algorithm1. Notice that computing Ck requires both {xik−1}Ni=1 and{xik}Ni=1, and that Ck needs to be passed as input toAlgorithm 1. Thus, the execution of Euler-Maruyama schemeprecedes that of Algorithm 1.

C. Convergence

The following Definition 3 and Proposition 1 will be usefulin proving Theorem 2 that follows which establishes theconvergence of Algorithm 1.

Definition 3: (Thompson metric) Consider z, z ∈ K,where K is a non-empty open convex cone. Further, supposethat K is a normal cone, i.e., there exists constant α suchthat ‖ z ‖≤ α ‖ z ‖ for z ≤ z. Thompson [25] proved thatK is a complete metric space w.r.t. the so-called Thompsonmetric given by

dT (z, z) := max{log γ(z/z), log γ(z/z)},where γ(z/z) := inf{c > 0 | z ≤ cz}. In particular, ifK ≡ Rn>0 (positive orthant of Rn), then

dT (z, z) = log max

{max

i=1,...,n

(zizi

), maxi=1,...,n

(zizi

)}. (34)

Proposition 1: [26, Proposition 3.2], [27] Let K be anopen, normal, convex cone and let φ : K 7→ K be an orderpreserving homogeneous map of degree r ≥ 0. Then, for allz, z ∈ K, we have

dT (φ(z),φ(z)) ≤ rdT (z, z) .

In particular, if r ∈ [0, 1), then the map φ(·) is strictlycontractive in the Thompson metric dT, and admits uniquefixed point in K.

Using (34) and Proposition 1, we establish the convergenceresult below.

Theorem 2: Consider the notations in (26)-(27), and thosein Algorithm 1. The iteration

z(:, `+ 1) =(ξk−1 �

(Γ>k y(:, `)

)) 11+βε/h

=(ξk−1 �

(Γ>k %k−1 � (Γkz(:, `))

)) 11+βε/h

(35)

for ` = 1, 2, . . ., is strictly contractive in the Thompsonmetric (34) on Rn>0, and admits unique fixed point zopt ∈Rn>0.

Proof: Rewriting (35) as

z(:, `+ 1) =(ξk−1 �

(Γ>k %k−1

)� (Γkz(:, `))

) 11+βε/h ,

and letting η ≡ ηk,k+1 := ξk−1 �(Γ>k %k−1

), we notice

that iteration (35) can be expressed as a cone preservingcomposite map θ := θ1◦θ2◦θ3◦θ4, where θ : Rn>0 7→ Rn>0,given by

z(:, `+ 1) = θ (z(:, `)) = θ1 ◦ θ2 ◦ θ3 ◦ θ4 (z(:, `)) , (36)

and θ1(z) := z1

1+βε/h , θ2(z) := η � z, θ3 := 1 � z,θ4(z) := Γkz. Our strategy is to prove that the compositemap θ is contractive on Rn>0 w.r.t. the metric dT.

From (27), notice that since Ck(i, j) ∈ [0,∞) we haveΓk(i, j) ∈ (0, 1]; therefore, Γk is a positive linear mapfor each k = 1, 2, . . .. Thus, by (linear) Perron-Frobeniustheorem, the map θ4 is contractive on Rn>0 w.r.t. dT. The mapθ3 involves element-wise inversion, which is an isometryon Rn>0 w.r.t. dT. Also, the map θ2 is an isometry byDefinition 3. As for the map θ1, notice that the quantityr := 1/(1 + βε/h) ∈ (0, 1) since βε/h > 0. Therefore, themap θ1(z) := zr (element-wise exponentiation) is monotone(order preserving) and homogeneous of degree r ∈ (0, 1) onRn>0. By Proposition 1, the map θ1(z) is strictly contractive.Thus, the composition

θ = θ1︸︷︷︸strictly contractive

◦ θ2︸︷︷︸isometry

◦ θ3︸︷︷︸isometry

◦ θ4︸︷︷︸contractive

is strictly contractive w.r.t. dT, and (by Banach contractionmapping theorem) admits unique fixed point zopt in Rn>0.

Corollary 3: The Algorithm 1 converges to unique fixedpoint (yopt, zopt) ∈ Rn>0 × Rn>0.

Proof: Since y(:, ` + 1) = %k−1 � (Γkz(:, `+ 1)),the z iterates converge to unique fixed point zopt ∈ Rn>0

(by Theorem 2), and the linear maps Γk are contractive(by Perron-Frebenius theory, as before), consequently the yiterates also converge to unique fixed point yopt ∈ Rn>0.Hence the statement.

V. NUMERICAL SIMULATION

In this section, we apply the algorithmic setup proposedin Section IV.B to few examples illustrating the numericalapproach. Our examples involve systems which are alreadyin JKO canonical form (Section III), as well as those whichcan be transformed to such form by non-obvious change ofcoordinates.

Page 6: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

Fig. 3: Comparison of the analytical and proximal solutions of the FPK PDE for (39) with time step h = 10−3, and with parametersa = 1, β = 1, ε = 5× 10−2. Shown above are the time evolution of the (left) PDFs, (middle) means, and (right) variances.

A. Linear Gaussian System

For an Ito SDE of the form

dx = Ax dt+B dw, (37)

it is well known that if x0 := x(t = 0) ∼ N (µ0,Σ0),then the transient joint PDFs ρ(x, t) = N (µ(t),Σ(t)) wherethe vector-matrix pair (µ(t),Σ(t)) evolve according to theODEs

µ(t) = Aµ, µ(0) = µ0, (38a)

Σ(t) = AΣ(t) +AΣ(t)> +BB>, Σ(0) = Σ0. (38b)

We benchmark the numerical results produced by theproposed proximal algorithm vis-a-vis the above analyticalsolutions. We consider the following two sub-cases of (37).

1) Ornstein-Uhlenbeck Process: We consider the 1D sys-tem

dx = −ax dt+√

2β−1dw, a, β > 0, (39)

which is in JKO canonical form with ψ(x) = 12ax

2.We generate N = 400 samples from the initial PDFρ0 = N (µ0, σ

20) with µ0 = 5 and σ2

0 = 4 × 10−2, andapply the proposed proximal recursion for (39) with timestep h = 10−3, and with parameters a = 1, β = 1,ε = 5 × 10−2. For implementing Algorithm 1, we settolerance δ = 10−3, and maximum number of iterationsL = 100. Fig. 3 shows that the PDF point clouds gener-ated by the proximal recursion match with the analyticalPDFs N

(µ0 exp(−at), (σ2

0 − 1aβ ) exp(−2at) + 1

), and

the mean-variance trajectories (computed from the numericalintegration of the point cloud data) match with the corre-sponding analytical solutions.

2) Multivariate LTI: We next consider the multivariatecase (37) where the pair (A,B) is assumed to be con-trollable, and the matrix A is Hurwitz (not necessarilysymmetric). Under these assumptions, the stationary PDF isN (0,Σ∞) where Σ∞ is the unique stationary solution of(38b) that is guaranteed to be symmetric positive definite.However, it is not apparent whether (37) can be expressedin the form (13), since for non-symmetric A, there does not

exist constant symmetric positive definite matrix Ψ such thatAx = −∇x>Ψx, i.e., the drift vector field does not admita natural potential. Thus, implementing the JKO scheme for(37) is non-trivial in general.

In a recent work [28], two successive time-varying co-ordinate transformations were given which can bring (37) inthe form (13), thus making it amenable to the JKO scheme.We apply these change-of-coordinates to (37) with

A =

(−10 5−30 0

), B =

(2

2.5

),

which satisfy the stated assumptions on (A,B), and imple-ment the proposed proximal recursion on this transformedco-ordinates with N = 400 samples generated from theinitial PDF ρ0 = N (µ0,Σ0), where µ0 = (4, 4)> andΣ0 = 4I2. As before, we set δ = 10−3, L = 100, h =10−3, β = 1, ε = 5 × 10−2. Once the proximal updates aredone, we transform back the probability weighted scatteredpoint cloud to the original state space co-ordinates viachange-of-measure formula associated with the known co-ordinate transforms [28, Section III.B]. Fig. 4 shows theresulting point clouds superimposed with the contour plotsfor the analytical solutions N (µ(t),Σ(t)) given by (38).Figs. 5 and 6 compare the respective mean and covarianceevolution. We point out that the change of co-ordinatesin [28] requires implementing the JKO scheme in a time-varying rotating frame (defined via exponential of certaintime varying skew-symmetric matrix) that depends on thestationary covariance Σ∞. As a consequence, the stationarycovariance resulting from the proximal recursion oscillatesabout the true stationary value.

B. Nonlinear non-Gaussian System

Next we consider the 2D nonlinear system of the form

(13) with ψ(x1, x2) =1

4(1 +x4

1) +1

2(x2

2−x21) (see Fig. 7).

As mentioned in Section III, the stationary PDF is ρ∞(x) =κ exp (−βψ(x)), which for our choice of ψ, is bimodal. Thetransient PDFs have no known analytical solution but can becomputed using the proposed proximal recursion. For doingso, we generate N = 400 samples from the initial PDF

Page 7: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

-1 8

0

2

4

6

8

x2

0.002

0.00

2

0.00

4

0.004

0.00

4

0.006

0.006

0.00

8

0.01

0

0.012

0.014

0.01

6

t = 0.0

-3 4

�5

0

5

0.0

02

0.0

02

0.0

02

0.0

02

0.0

03

0.0

03

0.0

05

0.0

05

0.00

6

0.0

06

0.0

07

0.0

09

0.0

11

0.01

2

0.013

t = 0.5

-4 5

�5

0

5

0.0

02

0.0

02

0.0

04

0.0

04

0.0

06

0.0

08

0.0

100.0

12

t = 1.0

-4 3x1

�5

0

5

x2 0.0

02

0.00

3

0.003

0.00

3

0.0

05

0.0

05

0.0

06

0.0

07

0.00

9

0.0

11

0.0

12

0.0

13

t = 2.0

-3 3x1

�5

0

5

10

0.00

2

0.002

0.0

03

0.003

0.00

30.0

05

0.0

06 0.

007

0.009

0.01

1

0.0

120.013

t = 3.0

-3 4x1

�5

0

5

0.00

2

0.00

2

0.0

04

0.0

04

0.0

06

0.00

8

0.01

0

0.0

12

t = 4.0

5 10 15 20 25

⇢analytical ⇢proximal

Fig. 4: Comparison of the analytical (contour plots) and proximal(weighted scattered point cloud) joint PDFs of the FPK PDE for(37) with time step h = 10−3, and with parameters β = 1, ε =5× 10−2. Simulation details are given in Section V.A.2. The color(red = high, blue = low) denotes the joint PDF value obtained viaproximal recursion at a point at that time (see colorbar).

0

2

4

µx

µxanalytical

µxproximal

0 1 2 3 4

t

�4

�2

0

2

4

µy

µyanalytical

µyproximal

Fig. 5: Comparison of the components of the mean vectors fromanalytical (dashed) and proximal (solid) computation of the jointPDFs for (37) with time step h = 10−3, and with parameters β =1, ε = 5× 10−2. Simulation details are given in Section V.A.2.

ρ0 = N (µ0,Σ0) with µ0 = (2, 2)> and Σ0 = 4I2, andset δ = 10−3, L = 100, h = 10−3, β = 1, ε = 5 × 10−2,as before. The resulting weighted point clouds are shown inFig. 8; it can be seen that as time progresses, the joint PDFscomputed via the proximal recursion, tend to the knownstationary solution ρ∞ (contour plots in the right bottomsub-figure in Fig. 8).

Fig. 9 shows the computational times for the proposedproximal recursions applied to the above nonlinear non-Gaussian system. Since the proposed algorithm involves sub-iterations (see while loop in Algorithm 1) while keepingthe physical time “frozen”, the convergence reported in

1

4

P11

P11analytical

P11proximal

�3

3

P12P12analytical

P12proximal

0 1 2 3 4

t

0

20

P22

P22analytical

P22proximal

Fig. 6: Comparison of the components of the covariance matricesfrom analytical (dashed) and proximal (solid) computation of thejoint PDFs for (37) with time step h = 10−3, and with parametersβ = 1, ε = 5×10−2. Simulation details are given in Section V.A.2.

Fig. 7: The drift potential ψ(x1, x2) =1

4(1 + x41) +

1

2(x22 − x21)

used in the example given in Section V.B.

Section IV.C must be achieved at “sub-physical time step”level, i.e., must incur smaller than h (here, h = 10−3 s)computational time. Indeed, Fig. 9 shows that each proximalupdate takes approx. 10−6 s, or 10−3h computational time,which demonstrates the efficacy of the proposed framework.

VI. CONCLUSIONS

We proposed a variational recursion to numerically solvethe transient Fokker-Planck or Kolmogorov’s forward equa-tion by exploiting the underlying infinite-dimensional gra-dient flow structure in the manifold of PDFs. From acomputational standpoint, this work develops a novel pointcloud solver for performing the Otto calculus avoiding spa-tial discretization or function approximation. From systems-theoretic standpoint, this work contributes to an emergingresearch program [28], [29] in uncovering new geometricmeanings of the equations of uncertainty propagation andfiltering, and using the same to efficiently solve these equa-tions via proximal algorithms [20].

Page 8: Kenneth F. Caluya, Abhishek Halder - arXiv · Kenneth F. Caluya, Abhishek Halder Abstract—We develop a new method to solve the Fokker-Planck or Kolmogorov’s forward equation that

-4 7

�2.5

0.0

2.5

5.0

x2

t = 0.0

-1 2�2

0

2

4

6

t = 0.5

-2 2

�2

0

2

4t = 1.0

-1 2x1

�2

0

2

x2

t = 2.0

-1 2x1

�2

0

2

t = 3.0

2x1

�2

0

2 0.015

0.030

0.0

45

0.0600.07

5

0.090

0.105

0.105

0.120

0.1

20

t = 4.0

10 20 30 40 50

⇢1analytical = 1Z exp (�� (x1, x2)) ⇢proximal

Fig. 8: The proximal (weighted scattered point cloud) joint PDFs ofthe FPK PDE (14) with the drift potential shown in Fig. 7, time steph = 10−3, and with parameters β = 1, ε = 5 × 10−2. Simulationdetails are given in Section V.B. The color (red = high, blue = low)denotes the joint PDF value obtained via proximal recursion at apoint at that time (see colorbar).

1 2 3 4

Physical time tk = kh (seconds)

10�6

Com

puta

tion

alti

me

(sec

onds)

Fig. 9: The computational times for proximal updates. Simulationdetails are given in Section V.B. Here, the physical time-step h =10−3 s, and k = 1, 2, . . ..

REFERENCES

[1] M. Ehrendorfer, “The Liouville equation and its potential usefulnessfor the prediction of forecast skill. part I: Theory,” Monthly WeatherReview, vol. 122, no. 4, pp. 703–713, 1994.

[2] A. Halder and R. Bhattacharya, “Dispersion analysis in hypersonicflight during planetary entry using stochastic Liouville equation,”Journal of Guidance, Control, and Dynamics, vol. 34, no. 2, pp. 459–474, 2011.

[3] S. Hess, “Fokker-Planck-equation approach to flow alignment in liquidcrystals,” Zeitschrift fur Naturforschung A, vol. 31, no. 9, pp. 1034–1037, 1976.

[4] W. Muschik and B. Su, “Mesoscopic interpretation of Fokker-Planckequation describing time behavior of liquid crystal orientation,” TheJournal of Chemical Physics, vol. 107, no. 2, pp. 580–584, 1997.

[5] Y. P. Kalmykov and W. T. Coffey, “Analytical solutions for rotationaldiffusion in the mean field potential: application to the theory of di-

electric relaxation in nematic liquid crystals,” Liquid crystals, vol. 25,no. 3, pp. 329–339, 1998.

[6] W. Park, J. S. Kim, Y. Zhou, N. J. Cowan, A. M. Okamura, and G. S.Chirikjian, “Diffusion-based motion planning for a nonholonomicflexible needle model,” in Robotics and Automation, 2005. ICRA 2005.Proceedings of the 2005 IEEE International Conference on. IEEE,2005, pp. 4600–4605.

[7] W. Park, Y. Liu, Y. Zhou, M. Moses, and G. S. Chirikjian, “Kinematicstate estimation and motion planning for stochastic nonholonomicsystems using the exponential map,” Robotica, vol. 26, no. 4, pp. 419–434, 2008.

[8] H. Hamann and H. Worn, “A framework of space–time continuousmodels for algorithm design in swarm robotics,” Swarm Intelligence,vol. 2, no. 2-4, pp. 209–239, 2008.

[9] S. Challa and Y. Bar-Shalom, “Nonlinear filter design using Fokker-Planck-Kolmogorov probability density evolutions,” IEEE Transac-tions on Aerospace and Electronic Systems, vol. 36, no. 1, pp. 309–315, 2000.

[10] F. Daum, “Nonlinear filters: beyond the Kalman filter,” IEEEAerospace and Electronic Systems Magazine, vol. 20, no. 8, pp. 57–69,2005.

[11] A. Halder and R. Bhattacharya, “Model validation: A probabilisticformulation,” in Decision and Control and European Control Confer-ence (CDC-ECC), 2011 50th IEEE Conference on. IEEE, 2011, pp.1692–1697.

[12] ——, “Further results on probabilistic model validation in Wassersteinmetric,” in Decision and Control (CDC), 2012 IEEE 51st AnnualConference on. IEEE, 2012, pp. 5542–5547.

[13] ——, “Probabilistic model validation for uncertain nonlinear systems,”Automatica, vol. 50, no. 8, pp. 2038–2050, 2014.

[14] P. J. Flory and M. Volkenstein, Statistical mechanics of chainmolecules. Wiley, 1969.

[15] R. E. Bellman, Dynamic Programming. Courier Dover Publications,1957.

[16] C. Villani, Topics in optimal transportation. American MathematicalSoc., 2003, no. 58.

[17] R. Jordan, D. Kinderlehrer, and F. Otto, “The variational formulationof the Fokker–Planck equation,” SIAM Journal on MathematicalAnalysis, vol. 29, no. 1, pp. 1–17, 1998.

[18] L. Ambrosio, N. Gigli, and G. Savare, Gradient flows: in metricspaces and in the space of probability measures. Springer Science& Business Media, 2008.

[19] F. Santambrogio, “{Euclidean, metric, and Wasserstein} gradientflows: an overview,” Bulletin of Mathematical Sciences, vol. 7, no. 1,pp. 87–154, 2017.

[20] N. Parikh, S. Boyd et al., “Proximal algorithms,” Foundations andTrends R© in Optimization, vol. 1, no. 3, pp. 127–239, 2014.

[21] J.-D. Benamou and Y. Brenier, “A computational fluid mechanics so-lution to the Monge-Kantorovich mass transfer problem,” NumerischeMathematik, vol. 84, no. 3, pp. 375–393, 2000.

[22] J. Karlsson and A. Ringh, “Generalized Sinkhorn iterations for regu-larizing inverse problems using optimal mass transport,” SIAM Journalon Imaging Sciences, vol. 10, no. 4, pp. 1935–1962, 2017.

[23] M. Cuturi, “Sinkhorn distances: Lightspeed computation of optimaltransport,” in Advances in neural information processing systems,2013, pp. 2292–2300.

[24] J.-D. Benamou, G. Carlier, M. Cuturi, L. Nenna, and G. Peyre, “It-erative Bregman projections for regularized transportation problems,”SIAM Journal on Scientific Computing, vol. 37, no. 2, pp. A1111–A1138, 2015.

[25] A. C. Thompson, “On certain contraction mappings in a partiallyordered vector space,” Proceedings of the American MathematicalSociety, vol. 14, no. 3, pp. 438–443, 1963.

[26] Y. Lim, “Nonlinear equations based on jointly homogeneous map-pings,” Linear Algebra and Its Applications, vol. 430, no. 1, pp. 279–285, 2009.

[27] R. D. Nussbaum, Hilbert’s projective metric and iterated nonlinearmaps. Memoirs of the American Mathematical Soc., 1988, vol. 391.

[28] A. Halder and T. T. Georgiou, “Gradient flows in uncertainty propaga-tion and filtering of linear Gaussian systems,” in Decision and Control(CDC), 2017 IEEE 56th Annual Conference on. IEEE, 2017, pp.3081–3088.

[29] ——, “Gradient flows in filtering and Fisher-Rao geometry,” in 2018Annual American Control Conference (ACC). IEEE, 2018, pp. 4281–4286.