Lecture Notes on Optimal Transport Problemscvgmt.sns.it/media/doc/paper/1008/trasporto.pdf ·...

Lecture Notes on OptimalTransport Problems ∗

Luigi Ambrosio, [email protected] †

Contents

1 Some elementary examples 3

2 Optimal transport plans: existence and regularity 5

3 The one dimensional case 14

4 The ODE version of the optimal transport problem 17

5 The PDE version of the optimal transport problem and the p-laplacian approximation 31

6 Existence of optimal transport maps 35

7 Regularity and uniqueness of the transport density 40

8 The Bouchitte–Buttazzo mass optimization problem 44

9 Appendix: some measure theoretic results 46

Introduction

These notes are devoted to the Monge–Kantorovich optimal transport problem.This problem, in the original formulation of Monge, can be stated as follows: given

∗Lectures given in Madeira (PT), Euro Summer School “Mathematical aspects of evolvinginterfaces”, 2–9 July 2000†Scuola Normale Superiore, Pisa; work partially supported by the GNAFA project “Problemi

di Monge–Kantorovich e strutture deboli”

1

two distributions with equal masses of a given material g0(x), g1(x) (correspondingfor instance to an embankment and an excavation), find a transport map ψ whichcarries the first distribution into the second and minimizes the transport cost

C(ψ) :=

∫X

|x− ψ(x)|g0(x) dx.

The condition that the first distribution of mass is carried into the second can bewritten as ∫

ψ−1(B)

g0(x) dx =

∫B

g1(y) dy ∀B ⊂ X Borel (1)

or, by the change of variables formula, as

g1(ψ(x)) |det∇ψ(x)| = g0(x) for Ln-a.e. x ∈ B

if ψ is one to one and sufficiently regular.More generally one can replace the functions g0, g1 by positive measures f0, f1

with equal mass, so that (1) reads f1 = ψ#f0, and replace the euclidean distance bya generic cost function c(x, y), studying the problem

minψ#f0=f1

∫X

c(x, ψ(x)) df0(x). (2)

The infimum of the transport problem leads also to a c-dependent distance betweenmeasures with equal mass, known as Kantorovich–Wasserstein distance.

The optimal transport problem and the Kantorovich–Wasserstein distance havea very broad range of applications: Fluid Mechanics [9], [10]; Partial DifferentialEquations [32, 29]; Optimization [13], [14] to quote just a few examples. Moreover,the 1-Wasserstein distance (corresponding to c(x, y) = |x − y| in (2)) is related tothe so-called flat distance in Geometric Measure Theory, which plays an importantrole in its development (see [6], [24], [30], [27], [44]). However, rather than showingspecific applications (for which we mainly refer to the Evans survey [21] or to theintroduction of [9]), the main aim of the notes is to present the different formulationsof the optimal transport problem and to compare them, focussing mainly on thelinear case c(x, y) = |x− y|. The main sources for the preparation of the notes havebeen the papers by Bouchitte–Buttazzo [13, 14], Caffarelli–Feldman–McCann [15],Evans–Gangbo [22], Gangbo–McCann [26], Sudakov [42] and Evans [21].

The notes are organized as follows. In Section 1 we discuss some basic examplesand in Section 2 we discuss Kantorovich’s generalized solutions, i.e. the transportplans, pointing out the connection between them and the transport maps. Section 3is entirely devoted to the one dimensional case: in this situation the order structureplays an important role and considerably simplifies the theory. Sections 4 and 5 are

2

devoted to the ODE and PDE based formulations of the optimal transport problem(respectively due to Brenier and Evans–Gangbo); we discuss in particular the role ofthe so-called transport density and the equivalence of its different representations.Namely, we prove that any transport density µ can be represented as

∫ 1

0πt#(|y −

x|γ) dt, where γ is an optimal planning, as∫ 1

0|Et| dt, where Et is the “velocity field”

in the ODE formulation, or as the solution of the PDE div(∇µuµ) = f1 − f0, withno regularity assumption on f1, f0. Moreover, in the same generality we proveconvergence of the p-laplacian approximation.

In Section 6 we discuss the existence of the optimal transport map, followingessentially the original Sudakov approach and filling a gap in his original proof(see also [15, 43]). Section 7 deals with recent results, related to those obtainedin [25], on the regularity and the uniqueness of the transport density. Section 8 isdevoted to the connection between the optimal transport problem and the so-calledmass optimization problem. Finally, Section 9 contains a self contained list of themeasure theoretic results needed in the development of the theory.

Main notation

X a compact convex subset of an Euclidean space Rn

B(X) Borel σ-algebra of XLn Lebesgue measure in Rn

Hk Hausdorff k-dimensional measure in Rn

Lip(X) real valued Lipschitz functions defined on XLip1(X) functions in Lip(X) with Lipschitz constant not greater than 1Σu the set of points where u is not differentiableπt projections (x, y) 7→ x+ t(y − x), t ∈ [0, 1]So(X) open segments ]]x, y[[ with x, y ∈ XSc(X) closed segments [[x, y]] with x, y ∈ X, x 6= yM(X) signed Radon measures with finite total variation in XM+(X) positive and finite Radon measures in XM1(X) probability measures in X|µ| total variation of µ ∈ [M(X)]n

µ+, µ− positive and negative part of µ ∈M(X)f#µ push forward of µ by f

1 Some elementary examples

In this section we discuss some elementary examples that illustrate the kind ofphenomena (non existence, non uniqueness) which can occur. The first one showsthat optimal transport maps need not exist if the first measure f0 has atoms.

3

Example 1.1 (Non existence) Let f0 = δ0 and f1 = (δ1 + δ−1)/2. In this casethe optimal transport problem has no solution simply because there is no map ψsuch that ψ#f0 = f1.

The following two examples deal with the case when the cost function c in X×Xis |x − y|, i.e. the euclidean distance between x and y. In this case we will use asa test for optimality the fact that the infimum of the transport problem is alwaysgreater than

sup

∫X

u d(f1 − f0) : u ∈ Lip1(X)

. (3)

Indeed, ∫X

u d(f1 − f0) =

∫X

u(ψ(x))− u(x) df0(x) ≤∫X

|ψ(x)− x| df0(x)

for any admissible transport ψ. Actually we will prove this lower bound is sharp iff0 has no atom (see (6) and (13)).

Our second example shows that in general the solution of the optimal transportproblem is not unique. In the one-dimensional case we will obtain (see Theorem 3.1),uniqueness (and existence) in the class of nondecreasing maps.

Example 1.2 (Book shifting) Let n ≥ 1 be an integer and f0 = χ[0,n]L1 andf1 = χ[1,n+1]L1. Then the map ψ(t) = t + 1 is optimal. Indeed, the cost relativeto ψ is n and, choosing the 1-Lipschitz function u(t) = t in (3), we obtain that thesupremum is at least n, whence the optimality of ψ follows. But since the minimalcost is n, if n > 1 another optimal map ψ is given by

ψ(t) =

t+ n on [0, 1]

t on [1, n].

In the previous example the two transport maps coincide when n = 1; howeverin this case there is one more (and actually infinitely many) optimal transport map.

Example 1.3 Let f0 = χ[0,1]L1 and f1 = χ[1,2]L1 (i.e. n = 1 in the previousexample). We have already seen that ψ(t) = t + 1 is optimal. But in this case alsothe map ψ(t) = 2− t is optimal as well.

In all the previous examples the optimal transport maps ψ satisfy the conditionψ(t) ≥ t. However it is easy to find examples where this does not happen.

Example 1.4 Let f0 = χ[−1,1]L1 and f1 = δ−1 + δ1. In this case the optimaltransport map ψ is unique (modulo L1-negligible sets); it is identically equal to −1on [−1, 0) and identically equal to 1 on (0, 1]. The verification is left to the readeras an exercise.

4

We conclude this section with some two dimensional examples.

Example 1.5 Assume that 2f0 is the sum of the unit Dirac masses at (1, 1) and(0, 0), and that 2f1 is the sum of the unit Dirac masses at (1, 0) and (0, 1). Thenthe “horizontal” transport and the “vertical” transport are both optimal. Indeed,the cost of these transports is 1 and choosing

u(x1, x2) =

√x2

1 + x22 if x1 + x2 ≤ 1√

(1− x1)2 + (1− x2)2 if x1 + x2 > 1

in (3) we obtain that the infimum of the transport problem is at least 1.

Example 1.6 Assume that f1 is the sum of two Dirac masses at A, B ∈ R2 andassume that f0 is supported on the middle axis between them. Then∫

X

|x− ψ(x)| df0(x) =

∫X

|x− A| df0(x)

whenever ψ(x) ∈ A,B, hence any admissible transport is optimal.

2 Optimal transport plans: existence and regu-

larity

In this section we discuss Kantorovich’s approach to the optimal transport problem.His idea has been to look for optimal transport “plans” , i.e. probability measures γin the product space X ×X, rather than optimal transport maps. We will see thatthis more general viewpoint can be used in several situations to prove that actuallyoptimal transport maps exist (this intermediate passage through a weak formulationof the problems is quite common in PDE and Calculus of Variations).

(MK) Let f0, f1 ∈M1(X). We say that a probability measure γ inM1(X ×X) isadmissible if its marginals are f0 and f1, i.e.

π0#γ = f0, π1#γ = f1.

Then, given a Borel cost function c : X ×X → [0,∞], we minimize

I(γ) :=

∫X×X

c(x, y) dγ(x, y)

5

among all admissible γ and we denote by Fc(f0, f1) the value of the infimum.

We also call an admissible γ a transport plan. Notice also that in Kantorovich’ssetting no restriction on f0 or f1 is necessary to produce admissible transport plans:the product measure f0× f1 is always admissible. In particular the following defini-tion is well posed and produces a family of distances in M1(X).

Definition 2.1 (Kantorovich–Wasserstein distances) Let p ≥ 1 and f0, f1 ∈M1(X). We define the p-Wasserstein distance between f0 and f1 by

Fp(f0, f1) :=

(min

π0#γ=f0, π1#γ=f1

∫X×X

|x− y|p dγ)1/p

. (4)

The difference between transport maps and transport plans can be better un-derstood with the following proposition.

Proposition 2.1 (Transport plans versus transport maps) Any Borel trans-port map ψ : X → X induces a transport plan γψ defined by

γψ := (Id× ψ)#f0. (5)

Conversely, a transport plan γ is induced by a transport map if γ is concentrated ona γ-measurable graph Γ.

Proof. Let ψ be a transport map. Since π0 (Id× ψ) = Id and π1 (Id× ψ) = ψwe obtain immediately that π0#γψ = f0 and π1#γψ = ψ#f0 = f1. Notice also that,by Lusin’s theorem, the graph of ψ is γψ-measurable.

Conversely, let Γ ⊂ X ×X be a γ-measurable graph on which γ is concentratedand write

Γ = (x, φ(x)) : x ∈ π0(Γ)for some function φ : π0(Γ) → X. Let (Kh) be an increasing sequence of compactsubsets of Γ such that γ(Γ \Kh)→ 0 and notice that

f0 (π0(Kh)) = γ(π−1

0 (π0(Kh)))≥ γ(Kh)→ 1.

Hence, π0(Γ) ⊃ ∪hπ0(Kh) is f0-measurable and with full measure in X. Moreover,representing γ as γx ⊗ f0 as in (58) we get

0 = limh→∞

γ(X ×X \Kh) =

∫X

γx (y : (x, y) /∈ ∪hKh) df0(x)

≥∫ ∗π0(Γ)

γx(X \ φ(x)) df0(x)

6

(here∫ ∗

denotes the outer integral). Hence γx is the unit Dirac mass at φ(x) forf0-a.e. x ∈ X. Since

x 7→ ψ(x) :=

∫X

y dγx(y)

is a Borel map coinciding with φ f0-a.e., we obtain that γx is the unit Dirac massat ψ(x) for f0-a.e. x. For A, B ∈ B(X) we get

γ(A×B) =

∫A

γx(B) df0(x) = f0 (x : (x, ψ(x)) ∈ A×B) = γψ(A×B)

and therefore γ = γψ.

The existence of optimal transport plans is a straightforward consequence of thew∗-compactness of probability measures and of the lower semicontinuity of I.

Theorem 2.1 (Existence of optimal plans) Assume that c is lower semicontin-uous in X ×X. Then there exists γ ∈M1(X ×X) solving (MK). Moreover, if c iscontinuous and real valued we have

min (MK) = infψ#f0=f1

∫X

c(x, ψ(x)) df0(x) (6)

provided f0 has no atom.

Proof. Clearly the set of admissible γ’s is closed, bounded and w∗-compact for thew∗-convergence of measures (i.e. in the duality with continuous functions in X×X).Hence, it suffices to prove that

I(γ) ≤ lim infh→∞

I(γh)

whenever γh w∗-converge to γ. This lower semicontinuity property follows by the

fact that c can be approximated from below by an increasing sequence of continuousand real valued functions ch (this is a well known fact: see for instance Lemma 1.61in [4]). The functionals Ih induced by ch converge monotonically to I, whence thelower semicontinuity of I follows.

In order to prove the last part we need to show the existence, for any γ ∈M+(X ×X) with π0#γ = f0 and π1#γ = f1, of Borel maps ψh : X → X such thatψh#f0 = f1 and δψh(x)⊗ f0 weakly converge to γ inM(X ×X). The approximationTheorem 9.3 provides us, on the other hand, with a Borel map ϕ : X → X such thatϕ#f0 has no atom, is arbitrarily close to f1 and δϕ(x) ⊗ f0 is arbitrarily close (withrespect to the weak topology) to γ. We will build ψh by an iterated application ofthis result.

7

By a standard approximation argument we can assume that L = Lip(c) is finiteand that

δ|x− y| ≤ c(x, y) ∀x, y ∈ X

for some δ > 0. Possibly replacing c by c/δ we assume δ = 1.Fix now an integer h, set f 0

0 = f0 and choose ϕ0 : X → X such that ϕ0#f00 has

no atom and

F1(f1, ϕ0#f00 ) < 2−h and

∫X

c(ϕ0(x), x) df 00 (x) < Fc(f1, f

00 ) + 2−h.

Then, we set f 10 = ϕ0#f

00 , find ϕ1 : X → X such that ϕ1#f

10 has no atom and

F1(f1, ϕ1#f10 ) < 2−1−h and

∫X

c(ϕ1(x), x) df 10 (x) < Fc(f1, f

10 ) + 2−1−h.

Proceeding inductively and setting fk0 = ϕ(k−1)#fk−10 we find ϕk such that

F1(f1, ϕk#fk0 ) < 2−k−h and

∫X

c(ϕk(x), x) dfk0 (x) < Fc(f1, fk0 ) + 2−k−h

and ϕk#fk0 has no atom. Then, we set φ0(x) = x and φk = ϕk−1· · ·ϕ0 for k ≥ 1, so

that fk0 = φk#f0. We claim that (φk) is a Cauchy sequence in L1(X, f0;X). Indeed,

∞∑k=0

∫X

|φk+1(x)− φk(x)| df0(x) =∞∑k=0

∫X

|ϕk(y)− y| dfk0 (y)

≤ 21−h +∞∑k=0

F1(f1, fk0 ) <∞.

Denoting by ψh the limit of φk, passing to the limit as k →∞ we obtain ψh#f0 = f1;moreover, we have∫

X

c(φk(x), x) df0(x) ≤∫X

c(ϕ0(x), x) df0(x) + L

k∑i=1

∫X

|φi(x)− φi−1(x))| df0(x)

≤ Fc(f1, f0) + 2−h + L

k∑i=1

∫X

|ϕi(y)− y| df i0(y) ≤ Fc(f1, f0) + 2−h(1 + 2L).

Passing to the limit as k →∞ we obtain∫X

c(ψh(x), x) df0(x) ≤ Fc(f1, f0) + 2−h(1 + 2L)

8

and the proof is achieved.

For instance in the case of Example 1.1 (where transport maps do not exist atall) it is easy to check that the unique optimal transport plan is given by

1

2δ0 × δ−1 +

1

2δ0 × δ1.

In general, however, uniqueness fails because of the linearity of I and of the convexityof the class of admissible plans γ. In Example 1.2, for instance, any measure

t(Id× ψ1)#f0 + (1− t)(Id× ψ2)#f0

is optimal, with t ∈ [0, 1] and ψ1, ψ2 optimal transport maps.In order to understand the regularity properties of optimal plans γ we introduce,

following [26, 36], the concept of cyclical monotonicity.

Definition 2.2 (Cyclical monotonicity) Let Γ ⊂ X × X. We say that Γ is c-ciclically monotone if

n∑i=1

c(xi+1, yi) ≥n∑i=1

c(xi, yi) (7)

whenever n ≥ 2 and (xi, yi) ∈ Γ for 1 ≤ 1 ≤ n, with xn+1 = x1.

The cyclical monotonicity property can also be stated in a (apparently) strongerform:

n∑i=1

c(xσ(i), yi) ≥n∑i=1

c(xi, yi) (8)

for any permutation σ : 1, . . . , n → 1, . . . , n. The equivalence can be provedeither directly (reducing to the case when σ has no nontrivial invariant set) orverifying, as we will soon do, that any cyclically monotone set is contained in the c-superdifferential of a c-concave function and then checking that the superdifferentialfulfils (8).

Theorem 2.2 (Regularity of optimal plans) Assume that c is continuous andreal valued. Then, for any optimal γ the set sptγ is c-ciclically monotone. Moreover,the union of sptγ as γ range among all optimal plans is c-ciclically monotone.

Proof. Assume by contradiction that there exist an integer n ≥ 2 and points(xi, yi) ∈ sptγ, i = 1, . . . , n, such that

f ((xi), (yi)) :=n∑i=1

c(xi+1, yi)− c(xi, yi) < 0

9

with xn+1 = x1. For 1 ≤ i ≤ n, let Ui, Vi be compact neighbourhoods of xi and yirespectively such that γ(Ui × Vi) > 0 and f ((ui), (vi)) < 0 whenever ui ∈ Ui andvi ∈ Vi.

Set now λ = mini γ(Ui × Vi) and denote by γi ∈ M1(Ui × Vi) the normalizedrestriction of γ to Ui× Vi. We can find a compact space Y , a probability measure σin Y and Borel maps ηi = ui× vi : X → Ui× Vi such that γi = ηi#σ for i = 1, . . . , n(it suffices for instance to define Y as the product of Ui × Vi, so that ηi are theprojections on the i-coordinate) and define

γ′ := γ +λ

n

n∑i=1

(ui+1 × vi)#σ − (ui × vi)#σ.

Since ληi#σ = λγi ≤ γ we obtain that γ′ ∈ M+(X × X); moreover, it is easy tocheck that π0#γ

′ = f0 and π1#γ′ = f1. This leads to a contradiction because

I(γ′)− I(γ) =λ

n

∫Y

n∑i=1

c(ui+1, vi)− c(ui, vi) dσ < 0.

In order to show the last part of the statement we notice that the collection ofoptimal transport plans is w∗-closed and compact. If (γh)h≥1 is a countable denseset, then

h⋃i=1

sptγi = spt

(h∑i=1

1

hγi

)is c-ciclically monotone for any h ≥ 1. Passing to the limit as h → ∞ we obtainthat the closure of the union of sptγh is c-ciclically monotone. By the density of(γh), this closure contains sptγ for any optimal plan γ.

Next, we relate the c-cyclical monotonicity to suitable concepts (adapted to c)of concavity and superdifferential.

Definition 2.3 (c-concavity) We say that a function u : X → R is c-concave ifit can be represented as the infimum of a family (ui) of functions given by

ui(x) := c(x, yi) + ti

for suitable yi ∈ X and ti ∈ R.

Remark 2.1 [Linear and quadratic case] In the case when c(x, y) is a symmetricfunction satisfying the triangle inequality, the notion of c-concavity is equivalent to1-Lipschitz continuity with respect to the metric dc induced by c. Indeed, givenu ∈ Lip1(X, dc), the family of functions whose infimum is u is simply

c(x, y) + u(y) : y ∈ X .

10

In the quadratic case c(x, y) = |x − y|2/2 a function u is c-concave if and only ifu− |x|2/2 is concave. Indeed, u = infi c(·, yi) + ti implies

u(x)− 1

2|x|2 = inf

i〈x,−yi〉+

1

2|yi|2 + ti

and therefore the concavity of u − |x|2/2. Conversely, if v = u − |x|2/2 is concave,from the well known formula

v(x) = infy, p∈∂+v(y)

v(y) + 〈p, x− y〉

(here ∂+v is the superdifferential of v in the sense of convex analysis) we obtain

u(x) = infy,−p∈∂+v(y)

1

2|p− x|2 + c(p, y).

Definition 2.4 (c-superdifferential) Let u : X → R be a function. The c-superdifferential ∂cu(x) of u at x ∈ X is defined by

∂cu(x) := y : u(z) ≤ u(x) + c(z, y)− c(x, y) ∀z ∈ X . (9)

The following theorem ([40], [41], [36]) shows that the graphs of superdifferentialsof c-concave functions are maximal (with respect to set inclusion) c-cyclically mono-tone sets. It may considered as the extension of the well known result of Rockafellarto this setting.

Theorem 2.3 Any c-ciclically monotone set Γ is contained in the graph of thec-superdifferential of a c-concave function. Conversely, the graph of the c-superdifferential of a c-concave function is c-ciclically monotone.

Proof. This proof is taken from [36]. We fix (x0, y0) ∈ Γ and define

u(x) := inf c(x, yn)− c(xn, yn) + · · ·+ c(x1, y0)− c(x0, y0) ∀x ∈ X

where the infimum runs among all collections (xi, yi) ∈ Γ with 1 ≤ i ≤ n andn ≥ 1. Then u is c-concave by construction and the cyclical monotonicity of Γ givesu(x0) = 0 (the minimum is achieved with n = 1 and (x1, y1) = (x0, y0)).

We will prove the inequality

u(x) ≤ u(x′) + c(x, y′)− c(x′, y′) (10)

11

for any x ∈ X and (x′, y′) ∈ Γ. In particular (choosing x = x0) this implies thatu(x′) > −∞ and that y′ ∈ ∂cu(x′). In order to prove (10) we fix λ > u(x′) and find(xi, yi) ∈ Γ, 1 ≤ i ≤ n, such that

c(x′, yn)− c(xn, yn) + · · ·+ c(x1, y0)− c(x0, y0) < λ.

Then, setting (xn+1, yn+1) = (x′, y′) we find

u(x) ≤ c(x, yn+1)− c(xn+1, yn+1) + c(xn+1, yn)− c(xn, yn)

+ · · ·+ c(x1, y0)− c(x0, y0) ≤ c(x, y′)− c(x′, y′) + λ.

Since λ is arbitrary (10) follows.Finally, if v is c-concave, yi ∈ ∂cv(xi) for 1 ≤ i ≤ n and σ is a permutation we

can add the inequalities

v(xσ(i))− v(xi) ≤ c(xσ(i), yi)− c(xi, yi)

to obtain (8).

In the following corollary we assume that the cost function is symmetric, contin-uous and satisfies the triangle inequality, so that c-concavity reduces to 1-Lipschitzcontinuity with respect to the distance induced by c.

Corollary 2.1 (Linear case) Let γ ∈M1(X×X) with π0#γ = f0 and π1#γ = f1.Then γ is optimal for (MK) if and only if there exists u : X → R such that

|u(x)− u(y)| ≤ c(x, y) ∀(x, y) ∈ X ×X (11)

u(x)− u(y) = c(x, y) for (x, y) ∈ sptγ. (12)

In addition, there exists u satisfying (11) such that (12) holds for any optimal plan-ning γ. We will call any function u with these properties a maximal Kantorovichpotential.

Proof. (Sufficiency) Let γ′ be any admissible transport plan; by applying (11) firstand then (12) we get

I(γ′) ≥∫X×X

u(x)− u(y) dγ′ =

∫X

u df0 −∫X

u df1

=

∫X

u(x)− u(y) dγ = I(γ).

(Necessity) Let Γ be the closure of the union of sptγ′ as γ′ varies among all optimalplans for (MK). Then we know that Γ is c-cyclically monotone, hence there exists a

12

c-concave function u such that Γ ⊂ ∂cu. Then, (11) follows by the c-concavity of u,while the inclusion Γ ⊂ ∂cu implies

u(y)− u(x) ≤ c(y, y)− c(x, y) = −c(x, y)

for any (x, y) ∈ sptγ ⊂ Γ. This, taking into account (11), proves (12).

A direct consequence of the proof of sufficiency is the identity

min (MK) = max

∫ud(f0 − f1) : u ∈ Lip1(X, dc)

(13)

(where dc is the distance in X induced by c) and the maximum on the right isachieved precisely whenever u satisfies (12).

In the following corollary, instead, we consider the case when c(x, y) = |x−y|2/2.The result below, taken from [26], was proved first by Brenier in [7, 8] under more re-strictive assumptions on f0, f1 (see also [26] for the general case c(x, y) = h(|x−y|)).Before stating the result we recall that the set Σv of points of nondifferentiability ofa real valued concave function v (i.e. the set of points x such that ∂+v(x) is not asingleton) is countably (CC) regular (see [45] and also [1]). This means that Σv canbe covered with a countable family of (CC) hypersurface, i.e. graphs of differencesof convex functions of n − 1 variables. This property is stronger than the canon-ical Hn−1-rectifiability: it implies that Hn−1-almost all of Σv can be covered by asequence of C2 hypersurfaces.

Corollary 2.2 (Quadratic case) Assume that any (CC) hypersurface is f0-negligible. Then the optimal planning γ is unique and is induced by an optimaltransport map ψ. Moreover ψ is the gradient of a convex function.

Proof. Let u : X → R be a c-concave function such that the graph Γ of its superdif-ferential contains the support of any optimal planning γ. As c(x, y) = |x − y|2/2,an elementary computation shows that (x0, y0) ∈ Γ if and only if

−y0 ∈ ∂+v(x0)

where v(x) = u(x)−|x|2/2 is the concave function already considered in Remark 2.1.Then, by the above mentioned results on differentiability of concave functions, theset of points where v is not differentiable is f0-negligible, hence for f0-a.e. x ∈ Xthere is a unique y0 = −∇v(x0) ∈ X such that (x0, y0) ∈ Γ. As sptγ ⊂ Γ, byProposition 2.1 we infer that

γ = (Id× ψ)#f0

for any optimal planning γ, with ψ = −∇v.

13

Example 2.1 (Brenier polar factorization theorem) A remarkable conse-quence of Corollary 2.2 is the following result, known as polar factorization theorem.A vector field r : X → X such that

Ln(r−1(B)) = 0 whenever Ln(B) = 0 (14)

can be written as ∇u η, with u convex and η measure preserving.It suffices to apply the Corollary to the measure f0 = r#(Ln X) (absolutely

continuous, due to (14)) and f1 = Ln X; we have then

Ln X = (∇v)#(r#(Ln X)) = (∇v r)#(Ln X)

for a suitable convex function v, hence η = (∇v) r is measure preserving. Thedesired representation follows with u = v∗, since ∇u = (∇v)−1.

Let us assume that f0 = Ln X and let f1 ∈ M+(X) be any other measuresuch that f1(X) = Ln(X); in general the problem of mapping f0 into f1 through aLipschitz map has no solution (it suffices to consider, for instance, the case whenX =B1 and f1 = 1

n H1 ∂B1). However, another remarkable consequence of Corollary 2.2

is that the problem has solution if we require the transport map to be only a functionof bounded variation: indeed, bounded monotone functions (in particular gradientsof Lipschitz convex functions) are functions of bounded variation (see for instanceProposition 5.1 of [2]). Moreover, we can give a sharp quantitative estimate of theerror made in the approximation by Lipschitz transport maps.

Theorem 2.4 There exists a constant C = C(n,X) with the following property:for any µ ∈ M+(X) with µ(X) = Ln(X) and any M > 0 there exist a Lipschitzfunction φ : X → X and B ∈ B(X) such that Lip(φ) ≤M , Ln(X \B) ≤ C/M and

µ = φ#(Ln B) + µs

with µs ∈M+(X) and µs(X) ≤ Ln(X \B).

Proof. By Corollary 2.2 we can represent µ = ψ#(Ln X) with ψ : X → X equalto the gradient of a convex function. Let Ω be the interior of X; by applyingProposition 5.1 of [2] (valid, more generally, for monotone operators) we obtain thatthe total variation |Dψ|(Ω) can be estimated with a suitable constant C dependingonly on n and X. Therefore, by applying Theorem 5.34 of [4] we can find a Borelset B ⊂ X (it is a suitable sublevel set of the maximal function of |Dψ|) such thatLn(X\B) ≤ c(n)|Dψ|(Ω)/M and the restriction of ψ to B is a M -Lipschitz function,i.e. with Lipschitz constant not greater than M . By Kirszbraun theorem (see for

14

instance [24]) we can extend ψ|B to a M -Lipschitz function φ : X → X. SettingBc = X \B we have then

µ = ψ#(Ln B) + ψ#(Ln Bc) = φ#(Ln B) + ψ#(Ln Bc)

and setting µs = ψ#(Ln Bc) the proof is achieved.

3 The one dimensional case

In this section we assume that X = I is a closed interval of the real line; we alsoassume for simplicity that the transport cost is c(x, y) = |x− y|p for some p ≥ 1.

Theorem 3.1 (Existence and uniqueness) Assume that f0 is a diffuse measure,i.e. f0(t) = 0 for any t ∈ I. Then

(i) there exists a unique (modulo countable sets) nondecreasing function ψ :sptf0 → X such that ψ#f0 = f1;

(ii) the function ψ in (i) is an optimal transport, and if p > 1 is the unique optimaltransport.

In the one-dimensional case these results are sharp: we have already seen thattransport maps need not exist if f0 has atoms (Example 1.1) and that, without themonotonicity constraint, are not necessarily unique when p = 1.

Proof. (i) Let m = min I and define

ψ(s) := sup t ∈ I : f1([m, t]) ≤ f0([m, s]) . (15)

It is easy to check that the following properties hold:

(a) ψ is non decreasing;

(b) ψ(I) ⊃ sptf1;

(c) if ψ(s) is not an atom of f1 we have

f1([m,ψ(s)]) = f0([m, s]). (16)

Let T be the at most countable set made by the atoms of f1 and by the pointst ∈ I such that ψ−1(t) contains more than one point; then ψ−1 is well defined on

15

ψ(I) \ T and ψ−1([t, t′]) = [ψ−1(t), ψ−1(t′)] whenever t, t′ ∈ ψ(I) \ T with t < t′.Then (c) gives

f1([t, t′]) = f1([m, t′])− f1([m, t]) = f0([m,ψ−1(t′)])− f0([m,ψ−1(t)])

= f0([ψ−1(t), ψ−1(t′)]) = f0(ψ−1([t, t′]))

(notice that only here we use the fact that f0 is diffuse). By (b) the closed intervalswhose endpoints belong to ψ(I) \ T generate the Borel σ-algebra of sptf1, and thisproves that ψ#f0 = f1.

Let φ be any nondecreasing function such that φ#f0 = f1 and assume, possiblymodifying φ on a countable set, that φ is right continuous. Let

T := s ∈ sptf0 : (s, s′) ∩ sptf0 = ∅ for some s′ > s .

and notice that T is at most countable (since we can index with T a family ofpairwise disjoint open intervals). We claim that φ ≥ ψ on sptf0 \ T ; indeed, fors ∈ sptf0 \ T and s′ > s we have the inequalities

f1([m,φ(s′)]) = f0(φ−1([m,φ(s′)])) ≥ f0([m, s′]) > f0([m, s])

and, by the definition of ψ, the inequality follows letting s′ ↓ s. In particular∫I

φ− ψ df0 =

∫I

φ df0 −∫I

ψ df0 =

∫I

1 df1 −∫I

1 df1 = 0

whence φ = ψ f0-a.e. in I. It follows that φ(s) = ψ(s) at any continuity points ∈ sptf0 of φ and ψ.(ii) By a continuity argument it suffices to prove that ψ is the unique solution of thetransport problem for any p > 1 (see also [26, 15]). Let γ be an optimal planningand notice that the cyclical monotonicity proved in Theorem 2.2 gives

|x− y′|p + |x′ − y|p ≥ |x− y|p + |x′ − y′|p

whenever (x, y), (x′, y′) ∈ sptγ. If x < x′, this condition implies that y ≤ y′ (thisis a simple analytic calculation that we omit, and here the fact that p > 1 plays arole). This means that the set

T := x ∈ sptf0 : card(y : (x, y) ∈ sptγ) > 1

is at most countable (since we can index with T a family of pairwise disjoint openintervals) hence f0-negligible. Therefore for f0-a.e. x ∈ I there exists a uniquey = ψ(x) ∈ I such that (x, y) ∈ sptγ (the existence of at least one y follows by the

16

fact that the projection of sptγ on the first factor is sptf0). Notice also that ψ isnondecreasing in its domain.

Arguing as in Proposition 2.1 we obtain that

γ = (Id× ψ)#f0.

In particular

f1 = π1#γ = π1#

((Id× ψ)#f0

)= ψ#f0

and since ψ is non decreasing it follows that ψ = ψ (up to countable sets) on sptf0.

4 The ODE version of the optimal transport

problem

In this and in the next section we rephrase the optimal transport problem in differ-ential terms. In the following we consider a fixed auxiliary open set Ω containingX; we assume that Ω is sufficiently large, namely that the open r-neighbourhoodof X is contained in Ω, with r > diam(X) (the necessity of this condition will bediscussed later on).

The first idea, due to Brenier ([10, 9] and also [11]) is to look for all the pathsft in M+(X) connecting f0 to f1. In the simplest case when ft = δx(t), it turns outthat the velocity field Et = x(t)δx(t) is related to ft by the equation

ft +∇ · Et = 0 in (0, 1)× Ω (17)

in the distribution sense. Indeed, given ϕ ∈ C∞c (0, 1) and φ ∈ C∞c (Ω), it sufficesto take ϕ(t)φ(x) as test function in (17) and to use the definitions of ft and Et toobtain

(ft +∇ · Et)(ϕφ) = −∫ 1

0

ϕ(t)φ(x(t)) + ϕ(t)〈∇φ(x(t)), x(t)〉 dt

= −∫ 1

0

d

dt[ϕ(t)φ(x(t))] dt = 0.

More generally, regardless of any assumption on (ft, Et) ∈ M+(X) × [M(Ω)]n,it is easy to check that (17) holds in the distribution sense if and only if

ft(φ) = ∇φ · Et in (0,1) ∀φ ∈ C∞c (Ω) (18)

in the distribution sense. We will use both interpretations in the following.

17

One more interpretation of (17) is given in the following proposition. Recall thata map f defined in (0, 1) with values in a metric space (E, d) is said to be absolutelycontinuous if for any ε > 0 there exists δ > 0 such that∑

i

(yi − xi) < δ =⇒∑i

d (f(yi), f(xi)) < ε

for any family of pairwise disjoint intervals (xi, yi) ⊂ (0, 1).

Proposition 4.1 If some family (ft) ⊂ M1(X) fulfils (17) for suitable measures

Et ∈ [M(Ω)]n satisfying∫ 1

0|Et|(Ω) dt < ∞ then f is an absolutely continuous map

between (0, 1) and M1(X), endowed with the 1-Wasserstein distance (4) and

limh→0

F1(ft+h, ft)

|h|≤ |Et|(Ω) for L1-a.e. t ∈ (0, 1). (19)

Conversely, if ft is an absolutely continuous map we can choose Et so that equalityholds in (19).

Proof. In (13) we can obviosuly restrict to test functions u such that |u| ≤ r =diam(X) on X; by our assumption on Ω any of these function can be extendedto Rn in such a way that the Lipschitz constant is still less than 1 and u ≡ 0 in aneighbourhood of Rn\Ω. In particular, choosing an optimal u and setting uε = u∗ρεwe get

F1(fs, ft) = limε→0+

∫X

uε d(ft − fs) = limε→0+

∫ t

s

∇ · Eτ (uε) dτ

= − limε→0+

∫ t

s

Eτ · ∇uε dτ ≤∫ t

s

|Eτ |(Ω) dτ (20)

whenever 0 ≤ s ≤ t ≤ 1 and this easily leads to (19).In the proof of the converse implication we can assume with no loss of generality

(up to a reparameterization by arclength) that f is a Lipschitz map. We can considerM1(X) as a subset of the dual Y = G∗, where

G :=φ ∈ C1(Rn) ∩ Lip(Rn) : φ ≡ 0 on Rn \ Ω

endowed with the norm ‖φ‖ = Lip(φ). By using convolutions and (13) it is easyto check that F1(µ, ν) = ‖µ − ν‖Y , so that M1(X) is isometrically embedded inY . Moreover, using Hahn–Banach theorem it is easy to check that any y ∈ Y isrepresentable as the divergence of a measure E ∈ [M(Ω)]n with ‖y‖ = |E|(Ω) (E isnot unique, of course).

18

By a general result proved in [5], valid also for more than one independentvariable, any Lipschitz map f from (0, 1) into the dual Y of a separable Banachspace is weakly∗-differentiable for L1-almost every t, i.e.

∃w∗ − limh→0

ft+h − fth

=: f(t)

and ft − fs =∫ tsf(τ) dτ for s, t ∈ (0, 1). In addition, the map is also metrically

differentiable for L1-almost every t, i.e.

∃ limh→0

‖ft+h − ft‖|h|

=: mf(t)

Although f is only a w∗-limit of the difference quotients, it turns out (see [5]) thatthe metric derivative mf is L1-a.e. equal to ‖f‖.

Putting together these informations the conclusion follows.

According to Brenier, we can formulate the optimal transport problem as follows.

(ODE) Let f0, f1 ∈M1(X) be given probability measures. Minimize

J(E) :=

∫ 1

0

|Et|(Ω) dt (21)

among all Borel maps ft : [0, 1]→M+(X) and Et : [0, 1]→ [M(Ω)]n such that (17)holds.

Example 4.1 In the case considered in Example 1.3 the measures

ft = χ[t,t+1]L1, Et = χ[t,t+1]L1

provide an admissible and optimal flow, obviously related to the optimal transportmap x 7→ x+ 1. But, quite surprisingly, we can also define an optimal flow by

ft =

1

1−2tχ[2t,1]L1 if 0 ≤ t < 1/2

δ1 if t = 1/21

2t−1χ[1,2t]L1 if 1/2 < t ≤ 1

whose “velocity field” is

Et =

2(1−x)(1−2t)2

χ[2t,1]L1 if 0 ≤ t < 1/22(x−1)(2t−1)2

χ[1,2t]L1 if 1/2 < t ≤ 1.

19

It is easy to check that ft = ∇·Et = 0 and that |Et| are probability measures for anyt 6= 1/2, hence J(E) = 1. The relation of this new flow with the optimal transportmap x 7→ 2− x will be seen in the following (see Remark 4.1(3)).

In order to relate solutions of (ODE) to solutions of (MK) we will need anuniqueness theorem, under regularity assumptions in the space variable, for theODE ft + ∇ · (gtft) = 0. If ft, gt are smooth (say C2) with respect to both thespace and time variables, uniqueness is a consequence of the classical method ofcharacteristics (see for instance §3.2 of [20]), which provides the representation

ft(xt) = f0(x) exp

(−∫ t

0

cs(xs) ds

)where ct = ∇ · gt and xt solves the ODE

xt = gt(xt), x0 = x, t ∈ (0, 1).

See also [33] for more general uniqueness and representation results in a weak setting.

Theorem 4.1 Assume that

ft +∇ · (gtft) = 0 in (0, 1)× Rn (22)

where∫ 1

0|ft|(Rn) dt <∞ and |gt|+Lip(gt) ≤ C, with C independent of t and f0 = 0.

Then ft = 0 for any t ∈ (0, 1).

Proof. Let gεt be obtained from gt by a mollification with respect to the space andtime variables and define Xε(s, t, x) as the solution of the ODE x = gεs(x) (with sas independent variable) such that Xε(t, t, x) = x. Define, for ψ ∈ C∞c ((0, 1)× Rn)fixed,

ϕε(t, x) := −∫ 1

t

ψ (s,Xε(s, t, x)) ds.

Since Xε (s, t,Xε(t, 0, x)) = Xε(s, 0, x) we have

ϕε (t,Xε(t, 0, x)) = −∫ 1

t

ψ (s,Xε(s, 0, x)) ds

and, differentiating both sides, we infer[∂ϕε

∂t+ gεt · ∇ϕε

](t,Xε(t, 0, x)) = ψ (t,Xε(t, 0, x))

whence ψ = (∂ϕε/∂t+ gεt · ∇ϕε) in (0, 1)× Rn.

20

Insert now the test function ϕε in (22) and take into account that f0 = 0 toobtain

0 =

∫ 1

0

∫Rn

ft(∂ϕε

∂t+ gt · ∇ϕε) dxdt =

∫ 1

0

∫Rn

ftψ + ft(gt − gεt ) · ∇ϕε dxdt.

The proof is finished letting ε → 0+ and noticing that |gεt | + |∇ϕε| ≤ C, with Cindependent of ε.

In the following theorem we show that (MK) and (ODE) are basically equivalent.Here the assumption that Ω is large enough plays a role: indeed, if for instanceX = [[x0, x1]], f0 = δx0 and f1 = δx1 , the infimum of (ODE) is easily seen to be lessthan

dist(x0, ∂Ω) + dist(x1, ∂Ω).

Therefore, in the general case when f0, f1 are arbitrary measures in X, we requirethat dist(∂Ω, X) > diam(X).

Theorem 4.2 ((MK) versus (ODE)) The problem (ODE) has at least one so-lution and min (ODE) = min (MK). Moreover, for any optimal planning γ ∈M1(X ×X) for (MK) the measures

ft := πt#γ, Et := πt# ((y − x)γ) t ∈ [0, 1] (23)

with πt(x, t) = x+ t(y − x) solve (ODE).

Proof. Let (ft, Et) be defined by (23). For any φ ∈ C∞(Rn) we compute

d

dt

∫X

φ dft =d

dt

∫X×X

φ (x+ t(y − x)) dγ

=

∫X×X

∇φ (x+ t(y − x)) · (y − x) dγ

=n∑i=1

∫X

∇iφ dEt,i = −∇ · Et(φ)

hence the ODE (17) is satisfied. Then, we simply evaluate the energy J(E) in (21)by

J(E) =

∫ 1

0

|πt#((y − x)γ)|(X) dt ≤∫ 1

0

πt#(|y − x|γ)(X) dt (24)

=

∫ 1

0

∫X×X

|x− y| dγdt = I(γ).

21

This shows that inf (ODE) ≤ min (MK).In order to prove the opposite inequality we first use Proposition 4.1 and then

we present a different strategy, which provides more geometric informations.By Proposition 4.1 we have

F1(f0, f1) ≤∫ 1

0

d

dtF1(ft, f0) dt ≤

∫ 1

0

|Et|(Ω) dt

for any admissible flow (ft, Et). This proves that min (MK) ≤ inf (ODE).For given (and admissible) (ft, Et), and assuming that

spt

∫ 1

0

|Et| dt ⊂⊂ Ω

we exhibit an optimal planning γ with I(γ) ≤ J(E).To this aim we fix a cut-off function θ ∈ C∞c (Ω) with 0 ≤ θ ≤ 1 and θ ≡ 1 on

X ∪ spt∫ 1

0|Et| dt; then, we define

f εt := ft ∗ ρε + εθ and Eεt := Et ∗ ρε

where ρ is any convolution kernel with compact support and ρε(x) = ε−nρ(x/ε).Notice that f εt are strictly positive on sptEt; moreover (17) still holds, sptEε

t ⊂ Ωfor ε small enough and L1-a.e. t and by (53) we have∫

Rn

|Eεt |(y) dy ≤ |Et|(Ω) ∀t ∈ [0, 1]. (25)

We define gεt = Eεt /f

εt and denote by ψεt (x) the semigroup in [0, 1] associated to the

ODEψεt (x) = gεt (ψεt (x)) , ψε0(x) = x. (26)

Notice that this flow for ε small enough maps Ω into itself and leaves Rn \ Ω fixed.Now, we claim that f εt = ψεt#(f ε0Ln) for any t ∈ [0, 1]. Indeed, since by definition

f εt +∇ · (gεt f εt ) = f εt +∇ · Eεt = 0

by Theorem 4.1 and the linearity of the equation we need only to check that alsoνεt = ψεt#(f ε0Ln) satisfies the ODE νεt + ∇ · (gενεt ) = 0. This is a straightforwardcomputation based on (26).

Using this representation, we can view ψε1 as approximate solutions of the optimaltransport problem and define

γε := (Id× ψε1)#(f ε0Ln),

22

i.e. ∫ϕ(x, y) dγε(x) =

∫Rn

ϕ (x, ψε1(x)) f ε0 (x) dx

for any bounded Borel function ϕ in Rn × Rn.In order to evaluate I(γε), we notice that

Length(ψεt (x)) =

∫ 1

0

|gεt |(ψεt (x)) dt

hence, by multiplying by f ε0 and integrating we get∫Rn

Length(ψεt (x))f ε0 (x) dx =

∫ 1

0

∫Rn

|gεt (y)|f εt (y) dydt (27)

=

∫ 1

0

∫Rn

|Eεt |(y) dydt ≤

∫ 1

0

|Et|(Ω) dt.

As

I(γε) =

∫Rn

|ψε1(x)− x|f ε0 (x) dx ≤∫Rn

Length(ψεt (x)) df0(x),

this proves that I(γε) ≤ J(E). Passing to the limit as ε ↓ 0 and noticing that

π0#γε = f ε0Ln, π1#γ

ε = f ε1Ln

we obtain, possibly passing to a subsequence, an admissible planning γ ∈M1(Rn×Rn) for f0 and f1 such that I(γ) ≤ J(E). We can turn γ in a planning supportedin X × X simply replacing γ by (π × π)#γ, where π : Rn → X is the orthogonalprojection.

Remark 4.1 (1) Starting from (ft, Et), the (possibly multivalued) operation lead-ing to γ and then to the new flow

ft := πt#γ, Et := πt# ((y − x)γ)

can be understood as a sort of “arclength reparameterization” of (ft, Et). However,since this operation is local in space (consider for instance the case of two line paths,one parameterized by arclength, one not), in general there is no function ϕ(t) suchthat

(ft, Et) = (fϕ(t), Eϕ(t)).

(2) The solutions (ft, Et) in Example 4.1 are built as πt#γ and πt#((y− x)γ) whereγ = (Id × ψ)#f0 and ψ(x) = x + 1, ψ(x) = 2 − x. In particular, in the secondexample, f1/2 = δ1 because 1 is the midpoint of any transport ray.(3) By making a regularization first in space and then in time of (ft, Et) one canavoid the use of Theorem 4.1, using only the classical representation of solutions of(22) with characteristics.

23

Remark 4.2 (Optimality conditions) (1) If (ft, Et) is optimal, then

spt∫ 1

0|Et| dt ⊂⊂ Ω. Indeed, we can find Ω′ ⊂⊂ Ω which still has the property

that the open r-neighbourhood of X is contained in Ω′, with r = diam(X), hence∫ 1

0

|Et|(Ω) dt = min (MK) =

∫ 1

0

|Et|(Ω′) dt.

(2) Since the two sides in the chain of inequalities (24) are equal when γ is optimal,we infer

|πt#((y − x)γ)| = πt#(|y − x|γ) for L1-a.e. t ∈ (0, 1). (28)

Analogously, we have ∣∣∣∣∫ 1

0

Et dt

∣∣∣∣ =

∫ 1

0

|Et| dt (29)

whenever (ft, Et) is optimal. If the strict inequality < holds, then

ft = f0 + t(f1 − f0) and Et =

∫ 1

0

Eτ dτ

provide an admissible pair for (ODE) with strictly less energy.

Remark 4.3 (Nonlinear cost) When the cost function c(x, y) is |x−y|p for somep > 1 the corresponding problem (MK) is still equivalent to (ODE), provided weminimize, instead of J , the energy

Jp(f, E) :=

∫ 1

0

Φp(ft, Et) dt

where

Φp(σ, ν) =

∫X

∣∣∣νσ

∣∣∣p dσ if |ν| << σ;

+∞ otherwise.

The proof of the equivalence is quite similar to the one given in Theorem 4.2. Againthe essential ingredient is the inequality

Φp(ν ∗ ρε, σ ∗ ρε) ≤ Φp(ν, σ)

The latter follows by Jensen’s inequality and the convexity of the map (z, t) 7→|z|pt1−p in Rn × (0,∞), which provide the pointwise estimate∣∣∣∣ν ∗ ρεσ ∗ ρε

∣∣∣∣p σ ∗ ρε ≤ (∣∣∣νσ ∣∣∣p σ) ∗ ρε.24

In this case, under the same assumptions on f0 made in Corollary 2.2, due to theuniqueness of the optimal planning γ = (Id × ψ#)f0, we obtain that, for optimal(ft, Et), γ

ε = (Id× ψε1)#fε0 converge to γ as ε→ 0+. As a byproduct, the maps ψε1

converge to ψ as Young measures. If f0 << Ln it follows that ψh converge to ψ in[Lr(f0)]n for any r ∈ [1,∞) (see Lemma 9.1 and the remark following it).

See §2.6 of [4] for a systematic analysis of the continuity and semicontinuityproperties of the functional (σ, ν) 7→ Φp(σ, ν) with respect to weak convergence ofmeasures.

In the previous proof I don’t know whether it is actually possible to show fullconvergence of γε as ε→ 0+. However, using a more refined estimate and a geometriclemma (see Lemma 4.1 below) we will prove that any limit measure γ satisfies thecondition ∫ 1

0

πt#(|y − x|γ) dt =

∫ 1

0

|Et| dt. (30)

We call transport density any of the measures in (30). This double representationwill be relevant in Section 7, where under suitable assumptions we will obtain theuniqueness of the transport density.

Corollary 4.1 Let (ft, Et) be optimal for (ODE). Then there exists an optimal

planning γ such that (30) holds. In particular spt∫ 1

0|Et| dt ⊂ X.

Proof. Let m = min (MK) = min (ODE) and recall that, by Remark 4.2(1), the

measure∫ 1

0|Et| dt has compact support in Ω. It suffices to build an optimal planning

γ such that ∫ 1

0

πt#((y − x)γ) dt =

∫ 1

0

Et dt. (31)

Indeed, by (29) we get ∫ 1

0

πt#(|y − x|γ) ≥∫ 1

0

|Et| dt

and the two measures coincide, having both mass equal to m.Keeping the same notation used in the final part of the proof of Theorem 4.2,

we define γεt as (Id× ψεt )#fε0 and we compute

I(γε) =

∫ 1

0

d

dtI(γεt ) dt =

∫ 1

0

∫BR

ψεt (x)− x|ψεt (x)− x|

· ψεt (x)f ε0 (x) dxdt.

Since I(γε)→ m as ε→ 0+ and since (by (25))∫ 1

0

∫BR

|ψεt (x)|f ε0 (x) dxdt =

∫ 1

0

∫BR

|Eεt | dxdt ≤

∫ 1

0

|Et|(Ω) dt = m

25

we infer

limε→0+

∫BR

∫ 1

0

∣∣∣∣∣ ψεt (x)

|ψεt (x)|− ψεt (x)− x|ψεt (x)− x|

∣∣∣∣∣2

|ψεt (x)|f ε0 (x) dtdx = 0. (32)

Now using Lemma 4.1 and the Young inequality 2ab ≤ δa2 + b2/δ, for any φ ∈C∞c (Rn) we obtain∣∣∣∣∫ 1

0

[πt#((y − x)γε)− Eεt ](φ) dt

∣∣∣∣=

∣∣∣∣∫X

∫ 1

0

[(ψε1(x)− x)φ(t(ψε1(x)− x))− ψεt (x)φ(ψεt (x))]f ε0 (x) dtdx

∣∣∣∣≤ RLip(φ)δ

∫X

Length(ψεt )fε0 (x) dx

+RLip(φ)

δ

∫X

∫ 1

0

∣∣∣∣∣ ψεt (x)

|ψεt (x)|− ψεt (x)− x|ψεt (x)− x|

∣∣∣∣∣2

|ψεt (x)|f ε0 (x) dtdx.

with R = diam(Ω). Passing to the limit first as ε → 0+, taking (32) into accountand then passing to the limit as δ → 0+ we obtain∫ 1

0

πt#((y − x)γ)(φ) dt =

∫ 1

0

Et(φ) dt.

Remark 4.4 [Decomposition of vectorfields] Let E be a vectorfield with∇·E =f in Ω and assume that |E|(Ω) = min (MK) with f0 = f+ and f1 = f−. Then,choosing Et = E and ft = f0+t(f1−f0) in (31), we obtain that E can be representedas the superposition of elementary vectorfields

E =

∫Ω×Ω

Exy dγ(x, y) with Exy := (y − x)H1 [[x, y]]

without cancellations, i.e. in such a way that

|E| =∫

Ω×Ω

|Exy| dγ(x, y), |∇ · E| =∫

Ω×Ω

|∇ · Ex,y| dγ(x, y).

The possibility to represent a generic vectorfield whose divergence is a measureas a superposition without cancellations of elementary vectorfields associated torectifiable curves is discussed in [39]. It turns out that this property is not truein general, at least if n ≥ 3; hence the optimality condition |E|(Ω) = min (MK) isessential.

26

Lemma 4.1 Let ψ ∈ Lip([0, 1],Rn) with ψ(0) = 0 and φ ∈ Lip(Rn). Then∣∣∣∣∫ 1

0

ψ(1)φ(tψ(1))− ψ(t)φ(ψ(t)) dt

∣∣∣∣ ≤ LR

∫ 1

0

∣∣∣∣∣ ψ(t)

|ψ(t)|− ψ(t)

|ψ(t)|

∣∣∣∣∣ |ψ(t)| dt

with L = Lip(φ), R = sup |ψ|.

Proof. We start from the elementary identity

d

dt[φ(sψ(t))ψi(t)] −

d

ds

[sφ(sψ(t))ψi(t)

](33)

= sn∑j=1

∂φ

∂xj(sψ(t))

(ψi(t)ψj(t)− ψj(t)ψi(t)

)and integrate both sides in [0, 1] × [0, 1]. Then, the left side becomes exactly theintegral that we need to estimate from above. The right side, up to the multiplicativeconstant Lip(φ), can be estimated with∫ 1

0

|ψ(t) ∧ ψ(t)| dt =

∫ 1

0

|ψ(t)|

∣∣∣∣∣ψ(t) ∧

(ψ(t)

|ψ(t)|− ψ(t)

|ψ(t)|

)∣∣∣∣∣ dt≤ sup |ψ|

∫ 1

0

∣∣∣∣∣ ψ(t)

|ψ(t)|− ψ(t)

|ψ(t)|

∣∣∣∣∣ |ψ(t)| dt.

Remark 4.5 The geometric meaning of the proof above is the following: the inte-gral to be estimated is the action on the 1-form φdxi of the closed and rectifiablecurrent T associated to the closed path starting at 0, arriving at ψ(1) followingthe curve ψ(t), and then going back to 0 through the segment [[0, ψ(1)]]; for any2-dimensional current G such that ∂G = T , as

T (φdxi) = G(dφ ∧ dxi), (34)

this action can be estimated by the mass of G times Lip(φ). The cone constructionprovides a current G with ∂G = T , whose mass can be estimated by∫ 1

0

∫ 1

0

s|ψ(t) ∧ ψ(t)| dsdt.

With this choice of G the identity (34) corresponds to (33).

27

In order to relate also (ft, Et) to the Kantorovich potential, we define the transportrays and the transport set and we prove the differentiability of the potential on thetransport set.

Definition 4.1 (Transport rays and transport set) Let u ∈ Lip1(X). We saythat a segment ]]x, y[[⊂ X is a transport ray if it is a maximal open oriented segmentwhose endpoints x, y satisfy the condition

u(x)− u(y) = |x− y|. (35)

The transport set Tu is defined as the union of all transport rays. We also defineT eu as the union of the closures of all transport rays.

Denoting by F the compact collection of pairs (x, y) with x 6= y such that (35)holds, the transport set is also given by

Tu =⋃

t∈(0,1)

⋃(x,y)∈F

x+ t(y − x)

and therefore is a Borel set (precisely a countable union of closed sets).We can now easily prove that the u is differentiable at any point in Tu.

Proposition 4.2 (Differentiability of the potential) Let u ∈ Lip1(X). Thenu is differentiable at any point z ∈ Tu. Moreover −∇u(z) is the unit vector parallelto the transport ray containing z.

Proof. Let x, y ∈ X be such that (35) holds. By the triangle inequality and the1-Lipschitz continuity of u we get

u(x)− u(x+ t(y − x)) = t|y − x| ∀t ∈ [0, 1].

This implies that, setting ν = (y − x)/|y − x|, the partial derivative of u alongν is equal to −1 for any internal point z of the segment. For any unit vector ξperpendicular to ν we have

u(z + hξ)− u(z) = u(z + hξ)− u(z +√|h|ν) + u(z +

√|h|ν)− u(z)

≤√|h|2 + |h| −

√|h| = O(|h|3/2) = o(|h|).

A similar argument also proves that u(z + hξ) − u(z) ≥ o(|h|). This proves thedifferentiability of u at z and the identity ∇u(z) = −ν.

In Section 6 we will need a mild Lipschitz property of the potential ∇u on T eu(see also [15, 43] for the case of general strictly convex norms). We will use thisproperty to prove in Corollary 6.1 that T eu \Tu is Lebesgue negligible. This propertycan also be used to prove that ∇u is approximately differentiable Ln-a.e. on Tu (seefor instance Theorem 3.1.9 of [24]).

28

Theorem 4.3 (Countable Lipschitz property) Let u ∈ Lip1(X). There existsa sequence of Borel sets Th covering Ln-almost all of T eu such that ∇u restricted toTh is a Lipschitz function.

Proof. Given a direction ν ∈ Sn−1 and a ∈ R, let R be the union of the half closedtransport rays [[x, y[[ with 〈y− x, ν〉 ≥ 0 and 〈y, ν〉 ≥ a. It suffices to prove that therestriction of ∇u to

T := R ∩ x : x · ν < a \ Σu

has the countable Lipschitz property stated in the theorem. To this aim, since BVloc

funtions have this property (see for instance Theorem 5.34 of [4] or [24]), it sufficesto prove that ∇u coincides Ln-a.e. in T with a suitable function w ∈ [BVloc(Sa)]

n,where Sa = x : x · ν < a. To this aim we define

u(x) := min u(y) + |x− y| : y ∈ Ya

where Ya is the collection of all right endpoints of transport rays with y · ν ≥ a. Byconstruction u ≥ u and equality holds on R ⊃ T .

We claim that, for b < a, u− C|x|2 is concave in Sb for C = C(b) large enough.Indeed, since |x− y| ≥ a− b > 0 for any y ∈ Ya and any x ∈ Sb, the functions

u(y) + |x− y| − C|x|2, y ∈ Ya

are all concave in H for C large enough depending on a − b. In particular, asgradients of real valued concave functions are BVloc (see for instance [2]), we obtainthat

w := ∇u = ∇(u− C|x|2) + 2Cx

is a BVloc function in Sa. Since ∇u = w Ln-a.e. in T the proof is achieved.By a similar argument, using semi-convexity in place of semi-concavity, one can

take into account the right extreme points of the transport rays.

As a byproduct of the equivalence between (MK) and (ODE) we can prove that(ODE) has “regular” solutions, related to the transport set and to the gradient ofany maximal Kantorovich potential u.

Theorem 4.4 (Regularity of (ft, Et)) For any solution (ft, Et) of (ODE) repre-sentable as in Theorem 4.2 for a suitable optimal planning γ there exists a 1-Lipschitzfunction u such that

(i) ft is concentrated on the transport set Tu and |Et| ≤ Cft for L1-a.e. t ∈ (0, 1),with C = diam(X);

(ii) Et = −∇u|Et| for L1-a.e. t ∈ (0, 1);

29

(iii) for any convolution kernel ρ we have

limε→0+

∫ 1

0

∫X

|∇u−∇u ∗ ρε|2 d|Et| dt = 0; (36)

(iv) |Et|(X) =∫ 1

0|Eτ |(X) dτ for L1-a.e. t ∈ (0, 1).

Proof. (i) Let u be any maximal Kantorovich potential and let T = Tu be theassociated transport set. For any t ∈ (0, 1) we have

ft(X \ T ) =

∫X×X

χ(x,y): x+t(y−x)/∈T dγ = 0

because, by (12), any segment ]]x, y[[ with (x, y) ∈ sptγ is contained in a transportray, hence contained in T . The inequality |Et| ≤ Cft simply follows by the fact that|x− y| ≤ C on sptγ.(ii) Choosing t satisfying (28) and taking into account Proposition 4.2, for anyφ ∈ C(X) we obtain∫

X

φ∇u d|Et| =

∫X×X

φ(πt)∇u(πt)|y − x| dγ

= −∫X×X

φ(πt)(y − x) dγ = −Et(φ).

(iii) Let m = J(E) = min (MK). By (13) we get

m =

∫X

u d(f0 − f1) = limε→0+

∫X

u ∗ ρε d(f1 − f0)

= limε→0+

∫X

∫ 1

0

u ∗ ρε dft dt = limε→0+

∫ 1

0

∇u ∗ ρε · Et dt

≤∫ 1

0

|Et|(X) dt = m.

This proves that

limε→0+

∫ 1

0

[|Et|(X)−

∫X

〈∇u ∗ ρε,∇u〉 d|Et|]dt = 0

whence, taking into account that |∇u| = 1 and |∇u ∗ ρε| ≤ 1, (36) follows.(iv) The inequality |Et|(X) ≤ I(γ) = J(E) has been estabilished during the proofof Theorem 4.2. By minimality equality holds for L1-a.e. t ∈ (0, 1).

30

Remark 4.6 (1) It is easy to produce examples of optimal flows (ft, Et) satisfying(i), (ii), (iii), (iv) which are not representable as in Theorem 4.2 (again it sufficesto consider the sum of two paths, with constant sum of velocities). It is not clearwhich conditions must be added in order to obtain this representation property.(2) Notice that condition (i) actually implies that f is a Lipschitz map from [0, 1]into M1(X) endowed with the 1-Wasserstein metric.

5 The PDE version of the optimal transport prob-

lem and the p-laplacian approximation

In this section we see how, in the case when the cost function is linear, the optimaltransport problem can be rephrased using a PDE. As in the previous section weconsider an auxiliary bounded open set Ω such that the open r-neighbourhood ofX is contained in Ω, with r > diam(X); we assume also that Ω has a Lipschitzboundary.

Specifically, we are going to consider the following problem and its connectionswith (MK) and (ODE).

(PDE) Let f ∈ M(X) be a measure with f(X) = 0. Find µ ∈ M+(Ω) andu ∈ Lip1(Ω) such that:

(i) there exist smooth functions uh uniformly converging to u on X, equal to 0 on∂Ω and such that the functions ∇uh converge in [L2(µ)]n to a function ∇µusatisfying

|∇µu| = 1 µ-a.e. in Ω; (37)

(ii) the following PDE is satisfied in the sense of distributions

−∇ · (∇µuµ) = f in Ω. (38)

Given (µ, u) admissible for (PDE), we call µ the transport density (the reasonfor that soon will be clear) and ∇µu the tangential gradient of u.

Remark 5.1 (1) Choosing uh as test function in (38) gives

µ(Ω) =

∫Ω

|∇µ u|2 dµ = limh→∞

∫Ω

〈∇µu,∇uh〉 dµ (39)

= limh→∞

f(uh) = f(u),

31

so that the total mass µ(Ω) depends only on f and u. We will prove in Theorem 5.1that actually µ(Ω) depends only on f and that µ is concentrated on X.(2) It is easy to prove that, given (µ, u), there is at most one tangential gradient∇µu: indeed if ∇uh → ∇µu and ∇uh → ∇µu we have∫

Ω

∇µu · ∇µu dµ = limh→∞

∫Ω

∇µu · ∇uh dµ = limh→∞

f(uh) = f(u) = µ(Ω)

whence ∇µu = ∇µu µ-a.e. On the other hand, Example 5.1 below shows that tosome u there could correspond more than one measures µ solving (PDE). We willobtain uniqueness in Section 7 under absolute continuity assumptions on f+ or f−.

We will need the following lemma, in which the assumption that Ω is “largeenough” plays a role.

Lemma 5.1 If (µ, u) solves (PDE) then sptµ ⊂⊂ Ω.

Proof. Let Ω′ ⊂⊂ Ω′′ ⊂⊂ Ω be such that |x − y| > r = diam(X) for any x ∈ Xand any y ∈ Rn \ Ω′. We choose a minimum point x0 for the restriction of u to Xand define

w(x) := min

[u(x)− u(x0)]+, dist(x,Rn \ Ω′).

Then, it is easy to check that w(x) = u(x)− u(x0) on X and that w ≡ 0 on Rn \Ω′.Since

µ(Ω) = f(u) = f(u− u(x0)) = limε→0+

f(w ∗ ρε)

= limε→0+

∫Ω

∇µu · ∇w ∗ ρε dx ≤ µ(Ω′′)

we conclude that µ(Ω \ Ω′′) = 0.

In the following theorem we build a solution of (PDE) choosing µ as a transportdensity of (MK) with f0 = f+ and f1 = f−. As a byproduct, by Corollary 4.1 weobtain that the support of µ is contained in X and that µ is representable as theright side in (30).

Theorem 5.1 ((ODE) versus (PDE)) Assume that (ft, Et) with

spt∫ 1

0|Et| dt ⊂⊂ Ω solves (ODE) and let u be a maximal Kantorovich poten-

tial relative to f0, f1. Then, setting f = f0 − f1, the pair (µ, u) with µ =∫ 1

0|Et| dt

solves (PDE) with ∇µu = ∇u and, in particular, sptµ ⊂ X.

32

Conversely, if (µ, u) solves (PDE), setting f0 = f+ and f1 = f−, the measures(ft, Et) defined by

ft := f0 + t(f1 − f0), Et := −∇µuµ

solve (ODE) and sptµ ⊂ X. In particular

µ(X) = min (ODE) = min (MK).

Proof. By Corollary 4.1 we can assume that µ is representable as in (30) for asuitable optimal planning γ. Then, condition (i) in (PDE) with ∇µu = ∇u followsby (36). By (17) and condition (ii) of Theorem 4.4 we get

∇ · (∇µu|Ft|) = ∇ · (∇u|Ft|) = ft

with Ft = πt#((y − x)γ). Integrating in time and taking into account the definitionof µ, we obtain (38). Notice also that µ(X) = J(E).

If (ft, Et) are defined as above, we get

ft +∇ · Et = f1 − f0 −∇ · (∇µuµ) = f1 − f0 + f = 0.

Also in this case J(E) = µ(X).

Example 5.1 (Non uniqueness of µ) In general, given f , the measure whichsolves (PDE) for a given u ∈ Lip1(X) is not unique. As an example one can considerthe situation in Example 1.5, where µ can be concentrated either on the horizontalsides of the square or on the vertical sides of the square.

As shown in [22], an alternative construction can be obtained by solving theproblem “−∆∞u = f”, i.e. studying the following problems

−∇ · (|∇up|p−2∇up) = f in Ω

up = 0 on ∂Ω(40)

as p → ∞. In this case µ is the limit, up to subsequences, of |∇up|p−1Ln. Undersuitable regularity assumptions, Evans and Gangbo prove that µ = aLn for somea ∈ L1(Ω) and use (a, u) to build an optimal transport ψ; their construction isbased on a careful regularization corresponding to the one used in Theorem 4.2 inthe special case Et = aLn and ft = f0 + t(f1 − f0). However, since the goal inTheorem 4.2 is to build only an optimal planning, and not an optimal transport,our proof is much simpler and works in a much greater generality (i.e. no specialassumption on f).

Now we also prove that the limiting procedure of Evans and Gangbo leads tosolutions of (PDE), regardless of any assumption on the data f0, f1.

33

Theorem 5.2 Let up be solutions of (40) with p ≥ n+ 1. Then

(i) the measures Hp = |∇up|p−2∇upLn Ω are equi-bounded in Ω;

(ii) (up) are equibounded and equicontinuous in Ω;

(iii) If (H, u) = limj(Hpj , upj) with pj →∞, then (µ, u) solves (PDE) with µ = |H|.

Proof. (i) Using up as test function and the Sobolev embedding theorem we get∫Ω

|∇up|p dx = f(up) ≤ |f |(X)‖up‖∞ ≤ C

(∫Ω

|∇up0|p0 dx)1/p0

with p0 = n+ 1. Using Holder inequality we infer∫Ω

|∇up|p dx ≤ Cpp−1Ln(Ω)

p−p0p0(p−1) .

(ii) Follows by (i) and the Sobolev embedding theorem.(iii) Clearly −∇ ·H = f in Ω. By (i) and Holder inequality we infer

supq≥n+1

∫Ω

|∇u|q dx <∞

whence |∇u| ≤ 1 Ln-a.e. in Ω.We notice first that the functional

ν 7→∫

Ω

∣∣∣∣ ν|ν| − w∣∣∣∣2 φ d|ν|

is lower semicontinuous with respect to the weak convergence of measures for anynonnegative φ ∈ Cc(Ω), w ∈ [C(Ω)]n. The verification of this fact is straightforward:it suffices to expand the squares. Second, we notice that

limε→0+

lim supp→∞

∫Ω

∣∣∣∣ Hp

|Hp|− ∇uε

∣∣∣∣2 d|Hp| = 0

whenever uε ∈ Lip1(Rn) are smooth functions uniformly converging to u in X andequal to 0 on ∂Ω. Indeed, as |∇uε| ≤ 1, we have∫

Ω

∣∣∣∣ Hp

|Hp|− ∇uε

∣∣∣∣2 d|Hp| ≤ 2

∫Ω

|∇up|p−1

(1− ∇uε · ∇up

|∇up|

)dx

≤ 2

∫Ω

|∇up|p−2(|∇up|2 −∇uε · ∇up) dx+ ωp

= 2f(up)− 2f(uε) + ωp

34

where ωp = supt≥0 tp−1 − tp tends to 0 as p→∞.

Taking into account these two remarks, setting p = pj and passing to the limitas j →∞ we obtain

limε→0+

∫Ω

∣∣∣∣ H|H| − ∇uε∣∣∣∣2 φ d|H| = 0.

for any nonnegative φ ∈ Cc(Ω). This implies that ∇uε converge in L2loc(|H|) to the

Radon–Nikodym derivative H/|H|. Since |∇uε| ≤ 1 we have also convergence in[L2(|H|)]n.

We now set µ = |H| and∇µu = limε∇uε, so that H = ∇µuµ and (38) is satisfied.

6 Existence of optimal transport maps

In this section we prove the existence of optimal transport maps in the case whenc(x, y) = |x − y| and f0 is absolutely continuous with respect to Ln, followingessentially the original Sudakov approach and filling a gap in his original proof(see the comments after Theorem 6.1). Using a maximal Kantorovich potential wedecompose almost all of X in transport rays and we build an optimal transport mapsby gluing the 1-dimensional transport maps obtained in each ray. The assumptionf0 << Ln is used to prove that the conditional measures f0C within any transportray C are non atomic (and even absolutely continuous with respect to H1 C), sothat Theorem 3.1 is applicable. Therefore the proof depends on the following tworesults whose proof is based on the countable Lipschitz property of ∇u stated inTheorem 4.3.

Theorem 6.1 Let B ∈ B(X) and let π : B → Sc(X) be a Borel map satisfying theconditions

(i) π(x) ∩ π(x′) = ∅ whenever π(x) 6= π(x′);

(ii) x ∈ π(x) for any x ∈ B;

(iii) the direction ν(x) of π(x) is a Sn−1-valued countably Lipschitz map on B.

Then, for any measure λ ∈ M+(X) absolutely continuous with respect to Ln B,setting µ = π#λ ∈ M+(Sc(X)), the measures λC of Theorem 9.1 are absolutelycontinuous with respect to H1 C for µ-a.e. C ∈ Sc(X).

Proof. Being the property stated stable under countable disjoint unions we mayassume that

35

(a) there exists a unit vector ν such that ν(x) · ν ≥ 12

for any x ∈ B;

(b) ν(x) is a Lipschitz map on B;

(c) B is contained in a strip

x : a− b ≤ x · ν ≤ a

with b > 0 sufficiently small (depending only on the Lipschitz constant of ν)and π(x) intersects the hyperplane x : x · ν = a.

Assuming with no loss of generality ν = en and a = 0, we write x = (y, z) withy ∈ Rn−1 and z < 0. Under assumption (a), the map T : π(B) → Rn−1 whichassociates to any segment π(x) the vector y ∈ Rn−1 such that (y, 0) ∈ π(x) is welldefined. Moreover, by condition (i), T is one to one. Hence, setting f = T π :B → Rn−1,

ν = T#µ = f#λ, C(y) = T−1(y) ⊃ f−1(y)

and representing λ = ηy ⊗ ν with ηy = λC(y) ∈ M1(f−1(y)), we need only to provethat ηy << H1 C(y) for ν-a.e. y.

To this aim we examine the Jacobian, in the y variables, of the map f(y, t).Writing ν = (νy, νt), we have

f(y, t) = y + τ(y, t)νy(y, t) with τ(y, t) = − t

νt(y, t).

Since νt ≥ 1/2 and τ ≤ 2b on B we have

det (∇yf(y, t)) = det

(Id+ τ∇yνy +

t

ν2t

∇yνt ⊗ νy)> 0

if b is small enough, depending only on Lip(ν).Therefore, the coarea factor

Cf :=

√∑A

det2A

(where the sum runs on all (n−1)× (n−1) minors A of ∇f) of f is strictly positiveon B and, writing λ = gLn with g = 0 out of B, Federer’s coarea formula (see forinstance [4], [38]) gives

λ =g

CfCfLn =

g

CfH1 f−1(y)⊗ Ln−1 = η′y ⊗ ν ′

36

with L = y ∈ Rn−1 : H1(f−1(y)) > 0 and

η′y :=

gCf H

1 f−1(y)∫f−1(y)

g/Cf dH1, ν ′ =

(∫f−1(y)

g

CfdH1

)Ln−1 L.

By Theorem 9.2 we obtain ν = ν ′ and ηy = η′y for ν-a.e. y, and this concludes theproof.

Remark 6.1 (1) In [42] V.N.Sudakov stated the theorem above (see Proposition 78therein) for maps π : B → So(X) without the countable Lipschitz assumption (iii)and also in a greater generality, i.e. for a generic “Borel decomposition” of thespace in open affine regions, even of different dimensions. However, it turns out thatthe assumption (iii) is essential even if we restrict to 1-dimensional decompositions.Indeed, G.Alberti, B.Kirchheim and D.Preiss [3] have recently found an example of acompact family of open and pairwise disjoint segments in R3 such that the collectionB of the midpoints of the segments has strictly positive Lebesgue measure. In thiscase, of course, if λ = Ln B the conditional measures λC are unit Dirac massesconcentrated at the midpoint of C, so that the conclusion of Theorem 6.1 fails.

If n = 2 it is not hard to prove that actually condition (iii) follows by (ii), sothat the counterexample mentioned above is optimal.(2) Since any open segment can be approximated from inside by closed segments,by a simple approximation argument one can prove that Theorem 6.1 still holds aswell for maps π : B → So(X) or for maps with values into half open [[x, y[[ segments.

Also the assumption (i) that the segments do not intersect can be relaxed. Forinstance we can assume that the segments can intersect only at their extreme points,provided the collection of these extreme points is Lebesgue negligible. This is pre-cisely the content of the next corollary; again, a counterexample in [28] shows thatthis property need not be true for a generic compact family of disjoint line segmentsin Rn, n ≥ 3.

Corollary 6.1 (Negligible extreme points) Let u ∈ Lip1(X). Then the collec-tion of the extreme points of the transport rays is Lebesgue negligible.

Proof. We prove that the collection L of all left extreme points is negligible, theproof for the right ones being similar. Let B = L \ Σu and set

π(x) := [[x, x− r(x)

2∇u(x)]]

where r(x) is the length of the transport ray emanating from x (this ray is uniquedue to the differentiability of u at x). By Theorem 4.3 the map π has the countable

37

Lipschitz property on B and, by construction, π(x) ∩ π(x′) = ∅ whenever x 6= x′.By Theorem 6.1 we obtain

λ =

∫Sc(X)

λC dν

with λ = Ln B, ν = π#λ and λC << H1 C probability measures concentratedon π−1(C) for ν-a.e. C. Since π−1(C) contains only one point for any C ∈ π(B) weobtain λC = 0 for ν-a.e. C, whence λ(B) = ν(Sc(X)) = 0.

Theorem 6.2 (Sudakov) Let f0, f1 ∈ M1(X) and assume that f0 << Ln. Thenthere exists an optimal transport ψ mapping f0 to f1. Moreover, if f1 << Ln we canchoose ψ so that ψ−1 is well defined f1-a.e. and ψ−1

# f1 = f0.

Proof. Let γ be an optimal planning. In the first two steps we assume that

for f0-a.e. x there exists y 6= x such that (x, y) ∈ sptγ. (41)

This condition holds for instance if f0 ∧ f1 = 0 because in this case any optimalplanning γ does not charge the diagonal ∆ of X × X, i.e. γ(∆) = 0 (otherwiseh = π0#(χ∆γ) = π1#(χ∆γ) would be a nonzero measure less than f0 and f1).

Let u ∈ Lip1(X) be a maximal Kantorovich potential given by Corollary 2.1, i.e.a function satisfying

u(x)− u(y) = |x− y| γ-a.e. in X ×X (42)

for any optimal planning γ and let Tu be the transport set relative to u (see Defini-tion 4.1). In the following we set

X = x ∈ X \ Σu : ∃y 6= x s.t. (x, y) ∈ sptγ .

By (41) and the absolute continuity assumption we know that f0 is concentrated

on X. Notice also that for any x ∈ X there exists a unique closed transport raycontaining x: this follows by the fact that any x ∈ X is a differentiability point ofu and by Proposition 4.2.Step 1. We define r : X × X → Sc(X) as the map which associates to any pair(x, y) the closed transport ray containing [[x, y]]. By (42) the map r is well defined

γ-a.e. out of the diagonal ∆; moreover, being f0 concentrated on X, r is also welldefined γ-a.e. on ∆. Hence, according to Theorem 9.1 we can represent

γ = γC ⊗ ν with ν := r#γ

and (42) givesu(x)− u(y) = |x− y| γC-a.e. in X ×X (43)

38

for ν-a.e. C ∈ Sc(X). By the sufficiency part in Corollary 2.1 we infer that γC is anoptimal planning relative to the probability measures

f0C := π0#γC , f1C := π1#γC

for ν-a.e. C ∈ Sc(X). By (56) we infer

πt#γ(B) =

∫Sc(X)

πt#γC(B) dν(C) ∀t ∈ [0, 1], B ∈ B(X). (44)

Notice also that

F1(f0, f1) =

∫Sc(X)

F1(f0C , f1C) dν(C). (45)

Indeed,

I(γ) =

∫X×X

|x− y| dγ =

∫Sc(X)

∫X×X

|x− y| dγC dν(C) =

∫Sc(X)

I(γC) dν(C)

and we know by (43) that γC is optimal for ν-a.e. C ∈ Sc(X).

Step 2. We denote by π : X → Sc(X) the natural map, so that π(x) is the closed

transport ray containing x. Since r = π π0 on X ×X we obtain

ν = r#γ = π#(π0#γ) = π#f0.

Moreover (44) gives

f0(B) =

∫Sc(X)

f0C(B) dν(C) ∀B ∈ B(X). (46)

Notice that the segments π(x), x ∈ X, can intersect only at their right extremepoint and that, by Corollary 6.1, the collection of these extreme points is Lebesguenegligible. As a consequence of (46) with ν = π#f0 and Remark 6.1(2), the measuresf0C are absolutely continuous with respect to H1 C for µ-a.e. C ∈ Sc(X). Hence,by Theorem 3.1, for µ-a.e. C ∈ Sc(X) we can find a nondecreasing map ψC : C → C(this notion makes sense, since C is oriented) such that ψC#f0C = f1C . Notice alsothat ψ−1 is well defined f1C-a.e., if also f1C << H1 C.

Taking into account that the closed transport rays in π(X) are pairwise disjoint

in X, we can glue all the maps ψC to produce a single Borel map ψ : X → X. Themap ψ is Borel because we have been able to exhibit the one dimensional transportmap constructively and because of the Borel property of the maps C 7→ fiC (see(15) and (54)); the simple but boring details are left to the reader.

39

Since ψ#f0C = ψC#f0C = f1C for µ-a.e. C ∈ Sc(X), taking (44) into account weinfer

ψ#f0 =

∫Sc(X)

ψ#f0C dν(C) =

∫Sc(X)

f1C dν(C) = f1.

Finally ψ is an optimal transport because (45) holds and any ψC is an optimaltransport.Step 3. In this step we show how the assumption (41) can be removed. We define

X ′ := x ∈ X : (x, x) ∈ sptγ and (x, y) /∈ sptγ ∀y 6= x

and L = (x, x) : x ∈ X ′. Then, we set f ′0 = f0 X ′ and f ′′0 = f0 X ′′, withX ′′ = X \X ′, and

f ′1 := π1#(γ L), f ′′1 := f1 − f ′1.

Since L ⊂ ∆, we have f ′1 = π0#(γ L) = f ′0, hence we can choose on X ′ the mapψ1 = Id as transport map to obtain (ψ1)#f

′0 = f ′1. Since f ′′0 is concentrated on X ′′

the condition (41) is satisfied with f ′′0 in place of f0 and we can find an optimaltransport map ψ2 : X ′′ → X such that (ψ2)#f

′′0 = f ′′1 . Gluing these two transport

maps we obtain a transport map ψ such that ψ#f0 = f1; since, by construction,

u(x)− u(ψ(x)) = |x− ψ(x)| f0-a.e. in X

we infer that ψ is optimal.

This proof strongly depends on the strict convexity of the euclidean distance,which provides the first and second order differentiability properties of the potentialu on the transport set Tu. Notice also that if c(x, y) = ‖x − y‖ and the norm ‖ · ‖is not strictly convex, then the “transport rays” need not be one-dimensional and,to our knowledge, the existence of an optimal transport map is an open problemin this situation. Indeed, this existence result is stated by Sudakov in [42] but hisproof is faulty, for the reasons outlined in Remark 6.1(1).

Under special assumptions on the data f0, f1 (absolute continuity, separatedsupports, Lipschitz densities) Evans and Gangbo provided in [22] a proof based ondifferential methods of the existence of optimal transport maps. For stricly convexnorms, the first fully rigorous proofs of the existence of an optimal transport mapunder the only assumption that f0 << Ln have been given in [15] and [43]. As inTheorem 6.2 the proof is strongly based on the differentiability of the directions oftransport rays.

40

7 Regularity and uniqueness of the transport

density

In this section we investigate the regularity and the uniqueness properties of thetransport density µ arising in (PDE). Recall that, as Theorem 5.1 shows, any such

measure µ can also be represented as∫ 1

0|Et| dt for a suitable optimal pair (ft, Et)

for (ODE) (even with Et independent of t). In turn, by Corollary 4.1, any optimal

measure µ =∫ 1

0|Et| dt can be represented as

µ =

∫ 1

0

πt#(|y − x|γ) dt (47)

for a suitable optimal planning γ. For this reason, in the following we restrict ourattention to the representation (47), valid for any optimal measure µ for (PDE).

A more manageable formula for µ, first considered by G.Bouchitte andG.Buttazzo in [14], is given in the following elementary lemma.

Lemma 7.1 Let µ be as in (47). Then

µ(B) =

∫X×X

Length (]]x, y[[∩B) dγ(x, y) ∀B ∈ B(X). (48)

Proof. For any Borel set B we have

µ(B) =

∫ 1

0

∫π−1t (B)

|y − x| dγ(x, y) dt =

∫X×X

|y − x|∫ 1

0

χπ−1t (B) dt dγ(x, y)

=

∫X×X

Length (]]x, y[[∩B) dγ(x, y).

A direct consequence of (48) is that µ is concentrated on the transport set (sinceγ-a.e. segment ]]x, y[[ is contained in a transport ray); another consequence is thedensity estimate

µ(Br(x))

2r≤ f0(X)f1(X) (49)

because Length(]]x, y[[∩Br(x)) ≤ 2r for any ball Br(x) and any segment ]]x, y[[. Thefirst results on µ that we state, proved by A.Pratelli in [34] (see also [19]), relatethe dimensions of f0 and f1 to the dimension of µ and show that necessarily µ isabsolutely continuous with respect to the Lebesgue measure if f0 (or, by symmetry,f1) has this property.

41

Theorem 7.1 Assume that for some k > 0 we have

supr∈(0,1)

f0(Br(x))

rk<∞ f0-a.e. in X.

Then µ has the same property. In particular µ << Ln if f0 << Ln.

The proof of Theorem 7.1 is based on (48); the density estimate on µ is achievedby a careful analysis of the transport rays crossing a generic ball Br(x). A similaranalysis proved the following summability estimate (see [19]).

Theorem 7.2 Assume that f0 = g0Ln and f1 = g1Ln with g0, g1 ∈ Lp(X), p > 1.Then µ = hLn with h ∈ L∞(X) if p =∞, h ∈ Lq(X) for any q < p if p <∞.

It is not known whether g0, g1 ∈ Lp implies h ∈ Lp for p <∞.

Definition 7.1 (Hausdorff dimension of a measure) Let µ ∈ M+(X). TheHausdorff dimension H-dim(µ) is the supremum of all k ≥ 0 such that µ << Hk.

In other words µ(B) = 0 whenever Hk(B) = 0 for some k < H-dim(µ) and forany k > H-dim(µ) there exists a Borel set B with µ(B) > 0 and Hk(B) = 0. Noticethat if µ is made of pieces of different dimensions, then H-dim(µ) is the smallest ofthese dimensions.

Using the density estimates and the implications (see for instance Theorem 2.56of [4] or [38]; here t > 0 and k > 0)

lim supr→0+

µ(Br(x))

ωkrk≥ t ∀x ∈ B =⇒ µ(B) ≥ tHk(B) (50)

lim supr→0+

µ(Br(x))

ωkrk≤ t ∀x ∈ B =⇒ µ(B) ≤ 2ktHk(B) (51)

we can prove a natural lower bound on the Hausdorff dimension of µ.

Corollary 7.1 Let µ be a transport density. Then

H-dim(µ) ≥ max 1,H-dim(f0),H-dim(f1) .

Proof. By (49) we infer that µ has finite (and even bounded) 1-dimensional densityat any point. In particular (51) gives µ(B) = 0 whenever H1(B) = 0, so thatH-dim(µ) ≥ 1.

Let k = H-dim(f0) and k′ < k; then f0 has finite k′-dimensional density f0-a.e.,otherwise by (50) the set where the density is not finite would be Hk′-negligibleand with strictly positive f0-measure. By Theorem 7.1 we infer that µ has finite

42

k′-dimensional density µ-a.e. By the same argument used before with k′ = 1 weobtain that H-dim(µ) ≥ k′ and therefore, since k′ < k is arbitrary, H-dim(µ) ≥ k.A symmetric argument proves that H-dim(µ) ≥ H-dim(f1).

We conclude this section proving the uniqueness of the transport density, underthe assumption that either f0 << Ln or f1 << Ln (see Example 5.1 for a nonunique-ness example if neither f0 nor f1 are absolutely continuous). Similar results havebeen first announced by Feldman and McCann (see [25]).

We first deal with the one dimensional case, where this absolute continuity as-sumption is not needed.

Lemma 7.2 Let µ be a transport density in X ⊂ R and assume that the interior(a, b) of X is a transport ray. Then µ = hL1 in (a, b) with

h(t) = σf ((a, t)) + c L1-a.e. in (a, b)

for some constant c, where f = f1 − f0 and σ = 1 if u′ = 1 in (a, b), σ = −1 ifu′ = −1 in (a, b). Moreover

c =F1(f0, f1)− σ

∫(a,b)

(b− t) df(t)

b− a.

Proof. The proof of the first statement is based on the equation µ′ = σf and on asmoothing argument. By integrating both sides we get

c(b− a) = µ(X)− σ∫ b

a

f ((a, t)) dt = µ(X)− σ∫ b

a

∫(a,t)

1 df(τ) dt

= µ(X)− σ∫

(a,b)

∫ b

τ

1 dt df(τ) = µ(X)− σ∫

(a,b)

(b− τ) df(τ).

The conclusion is achieved taking into account that µ(X) = min (MK) = F1(f0, f1).

Theorem 7.3 (Uniqueness) Assume that either f0 or f1 are absolutely contin-uous with respect to Ln. Then the transport density is absolutely continuous andunique.

Proof. We already know from Theorem 7.1 that any transport density is absolutelycontinuous and we can assume that f0 << Ln. By Corollary 4.1 we know thatthe class of transport densities relative to (f0, f1) is equal to the class of transportdensities relative to (f0 − h, f1 − h) where h is any measure in M+(X) such that

43

h ≤ f0 ∧ f1 (indeed, this subtraction does not change the velocity field Et). Hence,it is not restrictive to assume that f0 ∧ f1 = 0.

We can assume that µ is representable as in (30) for a suitable optimal planningγ. Adopting the same notation of the proof of Theorem 6.1, we have

γ = γC ⊗ ν

where ν = π#f0 does not depend on γ (and here the absolute continuity assumptionon f0 plays a crucial role) and γC is an optimal planning relative to the measuresf0C = π0#γC , f1C = π1#γC for ν-a.e. C. In particular

µ =

∫ 1

0

πt# (|y − x|γC ⊗ ν) dt =

∫ 1

0

πt#(|y − x|γC) dt⊗ ν.

Hence, it suffices to show that the measures

µC :=

∫ 1

0

πt#(|y − x|γC) dt

do not depend on γ (up to ν-negligible sets of course). To this aim, taking intoaccount Lemma 7.2, it suffices to show that f0C and f1C do not depend on γ.

Indeed, we already know from (46) and Theorem 9.2 that f0C do not depend onγ.

The argument for f1C is more involved. First, since γ(∆) = 0 (due to theassumption f0 ∧ f1 = 0), we have |x − y| > 0 γ-a.e., and therefore |x − y| > 0γC-a.e. for ν-a.e. C ∈ Sc(X). As a consequence, setting C = [[xC , yC ]], we obtainthat f1C(xC) = 0 for ν-a.e. C. Second, we examine the restriction f ′1C of f1C to therelative interior of C noticing that (44) gives

f1 Tu =

∫Sc(X)

f1C Tu dν(C) =

∫Sc(X)

f ′1C dν(C)

because Tu ∩ C is the relative interior of C for any C ∈ π(X). By Theorem 9.2 weobtain that f ′1C depend only on f1, Tu and ν.

In conclusion, since

f1C = f ′1C + (1− f ′1C(X)) δyC

we obtain that f1C do not depend on γ as well.

44

8 The Bouchitte–Buttazzo mass optimization

problem

In [13, 14] Bouchitte and Buttazzo consider the following problem. Given f ∈M(X)with f(X) = 0, they define

E(µ) := inf

∫X

1

2|∇v|2 dµ− f(v) : v ∈ C∞(X)

for any µ ∈M+(X). Then, they raised the following mass optimization problem.

(BB) Given m > 0, maximize E(µ) among all measures µ ∈ M+(X) with µ(X) =m.

A possible physical interpretation of this problem is the following: we may imag-ine that µ represents the conductivity of some material, thinking that the conduc-tivity (i.e. the inverse resistivity) is zero out of sptµ; accordingly we may imaginethat f = f+ − f− is a balanced density of positive and negative charges. Then−E(µ) represents the heating corresponding to the given conductivity, so that thereis an obvious interest in maximizing E(µ) and, for a minimizer u, the (formal) firstvariation of the energy

−∇ · (∇uµ) = f+ − f−

corresponds to Ohm’s law.More generally, problems of this sort appear in Shape Optimization and Linear

Elasticity. In these cases u is no longer real valued and general energies of the form

F(u) =

∫X

f (x,∇u(x)) dµ(x) (52)

must be considered in order to take into account light structures, corresponding tomass distributions not absolutely continuous with respect to Ln. This was the mainmotivation for Bouchitte, Buttazzo and Seppecher [12] in their development of ageneral differential calculus with measures. This calculus, based on a suitable con-cept of tangent space to a measure, enables the study of the relaxation of functionals(52) and provides an explicit formula for their lower semicontinuous envelope.

Coming back to the scalar problem (BB), a remarkable fact discovered in [13, 14]is its connection with the Monge–Kantorovich optimal transport problem, in thecase when c(x, y) = |x − y|. It turns out that the solutions of (BB) are in one toone correspondence with constant multiples of transport densities for (PDE) (or,equivalently, for (MK) with f0 = f+ and f1 = f−).

45

As a byproduct we have that µ is unique and absolutely continuous with respectto Ln if either f0 or f1 are absolutely continuous with respect to Ln.

Theorem 8.1 ((PDE) versus (BB)) Let (µ, u) be a solution of (PDE) and setm = µ(X). Then µ = m

mµ solves (BB) and any solution of (BB) is representable in

this way. Moreover

max(BB) = −1

2

m2

m.

Proof. Setting v = λuh (with uh as in the definition of (PDE)), for any ν ∈M+(X)with ν(X) = m we estimate

E(ν) ≤ m

2λ2 − λf(uh)

so that, letting h→∞ and taking into account (39), we obtain

E(ν) ≤ m

2λ2 − mλ.

By minimizing with respect to λ we obtain E(ν) ≤ −m2/(2m).On the other hand, using Young inequality and choosing λ so that λm = m we

can estimate∫X

1

2|∇v|2 dµ− f(v) ≥

∫X

〈∇v, λ∇µu〉 dµ− λ2m

2− f(v)

= −∇ · (∇µuµ)(v)− f(v)− 1

2

m2

m

= −1

2

m2

m

for any v ∈ C∞(X). This proves that µ is optimal for (BB).Conversely, it has been proved in [13, 14] that for any solution σ of (BB) there

exists v ∈ Lip(X) such that |∇σv| = m/m σ-a.e. and

−∇ · (∇σvσ) = f,

where ∇σv is understood in the Bouchitte–Buttazzo sense. But since (see [14] again)∫X

|∇σv|2 dσ = inf

lim infh→∞

∫X

|∇vh|2 dσ : vh → v uniformly, vh ∈ C∞(X)

we obtain a sequence of smooth functions vh uniformly converging to v such that∇vh → ∇σv in [L2(σ)]n. Setting u = m

mv and µ = m

mσ, this proves that (σ, u) solves

(PDE).

46

9 Appendix: some measure theoretic results

In this section we list all the measure theoretic results used in the previous section;reference books for the content of this section are [37], [18] and [4].

Let us begin with some terminology.

• (Measures) Let X be a locally compact and separable metric space. We denoteby [M(X)]m the space of Radon measures with values in Rm and with finite totalvariation in X. We recall that the total variation measure of µ = (µ1, . . . , µm) ∈[M(X)]m is defined by

|µ|(B) := sup

∞∑i=1

|µ(Bi)| : B =∞⊔i=1

Bi, Bi ∈ B(X)

and belongs to M+(X). By Riesz theorem the space [M(X)]m endowed with thenorm ‖µ‖ = |µ|(X) is isometric to the dual of [C(X)]m. The duality is given by theintegral, i.e.

〈µ, u〉 :=m∑i=1

∫X

ui dµi.

Recall also that, for µ ∈M+(X) and f ∈ [L1(X,µ)]m, the measure fµ ∈ [M(X)]m

is defined by

fµ(B) :=

∫B

f dµ ∀B ∈ B(X)

and |fµ| = |f |µ.

• (Push forward of measures) Let µ ∈ [M(X)]m, let Y be another metric spaceand let f : X → Y be a Borel map. Then the push forward measure f#µ ∈ [M(Y )]m

is defined byf#µ(B) := µ

(f−1(B)

)∀B ∈ B(Y )

and satisfies the more general property∫Y

u df#µ =

∫X

u f dµ for any bounded Borel function u : Y → R.

It is easy to check that |f#µ| ≤ f#|µ|.• (Support) We say that µ ∈ [M(X)]m is concentrated on a Borel set B if |µ|(X \B) = 0 and we denote by sptµ the smallest closed set on which µ is concentrated(the existence of a smallest set follows by the separability of X, precisely by theLindelof property). The support is also given by the formula (sometimes taken asthe definition)

sptµ = x ∈ X : |µ|(B%(x)) > 0 ∀% > 0 .

47

• (Convolution) If X ⊂ Rn, µ ∈ [M(X)]m and ρ ∈ C∞c (Rn) (for instance aconvolution kernel); we define

µ ∗ ρ(x) :=

∫X

ρ(x− y) dµ(y) x ∈ Rn.

Notice that µ ∗ ρ ∈ [C∞(Rn)]m because Dα(µ ∗ ρ) = (Dαρ) ∗ µ for any multiindex αand

sup |Dα(µ ∗ ρ)| ≤ sup |Dαρ|‖µ‖.

Moreover, using Jensen’s inequality it is easy to check (see for instance Theo-rem 2.2(ii) of [4]) that∫

C

|µ ∗ ρ| dx ≤ |µ|(Cρ) where Cρ := x : dist(x,C) ≤ diam(sptρ) (53)

for any closed set C ⊂ Rn.

• (Weak convergence) Assume that X is a compact metric space. We say that afamily of measures (µh) ⊂M(X) weakly converges to µ ∈M(X) if

limh→∞

∫X

u dµh =

∫X

u dµ ∀u ∈ C(X)

i.e., if µh weakly∗ converge to µ as elements of the dual of C(X). In the case Xis locally compact and separable, the concept is analogous, simply replacing C(X)the the subspace Cc(X) of functions with compact support. It is easy to check thatµ ∗ ρεLn weakly converge to µ in Rn whenever ρ is a convolution kernel.

We mention also the following criterion for the weak convergence of positivemeasures (see for instance Proposition 1.80 of [4]): if µh, µ ∈M+(X) satisfy

lim infh→∞

µh(A) ≥ µ(A) ∀A ⊂ X open

andlim suph→∞

µh(X) ≤ µ(X)

then∫Xφ dµh →

∫Xφ dµ for any bounded continuous function φ : X → R.

• (Measure valued maps) Let X, Y be locally compact and separable metricspaces. Let y 7→ λy be a map which assigns to any y ∈ Y a measure λy ∈ [M(X)]m.We say that λy is a Borel map if y 7→ λy(A) is a real valued Borel map for any openset A ⊂ X. By a monotone class argument it can be proved that y 7→ λy is a Borelmap if and only if

y 7→ λy (x : (y, x) ∈ B) is a Borel map for any B ∈ B(Y ×X). (54)

48

Moreover y 7→ |λy| is a Borel map whenever y 7→ λy is Borel (detailed proofs arein §2.5 of [4]). If µ ∈ M+(Y ), analogous statements hold for B(Y )µ-measurablemeasure valued maps, where B(Y )µ is the σ-algebra of µ-measurable sets (in thiscase one has to replace B(Y ×X) by B(Y )µ ⊗ B(X) in (54)).

• (Decomposition of measures) The following result plays a fundamental role inthese notes; it is also known as disintegration theorem.

Theorem 9.1 (Decomposition of measures) Let X, Y be locally compact andseparable metric spaces and let π : X → Y be a Borel map. Let λ ∈ [M(X)]m andset µ = π#|λ| ∈ M+(Y ). Then there exist measures λy ∈ [M(X)]m such that

(i) y 7→ λy is a Borel map and |λy| is a probability measure in X for µ-a.e. y ∈ Y ;

(ii) λ = λy ⊗ µ, i.e.

λ(A) =

∫Y

λy(A) dµ(y) ∀A ∈ B(X); (55)

(iii) |λy| (X \ π−1(y)) = 0 for µ-a.e. y ∈ Y .

The representation provided by Theorem 9.1 of λ can be used sometimes tocompute the push forward of λ. Indeed,

f#(λy ⊗ µ) = f#λy ⊗ µ (56)

for any Borel map f : X → Z, where Z is any other compact metric space. Noticealso that

|λ| = |λy| ⊗ µ. (57)

Indeed, the inequality ≤ is trivial and the opposite one follows by evaluating bothmeasures at B = X, using the fact that |λy|(X) = 1 for µ-a.e. y.

In the case when m = 1 and λ ∈ M+(X) the proof of Theorem 9.1 is availablein many textbooks of measure theory or probability (in this case λy are the the so-called conditional probabilities induced by the random variable π, see for instance[18]); in the vector valued case one can argue component by component, but thefact that |λy| are probability measures is not straightforward.

In Theorem 2.28 of [4] the decomposition theorem is proved in the case whenX = Y ×Z is a product space and π(y, z) = y is the projection on the first variable;in this situation, since λy are concentrated on π−1(y) = y × Z, it is sometimesconvenient to consider them as measures on Z, rather than measures on X, writing(55) in the form

λ(B) =

∫Y

λy (z : (y, z) ∈ B) dµ(y) ∀B ∈ B(X). (58)

49

Once the decomposition theorem is known in the special case X = Y × Z andπ(y, z) = z the general case can be easily recovered: it suffices to embed X into theproduct Y ×X through the map f(x) = (π(x), x) and to apply the decompositiontheorem to λ = f#λ.

Now we discuss the uniqueness of λx and µ in the representation λ = λx ⊗ µ.For simplicity we discuss only the case of positive measures.

Theorem 9.2 Let X, Y and π be as in Theorem 9.1; let λ ∈M+(X), µ ∈M+(Y )and let y 7→ ηy be a Borel M+(X)-valued map defined on Y such that

(i) λ = ηy ⊗ µ, i.e. λ(A) =∫Yηy(A) dµ(y) for any A ∈ B(X);

(ii) ηy(X \ π−1(y)) = 0 for µ-a.e. y ∈ Y .

Then ηy are uniquely determined µ-a.e. in Y by (i), (ii) and, setting B =y : ηy(X) > 0, the measure µ B is absolutely continuous with respect to π#λ.In particular

µ B

π#ληy = λy for π#λ-a.e. y ∈ Y (59)

where λy are as in Theorem 9.1.

Proof. Let ηy, η′y be satisfying (i), (ii). We have to show that ηy = η′y for µ-a.e. y.

Let (An) be a sequence of open sets stable by finite intersection which generates theBorel σ-algebra of X. Choosing A = An ∩ π−1(B), with B ∈ B(Y ), in (i) gives∫

B

ηy(An) dµ(y) =

∫B

η′y(An) dµ(y).

Being B arbitrary, we infer that ηy(An) = η′y(An) for µ-a.e. y, and therefore thereexists a µ-negligible set N such that ηy(An) = η′y(An) for any n ∈ N and anyy ∈ Y \N . By Proposition 1.8 of [4] we obtain that ηy = η′y for any y ∈ Y \N .

Let B′ ⊂ B be any π#λ-negligible set; then π−1(B′) is λ-negligible and therefore(ii) gives

0 =

∫Y

ηy(π−1(B′)

)dµ(y) =

∫B′ηy(X) dµ(y).

As ηy(X) > 0 on B ⊃ B′ this implies that µ(B′) = 0. Writing µ B = hπ#λ weobtain λ = hηy ⊗ π#λ and λ = λy ⊗ π#λ. As a consequence (59) holds.

• (Young measures) These measures, introduced by L.C.Young, arise in a naturalway in the study of oscillatory phenomena and in the analysis of weak limits ofnonlinear quantities (see [31] for a comprehensive introduction to this wide topic).

50

Specifically, assume that we are given compact metric spaces X, Y and a se-quence of Borel maps ψh : X → Y ; in order to understand the limit behaviour ofψh we associate to them the measures

γψh= (Id× ψh)#µ =

∫δψh(x) dµ(x)

and we study their limit inM(X×Y ). Assuming, possibly passing to a subsequence,that γψh

→ γ, due to the fact that π0#(γψh) = µ for any h we obtain that π0#γ = µ,

hence according to Theorem 9.1 we can represent γ as

γ = γx ⊗ µ

for suitable probability measures γx in Y , with x 7→ γx Borel. The family of measuresγx is called Young limit of the sequence (ψh); once the Young limit is known, we cancompute the w∗-limit of γ(ψh) in L∞(X,µ) for any γ ∈ C(Y ): indeed, using testfunction of the form φ(x)γ(y), we easily obtain that the limit (in the dual of C(X)and therefore in L∞(X,µ)) is given by

Lγ(x) :=

∫Y

γ(y) dγx(y).

We will use the following two well known results of the theory of Young measures.

Theorem 9.3 (Approximation theorem) Let γ ∈ M+(X × Y ) and write γ =γx ⊗ µ with µ = π0#γ. Then, if µ has no atom, there exists a sequence of Borelmaps ψh : X → Y such that

γx ⊗ µ = limh→∞

δψh(x) ⊗ µ.

Moreover, we can choose ψh in such a way that the measures ψh#µ have no atomas well.

Proof. (Sketch) We assume first that γx = γ is independent of x. By approximationwe can also assume that

γ =

p∑i=1

piδyi

for suitable yi ∈ Y and pi ∈ [0, 1]. Let Qh, h ∈ Zn, be a partition of Rn in cubes withside length 1/h; since µ has no atom, by Lyapunov theorem we can find a partitionX1h, . . . , X

ph of X ∩Qh such that µ(X i

h) = piµ(X ∩Qh). Then we define

ψh = pi on X ih.

51

For any φ ∈ C(X) and ϕ ∈ C(Y ) we have∫X×Y

φϕd(δψh⊗ µ) =

∑h∈Zn

p∑i=1

ϕ(yi)

∫Xi

h

φ dµ

∼∑h∈Zn

p∑i=1

piϕ(yi)

∫X∩Qh

φ dµ =

∫X×Y

φϕdµ× γ

and this proves thatµ× γ = lim

h→∞δψh⊗ µ.

If we want ψh#µ to be non atomic, the above construction needs to be modified onlyslightly: it suffices to take small balls Bi centered at yi and to define ψh equal to φion X i

h, where φi : Rn → Bi is any Borel and one to one map.If γx is piecewise constant (say in a canonical subdivision of X induced by a

partition in cubes) then we can repeat the local construction above in each regionwhere γx is constant. Moreover we can approximate Lipschitz functions γx (withrespect to the 1-Wasserstein distance) with piecewise constant ones. Finally, anyBorel map γx can be approximated by Lipschitz ones through a convolution.

We will also need the following result.

Lemma 9.1 Let ψh, ψ : X → Y be Borel maps and µ ∈ M1(X). Then ψh → ψµ-a.e. if and only if

γh := δψh(x) ⊗ µ(x)→ γ := δψ(x) ⊗ µ(x)

weakly in M(X × Y ).

Proof. Assume that ψh → ψ µ-a.e. Since∫X×Y

ϕdγϕ =

∫X

ϕ(x, φ(x)) dµ(x) ∀ϕ ∈ C(X × Y )

the dominated convergence theorem gives that γh → γ. To prove the oppositeimplication, fix ε > 0 and a compact set K such that ψ|K is continuous and µ(X \K) < ε. We use as test function

ϕ(x, y) = χK(x)γ (y − ψ(x))

with γ(t) = 1 ∧ |t|/ε (by approximation, although not continuous in X × Y , this isan admissible test function) to obtain

limh→∞

µ (x ∈ K : |ψh(x)− ψ(x)| ≥ ε) = 0.

52

The conclusion follows letting ε→ 0+.

With a similar proof one can obtain a slightly more general result, namely

γh := δψh(x) ⊗ µh → γ := δψ(x) ⊗ µ.

implies ψh → ψ µ-a.e. provided |µh − µ|(X)→ 0.

References

[1] G.Alberti: On the structure of the singular set of convex functions. Calc.Var., 2 (1994), 17–27.

[2] G.Alberti & L.Ambrosio: A geometric approach to monotone functions inRn. Math. Z., 230 (1999), 259–316.

[3] G.Alberti, B.Kirchheim & D.Preiss: Personal communication.

[4] L.Ambrosio, N.Fusco & D.Pallara: Functions of Bounded Variation andFree Discontinuity Problems. Oxford University Press, 2000.

[5] L.Ambrosio & B.Kirchheim: Rectifiable sets in metric and Banach spaces.Mathematische Annalen, in press.

[6] L.Ambrosio & B.Kirchheim: Currents in metric spaces. Acta Mathematica,in press.

[7] Y.Brenier: Decomposition polaire et rearrangement monotone des champs devecteurs. Comm. Pure Appl. Math., 44 (1991), 375–417.

[8] Y.Brenier: Polar factorization and monotone rearrangement of vector-valuedfunctions. Arch. Rational Mech. & Anal., 122 (1993), 323–351.

[9] J.Benamou & Y.Brenier: A computational fluid mechanics solution to theMonge–Kantorovich mass transfer problem. Numerische Math., 84 (2000), 375–393.

[10] Y.Brenier: A homogenized model for vortex sheets. Arch. Rational Mech. &Anal., 138 (1977), 319–353.

[11] Y.Brenier: A Monge–Kantorovich approach to the Maxwell equations. Inpreparation.

53

[12] G.Bouchitte, G.Buttazzo & P.Seppecher: Energies with respect to ameasure and applications to low dimensional structures. Calc. Var., 5 (1997),37–54.

[13] G.Bouchitte, G.Buttazzo & P.Seppecher: Shape optimization viaMonge–Kantorovich equation. C.R. Acad. Sci. Paris, 324-I (1997), 1185–1191.

[14] G.Bouchitte & G.Buttazzo: Characterization of optimal shapes andmasses through Monge–Kantorovich equation. Preprint, 2000.

[15] L.Caffarelli, M.Feldman & R.J.McCann: Constructing optimal mapsfor Monge’s transport problem as a limit of strictly convex costs. Forthcoming.

[16] C.Castaing & M.Valadier: Convex analysis and measurable multifunc-tions. Lecture Notes in Mathematics 580, Springer, 1977.

[17] A.Cellina & S.Perrotta: On the validity of the maximum principle and ofthe Euler–Lagrange equation for a minimum problem depending on the gradient.Siam J. Control Optim., 36 (1998), 1987–1998.

[18] C.Dellacherie & P.Meyer: Probabilities and potential. MathematicalStudies 29, North Holland, 1978.

[19] L.De Pascale & A.Pratelli: Regularity properties for Monge transportdensity and for solutions of some shape optimizazion problem. Forthcoming.

[20] L.C.Evans: Partial Differential Equations. Graduate Studies in Mathematics,19, AMS 1998.

[21] L.C.Evans: Partial Differential Equations and Monge–Kantorovich MassTransfer. Current Developments in Mathematics, forthcoming.

[22] L.C.Evans & W.Gangbo: Differential Equation Methods for the Monge-Kantorovich Mass Transfer Problem. Memoirs AMS, 653, 1999.

[23] K.J.Falconer: The geometry of fractal sets. Cambridge University Press,1985.

[24] H.Federer: Geometric measure theory. Springer, 1969.

[25] M.Feldman & R.McCann: Uniqueness and transport density in Monge’smass transport problem. Forthcoming.

[26] W.Gangbo & R.J.McCann: The geometry of optimal transportation. ActaMath., 177 (1996), 113–161.

54

[27] R.Jerrard & H.M.Soner: Functions of bounded n-variation. Preprint, 1999.

[28] D.G.Larman: A compact set of disjoint line segments in R3 whose end sethas positive measure. Mathematika, 18 (1971), 112–125.

[29] R.Jordan, D.Kinderlehrer & F.Otto: The variational formulation ofthe Fokker–Planck equation. SIAM J. Math. Anal., 29 (1998), 1–17.

[30] F.Morgan: Geometric measure theory – A beginner’s guide. Second edition,Academic Press, 1995.

[31] S.Muller: Variational methods for microstructure and phase transitions.Proc. CIME summer school “Calculus of variations and geometric evolu-tion problems”, Cetraro (Italy), 1996, S.Hildebrandt and M.Struwe eds,http:///www.mis.mpg.de/cgi-bin/lecture-notes.pl.

[32] F.Otto: The geometry of dissipative evolution equations: the porous mediumequation.

[33] F.Poupaud & M.Rascle: Measure solutions to the linear multi-dimensionaltransport equation with nonsmooth coefficients. Comm. PDE, 22 (1997), 301–324.

[34] A.Pratelli: Regolarita delle misure ottimali nel problema di trasporto diMonge–Kantorovich. Thesis, Pisa University, 2000.

[35] S.T.Rachev & L.Ruschendorf: Mass transportation problems. Probabilityand its applications, Springer.

[36] L.Ruschendorf: On c-optimal random variables. Statistics & ProbabilityLetters, 27 (1996), 267–270.

[37] W.Rudin: Real and Complex Analysis. McGraw–Hill, New York, 1974.

[38] L.Simon: Lectures on geometric measure theory, Proc. Centre for Math. Anal.,Australian Nat. Univ., 3, 1983.

[39] S.K.Smirnov: Decomposition of solenoidal vector charges into elementarysolenoids and the structure of normal one-dimensional currents. St. PetersburgMath. J., 5 (1994), 841–867.

[40] C.Smith & M.Knott: On the optimal transportation of distributions. J. Op-tim. Theory Appl., 52 (1987), 323–329.

55

[41] C.Smith & M.Knott: On Hoeffding–Frechet bounds and cyclic monotonerelations. J. Multivariate Anal., 40 (1992), 328–334.

[42] V.N.Sudakov: Geometric problems in the theory of infinite dimensional dis-tributions. Proc. Steklov Inst. Math., 141 (1979), 1–178.

[43] N.S.Trudinger & X.J.Wang: On the Monge mass transfer problem. Calc.Var. PDE, forthcoming.

[44] B.White: Rectifiability of flat chains. Annals of Mathematics, forthcoming.

[45] L.Zajıcek: On the differentiability of convex functions in finite and infinitedimensional spaces. Czechoslovak Math. J., 29 (104) (1979), 340–348.

56

Lecture Notes on Optimal Transport Problemscvgmt.sns.it/media/doc/paper/1008/trasporto.pdf ·...

Documents

Transcript of Lecture Notes on Optimal Transport Problemscvgmt.sns.it/media/doc/paper/1008/trasporto.pdf ·...