Calculus of Variations - NYU Courantcorona/hw/CalcVarNotes1.pdfWhat is the calculus of variations?...

Calculus of Variations

Instructor: Robert Kohn. Compiled by Eduardo Corona

Fall 2009

Contents

1 Introduction 2

2 The Direct Method 22.1 Subtleties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.1 1. The Neumann Problem (choosing the wrong variationalproblem) . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2.1.2 2. The functional can be unbounded . . . . . . . . . . . . 42.1.3 Choosing the right f . . . . . . . . . . . . . . . . . . . . . 52.1.4 3. Sometimes even Dirichlet data can be lost . . . . . . . 52.1.5 4. Nonconvexity of the integrand (in terms of ru) . . . . 62.1.6 5. What about local minima? Saddle Points? . . . . . . . 72.1.7 6. The choice of function space matters . . . . . . . . . . 7

2.2 How does the direct method work? . . . . . . . . . . . . . . . . . 82.2.1 Does the minimizer satisfy the Euler-Lagrange equations? 11

2.3 Suggested Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Convex Duality 173.1 Central Idea (Linear Programming) . . . . . . . . . . . . . . . . 173.2 Linear PDE framework . . . . . . . . . . . . . . . . . . . . . . . 18

3.2.1 Second Pass: Obtaining the Dual Problem . . . . . . . . . 193.2.2 Examples: . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.3 Take Home Message . . . . . . . . . . . . . . . . . . . . . . . . . 253.4 Suggested Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 26

4 Classic Calculus of Variations: Geodesics 284.1 Geodesics as a key example of 1D variational problems . . . . . . 28

4.1.1 Key Properties of Geodesics . . . . . . . . . . . . . . . . . 294.1.2 Regularity . . . . . . . . . . . . . . . . . . . . . . . . . . . 304.1.3 How does convexity help us? . . . . . . . . . . . . . . . . 314.1.4 Is u di¤erentiable? Does Euler-Lagrange even hold? . . . 324.1.5 Local Minimal Solution . . . . . . . . . . . . . . . . . . . 334.1.6 Conjugate Points and Jacobi Fields . . . . . . . . . . . . . 33

4.2 Suggested Exercises . . . . . . . . . . . . . . . . . . . . . . . . . 35

1

5 Recap and Introduction to Optimal Control 395.1 Some more notes on 1D variational problems . . . . . . . . . . . 395.2 What if convexity fails? What does it mean for convexity to be

natural? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 405.3 Brief introduction to Optimal Control . . . . . . . . . . . . . . . 41

5.3.1 Hamilton Jacobi . . . . . . . . . . . . . . . . . . . . . . . 415.3.2 Pontryagin Maximum Principle . . . . . . . . . . . . . . . 45

1st Session, 09/09/09

1 Introduction

What is the calculus of variations? It is the solution of optimization problemsover functions of 1 or more variables. Some of the applications include optimalcontrol and minimal surfaces. A simple problem of minimal surfaces, forexample, is of the form:

minu=' in @

�Z

q1 + kruk2dx

�Now, one of the most basic examples from PDE is the Dirichlet Problem fora bounded domain :

E[u] =

Z

1

2kruk2 + fu dx

minu=' in @

fE[u]g ()��u = f in u = g in @

�Here we see how the variational principle is equivalent to the Laplace equa-tion, which results from the vanishing of the �rst variation (Euler-Lagrangeequations).

2 The Direct Method

This method concerns itself with existence of solutions, usually in a space likeW 1;p() (�nite energy). It also suggests a numerical method (e.g. Galerkin-Ritz) via minimization in �nite dimensional subspaces.We �rst review what we know about the variational approach to the Laplace

equation. We take u 2 H1(), and f 2 L2() / H�1() (g 2 H1=2(@)).Suppose that E[u] achieves a minimum. Then, the �rst variation vanishes:

d

dt(E[u+ tv]) jt=0= 0 8v 2 H1

0 ()

This is:

d

dt

�Z

1

2[kruk2 + tru>rv + t2

2krvk2 + fu+ tvf

�j t=0�Z

ru>rv + t krvk2 + vf�t=0

=

�Z

ru>rv + vf�t=0

= 0

2

Using integration by parts / Green�s 2nd identity, we get:Z

(��u+ f)v = 0 8v 2 H10 ()

And so, u is a weak solution to the Laplace equation if and only if the �rstvariation vanishes. To see u is a classical solution (e.g. u 2 C2;� if f 2 C�)we need results from PDE regularity theory / Sobolev space theory. We don�teasily obtain regularity from the variational principle itself.Now, in general, lets say E[u] consists of a linear term plus a convex term

(in the former case, a quadratic). Then, for any w 2 H1() such that (w) = g,

E[w] = E[u]+1st variation of E at u in the dir. of w�u + 2nd variation at some function

The �rst variation vanishes (by de�nition), and since E is convex, the secondvariation is nonnegative. For the Dirichlet problem:Z

1

2krwk2+fw dx =

Z

1

2kruk2+fu dx+

Z

ru>r(w�u)+(w�u)f+Z

1

2kru�rwk2

So, E[w]� E[u] � 0 for all such w; with equality () w = u.

Remark 1 This existence and uniqueness is most often proved using the Lax-Milgram lemma. However, the minimization of E provides an alternate route,which has the key advantage to easily generalize to nonlinear PDE.

2.1 Subtleties

The direct method looks easy, but there are a number of subtleties (things thatcan go wrong if we are not careful with the analysis).

2.1.1 1. The Neumann Problem (choosing the wrong variationalproblem)

So, can the Neumann Problem be solved by:

min@u@n=g on @

fE[u]g

The answer is NO. What goes wrong? To illustrate it, we look at a simple, 1Dexample with f = 0:

minux(1) = 1ux(0) = �1

fE[u]g

For this problem, we can obtain a minimizing sequence u�, such that it is 0on (�; 1 � �), and linear everywhere else (� � x on (0; �) and �1 + � + x on(1 � �; 1 ) . It is easy to see that E[u�] = �, and that u� ! 0. This showsthat the minimum value is 0: However, the function 0 does not ful�ll theboundary conditions.

3

We should have been suspicious from the beginning because we know thatthe Neumann problem requires the following compatibility condition (on thedata), which can be obtained by Green�s theorem:Z

f =

Z@

g

2.1.2 2. The functional can be unbounded

Now, the correct approach is to minimize the following functional:

minu2H1()

�Z

1

2kruk2 + fu dx�

Z@

ugdS

�For a Neumann problem with inconsistent data, we can use u = C (constant),then

E[u] = C

�Z

f(x)dx�Z@

g(x)dSx

�Since the data is incompatible, then we can make C go to �1, and so thefunctional is unbounded below.

Now, if the data are consistent, why is the functional bounded below?

� The basic ingredients I need to show this are the trace estimate:

k (u)kL2(@) � C kukH1()

This shows that the trace operator : H1() ! L2(@) (or even toH1=2(@)) is continuous (bounded).

� Since changing u for u � c does not change the energy, we can focuson functions with 0 average. For those functions, we have the Poincaréinequality: Z

kruk2 � C1

Z

juj2

We can think of the optimal C1 as the �rst non-zero eigenvalue for theNeumann Laplacian on (so C1 = �2(), the famous Fiedler eigenvectorwhich depends on ).

� And so, using both estimates, we have that sinceRfu �

R@gu is a

continuous linear functional,

E[u] � C1 kuk21; � C2 kuk1;

The right hand side is bounded below, and so our functional is, too.

4

2.1.3 Choosing the right f

So, another example is... for a Dirichlet problem, can we allow the f to be adelta function? Is the following problem bounded below?

minu=' in @

�Z

1

2kruk2 dx+ u(x0)

�The answer is dimension dependent. For n = 1, the Sobolev embedding tells usthat H1 ,! C; and so evaluation is continuous (� 2 H�1). However, for n � 2;this is not the case, and we can conceivably come up with a sequence for whichu(x0) is arbitrarily large in absolute value, and kruk2 goes to 0.

2.1.4 3. Sometimes even Dirichlet data can be lost

We now study the minimal surface problem with Dirichlet boundary condition.The functional is not quadratic anymore, but it is still convex. Now, let bethe annulus between r = a and r = b, and lets assume u is radial. We imposethe Dirichlet conditions u(a) = A and u(b) = B.The problem is that, if A � B gets too large, the minimizer exists, but it

loses the boundary conditions. The solution goes from B to A, but the slopebecomes in�nite at some point. The function

pz2 + 1 gets close to z + 1

2 when

z is large, and so it is as ifq1 + kruk2 � kruk+ 1

2 .

0 1 2 3 4 5 6 7 8 9 100

2

4

6

8

10

x

y

Hence, if the gradient becomes too big, the minimizer absorbs a "jump", andthe BC are lost."Obviously, there is a problem with the functional. It can be �xed by con-

sidering instead the relaxed problem

min

Z

q1 + kruk2 +

Z@

ju� 'j dS

5

which correctly accounts for the vertical pieces at the boundary. The key pointsare that the relaxed problem achieves a minimum and the minimizing sequenceof the original problem achieves a minimum of the relaxed problem.

2.1.5 4. Nonconvexity of the integrand (in terms of ru)

In this case, the minimizing sequence can develop oscillations, and the minimummay not be achieved. To illustrate this, consider the problem:

minu(a)=u(b)=0

�Z 1

0

(u2x � 1)2 + u2dx�= min

u(a)=u(b)=0fF (u)g

For the �rst term to be 0; we would require juxj = 1, for the second one, u = 0.This is clearly impossible. However, we can come up with an oscillating u�(piecewise linear, with slopes 1 and �1) such that F [u�]! 0.

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0 1.2 1.4

0.5

1.0

1.5

x

y

Function (x2 � 1)2

x

6

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.2

0.1

0.0

0.1

0.2

x

y

Oscillating sequence u�

2.1.6 5. What about local minima? Saddle Points?

We will concern ourselves later with the study of local minima and saddle points.However, it is important to point out that, for nonconvex problems, whetheror not a function is a "local minimum" depends on the meaning of local. Forexample, lets consider:

minu(0)=au(1)=b

�Z 1

0

(u2x � 1)2dx�

If W (t) =pt2 + 1, then the Euler Lagrange equations are:

[W 0(ux)] = 0

Now, if u is a linear function, then it certainly satis�es the equation. Then, ourclaim is that ux = c and W 00(ux) > 0 (locally convex), then u achieves a localminimum in the C1 topology, but not in the L1 topology (if juxj < 1).

2.1.7 6. The choice of function space matters

(Example by Zhikov, paper in webpage): Let = B1(0), p(x) = p1�r((0; �=2)[(�; 3�=2)) + p2�r((�=2; �) [ (3�=2; 2�)): Then, we try to solve the followingvariational problem:

minu=' at @

(ZB1(0)

krukp(x) dx)

7

If p1 < p2, it follows that the minimum over u 2W 1;p1 is less or equal than theminimum overW 1;p2 (since we are considering it over a larger class of functions).Furthermore, it can be shown that the inequality is strict!(Example by Maniá) Lets consider the following 1 dimensional problem.

minu(0)=0u(1)=1

�Z 1

0

(u3 � x)2u6xdx�

This functional achieves its minimum at u(x) = x1=3; but if we restrict ourselvesto u 2 C1(0; 1), then this minimum is bounded away from 0.

2.2 How does the direct method work?

Usually, one wants to consider the following problem: for W (x; u;ru) such thatit is measurable in x; continuous in u and convex with a certain growth conditionin ru; �nd:

minu=' at @

�Z

W (x; u;ru) + fu dx�

To make our lives easier, we simply it to W (ru); where 9C1; C2 > 0 such that:

C1(k�kp � 1) �W (�) � C2(1 + k�kp)

for p 2 (1;1). This growth condition tells us that the natural space to workwith is W 1;p(). We now proceed to ask the following questions:

1. Is the functional well de�ned and continuous in W 1;p()

2. Is it bounded below?

3. Let m be the minimum value. Let fukg �W 1;p() such that E[uk]! m

(a) Show that kukk1;p; is uniformly bounded(b) From functional analysis, we know that this implies uk has a weakly

convergent subsequence, with limit u�

4. Is E[u] lower semicontinuous? (can the value of E "jump up" in u�) Thatis:

E[u�] � limkinf E[uk]

5. (Bonus) Does a Galerkin / Finite Element numerical scheme converge?

Details (from lecture notes):

8

1. To be sure that the functional is well-de�ned, we can impose some restric-tions on ' and f . For '; we can assume, for example, that 9� 2W 1;p()such that 0(�) = ': Then, our problem can be written as:

min

�Z

W (r�+reu) + f(� + eu) dx : eu = u� � 2W 1;p0 ()

�For f; we only require that

Rfu is a continuous linear functional, that is:Zfu � C kuk1;p;

It is su¢ cient then to ask for f 2 Lq (in general, we need f 2 (W 1;p)� =W�1;q).

2. By hypothesis, we have that: (???)Z

W (ru)dx � C

Z

krukp �K

But u = �+ eu; with eu 2W 1;p0 (). By the Poincaré inequality,Z

jeujp � C

Z

kreukp 8eu 2W 1;p0 ()

And so,

kuk1;p; � k�k1;p; + keuk1;p;� C

Z

kreukp +K� C

Z

krukp +K

We then haveZ

W (ru) + fu dx � C1 kukp1;p; � C2 kuk1;p; where C1 > 0

9

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0

0.7

0.6

0.5

0.4

0.3

0.2

0.1

0.0

x

y

xp � x

Since minz�0 c1zp + c2z is �nite, we see that the functional is boundedbelow. Moreover, for any �, the set�

u :

Z

W (ru) + fu dx � �, u = ' at @�

Is a bounded set in the W 1;p() norm.

3. From step 2, it follows that we can take a minimizing sequence, and anysuch sequence stays bounded in the W 1;p norm. For p > 1; the unit ballis weakly compact, and so there exists a weakly convergent subsequenceukj ! u�.

4. To show u� is a minimizer, we use that a closed, convex subset of W 1;p isweakly closed. We apply this to:�

u 2W 1;p() : 0(u) = ' andZ

W (ru) + fu dx � m+ "

�For any " > 0. Then, it follows that:

E[u�] � limkjinf E[ukj ] = m

Since ukj is a minimizing sequence, u� must then achieve the minimum.

10

2.2.1 Does the minimizer satisfy the Euler-Lagrange equations?

With no further hypothesis, the Euler Lagrange equation is very singular, e.g.if:

W (ru) =�2 kruk ; kruk � 11 + kruk2 ; kruk > 1

�Then W is C1 except at 0; but the Euler Lagrange equations are:

�div�

@W

@(ru)

�+ f = 0

Which is unde�ned at ru = 0. For problems like this, we need convex dualityas a substitute for the Euler Lagrange equations.

However, if W is su¢ ciently di¤erentiable then there is no problem: supposethat � !W (�) is di¤erentiable and that:��@W@�

�� C(j�jp�1 + 1)

Then the usual derivation of the Euler-Lagrange equation is justi�ed (we candi¤erentiate under the integral), since u; v 2W 1;p(),Z

d

dtW (ru+trv) =

Z

�dW

d(ru) (ru+ trv);rv�� dW

d(ru) (ru+ trv) q;

krvkp;

Which can be controlled independently of t by the W 1;p norms of u and v (bythe growth estimate assumption, since q(p� 1) = p).

Remark 2 The hypothesis that W (ru) be a convex function of ru is a naturalone when u is scalar valued, but not when u is vector valued. In fact, for thevector valued setup, the functional can be lower semi continuous even if W isnot convex. We will return to that later.

2.3 Suggested Exercises

1. Two examples were given where the Direct Method fails because theboundary condition is lost in the limit (examples A and C). Which stepsin our proof of the direct method fail in each case, and why?

Solution 3 (Sinziana) (A) was an example of the direct method failing tosolve the Laplace equation ��u + f = 0 with Neumann boundary conditions@u@n = g at @. In the 1D case, take = [0; 1]; f = 0; @u@n (0) = �1; @u@n (1) = 1and we solve

minux(0)=�1;ux(1)=1

Z 1

0

1

2u2xdx

11

A minimizing sequence is

u� =

8<: �x+ � 0 � x � �0 � � x � 1� �x� (1� �) 1� � � x � 1

and E[u�] = �. Taking � ! 0, the minimum value is 0, so the boundary condi-tions are lost.

Following the steps of the direct method with W (ru) = 12u

2x, we see that

W is well de�ned, continuous on H1[0; 1], convex and bounded below by 0. LetS be the space of functions which satisfy the boundary condition. Since W isbounded below, there exists a uniformly bounded minimizing sequence u�k 2 Ssuch that ku�kkW 1;p � M ; by weak compactness of the unit ball for p > 1 thissequence has a weakly convergent subsequence, ju�kj �u

�jH1 ! 0. However, theabove shows that u� = 0 =2 S, (since u�k ! 0) i.e. the space S is not closed,since u� escapes from it and loses the boundary conditions. This is where theDirect Method fails.Remark: The map f ! @f

@n

��@= g is not a continuous functional in H1 so

this has no hope of working. This is because if u 2 H1 ) ru 2 L2 ) tr(ru) 2H� 1

2 and we need at least H12 for continuity.

(C) was a minimal surface problem with Dirichlet boundary conditions, thesurface given by the graph of u. This example shows that even Dirichlet bound-ary conditions may be lost.

minuj@='

Z

(1 + jruj2) 12

To use the Direct Method let W (ru) =p1 + jruj2, so W is convex and there-

fore lower semicontinuous. Let be the annulus fa < r < bg in R2, noting that is not convex. Assuming u is radial, we impose the boundary condition

' =

�A r = aB r = b

with A;B su¢ ciently large.We observe that if jruj is very large

p1 + jruj2 � jruj, soW (ru) behaves

like the L1 norm of ru. Following the steps of the Direct Method, this impliesthat the lower bound of the growth condition holds only for p = 1, i.e. if � islarge

C1(k�k1 � 1) �W (�) � C2(k�kp + 1)We can show the upper bound still holds for p > 1 by applying Jensen�s inequal-ity. W is well de�ned, condinuous and bounded below. Call u� the minimumvalue and take uk 2 W 1;p a minimizing sequence. Since for p = 1 the unit ballis no longer weakly compact, we cannot extract a convergent subsequence.What happens here is that when A�B gets large, the minimizer exists but

it looses the boundary conditions. For A � B large enough the surface has a

12

vertical piece so it is no longer a graph. We can show this in 2D. Changing topolar coordinates

W (ru) = 2�Z b

a

rp1 + u2rdr

so the minimization problem becomes

minu(a)=A;u(b)=B

Z b

a

rp1 + u2rdr

Solving this is equivalent to solving the Euler-Lagrange equations

d

dr

�d

dprp1 + p2

�= 0

where p = ur. This gives

d

dr

r

urp1 + u2r

!= 0) urp

1 + u2r+

rurr�p1 + u2r

�3 = 0Calling y = ur and using separation of variables this becomes

(1 + y2)y + ry0 = 0) dy

y(1 + y2)= �dr

r

so that

ln j y

1 + y2j = � log r + C ) j y

1 + y2j = C

r, y = �

rC

r2 � Cand integrating we obtain

u(r) = �pC log

�r +

pr2 � C

�+D

Assuming A > B, only the negative solution makes sense. Replacing C by C2

and plugging in the boundary conditions

�C log�a+

pa2 � C2

�+D = A

�C log�b+

pb2 � C2

�+D = B

thus

A�B = � log a+

pa2 � C2

b+pb2 � C2

!If A � B is very large the above equation cannot hold for a; b �xed and C isdetermined by the boundary conditions. If A�B is very large the solution weobtain is discontinuous. Namely, we �nd C = a;D = B + a log

�b+

pb2 � a2

�,

so that

u(r) =

(B � a log

�r+pr2�a2

b+pb2�a2

�r > a

A r = a

In this case the surface given by the graph of u has a vertical region.

13

1. It was mentioned by the end of class that we can solve a variational prob-lem numerically by minimizing the functional over a �nite dimensionalsubspace (for example, piecewise linear functions on a particular triangu-lation). Justify this for the special case:�

��u = f in u = 0 in @

�by showing that for any subspace V � H1

0 (); the solution w� of

minw2V

Z

1

2krwk2 + wf

satis�es Z

1

2kru�rw�k2 = min

w2V

Z

1

2kru�rwk2

(The right hand side is easy to estimate, using the smoothness of u). Afood for thought: how can this type of result be generalized to

RW (ru)+

fu when W is convex but not quadratic?

Solution 4 This result can be generalized replacing krwk2 with a(w;w); wherea is a symmetric, continuous and elliptic bilinear form, and it is known asCea�s lemma (the �nite dimensional minimization is known as the Galerkin-Ritz method). First, we note that the minimizer over the entire space, thatis,

u� = arg minuj@='

Z

1

2kruk2 + uf

Satis�es the Euler-Lagrange equations:Z

ru�>rv + fv = 0 8v 2 H10 ()

For the exact same reason, the solution w� satis�es:Z

rw�>rw + fw = 0 8w 2 V

Now, since V is a subspace of H10 (); we can subtract these two to obtain:Z

r(u� � w�)>rw = 0 8w 2 V

We note that in this case, this is an orthogonality statement: u� � w� 2 V ?;and hence w� is the orthogonal projection of u� onto V . In any case, we can

14

now study the quantity:

1

2

Z

kru� �rwk2 =1

2

Z

hr(u� � w);r(u� � w)i

=1

2

Z

hr(u� � w� + w� � w);r(u� � w� + w� � w)i

=1

2

Z

hr(u� � w�);r(u� � w�)i+ hr(w� � w);r(w� � w)i

=1

2

Z

kru� �rw�k2 + krw� �rwk2

Which is essentially Pithagoras theorem. We can easily see that the minimumis attained when krw� �rwkL2() = 0 () w = w�.Food for thought: Although we can easily generalize last result for quadraticfunctionals, it is not straightforward to do so for a general convex W . Theidea here is to use convex duality or something else to show that w� such thatRW (rw�) is nearly optimal =) w� is close to u�. If the Hessian of W is

strictly positive de�nite, then we may use a quadratic approximation for W .Assuming W 2 C2; the convex minimization problem on V is:

minw2V

�ZW (rw)� fw

�= min

�2Rn

(ZW (

nXi=1

�ir�i)�nXi=1

�if�i

)

The Euler-Lagrange conditions now read:Zr�>i rW (

nXi=1

�ir�i)� f�i = 0 8i

div

rW (

nXi=1

�ir�i)!

= f

(Work in progress)

3 Does the variational problem

min (u)='

�Z

1

2kruk2 + fu dx

�achieve its minimum when f = �x0 for some x0 2 ? Hint: dim = n � 2is di¤erent from dim = 1.

4 What PDE and boundary condition holds at the critical points of:Z

1

2kruk2 + fu dx + �

Z@

juj2

15

5 Suppose W is strictly convex and C2; and u is a classical solution of:

div(@W

@(ru) (ru)) = 0 on

With Dirichlet boundary conditions (u = ' at @). Show that u achieves:

min (u)='

Z

W (ru)

and that it is the unique (classical) minimizer.

Proof. First we show that u is a critical point of I(u) =RW (ru) by showing

that I 0(u) = 0 (�rst variation). Taking any v 2 C2 such that (v) = 0, wecompute

I 0(u) =d

d"

��"=0

I(u+ "v)

=

Z

@W

@ru (ru) � rv

= �Z

div

�@W

@ru (ru)�v

= 0

since u satis�es div�@W@ru (ru)

�= 0 on . Thus u is a critical point of I. Since

I is a strictly convex functional, it follows that it must be the unique minimizerover all functions in C2 satisfying the boundary conditions.We can also verify this by writing for any w 2 C2 satisfying the Dirichlet

boundary conditions,

I(w) =

Z

W (ru+r(w � u))

=

Z

W (ru) + @W

@ru (ru) � (r(w � u)) + (strictly positive term)

= I(u)�Z

div

�@W

@ru (ru)�(w � u) + (strictly positive term)

= I(u) + (strictly positive term)

> I(u)

noting that w�u is zero on the boundary, so the divergence term vanishes sincethe �rst variation vanishes. Thus u is a global minimizer over functions in C2

satisfying the Dirichlet boundary conditions.

6 Consider W (ux) = (u2x � 1)2; and suppose that W 00(b � a) > 0. Showthat the linear function u�(x) = a + (b � a)x is a C1-local minimizer ofR 10W (ux)dx subject to u(0) = a and u(1) = b, in the sense that for any

16

v 2 C1(0; 1) with the same boundary conditions and kv � ukC1 su¢ cientlysmall we have: Z

W (vx)dx >

Z

W (u�x)dx

(Hint: start by showing that the function t!R 10W (u�x + t(vx � u�x))dx

is convex).

2nd Session, 23/09/09

3 Convex Duality

(a) It is a semiautomatic scheme for obtaining lower bounds for the optimizationproblem:

infBC[u]

�Z

W (ru)dx�

(b) It replaces the Euler-Lagrange equations if the problem is convex but non-smooth. For example,

inf

�Z

kruk dx :Z

udx = 1; 0(u) = 0

�The convex dual provides necessary and su¢ cient conditions for optimality.

3.1 Central Idea (Linear Programming)

We consider the standard primal problem:

(P ) minfc>x : Ax � b x � 0g

We can then derive a "trivial" lower bound by taking a linear combination ofthe constraints: If y � 0 :

y>Ax � y>b

Regrouping terms, for y � 0, AT y � c; we have that:

c>x � x>(A>y) � b>y

We call such a y dual admissible, and thus:

minfc>x : Ax � b x � 0g � maxfb>y : A>y � c y � 0g

Then the dual problem is:

(D) maxfb>y : A>y � c y � 0g

So, any dual admissible y gives us a "trivial" lower bound, and the best of thoseis obtained by maximization.

17

Theorem 5 (Duality Theorem of LP) If the primal and dual problems are fea-sible, there is no duality gap:

max(D) = min(P )

Also, let x� and y� solve the primal and dual LPs. This is if and only if:

Ax� � b y� � 0 with equality in at least oneA>y� � c x� � 0 with equality in at least one

This shows that the dual variables are lagrange multipliers for the primalp activecontraints (complementary slackness).

3.2 Linear PDE framework

First, we work on a basic PDE example of two problems in duality. Assumingthat f 2 H1=2(@) and

R@fds = 0, then we consider:

(P ) min

�Z

1

2kruk2 �

Z@

uf

�This is equivalent to the Neumann problem. Then the dual (also known as theThompson variational principle), is:

(D) max

��Z

1

2k�k2 : div(�) = 0 0(� � n) = f

�Where does this come from? In our �rst pass, lets show that for u primal ad-missible and � dual admissible,

�Z

1

2k�k2 �

Z

1

2kruk2 �

Z@

uf

And in fact that equality is possible if we choose this pair adequately. In ananalogue fashion to the LP, we can "reorganize the integral" (Integration byparts) to obtain:

1

2

Z

k� �ruk2 � 0 ()

Z

1

2k�k2 +

Z

1

2kruk2 �

Z

�>ru � 0Z

1

2k�k2 +

Z

1

2kruk2 �

Z

div(u�) � 0Z

1

2k�k2 +

Z

1

2kruk2 �

Z@

uf � 0

18

Using the divergence theorem, and the boundary conditions for both the primaland the dual problems. Hence, the inequality is established, and equality isachieved if and only if the optimal pair (u�; ��) satisfy:

�� = ru�

Going back to this problem�s Euler-Lagrange equation, �u� = 0; @u�

@n = f; weobserve that this is () �� = ru�; div(��) = 0 and �� n = f . Hence, dualitydoes in fact play the same role as the Euler-Lagrange equation in solving thisvariational problem.In fact, there is more. Suppose that we have (bu; b�) admissible but not

optimal pair. If the associated "duality gap" is small, that is:

0 <

Z

1

2kruk2 �

Z@

uf +

Z

1

2k�k2 < �

Then, we have that this pair is close to the optimal one, in the sense that:Z

1

2kb� � ��k2 < �Z

1

2krbu�ru�k2 < �

Remark 6 When doing a numerical approximation by the �nite element method/ Galerkin, it can be di¢ cult to know how good is your approximation. A primal-dual method can study and solve both primal and dual problems simultaneously.

3.2.1 Second Pass: Obtaining the Dual Problem

How could we have found the dual problem systematically? The answer is thatdual pairs are associated with saddle point (lagrange multiplier �avored) vari-ational problems (which involve switching a max-min / inf sup). This is easierto see if we consider the generalized problem for W convex:

(P ) min

�Z

W (ru)�Z@

uf

�From convex analysis, we know that W is a convex function () its graph isthe envelope of its supporting hyperplanes () its epigraph is a convex set()

W (�) = sup�[h�; �i �W �(�)]

Where W � is the Fenchel conjugate / Legendre transform of W (in this case,also a convex function). It is given by:

W �(�) = sup�[h�; �i �W (�)]

Which also tells us that W �� =W (assuming W is also has a closed graph).

19

Some convex analysis results Some useful properties of the Fenchel con-jugate are:

1. (Young�s inequality) From the de�nition of the conjugate, we can seethat W (x) +W �(y) � x>y 8x 2 Dom(W ); y 2 Dom(W �).

2. (W di¤erentiable) If W is di¤erentiable, let z 2 Rn; y = rW (z), then:

W �(y) = z>rW (z)�W (z)

The main use of this conjugate function in convex optimization has to do,of course, with duality theory. For instance, if we want to solve the convexproblem:

min�f0(x) : ffi(x) � 0gni=1 fhj(x) = 0gmj=1

Then the associated Lagrangian function for � � 0 (which we may recall fromKarush-Kuhn-Tucker) is:

L(x; �; �) = f0(x) +nXi=1

�ifi(x) +mXj=1

�jhj(x)

The variables (�; �) are known as the Lagrange multiplies, and they are ourdual variables. From here, we can de�ne the Lagrange dual function g(�; �):

g(�; �) = infx2D

fL(x; �; �)g

This is always a concave function, and assumes the value �1 whenever theLagrangian is unbounded below. We can also easily show that the Lagrangianfunction provides "trivial" lower bounds, since for any x feasible we have thatg(�; �) � L(x; �; �) � f0(x). Finally, the dual problem is:

maxfg(�; �) : � � 0g

In the particular case in which fi and hj are linear constraints, we can writethis in terms of the Fenchel conjugate:

g(�; �) = infxff0(x) + �>(Ax� b) + �>(Cx� d)g

= ��>b� �>d+ infxff0(x) + x>(A>�+ C>�)g

= �(�>b+ �>d+ f�0 (�A>�� C>�))

Back to the variational problem We obtain a similar result by substitutingW in terms of its Fenchel conjugate for the variational problem:

minBC[u]

max�

�Z

�>ru�W �(�)dx�Z@

ufdS

�

20

One can prove that the minimum and maximum can be swapped, and so this isalso:

max�

minBC[u]

�Z

�div(�)u�W �(�)dx+

Z@

0(u) 0(� � n� f)dS�

Holding � �xed, one can see that if div(�) 6= 0 or � � n 6= f , we can drive eachterm to �1: Imposing these two conditions, u dissapears and we then have:

max�

�Z

�W �(�)dx : div(�) = 0 0(� � n) = f

�Remark 7 For this to make sense, we need � �n to be de�ned on the boundary.In particular, we want Green�s formula to hold, and so if � 2 L2 and div(�) = 0;we have: Z

@

(� � n)u =Z

u div(�) +

Z

�>ru =Z

�>ru

That is, (� � n) is a well de�ned linear functiona on boundary traces of H1

functions, which means (� � n) 2 H�1=2(@).

Why can we swap min and max? In this case, we have that for an ad-missible pair (�; u); the value of the dual at � is smaller than the value of theprimal at u. Also, in a similar fashion as for the Laplace Neumann problem, wecan �nd a pair for which equality is possible (no duality gap):Z

W (ru) �Z

�>ru�W �(�) (by de�nition)

Z

W (ru)�Z@

uf � �Z

W �(�)

Which establishes the inequality. Now, given u�; equality is possible () wecan choose �� such that W (ru�) = ��>ru� � W �(��). It is an exercise tocheck that given u�; this �� is admissible by the Euler-Lagrange equations (ifW is smooth). That is, if

div(@W

@ru ) = 0@W

@ru � n = f

has a solution, then we can take �� = @W@ru .

If W is non-smooth, this is not so easy to check. Basically, if the functioninside the min(max()) is convex in y and concave in x, and some other technicalconditions are satis�ed, then we get equality (check Ekeland & Temam book ofVariational Calculus with applications to PDE).

21

3.2.2 Examples:

First Dirichlet Eigenvalue of the Laplacian Lets consider the followingproblem:

�� = min

(Rjruj2Ru2

: uj@ = 0)= min

�Z

jruj2 : uj@ = 0;Z

u2 = 1

�Since we know that minimizing the Rayleigh quotient yields the smallest eigen-value of the Laplacian. The trick now is to turn this into a convex problem. Wenote the following:

1. It is su¢ cient to consider u � 0 (since the eigenmode for this is non-negative). Hence, we can replace u by juj

2. Let � = u2 (u =p�); and de�ne �� as:

�� = min

(Z

jr�j2

4�dx : � � 0; �j@ = 0;

Z

� = 1

)It turns out that �(�; t) = j�j2 =4t is a jointly convex function of (�; t). Wecan check this directly, or just by computing its Fenchel transform:

j�j2

4t= max

�fh�; �i � t j�j2g

At the optimal �; we have � = 2t� () � = �=2t.

3. Once we know this to be a convex problem, we can switch to "auto-pilot".Using the same procedure as before,

�� = min�max�

�Z

�>r�� j�j2 : � � 0; �j@ = 0;Z

� = 1

�� (=) max

�min�

�Z@

�� n�Z

�(div(�) + j�j2) : � � 0; �j@ = 0;Z

� = 1

�= max

�min�

��Z

�(div(�) + j�j2) : � � 0; �j@ = 0;Z

� = 1

�Where the last statement was obtained by swappingmax andmin, div(��) =�div(�) + �>r� and the divergence theorem. The boundary term van-ishes (because of the Dirichlet condition on �) and we can see that thesecond term can blow up unless div(�) + j�j2 is constant. Then,

�� (=) max�

n� : div(�) + j�j2 = ��

oThe question that arises from this is: can I construct vector �elds thatful�ll this condition? Indeed, and equality holds. Let u0 be the eigen-function associated to ��, �0 = u20. I can then obtain equality using theidentity obtained with the Fenchel transform:

�� =r�02�0

22

(L1 � L1) pairs The problem at hand is now the following:

minBCfk�k1; : div(�) = 1 in g

An interpretation of this problem could be the following: is a �at region ofearth, and � is the velocity �eld of water �ow. It is raining uniformly. What isthe � that minimizes the �ow?(Physics Literature) They minimize

Rj�j1=2 (which is not a norm, and

certainly a nonconvex problem). This produces "fractal river basins".(Mechanics Literature) � is a symmetric stress tensor (in plasticity). In

that setting the material is �exible for a while, and then its atoms "give up".After that, there is a permanent deformation.

We �rst rewrite this problem in the equivalent form:

�minf� : k�k1 � 1; div(�) = �g

Where �� = � 1k��k ; where �

� solves the original problem. Setting up a dualityscheme, we have:

max�minu

�Z�>ru : k�k1 � 1; div(�) = �; uj@ = 0;

Z@

u = 1

�( = ) min

umax�

�Z�>ru : k�k1 � 1; div(�) = �; uj@ = 0;

Z@

u = 1

�The �rst expression can be rewritten as �

Ru div(�); and so is unbounded below

unlessRu = 1. Also, the maximum with respect to � is achieved ()

� = rujruj , in which case:

minumax�

�Z�>ru : k�k1 � 1; div(�) = �; uj@ = 0;

Z@

u = 1

�= min

u

�Zjruj : uj@ = 0;

Z@

u = 1

�Although the equality is true, it is not so easy to show when u and � are not sosmooth... after all, we might be dividing by 0. Moreover, the optimal u is oftenrather singular: as we will see, it is the characteristic function of a set. Thenthis equality can be proved using results from the book by Ekeland & Temam(Convex Analysis + Variational Problems). Similar L1 � L1 duality problemsarise in plasticity and other areas in mechanics; see book by Duvant & Lions(Inequalities in Mechanics + Physics).So, curiously (by "accident"), this minimum is more or less explicitly solv-

able:

1. First of all, it is easy to see that our problem is equivalent to:

minu

� RkrukRu(x)dx

: uj@ = 0�

23

2. We may then assume u � 0; and replace u by juj, since it leaves thenumerator invariante, and only increases the denominator.

3. Now, for u � 0; we have that:Z

u(x)dx =

Z

Z u(x)

0

dtdx =

Z 1

0

A([u � t])dt

By Fubini�s Theorem. We also use the co-area formula: (proof / intu-ition?) Z

f kr'k dx =Z 1

0

Z'=�

fdsd�

And so, for f = 1, we have:Z

kruk dx =Z 1

0

Zu=�

dsd� =

Z 1

0

l(@[u = � ])d�

4. Now, lets consider the special case in which u = �D; D � measurable.Then, R

krukRu(x)dx

=l(@D)

A(D)

We can draw some intuition from the fact that, viewed as a distribution,�D has a weak derivative which is essentially the n � 1 measure on @.That is, for � 2 C10 (),

h@xi�D; �i = �h�D; @xi�i

= �ZD

@xi�

=

Z@D

��i

Using the de�nition of weak derivative and the divergence theorem. How-ever, there is still the subtle, non-trivial matter of using this to make senseofRkr�Dk (since r�D is NOT an L1 function). I assume this could by

done by approximation by carefully chosen smooth approximation. In anycase, we consider the geometric optimization problem:

minD�

�l(@D)

A(D)

�5. Assuming the former optimization problem has a minimizer �, then usingthe formulas obtained for the general case, we have that:

l(@[u = t]) � �A([u � t]) 8t

And so, RkrukRu(x)dx

=

R10l(@[u = � ])d�R1

0A([u � t])dt

� �

24

Thus, if the geometric problem achieves its minimizer, this is a minimizerfor the general optimization problem. For more closely related problems,see Gilbert Strang�s preprint "Maximum �ows + minimum cuts in theplane" (Journal of Global Optimization). We will inspect just one itemfrom there, which consists of a very e¢ cient proof (due to Greiser) ofCheeger�s inequality, as a corollary of our previous result.

An E¢ cient Proof of Cheeger�s Inequality If �� is the smallest Dirichleteigenvalue of the Laplacian, and:

h = minD�

�l(@D)

A(D)

�Then,

h2

4� ��

Proof. From our prior discussion , 9 � such that k�k � 1 and div(�) = h: Letu0 be the �rst eigenfunction. Then:

h

Z

u20 =

Z

div(�)u20 = �Z

2u0 h�;ru0i

� 2

Zju0j kru0k � 2 ku0k2 kru0k2

And so,

h � 2kru0k2ku0k2= 2

p��

3.3 Take Home Message

Convex Duality is:

� Restricted to convex problems

� A substitute for the Euler-Lagrange equation for the non-smooth case

� A semi-automatic machinery to obtain "trivial" lower bounds

� A source for an alternative characterization of primal solutions

� Has many applications: we have shown eigenvalue problems, L1�L1 pairs,problems in mechanics and physics, and as a bonus the coarea formula anda slick proof for Cheeger�s inequality.

25


1. Show that the convex dual of

minuj@=u0

�Z1

2kruk2

�is

maxdiv(�)=0

�Z@

(� � n)u0 �1

2

Z

k�k2�

Solution 8 (Jim) We will use that the convex conjugate of the function fde�ned by

f(x) = 12 jxj

2;

isf�(y) = sup

x

�hx; yi � 1

2 jxj2�= 1

2 jyj2:

We follow the procedure of writing down the minimization problem, substitut-ing the integrand by the maximization problem involving its complex conjugate,interchanging the min and the max, applying partial integration and inspectingthe max-min problem to �nd the constraints involved in the dual problem:

minuj@=u0

Z12 jruj

2 = minuj@=u0

max�

Zh�;rui � 1

2 j�j2

(1)

� max�

minuj@=u0

Z

h�;rui � 12 j�j

2

= max�

minuj@=u0

Z

(�div(�))u+Z@

(� � n)u0 �Z

12 j�j

2

= maxdiv(�)=0

Z@

(� � n)u0 �Z

12 j�j

2:

In case u� is the minimizer of

minuj@=u0

Z

jruj2;

we may de�ne �� = ru� and see

max�h�;ru�i � 1

2 j�j2 = h��;ru�i � 1

2 j��j2 = 1

2 j��j2:

It follows that

26

max�

minuj@=u0

Z

h�;rui � 12 j�j

2

= max�

minuj@=u0

Z

h�;ru�i � 12 j�j

2

= max�

minuj@=u0

Z

h��;ru�i � 12 j�

�j2

= minuj@=u0

max�

Z

h��;ru�i � 12 j�

�j2

= minuj@=u0

Z

12 jruj

2;

where we used in the second line that the maximum value will only be attainedfor div(�) = 0, and that in that case, the value of the integrand does notdepend on the actual choice of u. Hence, inequality (1) was actually an equality.Summarizing,

minuj@=u0

Z12 jruj

2 = maxdiv(�)=0

Z@

(� � n)u0 �Z

12 j�j

2:

2 Show that the variational problems

(P ) min

�Z

k�k : div(�) = F in , � � nj@ = f

�and

(D) max

�Z@

ufdS �Z

uFdx : kruk1 � 1�

are a dual pair, if Z

Fdx =

Z@

fdS

How should � and ru be related if equality is to hold? Hint:

max�2Rn

fh�; �i � k�kg =�0 ; k�k � 11 ; k�k > 1

Remark: if � R2 and f = 0 then (P ) can be solved explicitly in simplecases using the coarea formula. Why?

3 When we study homogenization we will consider a periodic "conductivity"A(x), and we will learn that the associated e¤ective conductivity Aeffsatis�es:

hAeff�; �i = min' periodic

�1

�()

Z

hA(x)(� +r'); � +r'i�

27

(Aeff is in general a matrix). Show using duality that:

hAeff�; �i = max�

1

�()

Z

2 h�; �i �A�1(x)�; �

�: div(�) = 0, � periodic

�This can alternatively be written as:

2 he�; �i�hAeff�; �i = min� 1

�()

Z

A�1(x)�; �

�: div(�) = 0, � periodic,

1

�()

Z� = e��

for any �; e� 2 Rn. Optimizing over �; we get:A�1(x)�; �

�= min

�1

�()

Z

A�1(x)�; �

�: div(�) = 0, � periodic,

1

�()

Z� = �

�

4 Classic Calculus of Variations: Geodesics

4.1 Geodesics as a key example of 1D variational problems

A 1D variational problem often looks like:

minBC[u]

(Z b

a

F (t; u(t); _u(t))dt

)with u : [a; b]! Rn

along with assumptions on F . We will emphasize on the following:

� Geodesics, as a key example

� The importance of F being convex with respect to _u

� The role of the econd variation and conjugate points

� Regularity of minimizers

� Applications to optimal control: e.g. NASA wants to land the Moon Lan-der, controlling only a few of the rockets, and minimizing costs, fuel, etc.

A geodesic (by de�nition) is a curve that (locally) minimizers an arc lengthfunction / metric. In local coordinates (e.g. on the sphere, stereographic pro-jection, mercator, etc), if the curve is given by x(t); then the arc length is:Z b

a

j _x(t)j dt

Where

j _x(t)j dt =�X

gij(x(t)) _xi _xj

�1=228

(gij) the Riemannian metric tensor. The classic geodesic problem is then tominimize arc length subject to �xing the endpoints, that is:

min

(Z b

a

j _x(t)j dt : x(a) = x0; x(b) = x1

)We note that this problem is independent of the parametrization chosen, andthat F is not a smooth function on _x = 0; which is quite annoying. A standard"�x" is to consider instead:

min

(1

2

Z b

a

j _x(t)j2 dt : x(a) = x0; x(b) = x1

)Since the minimizer of this problem also minimizes length and has constantvelocity. Let L[x] be the functional for j _x(t)j and E[x] the quadratic functional(also known as energy functional). Then an obvios relation can be found usingthe Holder inequality:

L[x] � k _xkL2pb� a

=p2(b� a)E[x]

with equality () j _x(t)j2 = c constant. For the minimizer, we have the sameinequality:

minfLg �p2(b� a)minfEg

Now, to obtain the opposite inequality: given a curve x(t) with length L; wenow that the constant speed parametrization has j _x(t)j = L

(b�a) , and so:

E[x] =1

2(b� a)L[x]2

minfE] =1

2(b� a) minfLg2

And since the minimizer has constant speed, the minimizer of this modi�edproblem also minimizes length.

4.1.1 Key Properties of Geodesics

� They are smooth (if the metric is smooth)

� They are locally paths of shortest length

� Globally, they need not be path of shortest length. A canonical exampleis an arc of a great circle on the sphere.

Remark 9 The discussion above assumed that we had a single coordinate chart,valid along the entire geodesic. Locally, this is true, but is not necesssary glob-ally. If needed, wecan use di¤erent coordinate charts on di¤erent parts of thecurve.

29

These properties of geodesics are not special to them, and can be generalizedas long as F is strictly convex in _u. We note that the Euler-Lagrange equationsin this setting are given by:

us(t) = u(t) + s�(t), for � 2 H10 ([a; b]) (�(a) = �(b) = 0)

d

dsjs=0

(Z b

a

F (t; us; _us)dt

)Z b

a

@F

@ui�i(t) +

@F

@ _ui_�i(t)dt = 0 8i, 8� 2 H1

0 ([a; b])

Integrating by parts, we obtain:Z b

a

�@F

@ui� @

@t

�@F

@ _ui

��i(t) = 0 8i, 8� 2 H1

0 ([a; b])

()�@F

@ui� @

@t

�@F

@ _ui

��= 0 8i

As we know, these are necessary, but not su¢ cient conditions for u to be aminimizer.

4.1.2 Regularity

Also, without assuming strict convexity of F in _u, we cannot expect the optimalu to be smooth.

1. For example:

min

�Z 1

�1(u2t � 1)2dt : u(�1) = u(1) = 1

�Clearly, u(t) = jtj is a minimizer for this problem. Also, there are mul-tiple solutions, all non-smooth: Clearly, any path that employs only linesegments of slope �1 and ful�lls the boundary conditions is a minimizer.

1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

x

y

30

2. Using our other favorite 1D example:

min

�Z 1

�1(ut � 1)2u2tdt : u(�1) = 0; u(1) = 1

�Here, we have the alternative of choosing ut = 1 or 0, and so again wehave non-smooth, continuous solutions, such as u(t) = maxft; 0g:

1.0 0.8 0.6 0.4 0.2 0.0 0.2 0.4 0.6 0.8 1.0

0.2

0.4

0.6

0.8

1.0

x

y

We note that the Euler-Lagrange equations are valid away from the dis-continuity points, and also by taking the limit. Both examples illustratethat convexity alone is not enough (strict convexity is needed to gain reg-ularity).

4.1.3 How does convexity help us?

Assume that u 2 C2. Then, di¤erentiating the Euler Lagrange (using chainrule) yields:

@F

@ui=

@2F

@t@ _ui+Xj

@2F

@ _ui@uj_uj +

@2F

@ _ui@ _uj�uj

If D2 =�

@2F@ _ui@ _uj

�as matrix is invertible, we can solve for �uj in terms of t; u(t)

and _u(t). Hence, strict convexity will be needed to determine regularity, it doesnot allow _u to "jump" (a similar thing happens on certain elliptic equations).This result provides us intuition to prove that it is true in general. However,

we must be more careful. Let q = u, p = _u:

'j(t; u; q; p) =@F

@pj� qj

Then the Euler-Lagrange equations are equivalent to ' � 0. The point of askingfor convexity of F with respect to p is that:

' � 0 () p = argmaxfhp; qi � F (t; u; q; p)g

31

And so, ' � 0 globally and uniquely determines p.Now, lets assume p = (t; u; q) with @F

@pj= qj . By the implicit function

theorem, is a smooth function of t; u; q (since @F@pj

is nondegenerate). But' � 0 at this point, and in particular, I can choose p = _u. Hence,

_u(t) = (t; u;@F

@p)

I know by the Euler-Lagrange equations, that @F@p is di¤erentiable. Hence, if u

is di¤erentiable, _u is too. (u 2 C1 =) u 2 C2).

4.1.4 Is u di¤erentiable? Does Euler-Lagrange even hold?

How do we now u is even once di¤erentiable? Do the Euler-Lagrange equationseven hold? This is not to be taken for granted.

3 For example:

min

�Z 1

0

(u3 � t)2u6tdt : u(0) = 0; u(1) = 1�

We know a minimizer is given by u(t) = t1=3. We can even modify thisproblem such that the cusp in this solution is in the interior of the domain.In this case, the Euler-Lagrange equatin does hold. However,

[(u+ "�)3 � t]2 = (3"�t2=3 + 3"2�2t1=3 + "3�3)2

(t1=3 + "�)6t = (1

3t�2=3 + "�t)

6 =

�1

36t�4 + :::

�The product of the leading coe¢ cient yields Ct�2; which is not even inte-grable at t = 0.

So, a simple hypothesis we may incorporate is:

C1(1 + k _ukp) � F (t; u; _u) � C2(1 + k _ukp)

That is, polynomial growth in _u. Then F being integrable forces _u 2 Lp =)_u+ " _' 2 Lp; and the integral will be �nite for small ": Also,Z b

a

F (t; u+ "'; _u+ " _') 2 C1 for "

To conclude that u 2 C1 requires being more organized.

32

4.1.5 Local Minimal Solution

We will show that the second variation is strictly positive if the variations arerestricted to a small enough interval:

d2

d"2j"=0E[u+ "'] =

Z b

a

'>Fu;u'+ 2'>Fu; _u _'+ _'>F _u; _u _'

=

Z b

a

�' _'

��Fu;u Fu; _uFu; _u F _u; _u

��'_'

�The assertion is that if F _u; _u � cI for some c > 0 then the second variation isstrictly positive if supp(') is restricted to a su¢ ciently small interval. UsingPoincaré inequality,Z b

a

_'>F _u; _u _' � c

Z b

a

k _'k2 � c

(b� a)2Z b

a

k'k2

In particular, for b� a small,Z b

a

k'k2 �Z b

a

k _'k2

And the terms '>Fu; _u _' and '>Fu;u' are a lot smaller (O(") and O("2), re-spectively).

Is the convexity natural then? The answer for u : [a; b] ! Rn is yes:If F _u; _u is not positive de�nite, then the second variation can be negative fora well-chosen ': Let � be such that �>F _u; _u� < 0 at t0: We then construct afunction with support B"(t0); _' = �[�[t0�";t0](t)� �[t0;t0+"](t)]:

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00.00

0.05

0.10

x

y

To compute the second variation in the direction of ', we observe that _'>F _u; _u _' ��>F _u; _u(t0)� < 0, '>Fu; _u _' = O(") and '>Fu;u' = O("2). Thus, it will be neg-ative for " su¢ ciently small.

4.1.6 Conjugate Points and Jacobi Fields

Let u(t) be a solution of the Euler-Lagrange equation. Fixing the endpoints[a; b]; we know that on [a; a0] the solution has a positive second variation. Weobserve that, as a0 ! b; the positivity of the second variation may be lost (e.g.on an eigenvalue problem, � goes to 0 and then goes negative). If it is lost, the

33

�rst point at which this happen is called a conjugate point (e.g. an antipodalpoint on a big circle on the sphere).To make things a bit more rigorous, we �rst pose the following variational

problem, to atempt to minimize the second variation in the direction of ':

min

8<:bZa

�' _'


��'_'

�: '(a) = '(b) = 0

9=;If the variation is negative, the true minimum is �1.

De�nition 10 ' is a Jacobi �eld if it solves the Euler-Lagrange equation forthis variational problem, and ' 6= 0.

De�nition 11 We say (tc; u(tc)) is a conjugate point if there exists a Jacobi�eld.

Claim 12 (i) If ' is a Jacobi �eld, then the second variation in the directionof ' is 0.(ii) Beyond the �rst conjugate point, the path ceases to be minimal (there is anegative direction for the second variation quadratic form).

Proof. (i) The Euler-Lagrange equations for this problem read:

Fu;u'+ Fu; _u _' =d

dt[Fu; _u _'+ F _u; _u _']

Which is a homogeneous linear ODE.Now, let G(t; '; _') = '>Fu;u'+2'

>Fu; _u _'+ _'>F _u; _u _'. Since it is a quadratic

in the '; _' variables, it is clear that G(t; �'; � _') = �2G(t; '; _'); and so:Z b

a

X'jG'j + _'jG _'j =

Z b

a

2G

Integrating by parts on the left hand side, we have that:Z b

a

X'jG'j + _'jG _'j =

Z b

a

' � [G' �d

dtG _'j ] = 0

By the Euler-Lagrange equation. But then,Z b

a

G = 0

(ii) Let c 2 (a; b) be a conjugate point, e' the extension by 0 on [a; b] of thecorresponding Jacobi �eld for [a; c]. Then, by (i); we know that the secondvariation vanishes at this e'. If the second variation were nonnegative, then e'would be a minimizer, and by regularity, it would have to be smooth. ' beingthe solution of a nonzero ODE, if '(c) = _'(c) = 0; this would have implied that' � 0. Thus, the extension by zero e' is not smooth, which is a contradiction.

34


1. Show directly (using the Euler-Lagrange equations) that any extremal forR baj _xj2 dt has constant speed. Here j _xj2 = (

Pgij(x(t)) _xi _xj)

1=2.

Solution 13 (Logan) We are dealing with the variational problem

minu(a)=�u(b)=�

Z b

a

F (t; u(t); _u(t))dt;

where F : R � Rd � Rd ! R is de�ned F (t; u; p) =Pd

i;j=1 gi;j(u)pipj, and gdenotes a Riemannian metric, i.e., a symmetric, positive de�nite bilinear formon Rd, depending smoothly on u. If u(t) is a minimizer for this variationalproblem, then it must satisfy the Euler-Lagrange equations:

Fu(t; u(t); _u(t)) =d

dtFp(t; u(t); _u(t)):

Now in order to show that the speed of u is constant, we examine the derivativeof j _u(t)j2:

d

dtj _u(t)j2 = d

dtF (t; u(t); _u(t))

= Fu _u+ Fp�u

= (d

dtFp) _u+ Fp�u (due to the E-L equations)

=d

dt(Fp _u):

Here we observe that

Fp =

�d

dp1F; � � � ; d

dpdF

�=

"dXi=1

2g1;i(u)pi; � � � ;dXi=1

2gd;i(u)pi

#;

so that

Fp(t; u(t); _u(t)) _u =dX

i;j=1

2gj;i(u(t)) _ui(t) _uj(t)

= 2j _u(t)j2:

Substituting this into our chain of equalities tells us that

d

dtj _u(t)j2 = 2 d

dtj _u(t)j2;

and hence ddt j _u(t)j

2 = 0, meaning the speed of u (with respect to g) is constant.

35

2 Show that if b is a conjugate to a; then:

min

8<:bZa

�' _'


��'_'

�: '(a) = '(b) = 0

9=; = 0

3 When studying waves it is useful to consider paths that minimize traveltime, where the wave speed (x) is a speci�ed function of the location x.Show that this amounts to considering geodesics in the metric gij(x) :=1

(x)2 �ij .

Solution 14 (Danny) This problem is related to Fermat�s principle of leasttime, which postulates that rays of light follow the path which takes the leasttraveling time. The aim of this exercise is to prove that the wave path, which isgoverned by Fermat�s principle, in an isotropic medium (i.e., a medium in whichthe wave speed (x) at each point x is direction-independent) are geodesics withrespect to the metric gij(x) := 1

(x)2 �ij. In other words, we will show

Theorem 15 In an isotropic medium the paths that minimize the travelingtime, and the geodesics in the metric

gij(x) :=1

(x)2�ij (1)

are exactly the same locally, provided that (x) � 0 > 0 for some constant 0,where (x) is the wave speed at location x given by the medium.

Remark 16 (Generalization of Theorem 1)(i) In order to simplify our argument, we assume

(x) � 0 > 0;

which is natural in the physical world, but not necessary to ensure Theorem 15.(ii) The full version of Fermat�s principle states that the optical paths are ex-tremals corresponding to the traversal-time (as action). One can prove that suchextremals and the geodesics described in Theorem 15 are exactly the same; seesection 33.3 in [1] for instance.

36

Proof of Theorem 15.

Element of A.

For any two given �xed points P and Q 2 Rn, we denote the space of curvesconnecting P and Q as

A :=f� � Rn; � = Image (X); for some continuous functionX : [a; b]! Rn such that X(a) = P and X(b) = Qg:

See �gure 4.2 for an example of an element of A.On this space A, we can de�ne an action

I[�] :=

Z�

jdxjg =Z�

1

(x)

qdx21 + � � �+ dx2n:

Indeed, I[�] is the arc-length of � under the metric gij(x) := 1 (x)2 �ij . In the

following, we will show that the traveling time of the wave from P to Q alongthe path � is I[�], and hence, Theorem 15 is proved.Let X : [0; T ]! Rn be a wave path (parameterized by the time t) traveling

from P to Q in the given isotropic medium, and so, X satis�es:8><>:�� ddtX(t)

�� = (X(t))

X(0) = P

X(T ) = Q:

(2)

Under these notations, T is the traveling time of the path X. Using the

37

above parametrization and (2), we compute

I[Image(X)] =ZImage(X)

1

(x)

qdx21 + � � �+ dx2n

=

Z T

0

1

(X(t))

s�dX1(t)

dt

�2+ � � �+

�dXn(t)

dt

�2dt

=

Z T

0

1

(X(t)) (X(t)) dt

= T:

On the other hand, for any � 2 A, since (x) � 0 > 0, by applying anappropriate reparametrization, there exist a constant T 2 R+ and a functionX : [0; T ] ! Rn satisfying (2) such that � = Image(X). Therefore, by theprevious computations,

I[�] = T;

which is the traveling time of the wave from P to Q along �.

In conclusion,traveling time (�) = I[�];

and hence, when P and Q are closed enough,

the path that minimizes the traveling time from P to Q

= minimizer of fmin�2A

traveling time (�)g


I[�]g


arc-length (�) under the metric gij(x) =1

(x)2�ijg

= geodesic from P to Q under the metric gij(x) =1

(x)2�ij :

Remark 17 (Application of Theorem 1) In a uniform medium, in which (x) � c � constant, the metric in (1) becomes gij(x) = 1

c2 �ij, so the wavepaths are straight lines.

References

[1] Dubrovin, B. A.; Fomenko, A. T.; Novikov, S. P. Modern geometry �methods and applications. Part I. The geometry of surfaces, transforma-tion groups, and �elds. Translated from the Russian by Robert G. Burns.Graduate Texts in Mathematics, 93. Springer-Verlag, New York, 1984.

38

4 In this lecture we focused on Dirichlet boundary conditions. Supposeinstead we impose u(a) = �; and u(b) 2 M where M is a submanifold.What end condition does the Euler-Lagrange equation get at t = b? Whatis the proper notion of a Jacobi �eld in this case?

Lecture 4 Let u be a critical point ofRW (ru)dx: Assume u is scalar valued and

C2; and suppose W is nonconvex at ru(x) for some x 2 . Show thatthere exists ' with compact support on such that:

d2

d"2j"=0

�Z

W (r(u+ "'))dx�< 0

5 Recap and Introduction to Optimal Control

5.1 Some more notes on 1D variational problems

So, our recent discussion has been on 1D variational problems:

� Euler-Lagrange equation, smoothness of solutions

� strict convexity in _u

� growth condition for W ( _u) (similar for W (t; u; _u))

� convexity as a natural condition for 1D problems

� Jacobi �elds and conjugate points

� Nonnegative second variation =) convexity in _u

� In the Direct Method, a crucial step is lsc, and we showed that convexityis su¢ cient for this to hold.

Again, lets underline the case in which the EL equation doesn�t hold (Lar-rentier phenomenon):

min

�Z 1

0

(u3 � x)2 _u6dx : u(0) = 0; u(1) = 1�

Is the solution u = x1=3?When u is so singular, the functional at u+ "� could be unde�ned (or even

1) 8" � 0We can avoid this if W ( _u) has a growth condition. In such a case, the

minimum over C1 functions is 6= 0 (it is bounded away from 0). A fair questionwe could ask is: in a numerical scheme using piecewise polynomials, what valuewould we get? Also, how can we write a numerical scheme to approach di¤erentminima?. The Proof of this is sketched on the notes (check later).

39

To obtain the value 0; we solve (for � >> 1):

min

�Z 1

0

(u3 � x)2g6 + � jg � _uj dx�

We can also consider smoothness for higher dimensional problems:

minBC

�Z

W (ru)dx�

The Euler-Lagrange equations are:

div(@W

@ru ) = 0

�Xi;j

�@2W

@riu@rju

�@2u

@xi@xj= 0

The ellipticity of this second order equation is associate with convexity of W .The answer is that the solution u : ! R is indeed smooth, but it is a highlynontrivial result (DeGiorgi, Nash).Finally, what about u : ! Rm? This case is quite di¤erent, since con-

vexity no longer implies smoothness (too weak), and it is no longer the naturalcondition (too strong).

5.2 What if convexity fails? What does it mean for con-vexity to be natural?

So now, lets assume we are in 1D; and 9�1; �2 such that

W (��1 + (1� �)�2) > �W (�1) + (1� �)W (�2)

Now, set � = ��1 + (1� �)�2, and consider

min

�ZW (ux) +

��u� ��x��2�We can then construct a sequence of piecewise linear functions that zigzag withslopes �1 and �2, making the size of each piece "n ! 0. Then:

min

�ZW ((un)x) +

��un � ��x��2� � �W (�1) + (1� �)W (�2) +O("n)

However, by construction, the weak limit has larger energy, and so it "jumps".In other words, lower semicontinuity is broken.We can do the analogous construction in higher dimensions, as long as the

functions are scalar valued. However, the boundary conditions might cause sometrouble, and so to fully reproduce the e¤ect of the 1D counterexample, we willhave to add some form of boundary layers.

40

Now, let u : Rn ! Rm: �1 and �2 are matrices and all we need is to makesure that for all � tangent, D�u is continuous across the interface.

(�2 � �1)� = 0

(�2 � �1) = a n

Thus, we can prove that lower semicontinuity only implies this rank 1 convexitycondition. Also, this condition is not su¢ cient.

5.3 Brief introduction to Optimal Control

We will follow two main directions: the study of Hamilton Jacobi equations andthe Pontryagin Maximum Principle.

5.3.1 Hamilton Jacobi

This topic originated from the study of mechanics. However, this point of viewby itself is very narrow: optimal control is used nowadays by economists, biol-ogists, aeronautical engineers, mechanical engineers, etc. The typical problemone may encounter looks as follows:

max

(Z T

0

h(y(s); �(s))ds+ g(y(T )) : _y(s) = f(y(s); �(s)); y(0) = x

)

So, I am maximizing a utility function + �nal state payout over the controlfunction �(s). The ODE which appears as our constraint is often called theequation of state, and �(s) is our control. (In a stochastic problem, what I canknow at each time is also constrained). An additional time discounting factore��s might also be included.

"Baby" Economic Example (Merton) We have an investor, who investsat an interest rate r. Our control is his consumption rate. His wealth today isx; and he�ll get to "live" T years (really, y(s) � 0). The goal is to maximize histotal discounted utility:

max

�Z �

0

e��sh(�(s))ds : _y(s) = ry � �; y(0) = x

�Where � is the �rst time y = 0, or � = T . h in this case is a concave utilityfunction.Our claim is that this is almost a calculus of variations problem like the ones

we have encountered before. If we substitute �(s) = ry � _y; we have:

max

�Z �

0

e��sh(ry � _y)ds : y(0) = x

�

41

Maybe we can alter h to re�ect that y � 0 (penalizing y = 0 or y � 0 verysteeply). In general, we might not be able to solve for the controls like this, andit might be multivalued.The �rst thing to realize is that there is nothing special about time 0, and

so we de�ne what is called a value function:

u(x; t) = max�(s)2A

�Z �

t

e��sh(�(s))ds+ g(y(T )) : _y(s) = ry � �; y(t) = x

�This is well de�ned even if the sup is not achieved. Also, we can derive a PDEfor u (Hamilton-Jacobi-Belman).

Hamilton-Jacobi-Bellman: derivation We employ the principle of dy-namic programming: "just tell me what to do today. I�ll ask you again to-morrow"

u(x; t) = max�(s)2A

(Z t+�t

t

e��sh(�(s))ds+ u(y(t+�t); t+�t) : _y(s) = ry � �; y(t) = x

)This uses a recursive argument, which employs the value function at tomorrow�sposition / time. Now, for �t small, we have:

u(x; t) � max�(s)2A

�h(x; a)�t+ u(x; t) + ut(x; t)�t+ru>f(x; t)�t

And so, to leading order,

ut + max�(s)2A

�h(x; a) +ru(x; t)>f(x; a)

= 0

Or equivalently,ut +H(ru) = 0

H(�) = max�(s)2A

nh(x; a) + �>f(x; a)

oIf the state equations are translation invariant (functions of the control, and soH does not depend on x), we have a �rst order non-linear PDE for u; with BCu(0; a) = 0 (the investor is broke). This equation is to be solved backwards intime, with a �nal-time condition u(T; x) = g(x).Now, how should we go about solving this PDE? The method of character-

istics is good locally, but not so much if we want global solutions.

Another class of examples: Minimum Arrival Time We are now tryingto solve:

min�(s)2A

ftime of arrival to some target Tg

A baby version would be: _y = �; j�j � 1. Then starting at some x in somedomain ; the goal is to arrive at the boundary. Then, the Hamilton Jacobiequation would be:

u(x) = min f�t+ u(y�(�t)) : j�j � 1g

42

1 + mina2A

�ru>f

= 0

And so, we obtain the Eikonal equation:

kruk = 1 : uj@ = 0

We note that the solution, which is the distance function to the boundary, isnever smooth. Furthermore, there are an in�nite amount of piecewise linearweak solutions. However, the viscosity solution is unique, and it agrees with thesolution of the optimal control problem.

Is this of any use? Relationship between the value function and anoptimal policy Both for the typical �nal time problem and for the minimumarrival time we have used a formal calculation using dynamic programming toderive the Hamilton-Jacobi-Bellman (HJB) equation. For example, in the "babyversion" of the minimum arrival time problem we arrived at:

_y = �(s) A = fj�j � 1g

kruk = 1 : uj@ = 0And the viscosity solution for the HJB equation (which in this case is just aneikonal equation) is u = d(x; @).

� The formal calculation tells us exactly how the optimal control is relatedto ru(y(s); s):

u(x; t) � maxa2A

f�th(x; a) + u(y(t+�t); t+�t)g

Taylor expanding:

0 � maxa2A

f�th(x; a) + ut�t+ru>f�tg

� In other words, it speci�es a "feedback law" determining the optimal con-trol �

� The method of dynamic programming is like the method of characteristics- along the optimal curve we need to solve an ODE. For this to be possible,we need to know in advance what u and ru are. (For example, for theeikonal equation, the characteristic curves are straight lines).

� We obtain u from the viscosity solution of the HJB equation.

Veri�cation Argument:Let u be the value function (optimal value starting at x; at time t). Then,

we establish the following bounds:�value speci�c strategy

(explicit)

�� u(x; t) �

�proposed formulafor optimal value

�

43

The upper bound is non-trivial, and needs to be veri�ed using the Hamilton-Jacobi equation.Lets see this idea as applied to the special case of the eikonal equation, with

a square. Clearly, the optimal strategy is to go to the nearest point on theboundary @ (we divide the square using both diagonals in 4 triangular pieces.On each piece, we just follow a horizontal / vertical line to the nearest edge ofthe square).The veri�cation argument starts by integrating the equation along an arbi-

trary path (a characteristic). Consider any path associated to a control �(s).Then, where u� is smooth:

d

dsu�(y(s)) = ru�(y(s))>�(s) � min

j�j�1fru>� �(s)g = H(ru�(y(s)))

In this case,d

dsu�(y(s)) � �1 = �kru�k

Hence,u�(y(arrival time))� u�(x) � �arrival time

By de�nition, u� is 0 at the arrival, so:

u�(x) � arrival time to boundary

This argument is not entirely valid, since u� is never entirely smooth. However,we need only replace u� by w 2 C1 with w = 0 at @ and �krwk � �1 + ".(we "smooth o¤" the pyramid). Then, repeating this argument, we have:

w(x) � travel time +O(")

Now, a food for thought. Lets say I have an "incorrect" u; say:

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.00.0

0.1

0.2

x

y

Then, we must check that the lower bound does not hold true anymore.

Hopf-Lax Principle / Solution Formula We begin with the followingequation: �

ut +H(ru) = 0 t < Tu = g(x) t = T

�

44

With H convex / concave. The idea is to recognize this PDE as the HJBequation for a control problem.That is, we would like to understand the control problem:

H(ru) = maxff � p+ hg

_y = f(�(s)); y(t) = x

max�(s)

(Z T

t

h(�(s))ds+ g(y(T ))

)Given H convex, lets choose f(�) = �. Then H is the Fenchel conjugate of �h:Hence, the "good choice" of h for our problem is:

h = (�H)� = � sup�f�>� �H(�)g

Hence, there is a unique concave h such that this control problem has our originalequation as its HJB equation.

Claim 18 An optimal path for this problem must have constant velocity

Proof. By Jensen�s inequality,

1

T � t

Z T

t

h( _y(s))ds � h

1

T � t

Z T

t

_y(s)ds

!

= h

�y(T )� xT � t

�The path that starts at x and ends at � at time T has a velocity of ��xT�t . Hence,

u(x; t) = max�2Rn

�(T � t)h

�� xT � t

�+ g(�)

�One can prove that this is the unique viscosity solution for the HJB equation.Furthermore, we can see that when there is more than one optimal �; we havea failure in smoothness.

5.3.2 Pontryagin Maximum Principle

In Jost and Li-Jost, the chapters on optimal control follow the following route:

� Hamiltonian mechanics

� A reformulation by an "action" minimization, which yields the variationalproblem

� The Hamilton-Jacobi-Bellman equation gives the optimal solution as afunction of the endpoint and time.

45

Pontryagin maximum principle takes us back to Hamiltonian mechanics.We have become acustomed to the idea that some variational problems have

no solutions. For example, in the case of the oscillations presented in the 1Dproblem:

min

�Z 1

0

(ux � 1)2 + u2dx�

We may encounter the same phenomenon in optimal control problems (e.g. haveto shoot left/right rockets in rapid succession, or the classic case of hot and coldwater).Pontryagin in its full generality / power tells you what to do even when the

problem has no solution due to oscillatory behavior. For the sake of simplicity,we will see a "watered-down" version in which we assume existence:

u(x; t) = max

(Z T

t

h(y(s); �(s))ds+ g(y(T )) : _y = f(y; �), y(t) = x, � 2 A)

Using Lagrange multipliers,

maxy(t)=x�2A

min�(s)

(Z T

t

�(s) � [f(y(s); �(s))� _y] +

Z T

t

hds+ g(y(T ))

)

= min�(s)

maxy(t)=x�2A

(Z T

t

�(s) � [f(y(s); �(s))� _y] +

Z T

t

hds+ g(y(T ))

)

Switching max with min is ok if a solution exists. Now, taking the maximumover � and integrating by parts:

max�2A

Z T

t

� � [H(�(s); y(s))� _y] + g(y(T ))

min�(s)

maxy(t)=x

(Z T

t

� �H(�(s); y(s)) + y � _�ds� �(T )y(T ) + �(t)x+ g(y(T )))

Now, to perform the minimization with respect to y, we see that the vanishingof the �rst variation yields:

@H

@x(�(s); y(s)) = � _�

Also, rg(y(T )) = �(T ).

� �(s) achiees the optimum value max�2Af� � f(y; �) + h(y; �)g

� Evolution of y(s) and �(s) are governed by:�_y = r�H(�(s); y(s)) y(t) = x

_� = �ryH(�(s); y(s)) �(T ) = rg(y(T ))

�We have thatH(�; x) = max�2Af��f(x; �)+h(x; �)g = ��f(x; ��(�; x))+h(x; ��(�; x))

46

� Using the envelope theorem, we obtain the state equation:

r�H = f(x; ��)

� Moreover, the curve (y(s); �(s)) is a characteristic curve of the HJB equa-tion!

What would a practical person do with this?

� The Hamilton Jacobi equation is limited to special problems, or low spacedimensions

� Pontryagin Maximum Principle

� Problem: if we want to solve the Hamilton ODE, we have y(t) and �(T ).We use a "shooting method".

� We can also discretize the original control problem. People do this, solvethe optimization problem using CG or something else.

The original Pontryagin maximum principle uses a convexi�cation to dealwith the general case.

47

Calculus of Variations - NYU Courantcorona/hw/CalcVarNotes1.pdfWhat is the calculus of variations?...

Documents

Transcript of Calculus of Variations - NYU Courantcorona/hw/CalcVarNotes1.pdfWhat is the calculus of variations?...