Part IB - Variational Principles - SRCF1 Multivariate calculus IB Variational Principles This is...

Part IB — Variational Principles

Based on lectures by P. K. TownsendNotes taken by Dexter Chua

Easter 2015

These notes are not endorsed by the lecturers, and I have modified them (oftensignificantly) after lectures. They are nowhere near accurate representations of what

was actually lectured, and in particular, all errors are almost surely mine.

Stationary points for functions on Rn. Necessary and sufficient conditions for minimaand maxima. Importance of convexity. Variational problems with constraints; methodof Lagrange multipliers. The Legendre Transform; need for convexity to ensureinvertibility; illustrations from thermodynamics. [4]

The idea of a functional and a functional derivative. First variation for functionals,Euler-Lagrange equations, for both ordinary and partial differential equations. Use ofLagrange multipliers and multiplier functions. [3]

Fermat’s principle; geodesics; least action principles, Lagrange’s and Hamilton’s equa-tions for particles and fields. Noether theorems and first integrals, including two formsof Noether’s theorem for ordinary differential equations (energy and momentum, forexample). Interpretation in terms of conservation laws. [3]

Second variation for functionals; associated eigenvalue problem. [2]

1

Contents IB Variational Principles

Contents

0 Introduction 3

1 Multivariate calculus 41.1 Stationary points . . . . . . . . . . . . . . . . . . . . . . . . . . . 41.2 Convex functions . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

1.2.1 Convexity . . . . . . . . . . . . . . . . . . . . . . . . . . . 51.2.2 First-order convexity condition . . . . . . . . . . . . . . . 61.2.3 Second-order convexity condition . . . . . . . . . . . . . . 7

1.3 Legendre transform . . . . . . . . . . . . . . . . . . . . . . . . . . 81.4 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . 12

2 Euler-Lagrange equation 152.1 Functional derivatives . . . . . . . . . . . . . . . . . . . . . . . . 152.2 First integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.3 Constrained variation of functionals . . . . . . . . . . . . . . . . 21

3 Hamilton’s principle 243.1 The Lagrangian . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243.2 The Hamiltonian . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.3 Symmetries and Noether’s theorem . . . . . . . . . . . . . . . . . 27

4 Multivariate calculus of variations 32

5 The second variation 365.1 The second variation . . . . . . . . . . . . . . . . . . . . . . . . . 365.2 Jacobi condition for local minima of F [x] . . . . . . . . . . . . . 38

2

0 Introduction IB Variational Principles

0 Introduction

Consider a light ray travelling towards a mirror and being reflected.

z

We see that the light ray travels towards the mirror, gets reflected at z, and hitsthe (invisible) eye. What determines the path taken? The usual answer wouldbe that the reflected angle shall be the same as the incident angle. However,ancient Greek mathematician Hero of Alexandria provided a different answer:the path of the light minimizes the total distance travelled.

We can assume that light travels in a straight line except when reflected.Then we can characterize the path by a single variable z, the point where thelight ray hits the mirror. Then we let L(z) to be the length of the path, and wecan solve for z by setting L′(z) = 0.

This principle sounds reasonable - in the absence of mirrors, light travels ina straight line - which is the shortest path between two points. But is it alwaystrue that the shortest path is taken? No! We only considered a plane mirror, andthis doesn’t hold if we have, say, a spherical mirror. However, it turns out thatin all cases, the path is a stationary point of the length function, i.e. L′(z) = 0.

Fermat put this principle further. Assuming that light travels at a finitespeed, the shortest path is the path that takes the minimum time. Fermat’sprinciple thus states that

Light travels on the path that takes the shortest time.

This alternative formulation has the advantage that it applies to refraction aswell. Light travels at different speeds in different mediums. Hence when theytravel between mediums, they will change direction such that the total timetaken is minimized.

We usually define the refractive index n of a medium to be n = 1/v, wherev is the velocity of light in the medium. Then we can write the variationalprinciple as

minimize

∫path

n ds,

where ds is the path length element. This is easy to solve if we have two distinctmediums. Since light travels in a straight line in each medium, we can simplycharacterize the path by the point where the light crosses the boundary. However,in the general case, we should be considering any possible path between twopoints. In this case, we could no longer use ordinary calculus, and need newtools - calculus of variations.

In calculus of variations, the main objective is to find a function x(t) thatminimizes an integral

∫f(x) dt for some function f . For example, we might

want to minimize∫

(x2 + x) dt. This differs greatly from ordinary minimizationproblems. In ordinary calculus, we minimize a function f(x) for all possiblevalues of x ∈ R. However, in calculus of variations, we will be minimizing theintegral

∫f(x) dt over all possible functions x(t).

3

1 Multivariate calculus IB Variational Principles

1 Multivariate calculus

Before we start calculus of variations, we will first have a brief overview ofminimization problems in ordinary calculus. Unless otherwise specified, fwill be a function Rn → R. For convenience, we write the argument of f asx = (x1, · · · , xn) and x = |x|. We will also assume that f is sufficiently smoothfor our purposes.

1.1 Stationary points

The quest of minimization starts with finding stationary points.

Definition (Stationary points). Stationary points are points in Rn for which∇f = 0, i.e.

∂f

∂x1=

∂f

∂x2= · · · = ∂f

∂xn= 0

All minima and maxima are stationary points, but knowing that a point isstationary is not sufficient to determine which type it is. To know more aboutthe nature of a stationary point, we Taylor expand f about such a point, whichwe assume is 0 for notational convenience.

f(x) = f(0) + x · ∇f +1

2

∑i,j

xixj∂2f

∂xi∂xj+O(x3).

= f(0) +1

2

∑i,j

xixj∂2f

∂xi∂xj+O(x3).

The second term is so important that we have a name for it:

Definition (Hessian matrix). The Hessian matrix is

Hij(x) =∂2f

∂xi∂xj

Using summation notation, we can write our result as

f(x)− f(0) =1

2xiHijxj +O(x3).

Since H is symmetric, it is diagonalizable. Thus after rotating our axes to asuitable coordinate system, we have

H ′ij =

λ1 0 · · · 00 λ2 · · · 0...

.... . .

...0 0 · · · λn

,

where λi are the eigenvalues of H. Since H is real symmetric, these are all real.In our new coordinate system, we have

f(x)− f(0) =1

2

n∑i=1

λi(x′i)

2

4


This is useful information. If all eigenvalues λi are positive, then f(x) − f(0)must be positive (for small x). Hence our stationary point is a local minimum.Similarly, if all eigenvalues are negative, then it is a local maximum.

If there are mixed signs, say λ1 > 0 and λ2 < 0, then f increases in the x1

direction and decreases in the x2 direction. In this case we say we have a saddlepoint.

If some λ = 0, then we have a degenerate stationary point. To identify thenature of this point, we must look at even higher derivatives.

In the special case where n = 2, we do not need to explicitly find theeigenvalues. We know that detH is the product of the two eigenvalues. Hence ifdetH is negative, the eigenvalues have different signs, and we have a saddle. IfdetH is positive, then the eigenvalues are of the same sign.

To determine if it is a maximum or minimum, we can look at the trace ofH, which is the sum of eigenvalues. If trH is positive, then we have a localminimum. Otherwise, it is a local maximum.

Example. Let f(x, y) = x3 + y3 − 3xy. Then

∇f = 3(x2 − y, y2 − x).

This is zero iff x2 = y and y2 = x. This is satisfied iff y4 = y. So either y = 0,or y = 1. So there are two stationary points: (0, 0) and (1, 1).

The Hessian matrix is

H =

(6x −3−3 6y

),

and we have

detH = 9(4xy − 1)

trH = 6(x+ y).

At (0, 0), detH < 0. So this is a saddle point. At (1, 1), detH > 0, trH > 0.So this is a local minimum.

1.2 Convex functions

1.2.1 Convexity

Convex functions is an important class of functions that has a lot of niceproperties. For example, stationary points of convex functions are all minima,and a convex function can have at most one minimum value. To define convexfunctions, we need to first define a convex set.

Definition (Convex set). A set S ⊆ Rn is convex if for any distinct x,y ∈S, t ∈ (0, 1), we have (1− t)x + ty ∈ S. Alternatively, any line joining two pointsin S lies completely within S.

non-convex convex

5


Definition (Convex function). A function f : Rn → R is convex if

(i) The domain D(f) is convex

(ii) The function f lies below (or on) all its chords, i.e.

f((1− t)x + ty) ≤ (1− t)f(x) + tf(y) (∗)

for all x,y ∈ D(f), t ∈ (0, 1).

A function is strictly convex if the inequality is strict, i.e.

f((1− t)x + ty) < (1− t)f(x) + tf(y).

x y(1 − t)x + ty

(1 − t)f(x) + tf(y)

A function f is (strictly) concave iff −f is (strictly) convex.

Example.

(i) f(x) = x2 is strictly convex.

(ii) f(x) = |x| is convex, but not strictly.

(iii) f(x) = 1x defined on x > 0 is strictly convex.

(iv) f(x) = 1x defined on R∗ = R \ {0} is not convex. Apart from the fact that

R∗ is not a convex domain. But even if we defined, like f(0) = 0, it isnot convex by considering the line joining (−1,−1) and (1, 1) (and in factf(x) = 1

x defined on x < 0 is concave).

1.2.2 First-order convexity condition

While the definition of a convex function seems a bit difficult to work with, ifour function is differentiable, it is easy to check if it is convex.

First assume that our function is once differentiable, and we attempt to finda first-order condition for convexity. Suppose that f is convex. For fixed x,y,we define the function

h(t) = (1− t)f(x) + tf(y)− f((1− t)x + ty).

By the definition of convexity of f , we must have h(t) ≥ 0. Also, triviallyh(0) = 0. So

h(t)− h(0)

t≥ 0

6


for any t ∈ (0, 1). Soh′(0) ≥ 0.

On the other hand, we can also differentiate h directly and evaluate at 0:

h′(0) = f(y)− f(x)− (y − x) · ∇f(x).

Combining our two results, we know that

f(y) ≥ f(x) + (y − x) · ∇f(x) (†)

It is also true that this condition implies convexity, which is an easy result.How can we interpret this result? The equation f(x) + (y − x) · ∇f(x) = 0

defines the tangent plane of f at x. Hence this condition is saying that a convexdifferentiable function lies above all its tangent planes.

We immediately get the corollary

Corollary. A stationary point of a convex function is a global minimum. Therecan be more than one global minimum (e.g. a constant function), but there is atmost one if the function is strictly convex.

Proof. Given x0 such that ∇f(x0) = 0, (†) implies that for any y,

f(y) ≥ f(x0) + (y − x0) · ∇f(x0) = f(x0).

We can write our first-order convexity condition in a different way. We canrewrite (†) into the form

(y − x) · [∇f(y)−∇f(x)] ≥ f(x)− f(y)− (x− y) · ∇f(y).

By applying (†) to the right hand side (with x and y swapped), we know thatthe right hand side is ≥ 0. So we have another first-order condition:

(y − x) · [∇f(y)−∇f(x)] ≥ 0,

It can be shown that this is equivalent to the other conditions.This condition might seem a bit weird to comprehend, but all it says is that

∇f(x) is a non-decreasing function. For example, when n = 1, the equationstates that (y − x)(f ′(y)− f ′(x)) ≥ 0, which is the same as saying f ′(y) ≥ f ′(x)whenever y > x.

1.2.3 Second-order convexity condition

We have an even nicer condition when the function is twice differentiable. Westart with the equation we just obtained:

(y − x) · [∇f(y)−∇f(x)] ≥ 0,

Write y = x + h. Then

h · (∇f(x + h)−∇f(x)) ≥ 0.

Expand the left in Taylor series. Using suffix notation, this becomes

hi[hj∇j∇if +O(h2)] ≥ 0.

7


But ∇j∇if = Hij . So we have

hiHijhj +O(h3) ≥ 0

This is true for all h if the Hessian H is positive semi-definite (or simply positive),i.e. the eigenvalues are non-negative. If they are in fact all positive, then we sayH is positive definite.

Hence convexity implies that the Hessian matrix is positive for all x ∈ D(f).Strict convexity implies that it is positive definite.

The converse is also true — if the Hessian is positive, then it is convex.

Example. Let f(x, y) = 1xy for x, y > 0. Then the Hessian is

H =1

xy

( 2x2

1xy

1xy

2y2

)The determinant is

detH =3

x4y4> 0

and the trace is

trH =2

xy

(1

x2+

1

y2

)> 0.

So f is convex.To conclude that f is convex, we only used the fact that xy is positive,

instead of x and y being individually positive. Then could we relax the domaincondition to be xy > 0 instead? The answer is no, because in this case, thefunction will no longer be convex!

1.3 Legendre transform

The Legendre transform is an important tool in classical dynamics and thermo-dynamics. In classical dynamics, it is used to transform between the Lagrangianand the Hamiltonian. In thermodynamics, it is used to transform betweenthe energy, Helmholtz free energy and enthalpy. Despite its importance, thedefinition is slightly awkward.

Suppose that we have a function f(x), which we’ll assume is differentiable.For some reason, we want to transform it into a function of the conjugate variablep = df

dx instead. In most applications to physics, this quantity has a particularphysical significance. For example, in classical dynamics, if L is the Lagrangian,then p = ∂L

∂x is the (conjugate) momentum. p also has a context-independentgeometric interpretation, which we will explore later. For now, we will assumethat p is more interesting than x.

Unfortunately, the obvious option f∗(p) = f(x(p)) is not the transform wewant. There are various reasons for this, but the major reason is that it is ugly.It lacks any mathematical elegance, and has almost no nice properties at all.

In particular, we want our f∗(p) to satisfy the property

df∗

dp= x.

This says that if p is the conjugate of x, then x is the conjugate of p. We willsoon see how this is useful in the context of thermodynamics.

8


The symmetry is better revealed if we write in terms of differentials. Thedifferential of the function f is

df =df

dxdx = p dx.

So we want our f∗ to satisfydf∗ = x dp.

How can we obtain this? From the product rule, we know that

d(xp) = x dp+ p dx.

So if we define f∗ = xp− f (more explicitly written as f∗(p) = x(p)p− f(x(p))),then we obtain the desired relation df∗ = x dp. Alternatively, we can saydf∗

dp = x.The actual definition we give will not be exactly this. Instead, we define it in

a way that does not assume differentiability. We’ll also assume that the functiontakes the more general form Rn → R.

Definition (Legendre transform). Given a function f : Rn → R, its Legendretransform f∗ (the “conjugate” function) is defined by

f∗(p) = supx

(p · x− f(x)),

The domain of f∗ is the set of p ∈ Rn such that the supremum is finite. p isknown as the conjugate variable.

This relation can also be written as f∗(p) + f(x) = px, where x(p) is thevalue of x that maximizes the function.

To show that this is the same as what we were just talking about, notethat the supremum of p · x − f(x) is obtained when its derivative is zero, i.e.p = ∇f(x). In particular, in the 1D case, f∗(p) = px− f(x), where x satisfiesf ′(x) = p. So p is indeed the derivative of f with respect to x.

From the definition, we can immediately conclude that

Lemma. f∗ is always convex.

Proof.

f∗((1− t)p + tq) = supx

[((1− t)p · x + tq · x− f(x)

].

= supx

[(1− t)(p · x− f(x)) + t(q · x− f(x))

]≤ (1− t) sup

x[p · x− f(x)] + t sup

x[q · x− f(x)]

= (1− t)f∗(p) + tf∗(q)

Note that we cannot immediately say that f∗ is convex, since we have to showthat the domain is convex. But by the above bounds, f∗((1−t)p+tq) is boundedby the sum of two finite terms, which is finite. So (1− t)p + tq is also in thedomain of f∗.

This transformation can be given a geometric interpretation. We will onlyconsider the 1D case, because drawing higher-dimensional graphs is hard. Forany fixed x, we draw the tangent line of f at the point x. Then f∗(p) is theintersection between the tangent line and the y axis:

9


x

yslope = p

−f∗(p)

px

f∗(p) = px− f(x)

f(x)

Example.

(i) Let f(x) = 12ax

2 for a > 0. Then p = ax at the maximum of px− f(x). So

f∗(p) = px− f(x) = p · pa− 1

2a(pa

)2

=1

2ap2.

So the Legendre transform maps a parabola to a parabola.

(ii) f(v) = −√

1− v2 for |v| < 1 is a lower semi-circle. We have

p = f ′(v) =v√

1− v2

Sov =

p√1 + p2

and exists for all p ∈ R. So

f∗(p) = pv − f(v) =p2√

1 + p2+

1√1 + p2

=√

1 + p2.

A circle gets mapped to a hyperbola.

(iii) Let f = cx for c > 0. This is convex but not strictly convex. Thenpx− f(x) = (p− c)x. This has no maximum unless p = c. So the domainof f∗ is simply {c}. One point. So f∗(p) = 0. So a line goes to a point.

Finally, we prove that applying the Legendre transform twice gives theoriginal function.

Theorem. If f is convex, differentiable with Legendre transform f∗, thenf∗∗ = f .

Proof. We have f∗(p) = (p · x(p)− f(x(p)) where x(p) satisfies p = ∇f(x(p)).Differentiating with respect to p, we have

∇if∗(p) = xi + pj∇ixj(p)−∇ixj(p)∇jf(x)

= xi + pj∇ixj(p)−∇ixj(p)pj

= xi.

10


So∇f∗(p) = x.

This means that the conjugate variable of p is our original x. So

f∗∗(x) = (x · p− f∗(p))|p=p(x)

= x · p− (p · x− f(x))

= f(x).

Note that strict convexity is not required. For example, in our last exampleabove with the straight line, f∗(p) = 0 for p = c. So f∗∗(x) = (xp−f∗(p))|p=c =cx = f(x).

However, convexity is required. If f∗∗ = f is true, then f must be convex,since it is a Legendre transform. Hence f∗∗ = f cannot be true for non-convexfunctions.

Application to thermodynamics

Given a system of a fixed number of particles, the energy of a system is usuallygiven as a function of entropy and volume:

E = E(S, V ).

We can think of this as a gas inside a piston with variable volume.There are two things that can affect the energy: we can push in the piston

and modify the volume. This corresponds to a work done of −p dV , where p isthe pressure. Alternatively, we can simply heat it up and create a heat changeof T dS, where T is the temperature. Then we have

dE = T dS − p dV.

Comparing with the chain rule, we have

∂E

∂S= T, −∂E

∂V= p

However, the entropy is a mysterious quantity no one understands. Instead, welike temperature, defined as T = ∂E

∂S . Hence we use the (negative) Legendretransform to obtain the conjugate function Helmholtz free energy.

F (T, V ) = infS

[E(S, V )− TS] = E(S, V )− S ∂E∂S

= E − ST,

Note that the Helmholtz free energy satisfies

dF = −S dT − p dV.

Just as we could recover T and p from E via taking partial derivatives withrespect to S and V , we are able to recover S and p from F by taking partialderivatives with respect to T and V . This would not be the case if we simplydefined F (T, V ) = E(S(T, V ), V ).

If we take the Legendre transform with respect to V , we get the enthalpyinstead, and if we take the Legendre transform with respect to both, we get theGibbs free energy.

11


1.4 Lagrange multipliers

At the beginning, we considered the problem of unconstrained maximization.We wanted to maximize f(x, y) where x, y can be any real value. However,sometimes we want to restrict to certain values of (x, y). For example, we mightwant x and y to satisfy x+ y = 10.

We take a simple example of a hill. We model it using the function f(x, y)given by the height above the ground. The hilltop would be given by themaximum of f , which satisfies

0 = df = ∇f · dx

for any (infinitesimal) displacement dx. So we need

∇f = 0.

This would be a case of unconstrained maximization, since we are considering allpossible values of x and y.

A problem of constrained maximization would be as follows: we have a pathp defined by p(x, y) = 0. What is the highest point along the path p?

We still need ∇f · dx = 0, but now dx is not arbitrary. We only considerthe dx parallel to the path. Alternatively, ∇f has to be entirely perpendicularto the path. Since we know that the normal to the path is ∇p, our conditionbecomes

∇f = λ∇p

for some lambda λ. Of course, we still have the constraint p(x, y) = 0. So whatwe have to solve is

∇f = λ∇pp = 0

for the three variables x, y, λ.Alternatively, we can change this into a single problem of unconstrained

extremization. We ask for the stationary points of the function φ(x, y, λ) givenby

φ(x, y, λ) = f(x, y)− λp(x, y)

When we maximize against the variables x and y, we obtain the ∇f = λ∇pcondition, and maximizing against λ gives the condition p = 0.

Example. Find the radius of the smallest circle centered on origin that intersectsy = x2 − 1.

(i) First do it the easy way: for a circle of radius R to work, x2 + y2 = R2

and y = x2 − 1 must have a solution. So

(x2)2 − x2 + 1−R2 = 0

and

x2 =1

2±√R2 − 3

4

So Rmin =√

3/2.

12


(ii) We can also view this as a variational problem. We want to minimizef(x, y) = x2+y2 subject to the constraint p(x, y) = 0 for p(x, y) = y−x2+1.

We can solve this directly. We can solve the constraint to obtain y = x2−1.Then

R2(x) = f(x, y(x)) = (x2)2 − x2 + 1

We look for stationary points of R2:

(R2(x))′ = 0⇒ x

(x2 − 1

2

)= 0

So x = 0 and R = 1; or x = ± 1√2

and R =√

32 . Since

√3

2 is smaller, this is

our minimum.

(iii) Finally, we can use Lagrange multipliers. We find stationary points of thefunction

φ(x, y, λ) = f(x, y)− λp(x, y) = x2 + y2 − λ(y − x2 + 1)

The partial derivatives give

∂φ

∂x= 0⇒ 2x(1 + λ) = 0

∂φ

∂y= 0⇒ 2y − λ = 0

∂φ

∂λ= 0⇒ y − x2 + 1 = 0

The first equation gives us two choices

– x = 0. Then the third equation gives y = −1. So R =√x2 + y2 = 1.

– λ = −1. So the second equation gives y = − 12 and the third gives

x = ± 1√2. Hence R =

√3

2 is the minimum.

This can be generalized to problems with functions Rn → R using the samelogic.

Example. For x ∈ Rn, find the minimum of the quadratic form

f(x) = xiAijxj

on the surface |x|2 = 1.

(i) The constraint imposes a normalization condition on x. But if we scale upx, f(x) scales accordingly. So if we define

Λ(x) =f(x)

g(x), g(x) = |x|2,

the problem is equivalent to minimization of Λ(x) without constraint. Then

∇iΛ(x) =2

g

[Aijxj −

f

gxi

]

13


So we needAx = Λx

So the extremal values of Λ(x) are the eigenvalues of A. So Λmin is thelowest eigenvalue.

This answer is intuitively obvious if we diagonalize A.

(ii) We can also do it with Lagrange multipliers. We want to find stationaryvalues of

φ(x, λ) = f(x)− λ(|x|2 − 1).

So0 = ∇φ⇒ Aijxj = λxi

Differentiating with respect to λ gives

∂φ

∂λ= 0⇒ |x|2 = 1.

So we get the same set of equations.

Example. Find the probability distribution {p1, · · · , pn} satisfying∑i pi = 1

that maximizes the information entropy

S = −n∑i=1

pi log pi.

We look for stationary points of

φ(p, λ) = −n∑i=1

pi ln pi − λn∑i=1

pi + λ.

We have∂φ

∂pi= − ln pi − (1 + λ) = 0.

Sopi = e−(1+λ).

It is the same for all i. So we must have pi = 1n .

14

2 Euler-Lagrange equation IB Variational Principles

2 Euler-Lagrange equation

2.1 Functional derivatives

Definition (Functional). A functional is a function that takes in another real-valued function as an argument. We usually write them as F [x] (square brackets),where x = x(t) : R→ R is a real function. We say that F [x] is a functional ofthe function x(t).

Of course, we can also have functionals of many functions, e.g. F [x, y] ∈ Rfor x, y : R→ R. We can also have functionals of a function of many variables.

Example. Given a medium with refractive index n(x), the time taken by apath x(t) from x0 to x1 is given by the functional

T [x] =

∫ x1

x0

n(x) dt.

While this is a very general definition, in reality, there is just one particularclass of functionals we care about. Given a function x(t) defined for α ≤ t ≤ β,we study functional of the form

F [x] =

∫ β

α

f(x, x, t) dt

for some function f .Our objective is to find a stationary point of the functional F [x]. To do so,

suppose we vary x(t) by a small amount δx(t). Then the corresponding changeδF [x] of F [x] is

δF [x] = F [x+ δx]− F [x]

=

∫ β

α

(f(x+ δx, x+ δx, t)− f(x, x, t)

)dt

Taylor expand to obtain

=

∫ β

α

(δx∂f

∂x+ δx

∂f

∂x

)dt+O(δx2)

Integrate the second term by parts to obtain

δF [x] =

∫ β

α

δx

[∂f

∂x− d

dt

(∂f

∂x

)]dt+

[δx∂f

∂x

]βα

.

This doesn’t seem like a very helpful equation. Life would be much easier if

the last term (known as the boundary term)[δx∂f∂x

]βα

vanishes. Fortunately, for

most of the cases we care about, the boundary conditions mandate that theboundary term does indeed vanish. Most of the time, we are told that x is fixedat t = α, β. So δx(α) = δx(β) = 0. But regardless of what we do, we alwayschoose boundary conditions such that the boundary term is 0. Then

δF [x] =

∫ β

α

(δxδF [x]

δx(t)

)dt

where

15


Definition (Functional derivative).

δF [x]

δx=∂f

∂x− d

dt

(∂f

∂x

)is the functional derivative of F [x].

If we want to find a stationary point of F , then we need δF [x]δx = 0. So

Definition (Euler-Lagrange equation). The Euler-Lagrange equation is

∂f

∂x− d

dt

(∂f

∂x

)= 0

for α ≤ t ≤ β.

There is an obvious generalization to functionals F [x] for x(t) ∈ Rn:

∂f

∂xi− d

dt

(∂f

∂xi

)= 0 for all i.

Example (Geodesics of a plane). What is the curve C of minimal length betweentwo points A,B in the Euclidean plane? The length is

L =

∫C

d`

where d` =√

dx2 + dy2.There are two ways we can do this:

(i) We restrict to curves for which x (or y) is a good parameter, i.e. y can bemade a function of x. Then

d` =√

1 + (y′)2 dx.

Then

L[y] =

∫ β

α

√1 + (y′)2 dx.

Since there is no explicit dependence on y, we know that

∂f

∂y= 0

So the Euler-Lagrange equation says that

d

dx

(∂f

∂y′

)= 0

We can integrate once to obtain

∂f

∂y′= constant

This is known as a first integral, which will be studied more in detail later.

Plugging in our value of f , we obtain

y′√1 + (y′)2

= constant

This shows that y′ must be constant. So y must be a straight line.

16


(ii) We can get around the restriction to “good” curves by choosing an arbitraryparameterization r = (x(t), y(t)) for t ∈ [0, 1] such that r(0) = A, r(1) = B.So

d` =√x2 + y2 dt.

Then

L[x, y] =

∫ 1

0

√x2 + y2 dt.

We have, again∂f

∂x=∂f

∂y= 0.

So we are left to solve

d

dt

(∂f

∂x

)=

d

dt

(∂f

∂y

)= 0.

So we obtainx√

x2 + y2= c,

y√x2 + y2

= s

where c and s are constants. While we have two constants, they are notindependent. We must have c2 + s2 = 1. So we let c = cos θ, s = sin θ.Then the two conditions are both equivalent to

(x sin θ)2 = (y cos θ)2.

Hencex sin θ = ±y cos θ.

We can choose a θ such that we have a positive sign. So

y cos θ = x sin θ +A

for a constant A. This is a straight line with slope tan θ.

2.2 First integrals

In our example above, f did not depend on x, and hence ∂f∂x = 0. Then the

Euler-Lagrange equations entail

d

dt

(∂f

∂x

)= 0.

We can integrate this to obtain

∂f

∂x= constant.

We call this the first integral. First integrals are important in several ways. Themost immediate advantage is that it simplifies the problem a lot. We only haveto solve a first-order differential equation, instead of a second-order one. Notneeding to differentiate ∂f

∂x also prevents a lot of mess arising from the productand quotient rules.

17


This has an additional significance when applied to problems in physics.If we have a first integral, then we get ∂f

∂x = constant. This corresponds toa conserved quantity of the system. When formulating physics problems asvariational problems (as we will do in Chapter 3), the conservation of energyand momentum will arise as constants of integration from first integrals.

There is also a more complicated first integral appearing when f does not(explicitly) depend on t. To find this out, we have to first consider the totalderivative df

dt . By the chain rule, we have

df

dt=∂f

∂t+

dx

dt

∂f

∂x+

dx

dt

∂f

∂x

=∂f

∂t+ x

∂f

∂x+ x

∂f

∂x.

On the other hand, the Euler-Lagrange equation says that

∂f

∂x=

d

dt

(∂f

∂x

).

Substituting this into our equation for the total derivative gives

df

dt=∂f

∂t+ x

d

dt

(∂f

∂x

)+ x

∂f

∂x

=∂f

∂t+

d

dt

(x∂f

∂x

).

Thend

dt

(f − x∂f

∂x

)=∂f

∂t.

So if ∂f∂t = 0, then we have the first integral

f − x∂f∂x

= constant.

Example. Consider a light ray travelling in the vertical xz plane inside amedium with refractive index n(z) =

√a− bz for positive constants a, b. The

phase velocity of light is v = cn .

According the Fermat’s principle, the path minimizes

T =

∫ B

A

d`

v.

This is equivalent to minimizing the optical path length

cT = P =

∫ B

A

n d`.

We specify our path by the function z(x). Then the path element is given by

d` =√

dx2 + dz2 =√

1 + z′(x)2 dx,

Then

P [z] =

∫ xB

xA

n(z)√

1 + (z′)2 dx.

18


Since this does not depend on x, we have the first integral

k = f − z′ ∂f∂z′

=n(z)√

1 + (z′)2.

for an integration constant k. Squaring and putting in the value of n gives

(z′)2 =b

k2(z0 − z),

where z0 = (a− k2)/b. This is integrable and we obtain

dz√z0 − z

= ±√b

kdx.

So√z − z0 = ±

√b

2k(x− x0),

where x0 is our second integration constant. Square it to obtain

z = z0 −b

4k2(x− x0)2,

which is a parabola.

Example (Principle of least action). Mechanics as laid out by Newton wasexpressed in terms of forces and acceleration. While this is able to describe a lotof phenomena, it is rather unsatisfactory. For one, it is messy and difficult toscale to large systems involving many particles. It is also ugly.

As a result, mechanics is later reformulated in terms of a variational principle.A quantity known as the action is defined for each possible path taken bythe particle, and the actual path taken is the one that minimizes the action(technically, it is the path that is a stationary point of the action functional).

The version we will present here is an old version proposed by Maupertuisand Euler. While it sort-of works, it is still cumbersome to work with. Themodern version is from Hamilton who again reformulated the action principleto something that is more powerful and general. This modern version will bediscussed more in detail in Chapter 3, and for now we will work with the oldversion first.

The original definition for the action, as proposed by Maupertuis, was mass× velocity × distance. This was given a more precise mathematical definitionby Euler. For a particle with constant energy,

E =1

2mv2 + U(x),

where v = |x|. So we have

mv =√

2m(E − U(x)).

Hence we can define the action to be

A =

∫ B

A

√2m(E − U(x)) d`,

19


where d` is the path length element. We minimize this to find the trajectory.For a particle near the surface of the Earth, under the influence of gravity,

U = mgz. So we have

A[z] =

∫ B

A

√2mE − 2m2gz

√1 + (z′)2 dx,

which is of exactly the same form as the optics problem we just solved. So theresult is again a parabola, as expected.

Example (Brachistochrone). The Brachistochrone problem was one of theearliest problems in the calculus of variations. The name comes from the Greekwords brakhistos (“shortest”) and khronos (“time”).

The question is as follows: suppose we have a bead sliding along a frictionlesswire, starting from rest at the origin A. What shape of wire minimizes the timefor the bead to travel to B?

x

y

A

B

The conservation of energy implies that

1

2mv2 = mgy.

Sov =

√2gy

We want to minimize

T =

∫d`

v.

So

T =1√2g

∫ √dx2 + dy2

√y

=1√2g

∫ √1 + (y′)2

ydx

Since there is no explicit dependence on x, we have the first integral

f − y′ ∂f∂y′

=1√

y(1 + (y′)2)= constant

So the solution isy(1 + (y′)2) = c

for some positive constant c.The solution of this ODE is, in parametric form,

x = c(θ − sin θ)

y = c(1− cos θ).

Note that this has x = y = 0 at θ = 0. This describes a cycloid.

20


2.3 Constrained variation of functionals

So far, we’ve considered the problem of finding stationary values of F [x] withoutany restraint on what x could be. However, sometimes there might be somerestrictions on the possible values of x. For example, we might have a surface inR3 defined by g(x) = 0. If we want to find the path of shortest length on thesurface (i.e. geodesics), then we want to minimize F [x] subject to the constraintg(x(t)) = 0.

We can again use Lagrange multipliers. The problem we have to solve isequivalent to finding stationary values (without constraints) of

Φλ[x] = F [x]− λ(P [x]− c).

with respect to the function x(t) and the variable λ.

Example (Isoperimetric problem). If we have a string of fixed length, what isthe maximum area we can enclose with it?

We first argue that the region enclosed by the curve is convex. If it is not,we can “push out” the curve to increase the area enclosed without changing thelength. Assuming this, we can split the curve into two parts:

y2

y1

α βx

y

We have dA = [y2(x)− y1(x)] dx. So

A =

∫ β

α

[y2(x)− y1(x)] dx.

Alternatively,

A[y] =

∮y(x) dx.

and the length is

L[y] =

∮d` =

∮ √1 + (y′)2 dx.

So we look for stationary points of

Φλ[y] =

∮[y(x)− λ

√1 + (y′)2] dx+ λL.

In this case, we can be sure that our boundary terms vanish since there is noboundary.

Since there is no explicit dependence on x, we obtain the first integral

f − y′ ∂f∂y′

= constant = y0.

21


So

y0 = y − λ√

1 + (y′)2 − λ(y′)2√1 + (y′)2

= y − λ√1 + (y′)2

.

So

(y − y0)2 =λ2

1 + (y′)2

(y′)2 =λ2

(y − y0)2− 1

(y − y0)y′√λ2 − (y − y0)2

= ±1.

d[√

λ2 − (y − y0)2 ± x]

= 0.

So we haveλ2 − (y − y0)2 = (x− x0)2,

or(x− x0)2 + (y − y0)2 = λ2.

This is a circle of radius λ. Since the perimeter of this circle will be 2πλ, wemust have λ = L/(2π). So the maximum area is πλ2 = L2/(4π).

Example (Sturm-Liouville problem). The Sturm-Liouville problem is a verygeneral class of problems. We will develop some very general theory about theseproblems without going into specific examples. It can be formulated as follows:let ρ(x), σ(x) and w(x) be real functions of x defined on α ≤ x ≤ β. We willconsider the special case where ρ and w are positive on α < x < β. Our objectiveis to find stationary points of the functional

F [y] =

∫ β

α

(ρ(x)(y′)2 + σ(x)y2) dx

subject to the condition

G[y] =

∫ β

α

w(x)y2 dx = 1.

Using the Euler-Lagrange equation, the functional derivatives of F and G are

δF [y]

δy= 2(− (ρy′)′ + σy

)δG[y]

δy= 2(wy).

So the Euler-Lagrange equation of Φλ[y] = F [y]− λ(G[y]− 1) is

−(ρy′)′ + σy − λwy = 0.

We can write this as the eigenvalue problem

Ly = λwy.

22


where

L = − d

dx

(ρ

d

dx

)+ σ

is the Sturm-Liouville operator. We call this a Sturm-Liouville eigenvalueproblem. w is called the weight function.

We can view this problem in a different way. Notice that Ly = λwy is linearin y. Hence if y is a solution, then so is Ay. But if G[y] = 1, then G[Ay] = A2.Hence the condition G[y] = 1 is simply a normalization condition. We can getaround this problem by asking for the minimum of the functional

Λ[y] =F [y]

G[y]

instead. It turns out that this Λ has some significance. To minimize Λ, wecannot apply the Euler-Lagrange equations, since Λ is not of the form of anintegral. However, we can try to vary it directly:

δΛ =1

GδF − F

G2δG =

1

G(δF − ΛδG).

When Λ is minimized, we have

δΛ = 0 ⇔ δF

δy= Λ

δG

δy⇔ Ly = Λwy.

So at stationary values of Λ[y], Λ is the associated Sturm-Liouville eigenvalue.

Example (Geodesics). Suppose that we have a surface in R3 defined by g(x) = 0,and we want to find the path of shortest distance between two points on thesurface. These paths are known as geodesics.

One possible approach is to solve g(x) = 0 directly. For example, if we havea unit sphere, a possible solution is x = cos θ cosφ, y = cos θ sinφ, z = sin θ.Then the total length of a path would be given by

D[θ, φ] =

∫ B

A

√dθ2 + sin2 θdφ2.

We then vary θ and φ to minimize D and obtain a geodesic.Alternatively, we can impose the condition g(x(t)) = 0 with a Lagrange

multiplier. However, since we want the constraint to be satisfied for all t, weneed a Lagrange multiplier function λ(t). Then our problem would be to findstationary values of

Φ[x, λ] =

∫ 1

0

(|x| − λ(t)g(x(t))

)dt

23

3 Hamilton’s principle IB Variational Principles

3 Hamilton’s principle

As mentioned before, Lagrange and Hamilton reformulated Newtonian dynamicsinto a much more robust system based on an action principle.

The first important concept is the idea of a configuration space. This con-figuration space is a vector space containing generalized coordinates ξ(t) thatspecify the configuration of the system. The idea is to capture all informationabout the system in one single vector.

In the simplest case of a single free particle, these generalized coordinateswould simply be the coordinates of the position of the particle. If we havetwo particles given by positions x(t) = (x1, x2, x3) and y(t) = (y1, y2, y3), ourgeneralized coordinates might be ξ(t) = (x1, x2, x3, y1, y2, y3). In general, if wehave N different free particles, the configuration space has 3N dimensions.

The important thing is that the generalized coordinates need not be just theusual Cartesian coordinates. If we are describing a pendulum in a plane, we donot need to specify the x and y coordinates of the mass. Instead, the systemcan be described by just the angle θ between the mass and the vertical. So thegeneralized coordinates is just ξ(t) = θ(t). This is much more natural to workwith and avoids the hassle of imposing constraints on x and y.

3.1 The Lagrangian

The concept of generalized coordinates was first introduced by Lagrange in 1788.He then showed that ξ(t) obeys certain complicated ODEs which are determinedby the kinetic energy and the potential energy.

In the 1830s, Hamilton made Lagrange’s mechanics much more pleasant. Heshowed that the solutions of these ODEs are extremal points of a new “action”,

S[ξ] =

∫L dt

whereL = T − V

is the Lagrangian, with T the kinetic energy and V the potential energy.

Law (Hamilton’s principle). The actual path ξ(t) taken by a particle is the paththat makes the action S stationary.

Note that S has dimensions ML2T−1, which is the same as the 18th centuryaction (and Plank’s constant).

Example. Suppose we have 1 particle in Euclidean 3-space. The configurationspace is simply the coordinates of the particle in space. We can choose Cartesiancoordinates x. Then

T =1

2m|x|2, V = V (x, t)

and

S[x] =

∫ tB

tA

(1

2m|x|2 − V (x, t)

)dt.

Then the Lagrangian is

L(x, x, t) =1

2m|x|2 − V (x, t)

24


We apply the Euler-Lagrange equations to obtain

0 =d

dt

(∂L

∂x

)− ∂L

∂x= mx +∇V.

Somx = −∇V

This is Newton’s law F = ma with F = −∇V . This shows that Lagrangianmechanics is “the same” as Newton’s law. However, Lagrangian mechanics hasthe advantage that it does not care what coordinates you use, while Newton’slaw requires an inertial frame of reference.

Lagrangian mechanics applies even when V is time-dependent. However, ifV is independent of time, then so is L. Then we can obtain a first integral.

As before, the chain rule gives

dL

dt=∂L

∂t+ x · ∂L

∂x+ x · ∂

∂x

=∂L

∂t+ x ·

(∂L

∂x− d

dt

(∂L

∂x

))︸︷︷︸

δS

δx= 0

+ x · d

dt

(∂L

∂x

)+ x · ∂L

∂x︸︷︷︸d

dt

(x · ∂L

∂x

)So we have

d

dt

(L− x · ∂L

∂x

)=∂L

∂t.

If ∂L∂t = 0, then

x · ∂L∂x− L = E

for some constant E. For example, for one particle,

E = m|x|2 − 1

2m|x|2 + V = T + V = total energy.

Example. Consider a central force field F = −∇V , where V = V (r) is inde-pendent of time. We use spherical polar coordinates (r, θ, φ), where

x = r sin θ cosφ

y = r sin θ sinφ

z = r cos θ.

So

T =1

2m|x|2 =

1

2m(r2 + r2(θ2 + sin2 θφ2)

)So

L =1

2mr2 +

1

2mr2

(θ2 + sin2 θφ2

)− V (r).

We’ll use the fact that motion is planar (a consequence of angular momentumconservation). So wlog θ = π

2 . Then

L =1

2mr2 +

1

2mr2φ2 − V (r).

25


Then the Euler Lagrange equations give

mr −mrφ2 + V ′(r) = 0

d

dt

(mr2φ

)= 0.

From the second equation, we see that r2φ = h is a constant (angular momentumper unit mass). Then φ = h/r2. So

mr − mh2

r3+ V ′(r) = 0.

If we let

Veff = V (r) +mh2

2r2

be the effective potential, then we have

mr = −V ′eff(r).

For example, in a gravitational field, V (r) = −GMr . Then

Veff = m

(−GM

r+

h2

2r2

).

3.2 The Hamiltonian

In 1833, Hamilton took Lagrangian mechanics further and formulated Hamilto-nian mechanics. The idea is to abandon x and use the conjugate momentump = ∂L

∂x instead. Of course, this involves taking the Legendre transform of theLagrangian to obtain the Hamiltonian.

Definition (Hamiltonian). The Hamiltonian of a system is the Legendre trans-form of the Lagrangian:

H(x,p) = p · x− L(x, x),

where x is a function of p that is the solution to p = ∂L∂x .

p is the conjugate momentum of x. The space containing the variables x,pis known as the phase space.

Since the Legendre transform is its self-inverse, the Lagrangian is the Legendretransform of the Hamiltonian with respect to p. So

L = p · x−H(x,p)

with

x =∂H

∂p.

Hence we can write the action using the Hamiltonian as

S[x,p] =

∫(p · x−H(x,p)) dt.

26


This is the phase-space form of the action. The Euler-Lagrange equations forthese are

x =∂H

∂p, p = −∂H

∂x

Using the Hamiltonian, the Euler-Lagrange equations put x and p on a muchmore equal footing, and the equations are more symmetric. There are also manyuseful concepts arising from the Hamiltonian, which are explored much in-depthin the II Classical Dynamics course.

So what does the Hamiltonian look like? Consider the case of a single particle.The Lagrangian is given by

L =1

2m|x|2 − V (x, t).

Then the conjugate momentum is

p =∂L

∂x= mx,

which happens to coincide with the usual definition of the momentum. How-ever, the conjugate momentum is often something more interesting when weuse generalized coordinates. For example, in polar coordinates, the conjugatemomentum of the angle is the angular momentum.

Substituting this into the Hamiltonian, we obtain

H(x,p) = p · p

m− 1

2m( p

m

)2

+ V (x, t)

=1

2m|p|2 + V.

So H is the total energy, but expressed in terms of x,p, not x, x.

3.3 Symmetries and Noether’s theorem

Given

F [x] =

∫ β

α

f(x, x, t) dt,

suppose we change variables by the transformation t 7→ t∗(t) and x 7→ x∗(t∗).Then we have a new independent variable and a new function. This gives

F [x] 7→ F ∗[x∗] =

∫ β∗

α∗f(x∗, x∗, t∗) dt∗

with α∗ = t∗(α) and β∗ = t∗(β).There are some transformations that are particularly interesting:

Definition (Symmetry). If F ∗[x∗] = F [x] for all x, α and β, then the transfor-mation ∗ is a symmetry.

This transformation could be a translation of time, space, or a rotation, oreven more fancy stuff. The exact symmetries F has depends on the form of f .For example, if f only depends on the magnitudes of x, x and t, then rotationof space will be a symmetry.

27


Example.

(i) Consider the transformation t 7→ t and x 7→ x+ ε for some small ε. Then

F ∗[x∗] =

∫ β

α

f(x+ ε, x, t) dx =

∫ β

α

(f(x, x, t) + ε

∂f

∂x

)dx

by the chain rule. Hence this transformation is a symmetry if ∂f∂x = 0.

However, we also know that if ∂f∂x = 0, then we have the first integral

d

dt

(∂f

∂x

)= 0.

So ∂f∂x is a conserved quantity.

(ii) Consider the transformation t 7→ t− ε. For the sake of sanity, we will alsotransform x 7→ x∗ such that x∗(t∗) = x(t). Then

F ∗[x∗] =

∫ β

α

f(x, x, t− ε) dt =

∫ β

α

(f(x, x, t)− ε∂f

∂t

)dt.

Hence this is a symmetry if ∂f∂t = 0.

We also know that if ∂f∂t = 0 is true, then we obtain the first integral

d

dt

(f − x∂f

∂x

)= 0

So we have a conserved quantity f − x∂f∂x .

We see that for each simple symmetry we have above, we can obtain a firstintegral, which then gives a constant of motion. Noether’s theorem is a powerfulgeneralization of this.

Theorem (Noether’s theorem). For every continuous symmetry of F [x], thesolutions (i.e. the stationary points of F [x]) will have a corresponding conservedquantity.

What does “continuous symmetry” mean? Intuitively, it is a symmetry wecan do “a bit of”. For example, rotation is a continuous symmetry, since we cando a bit of rotation. However, reflection is not, since we cannot reflect by “abit”. We either reflect or we do not.

Note that continuity is essential. For example, if f is quadratic in x and x,then x 7→ −x will be a symmetry. But since it is not continuous, there won’t bea conserved quantity.

Since the theorem requires a continuous symmetry, we can just considerinfinitesimally small symmetries and ignore second-order terms. Almost everyequation will have some O(ε2) that we will not write out.

We will consider symmetries that involve only the x variable. Up to firstorder, we can write the symmetry as

t 7→ t, x(t) 7→ x(t) + εh(t),

28


for some h(t) representing the symmetry transformation (and ε a small number).By saying that this transformation is a symmetry, it means that when we

pick ε to be any (small) constant number, the functional F [x] does not change,i.e. δF = 0.

On the other hand, since x(t) is a stationary point of F [x], we know that if εis non-constant but vanishes at the end-points, then δF = 0 as well. We willcombine these two information to find a conserved quantity of the system.

For the moment, we do not assume anything about ε and see what happensto F [x]. Under the transformation, the change in F [x] is given by

δF =

∫ (f(x+ εh, x+ εh+ εh, t)− f(x, x, t)

)dt

=

∫ (∂f

∂xεh+

∂f

∂xεh+

∂f

∂xεh

)dt

=

∫ε

(∂f

∂xh+

∂f

∂xh

)dt+

∫ε

(∂f

∂xh

)dt.

First consider the case where ε is a constant. Then the second integral vanishes.So we obtain

ε

∫ (∂f

∂xh+

∂f

∂xh

)dt = 0

This requires that∂f

∂xh+

∂f

∂xh = 0

Hence we know that

δF =

∫ε

(∂f

∂xh

)dt.

Then consider a variable ε that is non-constant but vanishes at end-points. Thenwe know that since x is a solution, we must have δF = 0. So we get∫

ε

(∂f

∂xh

)dt = 0.

We can integrate by parts to obtain∫ε

d

dt

(∂f

∂xh

)dt = 0.

for any ε that vanishes at end-points. Hence we must have

d

dt

(∂f

∂xh

)= 0.

So ∂f∂xh is a conserved quantity.Obviously, not all symmetries just involve the x variable. For example, we

might have a time translation t 7→ t − ε. However, we can encode this as atransformation of the x variable only, as x(t) 7→ x(t− ε).

In general, to find the conserved quantity associated with the symmetryx(t) 7→ x(t) + εh(t), we find the change δF assuming that ε is a function of timeas opposed to a constant. Then the coefficient of ε is the conserved quantity.

29


Example. We can apply this to Hamiltonian mechanics. The motion of theparticle is the stationary point of

S[x,p] =

∫(p · x−H(x,p)) dt,

where

H =1

2m|p|2 + V (x).

(i) First consider the case where there is no potential. Since the action dependsonly on x (or p) and not x itself, it is invariant under the translation

x 7→ x + ε, p 7→ p.

For general ε that can vary with time, we have

δS =

∫ [(p · (x + ε)−H(p)

)−(p · x−H(p)

)]dt

=

∫p · ε dt.

Hence p (the momentum) is a constant of motion.

(ii) If the potential has no time-dependence, then the system is invariant undertime translation. We’ll skip the tedious computations and just state thattime translation invariance implies conservation of H itself, which is theenergy.

(iii) The above two results can also be obtained directly from first integralsof the Euler-Lagrange equation. However, we can do something cooler.Suppose that we have a potential V (|x|) that only depends on radius. Thenthis has a rotational symmetry.

Choose any favorite axis of rotational symmetry ω, and make the rotation

x 7→ x + εω × x

p 7→ p + εω × p,

Then our rotation does not affect the radius |x| and momentum |p|. Sothe Hamiltonian H(x,p) is unaffected. Noting that ω× p · x = 0, we have

δS =

∫ (p · d

dt(x + εω × x)− p · x

)dt

=

∫ (p · d

dt(εω × x)

)dt

=

∫ (p ·[ω × d

dt(εx)

])dt

=

∫(p · [ω × (εx + εx)]) dt

=

∫(εp · (ω × x) + εp · (ω × x)) dt

30


Since p is parallel to x, we are left with

=

∫(εp · (ω × x)) dt

=

∫εω · (x× p) dt.

So ω · (x×p) is a constant of motion. Since this is true for all ω, L = x×pmust be a constant of motion, and this is the angular momentum.

31

4 Multivariate calculus of variations IB Variational Principles

4 Multivariate calculus of variations

So far, the function x(t) we are varying is just a function of a single variable t.What if we have a more complicated function to consider?

We will consider the most general case y(x1, · · · , xm) ∈ Rn that mapsRm → Rn (we can also view this as n different functions that map Rm → R).The functional will be a multiple integral of the form

F [y] =

∫· · ·∫

f(y,∇y, x1, · · · , xm) dx1 · · · dxm,

where ∇y is the second-rank tensor defined as

∇y =

(∂y

∂x1, · · · , ∂y

∂xm

).

In this case, instead of attempting to come up with some complicated generalizedEuler-Lagrange equation, it is often a better idea to directly consider variationsδy of y. This is best illustrated by example.

Example (Minimal surfaces in E3). This is a natural generalization of geodesics.A minimal surface is a surface of least area subject to some boundary conditions.Suppose that (x, y) are good coordinates for a surface S, where (x, y) takesvalues in the domain D ⊆ R2. Then the surface is defined by z = h(x, y), whereh is the height function.

When possible, we will denote partial differentiation by suffices, i.e. hx = ∂h∂x .

Then the area is given by

A[h] =

∫D

√1 + h2

x + h2y dA.

Consider a variation of h(x, y): h 7→ h+ δh(x, y). Then

A[h+ δh] =

∫D

√1 + (hx + (δh)x)2 + (hy + (δh)y)2 dA

= A[h] +

∫D

hx(δh)x + hy(δh)y√1 + h2

x + h2y

+O(δh2)

dA

We integrate by parts to obtain

δA = −∫D

δh

∂

∂x

hx√1 + h2

x + h2y

+∂

∂y

hy√1 + h2

x + h2y

dA+O(δh2)

plus some boundary terms. So our minimal surface will satisfy

∂

∂x

hx√1 + h2

x + h2y

+∂

∂y

hy√1 + h2

x + h2y

= 0

Simplifying, we have

(1 + h2y)hxx + (1 + h2

x)hyy − 2hxhyhxy = 0.

32


This is a non-linear 2nd-order PDE, the minimal-surface equation. While it isdifficult to come up with a fully general solution to this PDE, we can considersome special cases.

– There is an obvious solution

h(x, y) = Ax+By + C,

since the equation involves second-derivatives and this function is linear.This represents a plane.

– If |∇h|2 � 1, then h2x and h2

y are small. So we have

hyy + hyy = 0,

or∇2h = 0.

So we end up with the Laplace equation. Hence harmonic functions are(approximately) minimal-area.

– We might want a cylindrically-symmetric solution, i.e. h(x, y) = z(r), where

r =√x2 + y2. Then we are left with an ordinary differential equation

rz′′ + z′ + z′3 = 0.

The general solution is

z = A−1 cosh(Ar) +B,

a catenoid.

Alternatively, to obtain this,this we can substitute h(x, y) = z(r) into A[h]to get

A[z] = 2π

∫r√

1 + (h′(r))2 dr,

and we can apply the Euler-Lagrange equation.

Example (Small amplitude oscillations of uniform string). Suppose we have astring with uniform constant mass density ρ with uniform tension T .

y

Suppose we pull the line between x = 0 and x = a with some tension T . Then weset it into motion such that the amplitude is given by y(x; t). Then the kineticenergy is

T =1

2

∫ a

0

ρv2 dx =ρ

2

∫ a

0

y2 dx.

The potential energy is the tension times the length. So

V = T

∫d` = T

∫ a

0

√1 + (y′)2 dx = (Ta) +

∫ a

0

1

2T (y′2) dx.

33


Note that y′ is the derivative wrt x while y is the derivative wrt time.The Ta term can be seen as the ground-state energy. It is the energy initially

stored if there is no oscillation. Since this constant term doesn’t affect wherethe stationary points lie, we will ignore it. Then the action is given by

S[y] =

∫∫ a

0

(1

2ρy2 − 1

2T (y′)2

)dx dt

We apply Hamilton’s principle which says that we need

δS[y] = 0.

We have

δS[y] =

∫∫ a

0

(ρy

∂

∂tδy − Ty′ ∂

∂xδy

)dx dt.

Integrate by parts to obtain

δS[y] =

∫∫ a

0

δy(ρy − Ty′′) dx dt+ boundary term.

Assuming that the boundary term vanishes, we will need

y − v2y′′ = 0,

where v2 = T/ρ. This is the wave equation in two dimensions. Note that this is alinear PDE, which is a simplification resulting from our assuming the oscillationis small.

The general solution to the wave equation is

y(x, t) = f+(x− vt) + f−(x+ vt),

which is a superposition of a wave travelling rightwards and a wave travellingleftwards.

Example (Maxwell’s equations). It is possible to obtain Maxwell’s equationsfrom an action principle, where we define a Lagrangian for the electromagneticfield. Note that this is the Lagrangian for the field itself, and there is a separateLagrangian for particles moving in a field.

We have to first define several quantities. First we have the charges: ρrepresents the electric charge density and J represents the electric currentdensity.

Then we have the potentials: φ is the electric scalar potential and A is themagnetic vector potential.

Finally the fields: E = −∇φ− A is the electric field, and B = ∇×A is themagnetic field.

We pick convenient units where c = ε0 = µ0 = 1. With these concepts inmind, the Lagrangian is given by

S[A, φ] =

∫ (1

2(|E|2 − |B|2) + A · J− φρ

)dV dt

Varying A and φ by δA and δφ respectively, we have

δS =

∫ (−E ·

(∇δφ+

∂

∂tδA

)−B · ∇ × δA + δA · J− ρδφ

)dV dt.

34


Integrate by parts to obtain

δS =

∫ (δA · (E−∇×B + J) + δφ(∇ ·E− ρ)

)dV dt.

Since the coefficients have to be 0, we must have

∇×B = J + E, ∇ ·E = ρ.

Also, the definitions of E and B immediately give

∇ ·B = 0, ∇×E = −B.

These four equations are Maxwell’s equations.

35

5 The second variation IB Variational Principles

5 The second variation

5.1 The second variation

So far, we have only looked at the “first derivatives” of functionals. We canidentify stationary points, but we don’t know if it is a maximum, minimum or asaddle. To distinguish between these, we have to look at the “second derivatives”,or the second variation.

Suppose x(t) = x0(t) is a solution of

δF [x]

δy(x)= 0,

i.e. F [x] is stationary at y = y0.To determine what type of stationary point it is, we need to expand F [x+δx]

to second order in δx. For convenience, let δx(t) = εξ(t) with constant ε� 1.We will also only consider functionals of the form

F [x] =

∫ β

α

f(x, x, t) dt

with fixed-end boundary conditions, i.e. ξ(α) = ξ(β) = 0. We will use both dots(x) and dashes (x′) to denote derivatives.

We consider a variation x 7→ x+ δx and expand the integrand to obtain

f(x+ εξ, x+ εξ, t)− f(x, x, t)

= ε

(ξ∂f

∂x+ ξ

∂f

∂x

)+ε2

2

(ξ2 ∂

2f

∂x2+ 2ξξ

∂2f

∂x∂x+ ξ2 ∂

2f

∂x2

)+O(ε3)

Noting that 2ξξ = (ξ2)′ and integrating by parts, we obtain

= εξ

[∂f

∂x− d

dt

(∂f

∂x

)]+ε2

2

{ξ2

[∂2f

∂x2− d

dt

(∂2f

∂x∂x

)]+ ξ2 ∂f

∂x2

}.

plus some boundary terms which vanish. So

F [x+ εξ]− F [x] =

∫ β

α

εξ

[∂f

∂x− d

dt

(∂f

∂x

)]dt+

ε2

2δ2F [x, ξ] +O(ε3),

where

δ2F [x, ξ] =

∫ β

α

{ξ2

[∂2f

∂x2− d

dt

(∂2f

∂x∂x

)]+ ξ2 ∂

2f

∂x2

}dt

is a functional of both x(t) and ξ(t). This is analogous to the term

δxTH(x)δx

appearing in the expansion of a regular function f(x). In the case of normalfunctions, if H(x) is positive, f(x) is convex for all x, and the stationary pointis hence a global minimum. A similar result holds for functionals.

In this case, if δ2F [x, ξ] > 0 for all non-zero ξ and all allowed x, then asolution x0(t) of δF

δx = 0 is an absolute minimum.

36


Example (Geodesics in the plane). We previously shown that a straight line isa stationary point for the curve-length functional, but we didn’t show it is infact the shortest distance! Maybe it is a maximum, and we can get the shortestdistance by routing to the moon and back.

Recall that f =√

1 + (y′)2. Then

∂f

∂y= 0,

∂f

∂y′=

y′√1 + (y′)2

,∂2f

∂y′2=

1√1 + (y′)2

3 ,

with the other second derivatives zero. So we have

δ2F [y, ξ] =

∫ β

α

ξ2

(1 + (y′)2)3/2dx > 0

So if we have a stationary function satisfying the boundary conditions, it is anabsolute minimum. Since the straight line is a stationary function, it is indeedthe minimum.

However, not all functions are convex[citation needed]. We can still ask whethera solution x0(t) of the Euler-Lagrange equation is a local minimum. For these,we need to consider

δ2F [x0, ξ] =

∫ β

α

(ρ(t)ξ2 + σ(t)ξ2) dt,

where

ρ(t) =∂2f

∂x2

∣∣∣∣x=x0

, σ(t) =

[∂2f

∂x2− d

dt

(∂2f

∂x∂x

)]x=x0

.

This is of the same form as the Sturm-Liouville problem. For x0 to minimizeF [x] locally, we need δ2F [x0, ξ] > 0. A necessary condition for this is

ρ(t) ≥ 0,

which is the Legendre condition.The intuition behind this necessary condition is as follows: suppose that

ρ(t) is negative in some interval I ⊆ [α, β]. Then we can find a ξ(t) that makesδ2F [x0, ξ] negative. We simply have to make ξ zero outside I, and small butcrazily oscillating inside I. Then inside I, x2 wiill be very large while ξ2 is kepttiny. So we can make δ2F [y, ξ] arbitrarily negative.

Turning the intuition into a formal proof is not difficult but is tedious andwill be omitted.

However, this is not a sufficient condition. Even if we had a strict inequalityρ(t) > 0 for all α < t < β, it is still not sufficient.

Of course, a sufficient (but not necessary) condition is ρ(t) > 0, σ(t) ≥ 0, butthis is not too interesting.

Example. In the Branchistochrone problem, we have

T [x] ∝∫ β

α

√1 + x2

xdt.

37


Then

ρ(t) =∂2f

∂x2

∣∣∣∣x0

> 0

σ(t) =1

2x2√x(1 + x2)

> 0.

So the cycloid does minimize the time T .

5.2 Jacobi condition for local minima of F [x]

Legendre tried to prove that ρ > 0 is a sufficient condition for δ2F > 0. This isknown as the strong Legendre condition. However, he obviously failed, since it isindeed not a sufficient condition. Yet, it turns out that he was close.

Before we get to the actual sufficient condition, we first try to understandwhy thinking ρ > 0 is sufficient isn’t as crazy as it first sounds.

If ρ > 0 and σ < 0, we would want to create a negative δ2F [x0, ξ] by choosingξ to be large but slowly varying. Then we will have a very negative σ(t)ξ2 while

a small positive ρ(t)ξ2.The problem is that ξ has to be 0 at the end points α and β. For ξ to take a

large value, it must reach the value from 0, and this requires some variation of ξ,thereby inducing some ξ. This is not a problem if α and β are far apart - wesimply slowly climb up to a large value of ξ and then slowly rappel back down,maintaining a low ξ throughout the process. However, it is not unreasonable toassume that as we make the distance β − α smaller and smaller, eventually all ξwill lead to a positive δ2F [x0, ξ], since we cannot reach large values of ξ withouthaving large ξ.

It turns out that the intuition is correct. As long as α and β are sufficientlyclose, δ2F [x0, ξ] will be positive. The derivation of this result is, however, ratherroundabout, involving a number of algebraic tricks.

For a solution x0 to the Euler Lagrange equation, we have

δ2F [x0, ξ] =

∫ β

α

(ρ(t)ξ2 + σ(t)ξ2

)dt,

where

ρ(t) =∂2f

∂x2

∣∣∣∣x=x0

, σ(t) =

[∂2f

∂x2− d

dt

(∂2f

∂x∂x

)]x=x0

.

Assume ρ(t) > 0 for α < t < β (the strong Legendre condition) and assumeboundary conditions ξ(α) = ξ(β) = 0. When is this sufficient for δ2F > 0?

First of all, notice that for any smooth function w(x), we have

0 =

∫ β

α

(wξ2)′ dt

since this is a total derivative and evaluates to wξ(α)− wξ(β) = 0. So we have

0 =

∫ β

α

(2wξξ + wξ2) dt.

38


This allows us to rewrite δ2F as

δ2F =

∫ β

α

(ρξ2 + 2wξξ + (σ + w)ξ2

)dt.

Now complete the square in ξ and ξ. So

δ2F =

∫ β

α

[ρ

(ξ +

w

ρξ

)2

+

(σ + w − w2

ρ

)ξ2

]dt

This is non-negative ifw2 = ρ(σ + w). (∗)

So as long as we can find a solution to this equation, we know that δ2F isnon-negative. Could it be that δ2F = 0? Turns out not. If it were, thenξ = −wρ ξ. We can solve this to obtain

ξ(x) = C exp

(−∫ x

α

w(s)

ρ(s)ds

).

We know that ξ(α) = 0. But ξ(α) = Ce0. So C = 0. Hence equality holds onlyfor ξ = 0.

So all we need to do is to find a solution to (∗), and we are sure that δ2F > 0.Note that this is non-linear in w. We can convert this into a linear equation

by defining w in terms of a new function u by w = −ρu/u. Then (∗) becomes

ρ

(u

u

)2

= σ −(ρu

u

)′= σ − (ρu)′

u+ ρ

(u

u

)2

.

We see that the left and right terms cancel. So we have

−(ρu)′ + σu = 0.

This is the Jacobi accessory equation, a second-order linear ODE.There is a caveat here. Not every solution u will do. Recall that u is used

to produce w via w = −ρu/u. Hence within [α, β], we cannot have u = 0 sincewe cannot divide by zero. If we can find a non-zero u(x) satisfying the Jacobiaccessory equation, then δ2F > 0 for ξ 6= 0, and hence y0 is a local minimum ofF .

A suitable solution will always exists for sufficiently small β − α, but maynot exist if β − α is too large, as stated at the beginning of the chapter.

Example (Geodesics on unit sphere). For any curve C on the sphere, we have

L =

∫C

√dθ2 + sin2 θ dφ2.

If θ is a good parameter of the curve, then

L[φ] =

∫ θ2

θ1

√1 + sin2 θ(φ′)2 dθ.

39


Alternatively, if φ is a good parameter, we have

L[θ] =

∫ φ2

φ1

√(θ′)2 + sin2 θ dφ.

We will look at the second case.We have

f(θ, θ′) =

√(θ′)2 + sin2 θ.

So∂f

∂θ=

sin θ cos θ√(θ′)2 + sin2 θ

,∂f

∂θ′=

θ′√(θ′)2 + sin2 θ

.

Since ∂f∂φ = 0, we have the first integral

const = f − θ′ ∂f∂θ′

=sin2 θ√

(θ′)2 + sin2 θ

So a solution is

c sin2 θ =

√(θ′)2 + sin2 θ.

Here we need c ≥ 1 for the equation to make sense.We will consider the case where c = 1 (in fact, we can show that we can

always orient our axes such that c = 1). This occurs when θ′ = 0, i.e. θ is aconstant. Then our first integral gives sin2 θ = sin θ. So sin θ = 1 and θ = π/2.This corresponds to a curve on the equator. (we ignore the case sin θ = 0 thatgives θ = 0, which is a rather silly solution)

There are two equatorial solutions to the Euler-Lagrange equations. Which,if any, minimizes L[θ]?

We have∂2f

∂(θ′)2

∣∣∣∣θ=π/2

= 1

and∂2f

∂θ∂θ′= −1,

∂2

∂θ∂θ′= 0.

So ρ(x) = 1 and σ(x) = −1. So

δ2F =

∫ φ2

φ1

((ξ′)2 − ξ2) dφ.

40


The Jacobi accessory equation is u′′ + u = 0. So the general solution is u ∝sinφ− γ cosφ. This is equal to zero if tanφ = γ.

Looking at the graph of tanφ, we see that tan has a zero every π radians.Hence if the domain φ2 − φ1 is greater than π (i.e. we go the long way fromthe first point to the second), it will always contain some values for which tanφis zero. So we cannot conclude that the longer path is a local minimum (itis obviously not a global minimum, by definition of longer) (we also cannotconclude that it is not a local minimum, since we tested with a sufficient andnot necessary condition). On the other hand, if φ2 − φ1 is less than π, then wewill be able to pick a γ such that u is non-zero in the domain.

41

Part IB - Variational Principles - SRCF1 Multivariate calculus IB Variational Principles This is...

Documents

Transcript of Part IB - Variational Principles - SRCF1 Multivariate calculus IB Variational Principles This is...