VARIATIONAL METHODS Contentsmath.jhu.edu/~yli/Variational methods.pdfVARIATIONAL METHODS 3 Note that...

VARIATIONAL METHODS

YI LI

Contents

1. Functionals 21.1. Introduction 21.2. Vector spaces 21.3. Functionals 31.4. Normed vector spaces 61.5. Continuous functionals 71.6. Linear functionals 102. A fundamental necessary condition for an extremum 112.1. Gateaux variation 112.2. Examples 152.3. An optimization problem in production planning 182.4. Frechet differential 193. Euler-Lagrange necessary condition for an extremum with constraints 203.1. Extremum problems 203.2. Weak continuity 203.3. Euler-Lagrange multiplier theorem for a single constraint 213.4. The Euler-Lagrange multiplier theorem for many constraints 273.5. Chaplygin’s problem 303.6. John multiplier theorem 364. Application I: Calculus of variations 374.1. Problems with fixed end points 374.2. Geodesic curves 484.3. Problems with variable end points 504.4. Functionals involving several unknown functions 574.5. Functionals involving higher-order derivatives 644.6. Functionals involving several independent variables 675. Application II: Sturm-Liouville eigenvalues 715.1. Sturm-Liouville problems 715.2. Rayleigh quotient and the lowest eigenvalue 735.3. Rayleigh-Ritz method and the lowest eigenvalue 755.4. Rayleigh quotient and Higher eigenvalues 775.5. The Courant minimax principle 795.6. Polya’s conjecture 826. Second variation in extremum problems 856.1. Higher-order variations 856.2. Necessary conditions for a local extremum 886.3. Sufficient conditions for a local extremum 89

1

2 YI LI

1. Functionals

1.1. Introduction. In the course of Calculus, we study derivatives of functionswith variables of real numbers. For example, the following function

f(x) = x2, x ∈ R

is continuous, minx∈R f(x) = f(0) = 0, and the first derivative of f at x = 0vanishes.

In this course, we shall study “derivatives” of “functions” with variables of func-tions. We will explain the mentioned two terminologies which are usually calledas variations and functionals, respectively. Let X denote the set of all functionsdefined on R and consider a map

F(f) := f(0)2, f ∈X .

Clearly that minf∈X F(f) = 0 = F(0) and (at least) the zero function minimizesthe map F . Then we may ask a natural question on how to define the first “deriv-ative” of the map F so that the first derivative of F vanishes at the zero function.

1.2. Vector spaces. By a vector space (over the set of real numbers R) we mean aset X of elements x, y, z, · · · , referred to as vectors, together with two operationsof addition and multiplication, satisfies the following rules:

(1) x+ y ∈X for any x, y ∈X .(2) ax ∈X for any x ∈X and a ∈ R.(3) x+ y = y + x for any x, y ∈X .(4) (x+ y) + z = x+ (y + z) for any x, y, z ∈X .(5) X contains an element 0, called the zero vector, such that x+ 0 = x for

every x ∈X .(6) For every x ∈X , there is a vector −x ∈X such that x+ (−x) = 0.(7) a(bx) = (ab)x for any a, b ∈ R and any x ∈X .(8) a(x+ y) = ax+ ay for any a ∈ R and any x, y ∈X .(9) (a+ b)x = ax+ bx for any a, b ∈ R and any x ∈X .

(10) 1x = x for all x ∈X .

Example 1.1. (1) R is a vector space. More generally, the n-dimensional Euclideanspace Rn = x = (x1, · · · , xn) : xi ∈ R, 1 ≤ i ≤ n is a vector space.

(2) For a fixed interval I ⊂ R, let X be the set of all real-valued functionsdefined on I. If we define

(φ+ ψ)(x) + φ(x) + ψ(x),(1.2.1)

(aφ)(x) + aφ(x),(1.2.2)

where a ∈ R and φ, ψ ∈X , then X is a vector space.

Example 1.2. LetX ′ = f ∈X : f(0)− f(1) = 1

where the vector space X is given by Example 1.1 with I = [0, 1]. However, theset X ′ is not a vector space. If we choose f(x) = 1 − x and g(x) = 1 − x2, thenf, g ∈X ′ but f + g /∈X ′.

If Y is a subset of a vector space X , then it is a subspace of X if

(a) x+ y ∈ Y for any x, y ∈ Y , and(b) ax ∈ Y for any a ∈ R and x ∈ Y .

VARIATIONAL METHODS 3

Note that Y is itself a vector space.

Example 1.3. For a fixed interval I ⊂ R, let C k(I) be the set of all real-valuedfunctions on I which have continuous derivative of all orders up to and includingk-th order. When I = [a, b], we write C k(I) as C k[a, b].

1.3. Functionals. A functional is a map J from the subset D(J ) of some vectorspace X , the domain of J , to R.

(1) Brachistochrone functional. Consider a smooth curve γ in the (x, y)-plane joining two fixed points P0 = (x0, y0) and P1 = (x1, y1) (assume that y0 > y1).The time T required for a bead to move from P0 down to P1 along γ is given by

(1.3.1) T =

∫ T

0

dt =

∫γ

ds

v,

where s measures the arc length along γ, ds/dt is the rate of change of arc lengthwith respect to time t, and the speed of motion is v = ds/dt.

We assume that the earth’s gravitational acts down along the negative y-direction.Then a bead located at the position (x, y) and sliding along γ under the force ofgravity will have kinetic energy of motion given as 1

2mv2 and potential energy given

as mgy, where m is the mass of the bead. By the conservation of energy, we have

(1.3.2)1

2mv2 +mgy = mgy0,

provided that the bead starts from rest at P0 with zero initial kinetic energy andinitial potential energy equal to mgy0.

Whenγ : y = Y (x), x0 ≤ x ≤ x1,

for some suitable function Y (x), we have

v =√

2g[y0 − Y (x)], ds =√

1 + Y ′(x)2dx.

Hence

(1.3.3) T (Y ) =

∫ x1

x0

√1 + Y ′(x)2

2g[y0 − Y (x)]dx

for any Y in D(T ) = Y ∈ C1[x0, x1] : Y (x0) = y0 and Y (x1) = y1 that is asubset of C1[x0, x1].

(2) Area functional. Consider Chaplygin’s problem on the greatest areathat can be encircled in a given time T by varying the closed path γ flown by anairplane at constant natural speed v0 while a constant wind blows. The airplaneis assumed to be flying in a fixed horizonal plane surface above the surface of theearth which we take to be the (x, y)-coordinate plane. Suppose that a fixed closedpath in the xy-plane parametrically as

γ :

x = X(t),y = Y (t),

for t ∈ [0, T ], where we require that

(1.3.4) X(0) = X(T ), Y (0) = Y (T ).

The area enclosed by γ is given by

(1.3.5) A =1

2

∫ T

0

[X(t)Y ′(t)− Y (t)X ′(t)]dt.

4 YI LI

We let w0 be the constant wind speed with 0 ≤ w0 ≤ v0 and assume that thex-direction coincides with w0. Let α(t) be the steering angle between the positivex-direction and the direction of the axis of the airplane. The absolute velocity ofthe airplane relative to the ground is

(1.3.6)

X ′(t) = v0 · cos[α(t)] + w0,Y ′(t) = v0 · sin[α(t)].

Hence

(1.3.7)

X(t) = x0 + v0

∫ t0

cos[α(τ)]dτ + w0t,

Y (t) = Y0 + v0

∫ t0

sin[α(τ)]dτ,

where we set

(1.3.8) X(0) = x0, Y (0) = y0.

By (1.3.4) we must have

(1.3.9)

∫ T0

cos[α(t)]dt = −w0

v0T,∫ T

0sin[α(t)]dt = 0.

We might also impose the additional constraint that the initial steering angle α(0)be specified as

(1.3.10) α(0) = α0

for some given initial angle α0. Therefore

A(α) =1

2

∫ T

0

v0 · sin[α(t)]

[x0 + w0t+ v0

∫ t

0

cos[α(τ)]dτ

]− [v0 · cos[α(t)] + w0]

[y0 + v0

∫ t

0

sin[α(τ)]dτ

]dt,(1.3.11)

and the domain D(A), a subset of C 0[0, T ], is the set of all continuous functionsα(t) on [0, T ] satisfying (1.3.9) and (1.3.10).

(3) Transit time functional. Consider the transit time of a boat crossing ariver from a fixed initial point on one bank to a specified terminal point on theother bank. For simplicity, we assume that the river has parallel banks, where welet the y-axis coincide with the left bank. The river is ` units wide so that the rightbank coincides with the line x = `.

We consider a river without cross currents so that the current velocity is every-where directed downstream along the y-direction. Furthermore, we assume thatthis downstream current speed w depends only on x as

w = w(x), x ∈ [0, `],

and the boat travels at a constant natural speed v0 relative to the surroundingwater. If the path of the boat is represented by

γ :

x = ξ(t),y = η(t),

for t ∈ [0, T ]. Then

(1.3.12)

dxdt = ξ′(t) = v0 · cos[α(t)],dydt = η′(t) = v0 sin[α(t)] + w(ξ(t)),


where α(t) is the steering angle of the boat measured between the positive x-direction and the direction of the axis of the boat.

The time of transit T of the boat is

(1.3.13) T =

∫ T

0

dt =

∫ `

0

dt

dxdx =

∫ `

0

dx

v0 · cosα,

where α = α(t) = α(x). If

(1.3.14) γ : y = Y (x), x ∈ [0, `],

where the function Y (x) = η(ξ−1(x)). Then

Y ′(x) =η′(t)

ξ′(t)

so that

(1.3.15) Y ′(x) =sinα+ e(x)

cosα, e(x) +

w(x)

v0.

From (1.3.15), we have

(1.3.16) cosα =1− e(x)2√

1− e(x)2 + Y ′(x)2 − e(x)Y ′(x)

and

(1.3.17) T (Y ) =1

v0

∫ `

0

√1− e(x)2 + Y ′(x)2 − e(x)Y ′(x)

1− e(x)2dx.

The domain of the functional defined by (1.3.17) is the vector space C 1[0, `].(4) Cost functional. We consider a company that manufactures and sells some

particular product. We assume that the company has on hand sufficient long-termorders for its product so that it can predict its future sales rate

(1.3.18) S = S(t)

with certainty. If the product is completely durable, it is natural to assume thatthe production rate P (t) and the finished product inventory level I(t) are relatedby

I = P − S.More generally, we allow for some spoilage of the inventory by considering insteadthe equation

(1.3.19) I = P − (S + αI),

where the spoilage proportionality α is a given constant.Finally, we assume that on the basis of the known sales forecast S of (1.3.18) the

company has decided on a desired inventory level i(t), resulting in a correspondingdesired production rate p(t), obtained from (1.3.19) as

(1.3.20) i = p− (S + αi).

Consider the cost functional C defined as

(1.3.21) C +∫ T

0

β2[I(t)− i(t)]2 + [P (t)− p(t)]2

dt,

where β is a fixed constant which the company might want to specify so as to givedifferent relative weights to the unwanted derivations of the inventory level andproduction rate away from their known desired levels.

6 YI LI

Suppose that

(1.3.22) I(0) = I0

holds for some given nonnegative constant I0 with I0 6= i0 = i(0). The differenceI0 − i0 furnishes a measure of the initial disturbance away from the desired state.From (1.3.19) we have

(1.3.23) IP (t) + I(t) = e−αt(I0 +

∫ t

0

eαt[P (τ)− S(τ)]dτ

).

Now the cost functional C given by (1.3.21) can be written as

(1.3.24) C(P ) =

∫ T

0

β2[IP (t)− i(t)]2 + [P (t)− p(t)]2

dt.

The domain of the functional C can be taken to be the vector space C 0[0, T ].

1.4. Normed vector spaces. A vector space X is said to be a normed vectorspace if there is real-valued norm function || · || defined on X which assigns thenumber ||x|| (called the norm of x) to the vector x ∈X such that

(1) ||x|| ≥ 0 whenever x ∈X , and ||x|| = 0 if and only if x = 0;(2) ||ax|| = |a|||x|| for every x ∈X and every α ∈ R;(3) (Triangle inequality) ||x+ y|| ≤ ||x||+ ||y|| for any x, y ∈X .

Example 1.4. For x = (x1, · · · , xn) ∈ Rn, we define

(1.4.1) ||x|| +√

(x1)2 + · · ·+ (xn)2

that is a norm on Rn. The triangle inequality follows from Cauchy’s inequality

(1.4.2)

(n∑i=1

xiyi

)2

≤

(n∑i=1

(xi)2

)(n∑i=1

(yi)2

).

Example 1.5. For φ ∈ C 0[a, b], we define

(1.4.3) ||φ||C 0[a,b] +

(∫ b

a

|φ(x)|2dx

)1/2

that is a norm on C 0[a, b]. The triangle inequality follows from Schwarz’s inequality

(1.4.4)

(∫ b

a

φ(x)ψ(x)dx

)2

≤

(∫ b

a

φ(x)2dx

)(∫ b

a

ψ(x)2dx

).

Another norm is given by

(1.4.5) ||φ||′C 0[a,b] + maxa≤x≤b

|φ(x)|

for any φ ∈ C 0[a, b]. We call (1.4.3) and (1.4.5) the L2-norm and the uniformnorm on C 0[a, b], respectively.

For φ ∈ C k[a, b], we define

(1.4.6) ||φ||′Ck[a,b] +k∑i=0

maxa≤x≤b

∣∣∣φ(i)(x)∣∣∣ .


Note that any subspace Y of any given normed vector space X is itself a normedvector space with the same norm as used on X .

If (X , || · ||) is a normed vector space, we define the ball of radius ρ centered atx to be the set

(1.4.7) Bρ(x) + y ∈X : ||y − x|| < ρ.A subset D of a normed vector space (X , || · ||) is said to be open in X wheneverD contains along with each of its elements x some ball Bρ(x) in X centered at xfor some positive ρ which may depend on x.

(a) A subspace Y of a normed vector space (X , || · ||) is not open in X unlessY is all of X . For example, R is a subspace of R2, but R is not open inR2 under the usual norm.

(b) Any open subspace Y is always considered to be a normed vector spaceitself with the same norm as used in X , and then Y is an open subset ofitself.

1.5. Continuous functionals. Let D be a fixed open set in a given normed vectorspace (X , || · ||), and let J be a functional defined on D. J is said to have thenumber L as its limit at x in D if J (y) is close to L whenever y is close to x (butdistinct from x). That is, J has the limit L at x if for every ε > 0 there is a ballBρ(x) ⊂ D such that

|L− J (y)| < ε

for all y ∈ Bρ(x) \ x. Symbolically we write

(1.5.1) limy→x in X

J (y) = L

whenever J has limit L at x.J is said to be continuous at x in D if J has the limit J (x) at x,


J (y) = J (x).

J is said to be continuous on D if J is continuous at each point of D.

Example 1.6. The cost functional (1.3.24) is continuous on C 0[0, T ] with theuniform norm (1.4.5). It suffices to show that for any such fixed P ∈ C 0[0, T ] andfor any given ε > 0 we can find a positive number ρ such that

(1.5.3) |C(P )− C(Q)| < ε

whenever

(1.5.4) 0 ≤ ||P −Q||′C 0[0,T ] < ρ,

where

(1.5.5) ||P −Q||′C 0[0,T ] = maxt∈[0,T ]

|P (t)−Q(t)|.

Since

(1.5.6) C(Q) =

∫ T

0

β2[IQ(t)− i(t)]2 + [Q(t)− p(t)]2

dt,

where

(1.5.7) IQ(t) = e−αt(I0 +

∫ t

0

eατ [Q(τ)− S(τ)]dτ

),

8 YI LI

it follows that

C(P )− C(Q) = β2

∫ T

0

[IP (t)− IQ(t)][IP (t) + IQ(t)− 2i(t)]dt

+

∫ T

0

[P (t)−Q(t)][P (t) +Q(t)− 2p(t)]dt

and hence

|C(P )− C(Q)| ≤ β2

∫ T

0

|IP (t)− IQ(t)||IP (t) + IQ(t)− 2i(t)|dt

+

∫ T

0

|P (t)−Q(t)||P (t) +Q(t)− 2p(t)|dt.(1.5.8)

From

|IP (t) + IQ(t)− 2i(t)| = |2(IP (t)− i(t)) + (IQ(t)− IP (t)|≤ 2|IP (t)− i(t)|+ |IP (t)− IQ(t)|(1.5.9)

and similarly

(1.5.10) |P (t) +Q(t)− 2p(t)| ≤ 2|P (t)− p(t)|+ |P (t)−Q(t)|,

we have

|C(P )− C(Q)| ≤ β2

∫ T

0

|IP (t)− IQ(t)| (2|IP (t)− i(t)|+ |IP (t)− IQ(t)|) dt

+

∫ T

0

|P (t)−Q(t)| (2|P (t)− p(t)|+ |P (t)−Q(t)|) dt.(1.5.11)

By

(1.5.12) |P (t)−Q(t)| ≤ ||P −Q||′C 0[0,T ], |P (t)− p(t)| ≤ ||P − p||′C 0[0,T ],

we get ∫ T

0

|P (t)−Q(t)| (2|P (t)− p(t)|+ |P (t)−Q(t)|) dt

≤∫ T

0

||P −Q||′C 0[0,T ]

(2||P − p||′C 0[0,T ] + ||P −Q||′C 0[0,T ]

)dt(1.5.13)

= ||P −Q||′C 0[0,T ]

(2||P − p||′C 0[0,T ] + ||P −Q||′C 0[0,T ]

)T.

On the other hand, we have

(1.5.14) |IP (t)− IQ(t)| ≤ e−αt∫ t

0

rαt|P (τ)−Q(τ)|dτ ≤ ||P −Q||′C 0[0,T ]T

and

β2

∫ T

0

|IP (t)− IQ(t)| (2|IP (t)− i(t)|+ |IP (t)− IQ(t)|) dt

≤ β2||P −Q||′C 0[0,T ]T(

2||IP − i||′C 0[0,T ] + ||P −Q||′C 0[0,T ]T)T.(1.5.15)


It follows from (1.5.11), (1.5.13), and (1.5.15) that

|C(P )− C(Q)| ≤ ||P −Q||′C 0[0,T ]Tβ2T

(2||IP − i||′C 0[0,T ]

+||P −Q||′C 0[0,T ]T)

+ 2||P − p||′C 0[0,T ] + ||P −Q||′C 0[0,T ]

.(1.5.16)

Letting Q→ P in C 0[0, T ], we have

limQ→P in C 0[0,T ]

C(Q) = C(P ).

Remark 1.7. We give examples showing that the continuity or lack of continuityof a given functional defined on a vector space may depend on the particular normused. This can happen only in infinite dimensional vector spaces, since all normsare equivalent to each other on any finite dimensional vector space.

(i) Let X = C 0[0, 1] equipped with the L2-norm given as

||φ||L2[0,1] +

(∫ 1

0

|φ(t)|2dt)1/2

for any φ ∈X . Consider the functional J defined by

J (φ) + φ(0), φ ∈X .

We show that J fails to be continuous at φ for each vector φ ∈ X . Letψ + φ+ χ. Then

J (ψ)− J (φ) = ψ(0)− φ(0) = χ(0).

If we choose

χn(t) +

1, 0 ≤ t ≤ 1

n ,√2− nt, 1

n ≤ t ≤2n ,

0, 2n ≤ t ≤ 1,

then

||χn||2L2[0,1] =

∫ 1

0

|χn(t)|2dt =

∫ 1n

0

dt+

∫ 2n

1n

(2− nt)dt

=1

n+(

2t− n

2t2) ∣∣∣ 2n

1n

=3

2n.

Consequently, ψn := φ+χn → φ as n→∞ in L2-norm, but J (ψn)−J (φ) =1.

(ii) Let Y = C 0[0, 1] equipped with the uniform norm as

||φ||C 0[0,1] + maxt∈[0,1]

|φ(t)|

for any φ ∈ Y . Consider the same functional J defined by

J (φ) + φ(0), φ ∈ Y .

We show that J is continuous at each vector φ ∈ Y . Indeed,

|J (ψ)− J (φ)| = |ψ(0)− φ(0)| ≤ ||ψ − φ||C 0[0,1] → 0

as ||ψ − φ||C0[0,1] → 0.

10 YI LI

1.6. Linear functionals. A functional J is said to be linear if the domain of Jconsists of some entire vector space X and if J satisfies the linearity relation

(1.6.1) J (ax+ by) = aJ (x) + bJ (y)

for all a, b ∈ R and all vectors x, y ∈X .

Remark 1.8. (a) The cost functional C is not linear.(b) If J is a linear functional, then J (0) = 0. In fact, by (1.6.1), we have

J (0) = J (0 + 0) = J (0) + J (0) = 2J (0).

(c) A linear functional is continuous on its domain X if and only if it is contin-uous at the zero vector in X . The proof is very easy. Suppose first that a linearfunctional J is continuous at 0 ∈ X , then for any given ε > 0 there is a numberρ > 0 such that

|J (y)| = |J (y)− J (0)| < ε

whenever ||y|| = ||y − 0|| < ρ. If x is any vector in X , then

J (z)− J (x) = J (z − x)

holds for all vectors z ∈X and so

|J (z)− J (x)| = |J (z − x)|.

If y := z − x and ||y|| < ρ, then |J (z)−J (x)| = |J (y)| < ε. Thus J is continuousat x ∈X .

(d) A linear functional J is continuous at the zero vector in its domain X if

(1.6.2) |J (x)| ≤ C||x||

for all x ∈X and some fixed constant C depending only on the functional J .(e) Let e1, · · · , en be the unit vectors in Rn given as

e1 = (1, 0, · · · , 0), en = (0, 0, · · · , 1),

where ei has its ith component equal to 1 and all other components equal to 0. Ifx = (x1, · · · , xn) ∈ Rn is any vector, then x can be written as a linear combinationof

x =n∑i=1

xiei.

Suppose that J is a linear functional on Rn. Then for any x = (x1, · · · , xn) ∈ Rn,we have

J (x) = J

(n∑i=1

xiei

)=

n∑i=1

xiJ (ei).

Since

|J (x)| ≤n∑i=1

|xi| · |J (ei)| ≤

(n∑i=1

|xi|2)1/2( n∑

i=1

|J (ei)|2)1/2

=

(n∑i=1

|J (ei)|2)1/2

||x||

by Cauchy’s inequality, it follows from (1.6.2) that J is continuous at the zerovector in Rn and then is continuous on Rn by (c).


Note that every linear functional on a finite dimensional normed vector spaceX is continuous.

(e) However the above result is not true for every normed vector space. Forexample, consider the functional

J (φ) := φ′(0) =d

dx

∣∣∣x=0

φ(x)

on the normed vector space C 1[−1, 1] equipped with the uniform norm defined as||φ|| := maxx∈[−1,1] |φ(x)| for any vector φ ∈ C 1[−1, 1]. Consider the vectors

φk(x) :=1√k

sin(kx), x ∈ [−1, 1], k ∈ N,

in C 1[−1, 1]. For any a, b ∈ R and φ, ψ ∈ C 1[−1, 1], we have

J (aφ+ bψ) =d

dx

∣∣∣x=0

[aφ(x) + bψ(x)] = aφ′(0) + bψ′(0) = aJ (φ) + bJ (ψ)

and then J is linear. On the other hand,

||φk|| = maxx∈[−1,1]

| sin(kx)|√k

=1√k,

|J (φk)| = |√k · cos(kx)|x=0| =

√k,

which imply that limφk→0 in C 1[−1,1] J (φk) = limk→∞ J (φk) = +∞. Hence, J is

not continuous at the zero vector and then is not continuous on C 1[−1, 1].

2. A fundamental necessary condition for an extremum

In this section we introduce the Gateaux1 variation and the Frechet differentialof a functional.

2.1. Gateaux variation. Let D be a fixed nonempty subset of a normed vectorspace X , and let J be a functional defined on D .

Definition 2.1. A vector x∗ ∈ D is said to be a maximum vector in D for Jif J (x) ≤ J (x∗) for all vectors x ∈ D . The vector x∗ in D is a local maximumvector in D for J if there is some ball Bρ(x

∗) in X centered at x∗ such thatJ (x) ≤ J (x∗) for all vectors x ∈ D ∩Bρ(x

∗). If D is an open subset of X , werequire that the ball Bρ(x

∗) to be contained in D . A local minimum vector inD for J is defined similarly.

We say that x∗ is a local extremum vector in D for J if x∗ is either a localmaximum vector or a local minimum vector, and in this case we say that J has alocal extremum at x∗. The functional value J (x∗) is said to be a local extremevalue for J in D .

Definition 2.2. A functional J defined on an open subset D of a normed vectorspace X is said to have a Gateaux variation at a vector x in D whenever thereis a function δJ (x) with values δJ (x;h) defined for all vectors h in X and suchthat

(2.1.1) limε→0

J (x+ εh)− J (x)

ε= δJ (x;h)

1Rene Eugene Gateaux (1889–1914): A French mathematician who is known for the Gateauxderivative. Part of his work has been posthumously published by Paul Levy. Gateaux was killed

during World War I. The word “Gateaux” is pronounced as gah/toh.

12 YI LI

holds for every vector h in X . The functional δJ (x) is called the Gateaux vari-ation of J at x. Note that δJ (x) : X → R.

Theorem 2.3. If a functional J defined on an open set D contained in a normedvector space X has a local extremum at a vector x∗ in D , and if J has a Gateauxvariation at x∗, then the Gateaux variation of J at x∗ must vanish; that is,

(2.1.2) δJ (x∗;h) = 0

for all vectors h in X .

Proof. Without loss of generality, we may assume that x∗ is a local minimum vectorin D for J . If h ∈X , then

J (x∗ + εh)− J (x∗) ≥ 0

holds for all sufficiently small number ε > 0, since x∗ + εh ∈ D by the openness ofD . Hence

J (x∗ + εh)− J (x∗)

ε≥ 0

for all sufficiently small number ε > 0. Hence

lim infε→0+

J (x∗ + εh)− J (x∗)

ε≥ 0.

Similarly,

lim supε→0−

J (x∗ + εh)− J (x∗)

ε≤ 0.

Since J has a Gateaux variation at x∗, it follows that (2.1.2) holds.

Remark 2.4. (1) Let J be a real-valued function defined in some open intervalD = (a, b) ⊂ R. If J is differentiable at x, then J has a variation at x given asδJ (x;h) = J ′(x)h for any h ∈ R. Hence (2.1.2) is equivalent to J ′(x∗) = 0.

(2) Let J be a real-valued function defined in some open region D ⊂ Rn. If Jhas continuous first-order partial derivatives at x, denoted as Jxi(x) + ∂J (x)/∂xifor i = 1, · · · , n, then J has a variation at x given as

(2.1.3) δJ (x;h) =n∑i=1

Jxi(x)hi

for any vector h = (h1, · · · , hn) ∈ Rn. Hence (2.1.2) is equivalent to Jxi(x∗) = 0for i = 1, · · · , n.

(3) (2.1.2) is a necessary condition but may not be a sufficient condition for J (x∗)being a local extreme value for J in D . Indeed, (2.1.2) may hold also at certainnon-extremum vectors x∗ such as saddle points or certain inflection points ofJ in D .

• (Inflection points) Let J (x) = x3, x ∈ R. The variation of J vanishes atx∗ = 0, but, the point x∗ = 0 is not a local extremum vector in R for J .

• (Saddle points) Let J (x) = x22−x2

1, where x = (x1, x2) ∈ R2. The variationof J vanishes at x∗ = (0, 0), but, the point x∗ = (0, 0) is not a localextremum vector in R2 for J .

(4) The limit (2.1.1) is unique if it exists, hence a functional can have at mostone variation at x.


(5) The value of the variation is the ordinary derivative of the function J (x+εh)considered as a function of the real number ε and evaluated at ε = 0; i.e.,

(2.1.4) δJ (x;h) =d

dε

∣∣∣ε=0J (x+ εh).

(6) The variation satisfy the homogeneity relation

(2.1.5) δJ (x; ah) = a · δJ (x;h)

for any a ∈ R.(7) We usually use the symbol “∆x” rather than h to denote the second argument

in the variation δJ (x; ∆x):

(2.1.6) δJ (x; ∆x) = limε→0

J (x+ ε∆x)− J (x)

ε=

d

dε

∣∣∣ε=0J (x+ ε∆x).

(8) Let the functional J be defined on an open set D in a normed vector spaceX by

J (x) = K(x)L(x), x ∈ D ,

where K and L are given functionals on D which are known to have variations ata vector x0 ∈ D . Then

(2.1.7) δJ (x0;h) = K(x0)δL(x0;h) + δK(x0;h)L(x0)

for any vector h ∈X .(9) Let K and L be as in (8) and define

J (x) =K(x)

L(x)

for any x ∈ D for which L(x) 6= 0. If K and L are known to have variations at avector x0 ∈ D at which L(x0) 6= 0, then

(2.1.8) δJ (x0;h) =L(x0)δK(x0;h)−K(x0)δL(x0;h)

L(x0)2

for any vector h ∈X .

Example 2.5. (1) Consider the functional

L(φ) :=

∫ π/2

0

[2φ(x)3 + 9(sin(x))φ(x)2 + 12(sin2(x))φ(x)− cos(x)]dx

for any function φ ∈ C 0[0, π/2]. Since

L(φ+ εψ) =

∫ π/2

0

[2(φ(x) + εψ(x))3 + 9 sin(x)(φ(x) + εψ(x))2

+12 sin2(x)(φ(x) + εψ(x))− cos(x)]dx

= L(φ) +

∫ π/2

0

[6εφ(x)2ψ(x) + 6εφ(x)ψ(x)2 + 2ε3ψ(x)3

+18ε sin(x)φ(x)ψ(x) + 9ε2 sin(x)ψ(x)2 + 12ε sin2(x)ψ(x)]dx,

we obtain

L(φ+ εψ)− L(φ)

ε=

∫ π/2

0

[6φ(x)2ψ(x) + 18 sin(x)φ(x)ψ(x) + 12 sin2(x)ψ(x)]dx

+ ε

∫ π/2

0

[6φ(x)ψ(x)2 + 2εψ(x)3 +9 sin(x)ψ(x)2]dx

14 YI LI

and hence

δL(φ;ψ) =

∫ π/2

0

[6φ(x)2ψ(x) + 18 sin(x)φ(x)ψ(x) + 12 sin2(x)ψ(x)]dx.

If φ∗ ∈ C 0[0, π/2] is a local extremum vector, then by Theorem 2.3 we concludethat

0 = 6φ∗(x)2 + 18 sin(x)φ∗(x) + 12 sin2(x) = 6[φ∗(x) + 2 sin(x)][φ∗(x) + sin(x)].

(2) Consider the functional J defined in Remark 1.7. We have shown that Jis not continuous on C 0[0, 1]. However, J has a Gateaux variation at each vectorφ ∈ C 0[0, 1] as

δJ (φ; ∆φ) = limε→0

J (φ+ ε∆φ)− J (φ)

ε= limε→0

ε∆φ(0)

ε= ∆φ(0).

(3) Let J be defined for any vector x = (x1, x2) ∈ R2 by

J (x) =

x1x

22

x21+x4

2, x1 6= 0,

0, x1 = 0.

From J (0) = 0 we have

δJ (0;h) = limε→0

J (0 + εh)− J (0)

ε=

limε→0

ε2h1h22

ε2h21+ε4h4

2, h1 6= 0,

0, h1 = 0.

=

h22

h1, h1 6= 0,

0, h1 = 0.

However,

lim(x2

2,x2)→0J (x) = lim

x2→0

x42

x42 + x4

2

=1

26= 0.

Thus J has a Gateaux variation at the origin 0 = (0, 0) but is not continuous at0 = (0, 0).

Remark 2.6. A functional J is continuous along each fixed direction at x if

(2.1.9) limε→0J (x+ εh) = J (x)

for each fixed vector h ∈X .

(a) If J has a Gateaux variation at x ∈ X , then J is continuous along eachfixed direction at x. Indeed,

limε→0

[J (x+ εh)− J (x)] = limε→0

J (x+ εh)− J (x)

ε· ε = δJ (x;h) · lim

ε→0ε = 0.

(b) The functional defined in Example 2.5 (3) is continuous along each fixeddirection at the origin 0 = (0, 0). Since the functional J has a Gateauxvariation at 0, it follows from (a) that J is continuous along each fixeddirection at 0. Directly computation shows that

limε→0J (εh) =

limε→0

ε3h1h22

ε2h21+ε4h4

2= limε

εh1h22

h21+ε2h4

2, h1 6= 0,

0, h1 = 0.= 0.


(c) The functional defined in Example 2.5 (2) is continuous along each fixeddirection at the each vector φ ∈ C 0[0, 1]. By Example 2.5 (2), we haveshown that

J (φ+ ε∆φ)− J (φ) = ε∆φ(0)

which tends to 0 as ε→ 0.

2.2. Examples. In this subsection we will calculate the variations of the function-als considered in Subsection 1.3.

(1) Cost functional. Consider the cost functional C defined by (1.3.23) and(1.3.24) on the vector space C 0[0, T ]:

C(P ) =

∫ T

0

β2[IP (t)− i(t)]2 + [P (t)− p(t)]2

dt,

IP (t) = e−αt(I0 +

∫ t

0

eατ [P (τ)− S(τ)]dτ

).

Since

IP+ε∆P (t) = e−αt(I0 +

∫ t

0

eατ [P (τ) + ε∆P (τ)− S(τ)]dτ

)= IP (t) + εe−αt

∫ t

0

eατ∆P (τ)dτ,

it follows that

C(P + ε∆P ) =

∫ T

0

β2[IP+ε∆P (t)− i(t)]2 + [P (t) + ε∆P (t)− p(t)]2

dt

= C(P ) + 2ε

∫ T

0

β2[IP (t)− i(t)]e−αt

∫ t

0

eατ∆P (τ)dτ

+[P (t)− p(t)]∆P (t)dt

+ ε2∫ T

0

β2

[e−αt

∫ t

0

eατ∆P (τ)dτ

]2

+ [∆P (t)]2

dt

and therefore

δC(P ; ∆P ) = 2

∫ T

0

β2[IP (t)− i(t)]e−αt

∫ t

0

eατ∆P (τ)dτ

+[P (t)− p(t)]∆P (t)dt(2.2.1)

for any vector ∆P ∈ C 0[0, T ].(2) Area functional. Recall the area functional defined in (1.3.11):

A(α) =1

2

∫ T

0

v0 · sin[α(t)]

[x0 + w0t+ v0

∫ t

0

cos[α(τ)]dτ

]−[v0 · cos[α(t)] + w0]

[y0 + v0

∫ t

0

sin[α(τ)]dτ

]dt,

16 YI LI

where α ∈ C 0[0, T ]. Then

A(α+ ε∆α)

=1

2

∫ T

0

v0 · sin[α(t) + ε∆α(t)]

[x0 + w0t+ v0

∫ t

0

cos[α(τ) + ε∆α(τ)]dτ

]−[w0 + v0 · cos[α(t) + ε∆α(t)]]

[y0 + v0

∫ t

0

sin[α(τ) + ε∆α(τ)]dτ

]dt

+∫ T

0

f(t, ε)dt

for any ε > 0 and any vectors α,∆α ∈ C 0[0, T ]. Since

(2.2.2)d

dε

∫ T

0

f(t, ε)dt =

∫ T

0

∂f(t, ε)

∂εdt,

it follows that

(2.2.3)d

dεA(α+ ε∆α) =

∫ T

0

∂f(t, ε)

∂εdt.

Now

∂f(t, ε)

∂ε

∣∣∣ε=0

=v0∆α(t)

2

cos[α(t)]

[x0 + w0t+ v0

∫ t

0

cos[α(τ)]dτ

]+ sin[α(t)]

[y0 + v0

∫ t

0

sin[α(τ)]dτ

]+v0

2sin[α(t)]

[−v0

∫ t

0

∆α(τ) · sin[α(τ)]dτ

]−[w0

2+v0

2· cos[α(t)]

] [v0

∫ t

0

∆α(τ) · cos[α(τ)]dτ

]=

v20

2

∫ t

0

cos[α(t)− α(τ)][∆α(t)−∆α(τ)]dτ

− v0w0

2

∫ t

0

∆α(τ) · cos[α(τ)]dτ

+v0

2[(x0 + w0t) cos[α(t)] + y0 sin[α(t)]]∆α(t),

so that

δA(α; ∆α) =v2

0

2

∫ T

0

∫ t

0

cos[α(t)− α(τ)][∆α(t)−∆α(τ)]dτdt

− v0w0

2

∫ T

0

∫ t

0

∆α(τ) · cos[α(τ)]dτ(2.2.4)

+v0

2

∫ T

0

[(x0 + w0t) cos[α(t)] + y0 sin[α(t)]]dt.

(3) Brachistochrone functional and transit time functional. Recall theBrachistochrone functional defined in (1.3.3)

T (Y ) =

∫ x1

x0

√1 + Y ′(x)2

2g[y0 − Y (x)]dx


for any Y ∈ D(T ) = Y ∈ C 1[x0, x1] : Y (x0) = y0 and Y (x1) = y1, and thetransit time functional defined in (1.3.17)

T (Y ) =

∫ `

0

√1− e(x)2 + Y ′(x)2 − e(x)Y ′(x)

v0[1− e(x)2]dx

for any Y ∈ C 1[0, `].Those two functionals are special examples of a wider class of functionals

(2.2.5) J (Y ) =

∫ x1

x0

F (x, Y (x), Y ′(x))dx,

where the function F = F (x, y, z) is a specified given function defined in some openset of R3. For instance, in the case of Brachistochrone functional, we have

(2.2.6) F (x, y, z) =

√1 + z2

2g(y0 − y),

while in the case of transit time functional, we have

(2.2.7) F (x, y, z) =

√1− e(x)2 + z2 − e(x)z

v0[1− e(x)2].

We now assume that the functional J is defined by (2.2.5) for all vectors Y insome open subset D of the normed vector space C 1[x0, x1] with a suitable norm.It is easily to see that(2.2.8)

δJ (Y ; ∆Y ) =

∫ x1

x0

[Fy(x, Y (x), Y ′(x))∆Y (x) + Fz(x, Y (x), Y ′(x))∆Y ′(x)] dx

for any vector Y in the domain D of J and any vector ∆Y ∈ C 1[x0, x1]. Hence,for the Brachistochrone functional, we have

δT (Y ; ∆Y ) =

∫ x1

x0

1

2[y0 − Y (x)]

√1 + Y ′(x)2

2g[y0 − Y (x)]∆Y (x)

+Y ′(x)∆Y ′(x)√

2g[y0 − Y (x)][1 + Y ′(x)2]

dx;(2.2.9)

for the transit time functional, we have

(2.2.10) δT (Y ; ∆Y ) =

∫ `

0

Y ′(x)− e(x)√

1− e(x)2 + Y ′(x)2

v0[1− e(x)2]√

1− e(x)2 + Y ′(x)2∆Y ′(x)dx.

Example 2.7. The functional

A(Y ) = 2π

∫ x1

x0

Y (x)√

1 + Y ′(x)2dx

gives the area of the surface of revolution obtained by rotating the curve γ aboutthe x-axis, where γ is given as γ : y = Y (x) for x ∈ [x0, x1]. Then

δA(Y ; ∆Y ) = 2π

∫ x1

x0

(∆Y

√1 + Y ′(x)2 + Y (x)

Y ′(x)∆Y (x)√1 + Y ′(x)2

)dx

= 2π

∫ x1

x0

1 + Y ′(x)2 + Y (x)Y ′(x)√1 + Y ′(x)2

∆Y (x)dx

18 YI LI

from which we get that a local extremum Y ∗ satisfies 1+Y ∗′(x)2+Y ∗(x)Y ∗′(x) = 0.

2.3. An optimization problem in production planning. If P ∗ is local ex-tremum vector in C 0[0, T ] for C, then by Theorem 2.3 and (2.2.1), we have

0 =

∫ T

0

β2[IP∗(t)− i(t)]e−αt

∫ t

0

eατ∆P (τ)dτ + [P ∗(t)− p(t)]∆P (t)

dt

=

∫ T

0

P ∗(t)− p(t) + β2eαt

∫ T

t

e−ατ [IP∗(τ)− i(τ)]dτ

∆P (t)dt(2.3.1)

for any ∆P ∈ C 0[0, T ], where

IP∗(t) = e−αt(I0 +

∫ t

0

eατ [P ∗(τ)− S(τ)]dτ

).

Hence P ∗ satisfies

(2.3.2) P ∗(t)− p(t) + β2eαt∫ T

t

e−ατ [IP∗(τ)− i(τ)]dτ = 0, t ∈ [0, T ].

Using the following fact that

(2.3.3)d

dt

∫ T

t

h(τ)dτ = −h(t)

for any h ∈ C [0, T ], we conclude that

(2.3.4)d

dt[P ∗(t)− p(t)] = β2[IP∗(t)− i(t)] + α[P ∗(t)− p(t)]

and

d2

dt2[P ∗(t)− p(t)] = α2[P ∗(t)− p(t)]

+ β2

d

dt[IP∗(t)− i(t)] + α[IP∗(t)− i(t)]

.(2.3.5)

On the other hand, (1.3.19) and (1.3.20) imply that

(2.3.6)d

dt[IP∗(t)− i(t)] + α[IP∗(t)− i(t)] = P ∗(t)− p(t),

so that (2.3.6) becomes

(2.3.7)d2

dt2[P ∗(t)− p(t)] = (α2 + β2)[P ∗(t)− p(t)], t ∈ [0, T ].

The general solution of (2.3.7) is

(2.3.8) P ∗(t)− p(t) = Aeγt +Be−γt, γ =√α2 + β2,

for any A,B ∈ R.However, letting t = T in (2.3.3) yields

(2.3.9) P ∗(T )− p(T ) = 0,

and, letting t = 0 in (2.3.5) yields

(2.3.10)d

dt

∣∣∣t=0

[P ∗(t)− p(t)]− α[P ∗(0)− p(0)] = β2(I0 − i(0))


where I0 + IP∗(0) and i0 + i(0) are known constants. The necessary conditions(2.3.9) and (2.3.10) let us determine the constants A and B as

(2.3.11) A =β2(I0 − i0)e−γT

(γ + α)eγT + (γ − α)e−γT, B =

−β2(I0 − i0)eγT

(γ + α)eγT + (γ − α)e−γT.

Plugging (2.3.11) into (2.3.8) we arrive at

(2.3.12) P ∗(t) = p(t) + β2(i0 − I0)eγ(T−t) − e−γ(T−t)

(γ + α)eγT + (γ − α)e−γT.

Furthermore,

IP∗(t) = i(t) + (I0 − i0)(γ + α)eγ(T−t) + (γ − α)e−γ(T−t)

(γ + α)eγT + (γ − α)e−γT,(2.3.13)

C(P ∗) = β2(I0 − i0)2 eγT − e−γT

(γ + α)eγT + (γ − α)e−γT.(2.3.14)

Finally we check that P ∗ gives a minimum value to the cost functional C. Thecomputation in Subsection 2.2, we have

C(P ∗ +Q)− C(P ∗)

+ 2

∫ T

0

β2[IP∗(t)− i(t)]e−αt

∫ t

0

eατQ(τ)dτ + [P (t)− p(t)]Q(t)

dt

+

∫ T

0

β2

[e−αt

∫ t

0

eατQ(τ)dτ

]2

+Q(t)2

dt

for any Q ∈ C 0[0, T ]; using (2.3.1) we conclude that

(2.3.15) C(P ∗ +Q)− C(P ∗) =

∫ T

0

β2

[e−αt

∫ t

0

eατQ(τ)dτ

]2

+Q(t)2

dt ≥ 0

for any Q ∈ C 0[0, T ]. Thus the cost functional C has a minimum in C 0[0, T ] at thevector P ∗.

2.4. Frechet differential. We say that a functional J defined on an open subsetD of a normed vector space X is Frechet differential2 or differential at a vectorx ∈ D whenever there is a continuous linear functional dJ (x) with values dJ (x;h)defined for all vectors h ∈X and for which

(2.4.1) limh→0 in X

J (x+ h)− J (x)− dJ (x;h)

||h||= 0

holds. The continuous linear functional dJ (x) is called the (Frechet) differentialof J at x. If J is differentiable at each vector x ∈ D , we say that J is differentiableon D .

Remark 2.8. (1) If we let

(2.4.2) E(x;h) :=J (x+ h)− J (x)− dJ (x;h)

||h||for any nonzero vector h ∈X , then (2.4.1) is equivalent to

(2.4.3) limh→0 in X

E(x;h) = 0.

2Maurice Frechet (1878–1973): a French mathematician. He made major contributions to thetopology of point sets and introduced the entire concept of metric spaces.

20 YI LI

(2) If a functional J is differentiable at x, then the variation of J at x existsand is equal to the differentiable at x,

(2.4.4) δJ (x;h) = dJ (x;h)

for all h ∈X . In fact,

J (x+ εh)− J (x)

ε=dJ (x; εh) + E(x; εh)||εh||

ε= dJ (x;h) + E(x; εh)||h|| |ε|

ε.

Letting ε→ 0 yields (2.4.4).(3) A functional J may have a variation at a vector x even if J is not differ-

entiable at x. For example, the functional J defined in Remark 1.7 (i) fails to bedifferentiable at each fixed vector φ ∈ C 0[0, 1] equipped with the L2-norm.

3. Euler-Lagrange necessary condition for an extremum withconstraints

3.1. Extremum problems. Let X be a normed vector space and D an opensubset of X . For two functionals J and K which are defined and have variationson D , consider the problem:

(3.1.1) find extremum vectors for J in D [K = k0] := x ∈ D : K(x) = k0 6= ∅,where k0 is some specified fixed number.

Remark 3.1. (1) Note that the set D [K = k0] may not open so that we can notuse Theorem 2.3.

(2) The variation of J need not vanish at a local extremum vector x∗ ∈ D [K =k0] if this set is not open in X .

Example 3.2. Consider

J (x) = x2, K(x) = x2 − 1, x ∈ D = R.

Since D [K = 0] = −1, 1, it follows that J attains its maximum and minimum at−1 and 1. However, the variation of J fails to vanish at each point in D [K = 0].

3.2. Weak continuity. If J is a functional which has a variation on an open setD contained in a normed vector space X , and if for some vector x ∈ D ,


δJ (y; ∆x) = δJ (x; ∆x)

holds for every vector ∆x ∈ X , then we say that the variation of J is weaklycontinuous at x.

Remark 3.3. (1) The variation of J is weakly continuous at x is equivalent tosaying that, for each fixed vector ∆x ∈ X , the variation δJ (y; ∆x) is continuousat y = x.

(2) Let J be defined in Remark 2.4 (1), then the variation of J is weaklycontinuous at x if and only if the function J is continuously differentiable at x.

(3) Let J be defined in Remark 2.4 (2), then the variation of J is weaklycontinuous at x if and only if the function J has continuous first-order partialderivatives at x.

(4) There is a functional J such that it has a weakly continuous variation eventhough J is not itself continuous. Consider the functional J defined in Remark 1.7(i). Since δJ (φ; ∆φ) = ∆φ, it follows that the variation of J is weakly continuous,but, we have showed that J is not continuous.


Example 3.4. The variation of the cost functional C is weakly continuous onC 0[0, T ]. From (2.2.1), we have

δC(Q; ∆P )− δC(P ; ∆P ) = 2

∫ T

0

[IQ(t)− IP (t)]e−αt∫ t

0

eατ∆P (τ)dτ

+ 2

∫ T

0

[Q(t)− P (t)]∆P (t)dt

for any vectors Q,P,∆P ∈ C 0[0, T ]. Hence

|δC(Q; ∆P )− δC(P ; ∆P )| ≤ 2β2

∫ T

0

|IQ(t)− IP (t)|e−αt∫ t

0

eατ |∆P (τ)|dτdt

+ 2

∫ T

0

|Q(t)− P (t)||∆P (t)|dt;

however,

|IQ(t)− IP (t)| ≤ ||Q− P ||e−αt∫ t

0

eατdτ = ||Q− P ||C 0[0,T ]1− e−αt

α,

where we used the uniform norm on C 0[0, T ]:

||Q− P ||C 0[0,T ] = maxt∈[0,T ]

|Q(t)− P (t)|.

Consequently,

|δC(Q; ∆P )− δC(P ; ∆P )|

≤ 2

∫ T

0

[β2

(e−αt − e−2αt

α

)∫ t

0

eατ |∆P (τ)|dτ + |∆P (t)|]dt||Q− P ||C 0[0,T ].

Thus

limQ→P in C 0[0,T ]

δC(Q; ∆P ) = δC(P ; ∆P ).

3.3. Euler-Lagrange multiplier theorem for a single constraint. The ex-tremum problem (3.1.1) can be solved by the following

Theorem 3.5. Let J and K be functionals which are defined and have variationson an open subset D of a normed vector space X , and let x∗ be a local extremumvector in D [K = k0] for J , where k0 is any given fixed number for which the setD [K = k0] is nonempty. Assume that both the variation of J and the variation ofK are weakly continuous near x∗. Then at least one of the following two possibilitiesmust hold:

(i) The variation of K at x∗ vanishes identically, i.e.,

(3.3.1) δK(x∗; ∆x) = 0

for every vector ∆x ∈X ; or(ii) The variation of J is a constant multiple of the variation of K at x∗, i.e.,

there is a constant λ such that

(3.3.2) δJ (x∗; ∆x) = λδK(x∗; ∆x)

for every vector ∆x ∈X .

22 YI LI

Example 3.6. Let

J (x) = x2, K(x) = x2 + 2x+3

4, x ∈ D = R.

We shall find extremum vectors in D [K = 0] for J . Note that

(3.3.3) δJ (x; ∆x) = 2x∆x, δK(x; ∆x) = 2(x+ 1)∆x, ∆x ∈ R.

From (3.3.3) it is easily to see that the variations of J and K are weakly continuouson R. Letting δK(x∗; ∆x) = 0, we find that x∗ = −1 /∈ D [K = 0]. Hence we needonly consider the second possibility of Theorem 3.5. By (3.3.2), we have

2x∗∆x = 2λ(x∗ + 1)∆x

or

x∗ − λ(x∗ + 1) = 0.

Hence

x∗ =λ

1− λ, λ 6= 1.

Since K(x∗) = 0, it follows that

λ2 − 2λ− 3 = 0,

from which λ = −1 or λ = 3 and x∗ = −1/2 or x∗ = −3/2.

Example 3.7. We consider the problem of finding the dimensions of the rectanglehaving the smallest perimeter among all rectangles with given fixed are A. Let x1

be the length and x2 the width of any such rectangle. Define

(3.3.4) J (x) = 2(x1 + x2), K(x) = x1x2

for any vector x = (x1, x2) ∈ R2. The problem is then to find a minimum point forthe function J in the open set

(3.3.5) D = x = (x1, x2) ∈ R2 : x1, x2 > 0

subject to the constraint

(3.3.6) K(x) = A.

From

δJ (x) = 2∆x1 + 2∆x2, δK(x; ∆x) = x2∆x1 + x1∆x2,

we find that δK(x∗; ∆x) = 0 implies that x∗ = 0 /∈ D [K = 0] and hence

2∆x1 + 2∆x2 = λ(x∗2∆x1 + x∗1∆x2)

which gives us x∗ = (2/λ, 2/λ). On the other hand, K(x∗ = 0, we conclude that

λ = 2/√A and x∗ = (

√A,√A).

Finally, we shall check that x∗ is a minimum vector in D [K = 0] for J . Choosingany vector ∆x ∈ R2 so that x∗ + ∆x ∈ D [K = 0], we get

A = (x∗1 + ∆x1)(x∗2 + ∆x2) = A+√A(∆x1 + ∆x2) + ∆x1∆x2,


and

J (x∗ + ∆x)− J (x∗) = 2(∆x1 + ∆x2)

= 2

(∆x1 −

√A∆x1√A+ ∆x1

)

=2(∆x1)2

√A+ ∆x1

.

Thus J (x∗ + ∆x) ≥ J (x∗) for any vector x∗ + ∆x ∈ D [K = 0].

Example 3.8. Consider the problem of minimizing the value of the functional

J (φ) =

∫ 2

1

xφ(x)2dx

on the vector space C 0[1, 2] subject to the constraint K(φ) = log 2, where thefunctional K is defined by

K(φ) =

∫ 2

1

φ(x)dx

for any φ ∈ C 0[1, 2]. Since

δJ (φ; ∆φ) = 2

∫ 2

1

xφ(x)∆φ(x)dx, δK(φ; ∆φ) =

∫ 2

1

∆φ(x)dx, ∆φ ∈ C 0[1, 2],

it follows that δK(φ; ∆φ)] = 0 has empty solution so that we need only considerthe second possibility in Theorem 3.5; that is,

2

∫ 2

1

xφ∗(x)∆φ(x)dx = λ

∫ 2

1

∆φ(x)dx

from which we get φ∗(x) = λ/2x for x ∈ [1, 2]. However, K(φ∗) = log 2, we concludethat λ = 2 and

φ∗(x) =1

x, x ∈ [1, 2].

Finally, we shall check that φ∗ is a minimum vector in C 0[1, 2][K = log 2] for J .If φ∗ + ψ ∈ C 0[1, 2][K = log 2], then∫ 2

1

(φ∗(x) + ψ(x))dx = log 2

and hence ∫ 2

1

ψ(x)dx = 0.

Using this and xφ∗(x) ≡ 1 yields

J (φ∗ + ψ)− J (φ∗) =

∫ 2

1

x[φ∗(x) + ψ(x)]2dx−∫ 2

1

xφ∗(x)2dx

=

∫ 2

1

xψ(x)2dx+ 2

∫ 2

1

ψ(x)dx

=

∫ 2

1

xψ(x)2dx.

Thus J (φ∗ + ψ) ≥ J (φ∗) whenever φ∗ + ψ ∈ C 0[1, 2][K = log 2].

24 YI LI

The proof of Theorem 3.5. We assume that equation (3.3.1) fails to hold ingeneral, so that we may choose a fixed vector ∆x ∈X such that

(3.3.7) δK(x∗,∆x) 6= 0.

Letting ∆x = ∆x in (3.3.2 yields

δJ (x∗; ∆x) = λδK(x∗; ∆x)

and then

δJ (x∗; ∆x)δK(x∗; ∆x) = δJ (x∗; ∆x)δK(x∗; ∆x).

By the above motivation, we claim that

(3.3.8) det

(δJ (x∗; ∆x) δJ (x∗; ∆y)δK(x∗; ∆x) δK(x∗; ∆y)

)= 0,

for any vectors ∆x,∆y ∈X . Letting ∆y = ∆x in (3.3.8) we have

(3.3.9) δJ (x∗; ∆x) = λδK(x∗; ∆x)

for all ∆x ∈X , where λ + δJ (x∗; ∆x)/δK(x∗; ∆x).Now we give a proof of (3.3.8). Without loss of generality, we may assume that

∆x,∆y 6= 0. Since x∗ lies in the open set D , we can find sufficiently small numbersα and β so that

x∗ + α∆x+ β∆y ∈ D .

Set

j + J(α, β) = J (x∗ + α∆x+ β∆y),(3.3.10)

k + K(α, β) = K(x∗ + α∆x+ β∆y).(3.3.11)

Since J and K are weakly continuous near x∗, we can find a number ρ > 0 suchthat both the variations of J and K are weakly continuous at each vector in

(3.3.12) U = x∗ + α∆x+ β∆y : (α, β) ∈ Uwhere U = (α, β) ∈ R2 : α2 + β2 < ρ2. Hence the mapping (j, k) maps the discU of the (α, β)-plane into the (j, k)-plane. The origin in the (α, β)-plane maps ontothe point (j0, k0) + (J (x∗), k0)

(i) The functions J and K are continuously differentiable in U and

(3.3.13) det

(∂J∂α

∂J∂β

∂K∂α

∂K∂β

)∣∣∣(α,β)=(0,0)

= det

(δJ (x∗; ∆x) δJ (x∗; ∆y)δK(x∗; ∆x) δK(x∗; ∆y)

).

For example,

∂J(α, β)

∂α= lim

ε→0

J(α+ ε, β)− J(α, β)

ε

= limε→0

J (x∗ + α∆x+ β∆y + ε∆x)− J (x∗ + α∆x+ β∆y)

ε= δJ (x∗ + α∆x+ β∆y; ∆x);

since J is weakly continuous in U , it follows that ∂J/∂α is continuous.(ii) If (3.3.8) does not hold, then we can find nonzero vectors ∆x,∆y ∈ X so

that

(3.3.14) det

(∂J∂α

∂J∂β

∂K∂α

∂K∂β

)∣∣∣(α,β)=(0,0)

6= 0.


According to the inverse function theorem, we have

(3.3.15) α = A(j, k), β = B(j, k)

in some disk V centered at the point (j0, k0) in the (j, k)-plane and satisfy

(3.3.16) J(A(j, k),B(j, k)) = j, K(A(j, k),B(j, k)) = k

for all (j, k) ∈ V . Furthermore,

(3.3.17) A(j0, k0) = B(j0, k0) = 0.

(iii) Let (j∗, k0) ∈ V and consider

(3.3.18) α∗ + A(j∗, k0), β∗ = B(j∗, k0).

From (3.3.15), (3.3.16), and (3.3.18), we conclude that

J (x∗ + α∗∆x+ β∗∆y) = J(α∗, β∗) = J(A(j∗, k0),B(j∗, k0)) = j∗,

K(x∗ + α∗∆x+ β∗∆y) = K(α∗, β∗) = K(A(j∗, k0),B(j∗, k0)) = k0.

Since ∆x,∆y are nonzero, if we choose j∗ sufficiently small to j0 but notequal to j0, we can make x∗+α∗∆x+β∗∆y ∈ D [K = k0]. This contradictsthe fact that x∗ is a local extremum vector in D [K = k0] for J and thereforeproves that (3.3.8) hold for any vectors ∆x,∆y ∈X .

Example 3.9. We consider a problem in investment planning for a person whohas a certain known annual income and some accumulated savings which he hasinvested and which earn him a known annual return. We assume that the totalavailable annual resources for consumption consist of his current annual income,his previous savings, and his current annual return on those savings which wereinvested.

Let S = S(t) denote the savings which are accumulated and invested at time t.We assume first that

(3.3.19) S = I +R− C,

where I is the annual income, R is the annual return generated by savings, and Cis the annual consumption. Set

(3.3.20) S(0) = S0

for a given nonnegative constant S0. We then assume that

(3.3.21) R = αS

for a given positive constant α. From (3.3.20) and (3.3.21) we get

(3.3.22) S − αS = I − C,

which gives us

(3.3.23) S(t) = eαtS0 + eαt∫ t

0

e−ατ [I(τ)− C(τ)]dτ.

Here we assume that the income function I = I(t) is known and the optimizationproblem will involve making a suitable choice for the unknown consumption functionC = C(t).

Finally, we assume that

(3.3.24) S(T ) = ST

26 YI LI

for a given nonnegative constant ST . Evaluating (3.3.23) at t = T yields

(3.3.25)

∫ T

0

e−αtC(t)dt = S0 − e−αTST +

∫ T

0

e−αtI(t)dt.

If we define a functional K on the vector space C 0[0, T ] by

(3.3.26) K(C) +∫ T

0

e−αtC(t)dt,

then (3.3.25) can be rewritten as

(3.3.27) K(C) = S0 − e−αTST +

∫ T

0

e−αtI(t)dt

for any function C ∈ C 0[0, T ].The optimization problem we shall consider now is to maximize the satisfaction

derived from consumption subject to the constraint (3.3.27). In general, we mayconsider some suitable measure of the satisfaction of the form

(3.3.28)

∫ T

0

F (t, C(t))dt,

where F = F (t, C) would be some suitable given function of t and C. For simplicity,we consider the form

(3.3.29) F (t, C) + e−βt log(1 + C)

for any t ≥ 0 and for any C > 0. We define a satisfaction functional S by

(3.3.30) S(C) =

∫ T

0

e−βt log[1 + C(t)]dt

for any C ∈ D = C ∈ C 0[0, T ] : C(t) > 0. If we define a constant k0 by

(3.3.31) k0 + S0 − e−αTST +

∫ T

0

e−αtI(t)dt,

then the problem is to find a maximum vector C∗ in the set D [K = k0] for S. Notethat

δK(C; ∆C) =

∫ T

0

e−αt∆C(t)dt,(3.3.32)

δS(C; ∆C) =

∫ T

0

e−βt

1 + C(t)∆C(t)dt(3.3.33)

for any vector ∆C ∈ C 0[0, T ] and for any C ∈ D . By Theorem 3.5, we must have

δS(C∗; ∆C) = λδK(C∗; ∆C)

for all vectors ∆C ∈ C 0[0, T ]. Thus,

(3.3.34)e−βt

1 + C∗(t)= λe−αt

for all t ∈ [0, T ]. Therefore

(3.3.35) C∗(t) = −1 +1

λe(α−β)t, t ∈ [0, T ].


By the constraint (3.3.27), we get

(3.3.36)1

λ=

(S0 − e−αTST +

∫ T

0

e−αtI(t)dt+1− e−αT

α

)β

1− e−βT

and(3.3.37)

C∗(t) = −1 +

(S0 − e−αTST +

∫ T

0


α

)β

1− e−βTe(α−β)t.

However, we have assumed that C∗ ∈ D so that we should furthermore imposethe conditions

α > β,(3.3.38)

S0 +

∫ T

0


α≥ e−αTST +

1− e−βT

β.(3.3.39)

Now we check that the satisfaction functional S has a maximum in D [K = k0]at the vector C∗ provided that (3.3.38) and (3.3.39) hold. Since D is open, we canchoose any function ∆C ∈ C 0[0, T ] so that C∗ + ∆C ∈ D [K = k0]. Since

S(C∗ + ∆C)− S(C∗) =

∫ T

0

e−βt log

[1 +

∆C(t)

1 + C∗(t)

]dt

=

∫ T

0

e−βt log[1 + λe(β−α)t∆C(t)

]dt,∫ T

0

e−αt∆C(t)dt = k0 −∫ T

0

e−αtC∗(t)dt = 0,

it follows that

S(C∗ + ∆C)− S(C∗) ≤∫ T

0

e−βtλe(β−α)t∆C(t)dt = 0

for any function ∆C ∈ C 0[0, T ] such that C∗ + ∆C ∈ D [K = k0].From (3.3.30) and (3.3.37) we conclude that

S(C∗) =T (β − α)e−βT

β

+

(1− e−βT

β

)[α− ββ

+ logβ

1− e−βT+ log

(k0 +

1− e−αT

α

)].(3.3.40)

Letting k + k0 we find

(3.3.41)∂S(C∗)

∂k=

1

k + [(1− e−αT )/α]

1− e−βT

β= λ.

Hence in this case the Euler-Lagrange multiplier λ gives the rate of change ofthe extreme value S(C∗) with respect to the constraint value k.

3.4. The Euler-Lagrange multiplier theorem for many constraints. LetK1, · · · ,Km be any collection of functionals which are defined and have variations onan open subset D of a normed vector space X , and let D [Ki = ki for i = 1, · · · ,m]denote the subset of D which consists of all vectors x ∈ D which simultaneouslysatisfy all the following constraints:

(3.4.1) K1(x) = k1, · · · ,Km(x) = km.

28 YI LI

Here k1, · · · , km may be any given numbers, and we assume that there is at leastone vector in D which satisfies all the constraints of (3.4.1) so that the set D [Ki =ki for i = 1, · · · ,m] is not empty.

Theorem 3.10. Let J ,K1, · · · ,Km be functionals which are defined and have vari-ations on an open subset D of a normed vector space X , and let x∗ be a localextremum vector in D [Ki = ki for i = 1, · · · ,m] for J , where k1, · · · , km are anygiven fixed numbers for which the set D [Ki = ki for i = 1, · · · ,m] is nonempty.Assume that the variation of J and the variation of each Ki (for i = 1, · · · ,m) areweakly continuous near x∗. Then at least one of the following two possibilities musthold:

(i) The following determinant vanishes identically,

(3.4.2) det

δK1(x∗; ∆x1) δK1(x∗; ∆x2) · · · δK1(x∗; ∆xm)δK2(x∗; ∆x1) δK2(x∗; ∆x2) · · · δK2(x∗; ∆xm)

......

...δKm(x∗; ∆x1) δKm(x∗; ∆x2) · · · δKm(x∗; ∆xm)

= 0

for all vectors ∆x1, · · · ,∆xm ∈X ; or(ii) The variation of J at x∗ is a linear combination of the variations ofK1, · · · ,Km at x∗, i.e., there are constants λ1, · · · , λm such that

(3.4.3) δJ (x∗; ∆x) =

m∑i=1

λiδKi(x∗; ∆x)

holds for every vector ∆x ∈X .

Theorem 3.11. Let J ,K1, · · · ,Km be functionals which are defined on an openset D contained in the normed vector space X = Rn. We assume that

(i) all those functionals are differentiable and weakly continuous on D ,(ii) the set D [Ki = ki for i = 1, · · · ,m] is nonempty for all choices of k1, · · · , km

considered,(iii) J has a local maximum or minimum in D [Ki = ki for i = 1, · · · ,m] at

some vector x∗ and the determinant (3.4.2) is not identically zero. ThenTheorem 3.10 and (2.4.4) imply that there are numbers λ1, · · · , λm suchthat

(3.4.4) dJ (x∗; ∆x) =

m∑i=1

λidKi(x∗; ∆x)

for every vector ∆x ∈ Rn. Any such extremum vector x∗ is written as

(3.4.5) x∗ = x∗(k1, · · · , km)

and assumed that x∗i = x∗i (k) = x∗i (k1, · · · , km), i = 1, · · · , n, are continu-ously differentiable,

(iv) the functional J (x∗(k1, · · · , km)) has continuous first-order partial deriva-tives with respect to k1, · · · , km.

Then

(3.4.6)∂

∂kiJ (x∗(k1, · · · , km)) = λi

for i = 1, · · · ,m.


Proof. By (iii), we have

(3.4.7) x∗j (k + ∆k) = x∗j (k) +

m∑i=1

∂x∗j (k)

∂ki∆ki +Rj(k; ∆k)|∆k|

for j = 1, · · · , n, where

(3.4.8) lim∆k→0 in Rn

Rj(k; ∆k) = 0.

Thus

(3.4.9) x∗(k + ∆k) = x∗(k) +

m∑i=1

∂x∗(k)

∂ki∆ki +R(k; ∆k)|∆k|.

If we denote by ei = (0, · · · , 0, 1, 0, · · · , 0) the i-th unit vector in Rm, then

(3.4.10) x∗(k + ∆kiei) = x∗(k) +∂x∗(k)

∂ki∆ki +R(k; ∆kiei)|∆ki|

for all small nonzero number ∆ki. Consequently,

J (x∗(k + ∆kiei))− J (x∗(k))

∆ki

=dJ (x∗(k); ∂x

∗(k)∂ki

∆ki +R(k; ∆kiei)|∆ki|) + E(x∗(k); ∆x)|∆x|∆ki

= dJ(x∗(k);

∂x∗(k)

∂ki

)+|∆ki|∆ki

dJ (x∗(k);R(k; ∆kiei)) +|∆x|∆kiE(x∗(k); ∆x)

where

∆x +∂x∗(k)

∂xi∆ki +R(k; ∆kiei)|∆ki|.

Letting now ∆ki → 0 yields

∂J (x∗(k))

∂ki= lim

∆ki→0

J (x∗(k + ∆kiei))− J (x∗(k))

∆ki= dJ

(x∗(k);

∂x∗(k)

∂ki

).

Hence

(3.4.11)∂J (x∗(k))

∂ki=

m∑j=1

λjdKj(x∗(k);

∂x∗(k)

∂ki

).

On the other hand, since

Kj(x∗(k + ∆kiei)) =

ki + ∆ki, j = i,

kj , j 6= i,

it follows that

dKj(x∗(k);

∂x∗(k)

∂ki

)= lim

∆ki→0

Kj(x∗(k + ∆kiei))−Kj(x∗(k))

∆ki

=

1, j = i,0, j 6= i.

Plugging it into (3.4.11), we prove (3.4.6).

30 YI LI

3.5. Chaplygin’s problem. This problem is to find a steering control α∗ ∈ C 0[0, T ]with the uniform norm that will alow an airplane to encircle a maximum area intime T while flying at constant natural speed v0 (relative to the surrounding air)and while a constant wind is blowing.

For any α ∈ C 0[0, T ], we define

K1(α) =

∫ T

0

cos[α(t)]dt,(3.5.1)

K2(α) =

∫ T

0

sin[α(t)]dt,(3.5.2)

K3(α) = α(0).(3.5.3)

Then the constraints (1.3.9) and (1.3.10) become

(3.5.4) K1(α) = −w0

v0T, K2(α) = 0, K(α) = α0.

The optimization problem is then to fins an extremum vector α∗ ∈ D [K1 =−w0

v0T, K2 = 0, K3 = α0] for the functional A defined blew, where D = C 0[0, T ].

Recall that

A(α) =1

2

∫ T

0

v0 · sin[α(t)]

[x0 + w0t+ v0

∫ t

0

cos[α(τ)]dτ

]−[v0 · cos[α(t)] + w0]

[y0 + v0

∫ t

0

sin[α(τ)]dτ

]dt.

It is easily to see that

δK1(α; ∆α) = −∫ T

0

sin[α(t)]∆α(t)dt,(3.5.5)

δK2(α; ∆α) =

∫ T

0

cos[α(t)]∆α(t)dt,(3.5.6)

δK3(α; ∆α) = ∆α(0)(3.5.7)

for any vector ∆α ∈ C 0[0, T ].

(a) The determinant

det + det

δK1(α; ∆α1) δK1(α; ∆α2) δK1(α; ∆α3)δK2(α; ∆α1) δK2(α; ∆α2) δK2(α; ∆α3)δK3(α; ∆α1) δK3(α; ∆α2) δK3(α; ∆α3)

does not vanish identically for any vectors ∆α1,∆α2,∆α3 ∈ C 0[0, T ]. Sinceany continuous function on [0, T ] can be approximated by smooth functions,it suffices to check when α is continuously differentiable with dα(t)/dt 6= 0for t ∈ [0, T ]. Without loss generality, we may furthermore assume thatα(T )− α(0) = 2πn for some nonzero integer n. Taking

∆α1(t) = sin[α(t)]dα(t)

dt, ∆α2(t) = cos[α(t)]

dα(t)

dt,


we have

δK1(α; ∆α1) = −∫ T

0

sin2[α(t)]dα(t) = −∫ α(T )

α(0)

sin2 xdx

= −(x

2− sin(2x)

4

) ∣∣∣α(T )

α(0)= −πn,

δK1(α; ∆α2) = −∫ T

0

sin[α(t)] cos[α(t)]dα(t) = −∫ α(T )

α(0)

sinx · cosxdx

= −1

2sin2 x

∣∣∣α(T )

α(0)= 0;

similarly, we have

δK2(α; ∆α1) = 0, δK2(α; ∆α2) = πn.

Hence

det = −π2n2∆α3(0) + πnα(0)

∫ T

0

cos[α(t)− α(0)]∆α3(t)dt.

Choosing ∆α3 = dα/dt yields

det = −π2n2α(0) + πnα(0)

∫ 2πn

0

cosxdx

= −π2n2α(0)

which is nonzero.(b) By Theorem 3.10, there are constants λ1, λ2, λ3 such that

(3.5.8) δA(α∗; ∆α) = λ1δK1(α∗; ∆α) + λ2δK2(α∗; ∆α) + λ3δK3(α∗; ∆α)

for any vector ∆α ∈ C 0[0, T ]. From (2.2.4), we get

v20

2

∫ T

0

∆α(t)

∫ t

0

cos[α∗(t)− α∗(τ)]dτdt

−v20

2

∫ T

0

∫ t

0

cos[α∗(t)− α∗(τ)] +

w0

v0cos[α∗(τ)]

∆α(τ)dτdt

=

∫ T

0

− sin[α∗(t)]

(λ1 +

v0y0

2

)+ cos[α∗(t)]

(λ2 −

v2

2(w0t+ x0)

)∆α(t)dt

+ λ3∆α(0);

a simple calculation shows that

λ3∆α(0) =

∫ T

0

v2

0

∫ t

0

cos[α∗(t)− α∗(τ)]dτ +(λ1 +

v0y0

2

)sin[α∗(t)]

+(−λ2 +

v0x0

2+ v0w0t

)cos[α∗(t)]

∆α(t)dt(3.5.9)

for any function ∆α ∈ C 0[0, T ]. In particular,

0 =

∫ T

0

v2

0

∫ t

0

cos[α∗(t)− α∗(τ)]dτ +(λ1 +

v0y0

2

)sin[α∗(t)]

+(−λ2 +

v0x0

2+ v0w0t

)cos[α∗(t)]

∆α(t)dt(3.5.10)

32 YI LI

for any function ∆α ∈ C 0[0, T ] with ∆α(0) = ∆α(T ) = 0. Using part (c)below, we conclude that

0 = v20

∫ t

0

cos[α∗(t)− α∗(τ)]dτ +(λ1 +

v0y0

2

)sin[α∗(t)]

+(−λ2 +

v0x0

2+ v0w0t

)cos[α∗(t)](3.5.11)

and hence

(3.5.12) λ3 = 0.

If we write α∗ = α∗(v0, w0, T, α0), then Theorem 3.11 implies that

(3.5.13)∂

∂α0A(α∗) = 0.

(c) Du Bois-Reymond’s lemma. Let f(x) be any given continuous real-valued function on [a, b] and suppose that for some nonnegative integern ∈ N ∪ 0 ∫ b

a

f(x)h(x)dx = 0

holds for all functions h ∈ C n[a, b] which vanish at the end points alongwith their derivatives of order up to and including order n,

h(k)(a) = h(k)(b) = 0, k = 0, 1, · · · , n.Then f ≡ 0 on [a, b].

Proof. Without loss of generality, we may show that f vanishes in the openinterval (a, b). Suppose that there is some interior point x∗ ∈ (a, b) forwhich f(x∗) 6= 0. We may assume that f(x∗) > 0, otherwise, we considerthe function −f .

Since f is continuous, it follows that f(x) > 0 for all x ∈ (α, β) anf forsome sufficiently small open interval (α, β) of x∗. Define

h∗(x) =

(x− α)n+1(β − x)n+1, x ∈ [α, β],

0, x ∈ [a, α) ∪ (β, b].

Then h∗ ∈ C n[a, b] and vanishes at the end points x = a and x = b alongwith all its derivatives. The assumption implies that∫ β

α

f(x)(x− α)n+1(β − x)n+1dx = 0.

However, the function f(x)(x − α)n+1(β − x)n+1 > 0 for all x ∈ (α, β),we obtain a contradiction! Hence, the original assumption f(x∗) 6= 0 isimpossible.

(d) Letting t = 0 and t = T in (3.5.11) respectively and using the constraints(3.5.4), we have(

λ1 +v0y0

2

)sinα0 +

(−λ2 +

v0x0

2

)cosα0 = 0,(3.5.14) (

λ1 +v0y0

2

)sin[α∗(T )] +

(−λ2 +

v0x0

2

)cos[α∗(T )] = 0.(3.5.15)

The determine of (3.5.14) and (3.5.15) is

det = cosα0 · sin[α∗(T )]− sinα0 · cos[α∗(T )].


If det 6= 0, then

λ1 = −v0y0

2, λ2 =

v0x0

2;

if det = 0, that is, tanα0 = tan[α∗(T )], or α∗(T ) = α0 + 2πn for somen ∈ N, we have

(3.5.16) λ1 = −v0y0

2− µv2

0 cosα0, λ2 =v0x0

2− µv2

0 sinα0

for any constant µ. Thus (3.5.16) is a general solution for (3.5.14) and(3.5.15). Plugging (3.5.16) into (3.5.11) and putting

(3.5.17) e0 +w0

v0∈ [0, 1),

we arrive at

(3.5.18)

∫ t

0

cos[α∗(t)− α∗(τ)]dτ = µ sin[α∗(t)− α0]− e0t cos[α∗(t)].

On the other hand, from (1.3.7) and the identity

cos[α∗(t)− α∗(τ)] = cos[α∗(t)] cos[α∗(τ)] + sin[α∗(t)] sin[α∗(τ)],

we have(3.5.19)∫ t

0

cos[α∗(t)− α∗(τ)]dτ =

(X(t)− x0

v0− e0t

)cos[α∗(t)] +

(Y (t)− y0

v0

)sin[α∗(t)]

(where (X(t), Y (t)) corresponds to α∗) and then(3.5.20)

(X(t)− x0 + µv0 sinα0) cos[α∗(t)] + (Y (t)− y0 − µv0 cosα0) sin[α∗(t)] = 0.

If we introduce

(3.5.21) R(t) +([X(t)− x0 + µv0 sinα0]2 + [Y (t)− y0 − µv0 cosα0]2

)1/2,

then

X(t)− x0 + µv0 sinα0 = R(t) sin[α∗(t)],(3.5.22)

Y (t)− y0 − µv0 cosα0 = −R(t) cos[α∗(t)].(3.5.23)

Since

R′(t) = sin[α∗(t)]X ′(t)− cos[α∗(t)]Y ′(t)

=Y ′(t)

v0X ′(t)− X ′(t)− w0

v0Y ′(t)

= e0Y′(t)

and R(0) = µv0, it follows that

(3.5.24) R(t) = e0Y (t) + (µv0 − e0y0).

Consequently,

(3.5.25) (X(t)− x0 + µv0 sinα0)2+(Y (t)− y0 − µv0 cosα0)

2= [e0(y−y0)+µv0]2.

The above equation can be rewritten as

(3.5.26)(x− x1)2

a2+

(y − y1)2

b2= 1,

34 YI LI

with

(x, y) = (X(t), Y (t),

(x1, y1) =

(x0 − µv0 sinα0, y0 + µv0

e0 + cosα0

1− e20

),

a = µn01 + e0 cosα0√

1− e20

,

b = µv01 + e0 cosα0

1− e20

.

The area now is given by

(3.5.27) area = πab = πµ2v20

(1 + e cosα0)2

(1− e20)3/2

.

(e) The equation (3.5.26) together with (1.3.4) implies that

(3.5.28) X ′(T ) = X ′(0), Y ′(T ) = Y (0).

Thus

(3.5.29) α∗(T ) = α0 + 2πn

for some integer n. We claim that

(3.5.30) α∗ (1 + e0 cosα∗) + 2 (α∗)2e0 sinα∗ = 0.

Differentiating (3.5.18) twice we have

−∫ t

0

sin[α∗(t)− α∗(τ)]dτα∗(t) + 1

= µ cos[α∗(t)− α0]α∗(t)− e0 cos[α∗(t)] + e0t sin[α∗(t)]α∗(t),

−∫ t

0

sin[α∗(t)− α∗(τ)]dτα∗(t)−∫ t

0

cos[α∗(t)− α∗(τ)]dτ (α∗(t))2

= −µ sin[α∗(t)− α0] (α∗(t))2

+ µ cos[α∗(t)− α0]α(t) + e0 sin[α∗(t)]α∗(t)

+ e0t sin[α∗(t)]α∗(t) + e0 sin[α∗(t)]α∗(t) + e0t cos[α∗(t)] (α∗(t))2.

Resolving the above two integrals and using (3.5.18) again we arrive at(3.5.30). We may rewrite (3.5.30) as

(3.5.31)α∗

α∗=−2α∗e0 sinα∗

1 + e0 cosα∗

or

(3.5.32)d

dtlog α∗ = 2

d

dtlog(1 + e0 cosα∗).

Letting

(3.5.33) α0 + α(0)

yields

(3.5.34) α∗ = α0

(1 + e0 cosα∗

1 + e0 cosα0

)2

.

Therefore

(3.5.35)

∫ α∗

α0

dα

(1 + e0 cosα)2=

α0t

(1 + e0 cosα0)2.


On the other hand, differentiating (3.5.18) and setting t = 0 yields

(3.5.36) α0 =1 + e0 cosα0

µ

which gives us

(3.5.37)

∫ α∗

α0

dα

(1 + e0 cosα)2=

t

µ(1 + e0 cosα0).

(f) We claim that

(3.5.38) I +∫

dα

(1 + e0 cosα)2=

(−e0 sinα

1+e0 cosα + 2√1−e20

arctan(√

1−e01+e0

tan α2

))1− e2

0

.

We first review the basic identities

sinα =2 tan α

2

1 + tan2 α2

, cosα =1− tan2 α

2

1 + tan2 α2

, tanα =2 tan α

2

1− tan2 α2

.

We first show that

(3.5.39) J +∫

dα

1 + e0 cosα=

2√1− e2

0

arctan

(√1− e0

1 + e0tan

α

2

).

To prove (3.5.39) we use the above basic identities to conclude that

J =

∫dα

1 + e01−tan2 α

2

1+tan2 α2

=

∫(1 + tan2 α

2 )dα

(1 + e0) + (1− e0) tan2 α2

=2

1− e0

∫d tan α

2(√1+e01−e0

)2

+ tan2 α2

=2

1− e0

√1− e0

1 + e0arctan

(√1− e0

1 + e0tan

α

2

)=

2√1− e2

0

arctan

(√1− e0

1 + e0tan

α

2

).

Now a basic idea to prove (3.5.38) is to lower the power for 1 + e0 cosα.From(

sinα

1 + e0 cosα

)′=

cosα(1 + e0 cosα) + e0 sin2 α

(1 + e0 cosα)2

=e0 + cosα

(1 + e0 cosα)2=

1e0

1 + e0 cosα+

e20−1e0

(1 + e0 cosα)2

so that

1

(1 + e0 cosα)2=

e0

e20 − 1

(sinα

1 + e0 cosα

)′− 1

e20 − 1

1

1 + e0 cosα

and

J =e0

e20 − 1

sinα

1 + e0 cosα− 1

e20 − 1

J.

36 YI LI

(g) From (3.5.37), (3.5.38), and (3.5.29), we have

(3.5.40) µn =(1− e2

0)1/2T

2πn(1 + e0 cosα0)

for any nonzero integer n. For convenience, we take n to be a positiveinteger n ∈ N and write

µ−n = −µn, n ∈ N.

Then

an =(1− e2

0)v0T

2πn,(3.5.41)

bn =

√1− e2

0v0T

2πn=

an√1− e2

0

,(3.5.42)

A(α∗n) =(1− e2

0)3/2(v0T )2

4πn2.(3.5.43)

Hence the maximum area among all admissible functions when n = 1 andA(α∗1) = A(α∗−1) = (1− e2

0)3/2(v0T )/4π.

3.6. John multiplier theorem. In Theorem 3.10 we considered the set D [Ki =ki for i = 1, · · · ,m], where D is a given open set in a normed vector space X .Now, we may wish to find an extremum vector for a functional J in a constraintset of the type D [Ki ≤ ki for i = 1, · · · ,m].

Theorem 3.12. (John multiplier theorem) Let J ,K1, · · · ,Km be differentiablefunctions on an open subset D of a normed vector space X , and let x∗ be a localminimum vector in D [Ki ≤ ki for i = 1, · · · ,m] for J . Assume also that thedifferential of J and the differential of each Ki are weakly continuous near x∗.Then there are nonnegative constants µ0, µ1, · · · , µm, which do not all vanish, suchthat

(3.6.1) µ0dJ (x∗; ∆x) +

m∑i=1

µidKi(x∗; ∆x) = 0

for all vectors ∆x ∈X and such that

(3.6.2) [Ki(x∗)− ki]µi = 0

for each i = 1, · · · ,m.

Proof. We give a proof when m = 1. Suppose that the vector x∗ is a local minimumvector in D [K1 ≤ k1] for J , i.e., K1(x∗) ≤ k1, and there is some ball Bρ(x

∗) in Xcentered at x∗ such that J (x∗) ≤ J (x) for all vectors x which simultaneously arein Bρ(x

∗) and satisfy K1(x) ≤ k1.We now consider two possible cases: (1) K1(x∗) = k1 and (2) K1(x∗) < k1.(1) K1(x∗) = k1. By Theorem 3.10 we have

µ0δJ (x∗; ∆x) + µ1δK1(x∗; ∆x) = 0

for all vectors ∆x ∈X , and (µ0, µ1) = (0, 1) or (1,−λ). We now claim that λ ≤ 0.According to (3.4.6) we get

λ =∂

∂k1J (x∗(k1))


and hence it suffices to show that ∂J (x∗(k1))/∂k1 ≤ 0. Since x∗ is a local minimumvector in D [K1 ≤ k1] for J , it follows that J (x∗(k1)) is increasing if k1 decreases.

(2) K1(x∗) < k1. By continuity, K1(x) < k1 hold for all vectors x in some ballBρ(x

∗) centered at x∗ and it follows that x∗ is a local minimum vector in D for J .By Theorem 2.3, we have

δJ (x∗; ∆x) = 0

for all vectors ∆x ∈ X . Hence this case can be included also in the previouscase.

4. Application I: Calculus of variations

4.1. Problems with fixed end points. We consider the problem of maximizingor minimizing the value of a functional J defined by

(4.1.1) J (Y ) +∫ x1

x0

F (x, Y (x), Y ′(x))dx

in terms of a given known function F on R3. Here Y ∈ C 1[x0, x1] and

(4.1.2) Y (x0) = y0, Y (x1) = y1

for given constants y0 and y1.Defining functionals K0 and K1 by

(4.1.3) K0(Y ) = Y (x0), K1(Y ) = Y (x1)

for any function Y ∈ C 1[x0, x1], we see that the problem is equivalent to findextremum vectors in D [K0 = y0,K1 = y1] for J (Here D = C 1[x0, x1] with anysuitable norm so that functionals have weakly continuous variations).

Note that

δK0(Y ; ∆Y ) = ∆Y (x0),(4.1.4)

δK1(Y ; ∆Y ) = ∆Y (x1),(4.1.5)

δJ (Y ; ∆Y ) =

∫ x1

x0

[Fy(x, Y (x), Y ′(x))∆Y (x)

+Fz(x, Y (x), Y ′(x))∆Y ′(x)] dx(4.1.6)

for any vectors Y,∆Y ∈ C 1[x0, x1].

(a) The determinant defined in (3.4.2) does not vanish identically for all func-tions ∆Y0,∆Y1 ∈ C 1[x0, x1]. In fact,

det = det

(δK0(Y ; ∆Y0) δK0(Y ; ∆Y1)δK1(Y ; ∆Y0) δK1(Y ; ∆Y1)

)= ∆Y0(x0)∆Y1(x1)−∆Y0(x1)∆Y1(x0);

taking ∆Y0(x) = 1 and ∆Y1(x) = x−x0

x1−x0for x ∈ [x0, x1], we find det = 1 6=

0.(b) By Theorem 3.10 we can find constants λ0, λ1 such that

(4.1.7) δJ (Y ; ∆Y ) = λ0δK0(Y ; ∆Y ) + λ1δK1(Y ; ∆Y )

for any function ∆Y ∈ C 1[x0, x1], and then∫ x1

x0

[Fy(x, Y (x), Y ′(x))∆Y (x) + Fz(x, Y (x), Y ′(x))∆Y ′(x)]dx

= λ0∆Y (x0) + λ1∆Y (x1)(4.1.8)

for any function ∆Y ∈ C 1[x0, x1].

38 YI LI

(c) (Euler-Lagrange equation) We now assume that

(4.1.9) Fz(x, Y (x), Y ′(x)) ∈ C 1[x0, x1]

as a function of x. From

d

dx[Fz(x, Y (x), Y ′(x))∆Y (x)] = Fz(x, Y (x), Y ′(x))∆Y ′(x)

+

[d

dxFz(x, Y (x), Y ′(x))

]∆Y (x),

we have∫ x1

x0

Fz(x, Y (x), Y ′(x))∆Y ′(x)dx

= Fz(x1, Y (x1), Y ′(x1))∆Y (x1)− Fz(x0, Y (x0), Y ′(x0))∆Y (x0)

−∫ x1

x0

[d


]∆Y (x)dx.

The above formula, together with (4.1.8), gives∫ x1

x0

[Fy(x, Y (x), Y ′(x))− d


]∆Y (x)dx(4.1.10)

= [λ0 + Fz(x0, y0, Y′(x0))] ∆Y (x0) + [λ1 − Fz(x1, y1, Y

′(x1))] ∆Y (x1)

for all vectors ∆Y ∈ C 1[x0, x1]. In particular,

(4.1.11)

∫ x1

x0

[Fy(x, Y (x), Y ′(x))− d


]∆Y (x)dx = 0

for all vectors ∆Y ∈ C 1[x0, x1] satisfying ∆Y (x0) = ∆Y (x1) = 0. By DuBois-Reymond’s lemma, we get

(4.1.12) Fy(x, Y (x), Y ′(x))− d

dxFz(x, Y (x), Y ′(x)) = 0

for all x ∈ [x0, x1], provided that (4.1.9) holds. The above equation (4.1.12)is called the Euler-Lagrange equation.

(d) (Du Bois-Reymond, 1879) Du Bois-Reymond derived (4.1.12) without as-suming that the function Fz(x, Y (x), Y ′(x)) is differentiable. Define

(4.1.13) g(x) +∫ x

x0

Fy(ξ, Y (ξ), Y ′(ξ))dξ.

Then g is continuous differentiable and

(4.1.14) g′(x) = Fy(x, Y (x), Y ′(x))

so that

(4.1.15)

∫ x1

x0

Fy(x, Y (x), Y ′(x))∆Y (x)dx = −∫ x1

x0

g(x)∆Y ′(x)dx

for any function ∆Y ∈ C 1[x0, x1] vanishing at the end points x0 and x1.Hence, equation (4.1.8) can be rewritten as

(4.1.16)

∫ x1

x0

[−g(x) + Fz(x, Y (x), Y ′(x))] ∆Y ′(x)dx = 0

for any function ∆Y ∈ C 1[x0, x1] vanishing at the end points x0 and x1.


We claim that the function −g(x) + Fz(x, Y (x), Y ′(x)) is everywhereconstant. Define

(4.1.17) c +1

x1 − x0

∫ x1

x0

[−g(x) + Fz(x, Y (x), Y ′(x))] dx

so that ∫ x1

x0

[−g(x) + Fz(x, Y (x), Y ′(x))− c] ∆Y ′(x)dx = 0

which must hold for all functions ∆Y ∈ C 1[x0, x1] vanishing at the endpoints x = x0 and x = x1. By Du Bois-Reymond’s lemma, it follows that

(4.1.18) − g(x) + Fz(x, Y (x), Y ′(x))− c ≡ 0

for all x ∈ [x0, x1]. From (4.1.18), we see that Fz(x, Y (x), Y ′(x)) is contin-uous differentiable and then we get (4.1.12).

(e) From (4.1.10) and (4.1.12), we have

(4.1.19) λ0 = −Fz(x0, y0, Y′(x0)), λ1 = Fz(x1, y1, Y

′(x1)).

Consequently, Theorem 3.11 shows that

(4.1.20)∂

∂y0J (Y ) = −Fz(x0, y0, Y

′(x0)),∂

∂y1J (Y ) = Fz(x1, y1, Y

′(x1)).

Example 4.1. (Shortest distance between two points/geodesics) This problem isto find a curve y = Y (x), x ∈ [x0, x1], that minimizes the distance between twogiven points P0 = (x0, y0) and P1 = (x1, y1) in R2. Let

J (Y ) +∫ x1

x0

√1 + Y ′(x)2dx

be the length of any such curve y = Y (x). By (4.1.12), we have

d

dx

[Y ′(x)√

1 + Y ′(x)2

]= 0.

This equation can be integrated to give

Y ′(x)√1 + Y ′(x)2

= constant,

and then Y ′(x) ≡ A for some constant A. Finally, we find for the extremumfunction the result Y (x) = Ax + B, where the constants A and B are determinedby

A =y2 − y1

x2 − x1, B =

y1x2 − y2x1

x2 − x1.

Example 4.2. (1) If F = F (y, z) is independent on the first argument x, then

d

dx[F (Y (x), Y ′(x))− Y ′(x)Fz(Y (x), Y ′(x))]

= Fy(Y (x), Y ′(x))Y ′(x) + Fz(Y (x), Y ′(x))Y ′′(x)

− Y ′′(x)Fz(Y (x), Y ′(x))− Y ′(x)d

dxFz(Y (x), Y ′(x))

= Y ′(x)

[Fy(Y (x), Y ′(x))− d


].

40 YI LI

Hence, if Y (x) is any solution of the Euler-Lagrange equation (4.1.12), we find that

(4.1.21)d

dx[F (Y (x), Y ′(x))− Y ′(x)Fz(Y (x), Y ′(x))] = 0,

which is equivalent to

(4.1.22) F (Y (x), Y ′(x))− Y ′(x)Fz(Y (x), Y ′(x)) = C

for some constant C.(2) If F = F (x, z) is independent on the second argument y, then the Euler-

Langrange equation (4.1.12) gives us

(4.1.23)d

dxFz(x, Y

′(x)) = 0,

which is equivalent to

(4.1.24) Fz(x, Y′(x)) = C

for some constant C.

Example 4.3. (Mimimum transit time of a boat) We take P0 = (x0, y0) = (0, 0)and P1 = (x1, y1) = (`, y1), and we then seek a curve γ connecting P0 and P1 givenas

γ : y = Y (x), x ∈ [0, `]

along with the boat can travel from P0 to P1 in minimum time. Recall (1.3.17)that

T (Y ) =

∫ `

0

F (x, Y ′(x))dx

where

F (x, Y ′(x)) =

√1− e(x)2 + Y ′(x)2 − e(x)Y ′(x)

v0[1− e(x)2].

Since

Fz(x, Y′(x)) =

Y ′(x)− e(x)√

1− e(x)2 + Y ′(x)2

v0[1− e(x)2]√

1− e(x)2 + Y ′(x)2,

it follows from (4.1.24) that

Y ′(x)− e(x)√

1− e(x)2 + Y ′(x)2

v0[1− e(x)2]√

1− e(x)2 + Y ′(x)2= C

which can be simplified as

(4.1.25) Y ′(x)2 =[e(x) +A(1− e(x)2)]2

1− 2Ae(x)−A2[1− e(x)2], A = v0C.

Hence

(4.1.26) Y (x) =

∫ x

0

e(ξ) +A[1− e(ξ)2]√1− 2Ae(ξ)−A2[1− e(ξ)2]

dξ

by imposing the constraint Y (0) = 0. Therefore

(4.1.27) Tmin =1

v0

∫ `

0

1−Ae(x)√1− 2Ae(x)−A2[1− e(x)2]

dx.

The constant A can be determined by Y (`) = y1.


Example 4.4. (1)(Brachistochrone problem) This problem is to find a minimumvector in D [K0 = y0,K1 = y1] for the functional T of (1.3.3), where D = C 1[x0, x1],and where

K0(Y ) = Y (x0) = y0, K1(Y ) = Y (x1) = y1.

Now

F (Y (x), Y ′(x)) =

√1 + Y ′(x)2

2g[y0 − Y (x)].

From

Fz(Y (x), Y ′(x)) =Y ′(x)√

2g[y0 − Y (x)][1 + Y ′(x)2],

we have from (4.1.22) that√1 + Y ′(x)2

y0 − Y (x)− Y ′(x)2√

[y0 − Y (x)][1 + Y ′(x)2]=√

2gC,

or

(4.1.28) [y0 − Y (x)][1 + Y ′(x)2] = A

with A−1 = 2gC2. Hence

(4.1.29) Y ′(x) = −

√A− [y0 − Y (x)]

y0 − Y (x).

Letting

(4.1.30) y0 − Y (x) = A

[sin

(θ(x)

2

)]2

,

we get

(4.1.31) A

[sin

(θ(x)

2

)]2dθ(x)

dx= 1

and then

(4.1.32) x = x0 +A

2(θ − sin θ).

Thus, we find for any such extremum curve γ that

(4.1.33) γ :

x = x0 + A

2 (θ − sin θ),y = y0 − A

2 (1− cos θ),

for θ ∈ [θ0, θ1]. Consequently,

Tmin =

∫ θ1

0

√(dx/dθ)2 + (dy/dθ)2

2g(y0 − y)dθ

=A

2

∫ θ1

0

√2(1− cos θ)

gA(1− cos θ)dθ(4.1.34)

=

√A

2gθ1.

By constraints

(4.1.35) A(θ1 − sin θ1) = 2(x− x1), A(1− cos θ1) = −2(y1 − y0),

42 YI LI

we can uniquely determine values of A and θ1.(2) (Brachistochrone problem through the earth) Let A and B be two fixed given

points on the surface of the earth, and we let γ be a plane curve connecting A andB passing through the earth’s interior.

Suppose that a tunnel can be dug through the earth from A to B along the pathγ, and we then consider the time of motion T required for a bead to slide withoutfriction through the tunnel from A to B,

(4.1.36) T +∫γ

ds

v,

where s measures arc length along γ, ds/dt is the rate of change of arc length withrespect to time t during the motion, and the instantaneous speed of motion v isgiven by v = ds/dt. We see the particular tunnel γ that yields the least value forT .

By conservation of energy, we have

(4.1.37)1

2mv2 +mg

r2

2ρ= mg

ρ

2,

where r is the distance of the bead from the center of the earth, ρ is the radiusof the earth, and g is the acceleration of the earth’s gravity at the surface of theearth. From (4.1.37), we have

(4.1.38) v =

√g(ρ2 − r2)

ρ.

We place a Cartesian (x, y)-coordinate plane with the origin at the center of theearth and with the positive x-axis passing through the point A, and we then let rand θ be the usual plane polar coordinates of the point (x, y), defined by

(4.1.39) x = r · cos θ, y = r · sin θ.

We then represent the curve γ in terms of polar coordinates as

(4.1.40) γ : r = R(θ) =

x = R(θ) · cos θ,y = R(θ) · sin θ, θ ∈ [0, θ1],

where θ1 is the fixed central angle determined by the given points A and B and isgiven by

(4.1.41) ρθ1 = SAB ,

and SAB is the known arc length between the given points A and B.Since

ds =

√(dx

dθ

)2

+

(dy

dθ

)2

dθ

=

√(R′(θ) · cos θ −R(θ) · sin θ)2

+ (R′(θ) · sin θ +R(θ) · cos θ)2dθ

= =√R(θ)2 +R′(θ)2dθ,

it follows from (4.1.36) and (4.1.38) that

(4.1.42) T =

√ρ

g

∫ θ1

0

√R(θ)2 +R′(θ)2

ρ2 −R(θ)2dθ +

∫ θ1

0

F (R(θ), R′(θ))dθ.


We now seek to minimize the functional (4.1.42) among all R(θ) ∈ C 1[0, θ1]which satisfy the constraints

(4.1.43) R(0) = R(θ1) = ρ.

From (4.1.22), we have

(4.1.44) F (R(θ), R′(θ))−R′(θ)Fz(R(θ), R′(θ)) = C

for some constant C. Consequently,

(4.1.45)

√R(θ)2 +R′(θ)2

ρ2 −R(θ)2− R′(θ)2√

[ρ2R(θ)2][R(θ)2 +R′(θ)2]=

√g

ρC,

which can be simplified as

(4.1.46) [R′(θ)]2 =ρ2

r21

R(θ)2 − r21

ρ2 −R(θ)2R(θ)2,

where

(4.1.47) r1 =ρ√

1 + (ρC2/g).

Hence

(4.1.48) R′(θ) =

−ρr1R(θ)

√R(θ)2−r21ρ2−R(θ)2 , θ ∈ [0, θ1/2],

ρr1R(θ)

√R(θ)2−r21ρ2−R(θ)2 , θ ∈ [θ1/2, θ1].

Letting

(4.1.49) R(θ)2 =ρ2 + r2

1

2+ρ2 − r2

1

2cos

[2ρϕ(θ)

ρ− r1

],

the equation (4.1.48) can be written as

(4.1.50) ϕ′(θ) =(ρ2 + r2

1) + (ρ2 − r21) cos[2ρϕ(θ)/(ρ− r1)]

r1(ρ+ r1)[1− cos[2ρϕ(θ)/(ρ− r1)]],

or −1 +

2ρ2

(ρ2 + r21) + (ρ2 − r2

1) cos[2ρϕ/(ρ− r1)]

ϕ′(θ) =

ρ− r1

r1.

By the initial condition ϕ(0) = 0, we get

(4.1.51) 2ρ2

∫ ϕ

0

dϕ

(ρ2 + r21) + (ρ2 − r2

1) cos[2ρϕ/(ρ− r1)]= ϕ+

ρ− r1

r1θ.

Using the formula3∫dϕ

a+ b cosϕ=

2√a2 − b2

arctan

√a2 − b2 tan(ϕ/2)

a+ b, a2 > b2,

3Using x = tan(ϕ/2) and cosϕ = 1−x2

1+x2 , we have∫dϕ

a + b cosϕ=

2

a− b

∫dx(√

a+ba−b

)2+ x2

=2

a− b

√a− b

a + btan−1

(√a− b

a + btan

ϕ

2

).

44 YI LI

we arrive at

(4.1.52)r1

ρtan

ρϕ

ρ− r1= tan

(θ +

r1ϕ

ρ− r1

).

Another initial condition ϕ(θ1) = (ρ− r1)π/ρ, we obtain from (4.1.52) that

(4.1.53) θ1 +r1

ρπ = π, r1 = ρ

(1− θ1

π

)= ρ− SAB

π.

As an exercise we can show that

(4.1.54) γ :

x = ρ+r1

2 cosϕ+ ρ−r12 cos ρ+r1ρ−r1ϕ,

y = ρ+r12 sinϕ− ρ−r1

2 sin ρ+r1ρ−r1ϕ,

ϕ ∈ [0, θ1],

and hence

(4.1.55) Tmin = θ1

√ρ(ρ+ r1)

g(ρ− r1)=

√2πSAB − S2

AB

ρg.

Example 4.5. Let P0 = (x0, y0) and P1 = (x1, y1) be two given points in the planewith x0 < x1 and y0, y1 > 0, and let γ be any curve which connects P0 and P1

given asγ : y = Y (x), x ∈ [x0, x1].

The area of the surface obtained by rotating γ about the x-axis is given by

A(Y ) := 2π

∫ x1

x0

Y (x)√

1 + Y ′(x)2dx.

Assume that Y ∈ C 1[x0, x1] with Y (x0) = y0 and Y (x1) = y1 minimizes the area

functional. Since F (y, z) = 2πy√

1 + z2, it follows from (4.1.22) that

C = 2πY (x)√

1 + Y ′(x)2 − Y ′(x)2πY (x)Y ′(x)√

1 + Y ′(x)2=

2πY (x)√1 + Y ′(x)2

.

A general solution, called catenary, of the above ordinary differential equation hasthe form

Y (x) = a · cosh

(x− ba

)for some suitable constants a and b, where cosh(t) = (et + e−t)/2 is the hyperboliccosine function. The resulting surface of revolution is called a catenoid.

(2) (Queen Dido’s problem) Let P0 = (x0, 0) and P1 = (x1, 0) be two fixedpoints on the x-axis with x0 < x1, and let ` be any given fixed length satisfyingx1 − x0 < ` < π

2 (x1 − x0). Let γ be any curve of length ` connecting P0 and P1

given asγ : y = Y (x), x ∈ [x0, x1]

with Y (x) ≥ 0. We will show that a suitable circular arc encloses the greatest areawith the x-axis among all C2-curve of length `. Consider

J (Y ) :=

∫ x1

x0

Y (x)dx,

K0(Y ) := Y (x0),

K1(Y ) := Y (x1),

K2(Y ) :=

∫ x1

x0

√1 + Y ′(x)2dx.


The first functional gives the area of Y enclosed with the x-axis, while the lastfunctional gives the length of Y . The problem now is to find a maximal vector inD [K0 = 0,K1 = 0,K2 = `] for J where D = C 2[x0, x1]. Note that

δJ (Y ; ∆Y ) =

∫ x1

x0

∆Y (x)dx,

δK0(Y ; ∆Y ) = ∆Y (x0),

δK1(Y ; ∆Y ) = ∆Y (x1),

δK2(Y ; ∆Y ) =

∫ x1

x0

Y ′(x)√1 + Y ′(x)2

∆Y ′(x)dx.

Let Y be a extremum vector in D [K0 = 0,K1 = 0,K2 = `]. To apply Theorem3.10, we should prove the determinant

det := det

δK0(Y ; ∆Y0) δK0(Y ; ∆Y1) δK0(Y ; ∆Y2)δK1(Y ; ∆Y0) δK1(Y ; ∆Y1) δK1(Y ; ∆Y2)δK2(Y ; ∆Y0) δK2(Y ; ∆Y1) δK2(Y ; ∆Y2)

does not vanish identically for all vectors ∆Y0,∆Y1,∆Y2 ∈ C 2[x0, x1]. Choosing

∆Y1(x) =

∫ x

x0

√1 + Y ′(x)2dx, ∆Y2(x) = ∆Y1(x) + 1,

then ∆Y ′1(x) = ∆Y ′2(x) =√

1 + Y ′(x)2 for any x ∈ [x0, x1] and

δK2(Y ; ∆Y1) = δK2(Y ; ∆Y2) =

∫ x1

x0

Y ′(x)dx = Y (x1)− Y (x0) = 0

since K0(Y ) = K1(Y ) = 0. Hence we have

det = −δK2(Y ; ∆Y0) · δK1(Y ; ∆Y1)

= −∫ x1

x0

Y ′(x)√1 + Y ′(x)2

∆Y ′0(x)dx ·∫ x1

x0

√1 + Y ′(x)2dx

for any vector ∆Y0 ∈ C 2[x0, x1]. If det ≡ 0, then we must have∫ x1

x0

Y ′(x)√1 + Y ′(x)2

∆Y ′0(x)dx = 0

from which Y ′(x) ≡ 0 and then ` = K2(Y ) = x1 − x0 < `, a contradiction.Therefore, det does not vanish identically. By Theorem 3.10,

δJ (Y ; ∆Y ) = λ0δK0(Y ; ∆Y ) + λ1δK1(Y ; ∆Y ) + λ2δK2(Y ; ∆Y )

for some constants λ0, λ1, and λ2. Thus∫ x1

x0

∆Y (x)dx = λ0∆Y (x0) + λ1∆Y (x1) + λ2

∫ x1

x0

Y ′(x)√1 + Y ′(x)2

∆Y ′(x)dx.

Using the identity

d

dx

[Y ′(x)√

1 + Y ′(x)2∆Y (x)

]=

Y ′(x)√1 + Y ′(x)2

∆Y ′(x) +d

dx

[Y ′(x)√

1 + Y ′(x)2

]∆Y (x)

yields ∫ x1

x0

1 + λ2

d

dx

[Y ′(x)√

1 + Y ′(x)2

]∆Y (x)dx = 0

46 YI LI

for any vector ∆Y ∈ C 2[x0, x1] with ∆Y (x0) = ∆Y (x1) = 0. By Du Bois-Reymond’s lemma, we get

1 + λ2d

dx

[Y ′(x)√

1 + Y ′(x)2

].

It is not hard to prove that

Y ′(x)2 =(x− a)2

c2 − (x− a)2

for some positive constants a and b. Integrating on both sides yields

(x− a)2 + (y − b)2 = c2

for suitable positive constants a, b, and c.(3) The following inequality

(4.1.56)

∫ 1

0

Y ′(x)2dx ≥ π2

4

∫ 1

0

Y (x)2dx

for any Y ∈ C 1[0, 1] with Y (0) = 0 and Y (1) = 1. By approximation, we mayassume that Y ∈ C 2[0, 1]. We now minimize the functional

(4.1.57) J (Y ) :=

∫ 1

0Y ′(x)2dx∫ 1

0Y (x)2dx

in the subset D [K0 = 0,K1 = 1], where D = C 2[0, 1] and

K0(Y ) := Y (0), K(1) := Y (1).

If Y is an extremum vector in D [K0 = 0,K1 = 1] for J , then

1

2

(∫ 1

0

Y (x)2dx

)2

δJ (Y ; ∆Y )

=

(∫ 1

0

Y (x)2dx

)[∫ 1

0

Y ′(x)∆Y ′(x)dx

]−(∫ 1

0

Y ′(x)2dx

)[∫ 1

0

Y (x)∆Y (x)

];

by integration by parts, we have∫ 1

0

Y ′(x)∆Y ′(x)dx = Y ′(1)∆Y (1)− Y ′(0)∆Y (0)−∫ 1

0

Y ′′(x)∆y(x)dx,

and hence

1

2

(∫ 1

0

Y (x)2dx

)2

δJ (Y ; ∆Y )

=

(∫ 1

0

Y (x)2dx

)Y ′(1)∆Y (1)−

(∫ 1

0

Y (x)2dx

)Y ′(0)∆Y (0)

−∫ 1

0

[(∫ 1

0

Y (x)2dx

)Y ′′(x) +

(∫ 1

0

Y ′(x)2dx

)Y (x)

]∆Y (x)dx.

By Theorem 3.10, we have

δJ (Y ; ∆Y ) = λ0δK(Y ; ∆Y ) + λ1δK1(Y ; ∆Y ).


Letting ∆Y (0) = ∆Y (1) = 0 yields∫ 1

0

[(∫ 1

0

Y (x)2dx

)Y ′′(x) +

(∫ 1

0

Y ′(x)2dx

)Y (x)

]∆Y (x)dx = 0.

Consequently,

(4.1.58)

(∫ 1

0

Y (x)2dx

)Y ′′(x) +

(∫ 1

0

Y ′(x)2dx

)Y (x) = 0, x ∈ [0, 1].

Multiplying by Y (x) and then integrating over [0, 1] on both sides of (4.1.58), wearrive at

0 =

(∫ 1

0

Y ′(x)2dx

)(∫ 1

0

Y (x)2dx

)+

(∫ 1

0

Y (x)2dx

)(∫ 1

0

Y (x)Y ′′(x)dx

)=

(∫ 1

0

Y ′(x)2dx

)(∫ 1

0

Y (x)2dx

)+

(∫ 1

0

Y (x)2dx

)[Y (1)Y ′(1)− Y (0)Y ′(0)−

∫ 1

0

Y ′(x)2dx

]= Y ′(1)

(∫ 1

0

Y (x)2dx

).

Since Y (1) = 1, it follows that the integral of Y (x)2 over [0, 1] is nonzero and hence

(4.1.59) Y ′(1) = 1.

Solving the ODE with the initial conditions Y (0) = 0 and Y (1) = 1, we obtain

(4.1.60) Y (x) =sin(ax)

sin a, a :=

√√√√∫ 1

0Y ′(x)2dx∫ 1

0Y (x)2dx

.

From (4.1.59) and

Y ′(x) =a

sin acos(ax),

we have

1 = Y ′(1) =a

sin acos(a) =⇒ a =

π

2+ kπ, k = 0, 1, · · · ,

since a > 0. Then

J (Y ) =(π

2+ kπ

)2∫ 1

0cos(πx/2)2dx∫ 1

0sin(πx/2)2dx

=(π

2+ kπ

)2∫ π/2

0cos(t)2dt∫ π/2

0sin(t)2dt

=(π

2+ kπ

)2

≥ π2

4,

and the minimum of J is

Jmin = J(

sin(π

2x))

=π2

4.

48 YI LI

4.2. Geodesic curves. We consider a given fixed surface S in R3, and we let P0

and P1 be any given fixed points on S.

Problem: Find a curve γ which has the shortest length among allcurves which lie on the surface S and connect P0 and P1.

Any such a curve giving the minimum distance between two fixed points of S iscalled a geodesic curve.

Example 4.6. (Geodesics on a right circular cylinder) Let x, y, z be the usualCartesian coordinates in R3 and then cylindrical coordinates r, θ, u are defined by

x = r · cos θ, y = r · sin θ, z = u.

We assume that the central axis of the cylinder S coincide with the z-axis and thenthe surface S can be given parametrically as

S :

x = a · cos θ,y = a · sin θ,z = u,

θ ∈ [0, 2π], u ∈ R.

Here a is a fixed positive constant which gives the value of the radius of the cylinder.Any curve γ lying on the surface of the cylinder S can be parametrically representedas

(4.2.1) γ :

x = a · cos θ,y = a · sin θ,z = U(θ),

θ ∈ [θ0, θ1]

for some suitable function U(θ). The constants θ0 and θ1 are the θ-coordinatesof the endpoints P0 = (x0, y0, z0) and P1 = (x1, y1, z1) of γ (we may assume thatθ0 ≤ θ1), which satisfy

(4.2.2) x0 = a · cos θ0, y0 = a · sin θ0, x1 = a · cos θ1, y1 = a · sin θ1.

Without loss of generality, we furthermore assume that

(4.2.3) 0 ≤ θ1 − θ0 ≤ π.

The length L of (4.2.1) is given by

(4.2.4) L(γ) +∫γ

∣∣∣∣dγdθ∣∣∣∣ =

∫ θ1

θ0

√(dx

dθ

)2

+

(dy

dθ

)2

+

(dz

dθ

)2

dθ.

Then

(4.2.5) L(U) = L(γ) =

∫ θ1

θ0

√a2 + U ′(θ)2dθ =

∫ θ1

θ0

F (U ′(θ))dθ,

where

(4.2.6) F (w) +√a2 + w2.

The problem of finding the geodesic curve connecting P0 = (x0, y0, z0) and P1 =(x1, y1, z1) on the cylinder can be reduced to the problem of finding the functionU = U(θ) which minimizes the length functional L(U) of (4.2.5) subject to theconstraints

(4.2.7) U(θ0) = z0, U(θ1) = z1.


By (4.1.24), the Euler-Lagrange equation becomes

(4.2.8) C = Fw(U ′(θ)) =U ′(θ)√

a2 + U ′(θ)2

for some constant C, and then

(4.2.9) U(θ) = Aθ +B

for some constants A and B. In this case the curve γ is called a circular helix.By (4.2.7), we have

(4.2.10) A =z1 − z0

θ1 − θ0, B =

θ1z0 − θ0z1

θ1 − θ0

and

(4.2.11) Lmin =

∫ θ1

θ0

√a2 +A2dθ =

√a2(θ1 − θ0)2 + (z1 − z0)2.

On the other hand, from (4.2.2) we have

cos(θ1 − θ0) = cos θ1 cos θ0 + sin θ1 sin θ0 =x0x1 + y0y1

a2, θ1 − θ0 ∈ [0, π].

Example 4.7. (Geodesics on a sphere) Consider spherical polar coordinates r, θ, φby

x = r · sinφ · cos θ,

y = r · sinφ · sin θ,z = r · cosφ,

and then the surface of a sphere S of radius a centered at the origin can be givenparametrically as

(4.2.12) S :

x = a · sinφ · cos θ,y = a · sinφ · sin θ,z = a · cosφ

, φ ∈ [0, π], θ ∈ [0, 2π].

Any curve γ on S can be written as

(4.2.13) γ :

x = a · sinφ · cos Θ(φ),y = a · cosφ · sin Θ(φ),z = a · cosφ,

, φ ∈ [φ0, φ1]

for some suitable function Θ(φ) on γ. The endpoints P0 = (x0, y0, z0) and P1 =(x1, y1, z1) of γ satisfy

(4.2.14) z0 =√x2

0 + y20 + z2

0 ·cosφ0, z1 =√x2

1 + y21 + z2

1 ·cosφ1, φ0, φ1 ∈ [0, π].

The length of γ is

(4.2.15) L(Θ) = L(γ) = a

∫ φ1

φ0

√1 + Θ′(φ)2 sin2 φdθ =

∫ φ1

φ0

F (φ,Θ′(φ))dθ

where

(4.2.16) F (φ,w) = a

√1 + w2 sin2 φ.

50 YI LI

The problem of finding the geodesic curve connecting P0 = (x0, y0, z0) and P1 =(x1, y1, z1) on the sphere can now be reduced to the problem of finding the functionΘ(φ) which minimizes the length functional L(Θ) of (4.2.15) subject to constraints

(4.2.17) tan Θ(φ0) =y0

x0, tan Θ(φ1) =

y1

x0.

By (4.1.24), the Euler-Lagrange equation becomes

(4.2.18) C = Fw(φ,Θ′(φ)) = aΘ′(φ) sin2 φ√

1 + Θ′(φ)2 sin2 φ

and then

(4.2.19) Θ′(φ) =A

sinφ√

sin2 φ−A2

for some constant A. Writing

(4.2.20) A + sinα

yields

(4.2.21) Θ′(φ) =sinα

sinφ√

sin2 φ− sin2 α.

Introducing

(4.2.22) tanφ =1

u,

we have

(4.2.23)dΘ

du= Θ′(φ)

dφ

du=

− tanα√1− u2 tan2 α

.

Hence

(4.2.24) Θ + β = cos−1 (u · tanα)

where β is a constant of integration.

4.3. Problems with variable end points. We consider the general problem ofminimizing or maximizing the functional

(4.3.1) J (x1, Y ) =

∫ x1

x0

F (x, Y (x), Y ′(x))dx

among all curves

(4.3.2) γ : y = Y (x), x ∈ [x0, x1]

which satisfy the initial constraint

(4.3.3) Y (x0) = y0

and the terminal constraint

(4.3.4) Φ(x1, Y (x1)) = 0

where Φ = Φ(x, y) is a given function.We take the vector space X as the set of all pairs (x1, Y ), where x1 ∈ R and

Y ∈ C 10 (R). If (x1, Y ), (x∗1, Y

∗) ∈X , then we define

(x1, Y ) + (x∗1, Y∗) + (x1 + x∗1, Y + Y ∗),


which again gives a vector in X . Similarly,we define the product a(x1, Y ) by

a(x1, Y ) = (ax1, aY )

for any a ∈ R and any vector (x1, Y ) ∈ X . Thus the vector space X is well-defined. We equip X with the norm || · ||X defined by

(4.3.5) ||(x1, Y )||X + |x1|+ ||Y ||C 1(R) = |x1|+ maxx∈R|Y (x)|+ max

x∈R|Y ′(x)|

for any vector (x1, Y ) ∈X .The extremum problem for the functional (4.3.1) is to seek a vector (x1, Y ) in

some given open set D ⊂ X that will maximize or minimize in D the functional(4.3.1), where the admissible vectors (x1, Y ) are also required to satisfy (4.3.3) and(4.3.4).

If we define functionals K0 and K1 on D by

K0(x1, Y ) = Y (x0),(4.3.6)

K1(x1, Y ) = Φ(x1, Y (x1)),(4.3.7)

then the extremum problem is to find extremum vectors in D [K0 = y0,K1 = 0] forthe functional (4.3.1).

Since

δJ (x1, Y ; ∆x1,∆Y ) =d

dεJ (x1 + ε∆x1, Y + ε∆Y )

∣∣∣ε=0

,

it follows that

δJ (x1, Y ; ∆x1,∆Y )

=d

dε

∣∣∣ε=0

∫ x1+ε∆x1

x0

F (x, Y (x) + ε∆Y (x), Y ′(x) + ε∆Y ′(x))dx

= F (x1, Y (x1), Y ′(x1))∆x1(4.3.8)

+

∫ x1

x0

[Fy(x, Y (x), Y ′(x))∆Y (x) + Fw(x, Y (x), Y ′(x))∆Y ′(x)] dx

for any vector (∆x1,∆Y ) ∈X . Similarly, we have

(4.3.9) δK0(x1, Y ; ∆x1,∆Y ) = ∆Y (x0),

and

δK1(x1, Y ; ∆x1,∆Y ) =d

dεK1(x1 + ε∆x1, Y + ε∆Y )

∣∣∣ε=0

=d

dε

∣∣∣ε=0

Φ(x1 + ε∆x1, Y (x1 + ε∆x1) + ε∆Y (x1 + ε∆x1))

= Φx(x1, Y (x1))∆x1(4.3.10)

+ Φy(x1, Y (x1)) [Y ′(x1)∆x1 + ∆Y (x1)]

for any vector (∆x1,∆Y ) ∈X .Suppose now that the vector (x1, Y ) is a local extremum vector in D [K0 =

y0,K1 = 0] for J . Then Theorem 3.11 implies that

(4.3.11) δJ (x1, Y ; ∆x1,∆Y ) = λ0δK0(x1, Y ; ∆x1,∆Y ) + λ1δK1(x1, Y ; ∆x1,∆Y )

for suitable constants λ0 and λ1, for all numbers ∆x1, and for all functions ∆Y ∈C 1[x0, x1] ⊂ C 1

0 (R).

52 YI LI

From (4.3.8), (4.3.9), (4.3.10), and (4.3.11), we have

F (x1, Y (x1), Y ′(x1))∆x1

+

∫ x1

x0

[Fy(x, Y (x), Y ′(x))∆Y (x) + Fz(x, Y (x), Y ′(x))∆Y ′(x)] dx

= λ0∆Y (x0) + λ1Φz(x1, Y (x1))∆x1 + λ1Φy(x1, Y (x1)) [Y ′(x1)∆x1 + ∆Y (x1)]

and then, by integration by parts,∫ x1

x0

[Fy(x, Y (x), Y ′(x))− d


]∆Y (x)dx

+ Fz(x, Y (x), Y ′(x))∆Y (x)∣∣∣x1

x0

= λ0∆Y (x0) + λ1Φy(x1, Y (x1))∆Y (x1)

+ −F (x1, Y (x1), Y ′(x1)) + λ1 [Φx(x1, Y (x1)) + Y ′(x1)Φy(x1, Y (x1))]∆x1.

Thus ∫ x1

x0

[Fy(x, Y (x), Y ′(x))− d


]∆Y (x)dx

= [λ0 + Fz(x0, y0, Y′(x0))] ∆Y (x0)

+ [λ1Φy(x1, Y (x1))− Fz(x1, Y (x1), Y ′(x1))] ∆Y (x1)(4.3.12)

+ −F (x1, Y (x1), Y ′(x1))

+λ1 [Φx(x1, Y (x1)) + Y ′(x1)Φy(x1, Y (x1))]∆x1,

which holds for all vectors (∆x1,∆Y ) ∈X if (x1, Y ) is a local extremum vector inD [K0 = y0,K1 = 0] for J .

(1) If we take ∆x1 = 0 and consider functions ∆Y (x) which vanish at the endpoints x0 and x1, we find

(4.3.13)

∫ x1

x0

[Fy(x, Y (x), Y ′(x))− d


]∆Y (x) = 0

which must hold for all continuously differentiable functions ∆Y (x) on[x0, x1] satisfying ∆Y (x0) = ∆Y (x1) = 0. By Du Bois-Reymond’s lemma,we still get (4.1.12), i.e.,

(4.3.14) Fy(x, Y (x), Y ′(x))− d

dxFz(x, Y (x), Y ′(x)) = 0.

Plugging (4.3.14) into (4.3.12) we conclude that

0 = [λ0 + Fz(x0, y0, Y′(x0))] ∆Y (x0)

+ [λ1Φy(x1, Y (x1))− Fz(x1, Y (x1), Y ′(x1))] ∆Y (x1)(4.3.15)

+ −F (x1, Y (x1), Y ′(x1))

+λ1 [Φx(x1, Y (x1)) + Y ′(x1)Φy(x1, Y (x1))]∆x1

which holds for all ∆x1 ∈ R and all continuously differentiable functions∆Y (x) on [x0, x1].

(2) Taking first ∆x1 = 0 and

∆Y (x) =x1 − xx1 − x0

,


we have

(4.3.16) λ0 + Fz(x0, y0, Y′(x0)) = 0.

Taking then ∆x1 = 0 and

∆Y (x) =x− x0

x1 − x0,

we have

(4.3.17) λ1Φy(x1, Y (x1))− Fz(x1, Y (x1), Y ′(x1)) = 0.

Finally, we choose ∆x1 = 1 and use (4.3.16) and (4.3.17), we conclude that

(4.3.18) F (x1, Y (x1), Y ′(x1)) = λ1 [Φx(x1, Y (x1)) + Y ′(x1)Φy(x1, Y (x1))] .

Eliminating λ1 in (4.3.18) by using (4.3.17), we arrive at

Φy(x1, Y (x1))F (x1, Y (x1), Y ′(x1))(4.3.19)

= Fz(x1, Y (x1), Y ′(x1)) [Φx(x1, Y (x1)) + Y ′(x1)Φy(x1, Y (x1))] ,

which is then a natural boundary condition which must hold at thevariable endpoint x = x1 for any local extremum vector (x1, Y ).

(3) (Geometric interpretation of (4.3.19)) Consider the function

(4.3.20) F (x, y, z) + f(x, y)√

1 + z2

for some given function f(x, y). Since

Fz(x, y, z) = f(x, y)z√

1 + z2,

it follows that (except possibly if f(x1, Y (x1)) = 0)

Φy(x1, Y (x1))Y ′(x1)√

1 + Y ′(x1)2

=Y ′(x1)√

1 + Y ′(x1)2[Φx(x1, Y (x1)) + Y ′(x1)Φy(x1, Y (x1))]

and then

(4.3.21) Φy(x1, Y (x1)) = Y ′(x1)Φx(x1, Y (x1)).

From this we claim that the extremum curve γ of (4.3.2) must intersect thegiven curve C : Φ(x, y) = 0 orthogonally. Indeed, the slope y′C of the curveC is given as

y′C = −ΦxΦy

and hence

(4.3.22) Y ′(x1)y′C(x1) + 1 = 0.

Example 4.8. We consider the minimizing the functional

(4.3.23) J (x1, Y ) =

∫ x1

5

√1 + Y ′(x)2

Y (x)dx

among all curves γ given as

γ : y = Y (x), x ∈ [5, x1]

54 YI LI

which join the point P0 = (5, 5) to the line C defined as

(4.3.24) C : y = x− 5.

Then

(4.3.25) F (x, y, z) =

√1 + z2

y, Fz(x, y, z) =

z

y√

1 + z2.

Since F is independent on the first argument x, then (4.3.14) reduces to (4.1.22)and therefore

(4.3.26) Y (x)2[1 + Y ′(x)2] = A2

for some constant A. The differential equation (4.3.26) gives us∫Y (x)Y ′(x)dx√A2 − Y (x)2

= ±∫dx

which implies that −√A2 − Y (x)2 = ±(x+B) and

(4.3.27) Y (x) =√A2 − (x−B)2

for another constant B. Here we have taken the positive square root in (4.3.27) soas to make the condition Y (5) = 5 possible at the point P0. Since Y (5) = 5, wehave

(4.3.28) A2 = 25 + (5−B)2;

on the other hand, (4.3.24) yields

(4.3.29) Y (x1) = x1 − 5;

finally, the natural boundary condition (4.3.19) or (4.3.21) implies that

(4.3.30) Y ′(x1) = −1.

Now, (4.3.27)–(4.3.30) gives us

Y ′(x) =B − x√

A2 − (x−B)2=

B − xY (x)

,

Y ′(x1) =B − x1

x1 − 5= −1;

thus B = 5 and A2 = 25. Consequently,

(4.3.31) Y (x) =√

25− (x− 5)2, x1 = 5 +5√2.

From (4.3.31) and (4.3.23), we have

Jmin =

∫ 5+ 5√2

5

√1 + (x−5)2

25−(x−5)2√25− (x− 5)2

dx = 5

∫ 5+ 5√2

5

dx

x(10− x)

=1

2

∫ 5+ 5√2

5

(1

x− 1

x− 10

)dx =

1

2ln

∣∣∣∣ x

x− 10

∣∣∣∣ ∣∣∣5+ 5√2

5

=1

2ln

1 + 1√2

1− 1√2

=1

2ln(3 + 2

√2).


Example 4.9. (1) (James Bernoulli’s brachistochrone problem) In Example 4.4,if we require

Y (x0) = y0, C : x = x1.

By (4.1.33) we have

(4.3.32) γ :

x = x0 + A

2 (θ − sin θ),y = y0 − A

2 (1− cos θ),θ ∈ [0, θ1].

Since Φ(x, y) = x−x1, the natural boundary condition (4.3.19) and the expressionof Fz(x, Y (x), Y ′(x)) in Example 4.4 show that

0 = Y ′(x1)

or dydθ |θ1 = 0. By (4.3.32) it yields

(4.3.33) θ1 = π, A =2(x1 − x0)

π.

Plugging (4.3.33) into (4.1.34), the minimum time is given by

(4.3.34) Tmin =

√(x1 − x0)π

g

where we assume that x1 ≥ x0.(2) Let P0 = (0, y0) be a given fixed point on the y-axis, and let P1 = (x1, 0)

represent any variable point on the x-axis, with x1 > 0 and y0 > 0. Let γ be anycurve connecting P0 and P1 given as γ : y = Y (x), 0 ≤ x ≤ x1. The area enclosedbetween γ and the coordinate axes is given by

(4.3.35) A(x1, Y ) :=

∫ x1

0

Y (x)dx,

while the surface area generated when γ is rotated about the x-axis is given by

(4.3.36) J (x1, Y ) := 2π

∫ x1

0

Y (x)√

1 + Y ′(x)2dx.

We consider the problem

Given a positive constant A, find a curve in D [K0 = y0,K1 =0,K2 = A] for J , where D = C 1(R+) and

K0(x1, Y ) := Y (0),

K1(x1, Y ) := Y (x1),

K2(x1, Y ) := A(x1, Y ).

Let

F (y, z) := 2πy√

1 + z2, Fz(y, z) =2πyz√1 + z2

.

The variation of J (x1, Y ) is given by

δJ (x1, Y ; ∆x1,∆Y ) = F (Y (x1), Y ′(x1))∆x1 +

∫ x1

0

[Fy(Y (x), Y ′(x))∆Y (x)

+ Fz(Y (x), Y ′(x))∆Y ′(x)] dx

= F (Y (x1), Y ′(x1))∆x1 + Fz(Y (x), y′(x))∆Y (x)∣∣∣x1

0

+

∫ x1

0

[Fy(Y (x), Y ′(x))− d


]∆Y (x)dx.

56 YI LI


δJ (x1, Y ; ∆x1,∆Y ) =

2∑i=0

λiδKi(x1, Y ; ∆x1,∆Y )

for some constants λi, i = 0, 1, 2. Consequently, by letting ∆x1 = ∆Y (x1) =∆Y (0) = 0,

(4.3.37) Fy(Y (x), Y ′(x))− d

dxFz(Y (x), Y ′(x)) = λ2

which implies

d

dx[F (Y (x), Y ′(x))− Y ′(x)Fz(Y (x), y′(x))− λ2Y (x)] = 0

and hence

(4.3.38) F (Y (x), Y ′(x))− y′(x)Fz(Y (x), Y ′(x)) = λ2Y (x) + C

for some constant. Therefore

Y (x)√1 + Y ′(x)2

= λ2Y (x) + C =⇒ C = 0,

and

(4.3.39) λ2 =1√

1 + Y ′(x)2=⇒ Y (x) = ax+ b

for some constants a and b. By initial conditions, we conclude that

(4.3.40)x

x1+

y

y0= 1, y = Y (x).

(3) Among all curves γ that have length ` and begin and end on the parabolay = x2, it is desired to find such a curve that bounds the greatest possible areabetween itself and the given parabola. If γ∗ is any such extremum curve, then γ∗

must be an appropriate arc of the circle of radius r centered at the point (0,−b),where r is determined by the equation

(4.3.41) 2r

(sin

`

2r

)2

= cos`

2r, 0 <

`

2r< π

and where b is then given as

(4.3.42) b =

√1 + 16r2 − 1

8.

Using the Euler-Lagrange multiplier theorem with three constraints, we can showthat any extremum curve γ given as γ : y = Y (x), x0 ≤ x ≤ x1, where x0, x1 arevariable points, must satisfy

(4.3.43)d

dx

[Y ′(x)√

1 + Y ′(x)2

]= constant, x0 < x < x1

and the natural boundary conditions (see (4.3.19) and (4.3.21))

(4.3.44) 2x0Y′(x0) = 2x1Y

′(x1) = −1

along with the specified constraints

(4.3.45) Y (x0) = x20, Y (x1) = x2

1


and

(4.3.46)

∫ x1

x0

√1 + Y ′(x)2dx = `.

Equation(4.3.43) can be integrated to give

(4.3.47) (x+ a)2 + (y + b) = r2, y = Y (x),

for suitable constants a, b, r. From (4.3.44), (4.3.45), and (4.3.47), we obtain

(4.3.48) x2i + 2axi − b = 0, i = 0, 1.

Since x0 < x1, it follows from (4.3.48) that

(4.3.49) x0 + a = −√b+ a2, x2 + a =

√b+ a2,

and hence, from (4.3.45) and (4.3.47),

(4.3.50) x0 = −x1 < 0, a = 0, x1 =√b = −x0.

Hence, (4.3.47) becomes

(4.3.51) x2 + (y + b)2 = r2,

from which we get

b+ (b+ b)2 = r2 =⇒ b =

√1 + r2 − 1

8.

Finally, (4.3.41) follows from (4.3.46):

` =

∫ √b−√b

rdx√r2 − x2

= 2r

∫ √b0

dx√r2 − x2

=⇒(

sin`

2r

)2

=b

r2,

and

b = r2

(sin

`

2r

)2

=

√1 + 16r2 − 1

8=⇒ 2r

(sin

`

2r

)2

=

√1 + 16r2 − 1

4r;

so√

1 + 16r2 − 1

8r2= 1−

(cos

`

2r

)2

=⇒(

cos`

2r

)2

=

(√1 + 16r2 − 1

4r

)2

and therefore

2r

(sin

`

2r

)2

= cos

(`

2r

).

4.4. Functionals involving several unknown functions. Consider the extremumproblem of a functional J of the form

(4.4.1) J =

∫ x1

x0

F (x, Y1(x), · · · , Yn(x), Y ′1(x), · · · , Y ′n(x))dx,

which depends on n unknown functions Y1, · · · , Yn ∈ C 1[x0, x1], with constraints

(4.4.2) Yi(x0) = ai, Yi(x1) = bi, i = 1, · · · , n,

where a1, · · · , an, b1, · · · , bn are given constants, and F = F (x, y1, · · · , yn, z1, · · · , zn).Set

(4.4.3) Y(x) = (Y1(x), · · · , Yn(x)),

58 YI LI

and

Y(x0) = a = (a1, · · · , an), Y(x1) = b = (b1, · · · , bn).

Then (4.4.1) can be written as

(4.4.4) J (Y) =

∫ x1

x0

F (x,Y(x),Y′(x))dx.

We take the domain D of the functional J to be the entire vector space X whichconsists of all vector functions Y = (Y1, · · · , Yn) whose components Yi ∈ C 1[x0, x1]for all i = 1, · · · , n. We equip X with the norm || · ||X defined by

||Y||X =

n∑i=1

(max

x∈[x0,x1]|Yi(x)|+ max

x∈[x0,x1]|Y ′i (x)|

)for any vector Y = (Y1, · · · , Yn) ∈X .

If we define functionals Ki and Li on X by

(4.4.5) Ki(Y) = Yi(x0), Li(Y) = Yi(x1), i = 1, · · · , n,

then the fixed endpoint extremum problem is to find local extremum vectors inD [Ki = ai,Li = bi, i = 1, · · · , n] for the functional J of (4.4.4), where D = X .Note that

δKi(Y; ∆Y) = ∆Yi(x0),(4.4.6)

δLi(Y; ∆Y) = ∆Yi(x1)(4.4.7)

for any vector ∆Y = (∆Y1, · · · ,∆Yn) ∈X . From

J (Y + ε∆Y) =

∫ x1

x0

F (x,Y(x) + ε∆Y(x),Y′(x) + ε∆Y′(x))dx,

we have

δJ (Y; ∆Y) =

n∑i=1

∫ x1

x0

[Fyi(x,Y(x),Y′(x))∆Yi(x)

+ Fzi(x,Y(x),Y′(x))∆Y ′i (x)] dx

which can be written as

δJ (Y; ∆Y)

=

n∑i=1

∫ x1

x0

[Fyi(x,Y(x),Y′(x))− d

dxFzi(x,Y(x),Y′(x))

]∆Yi(x)dx

+

n∑i=1

[Fwi(x1,Y(x1),Y′(x1))∆Yi(x1)− Fzi(x0,Y(x0),Y′(x0))∆Yi(x0)](4.4.8)

for any vector ∆Y = (∆Y1, · · · ,∆Yn) ∈ X . By Theorem 3.10 (where the secondcase is only true), we get

(4.4.9) δJ (Y; ∆Y) =

n∑i=1

[λiδKi(Y; ∆Y) + µiδLi(Y; ∆Y)]


for some constants λ1, · · · , λn, µ1, · · · , µn and all vectors ∆Y = (∆Y1, · · · ,∆Yn) ∈X . Plugging (4.4.5) and (4.4.8) into (4.4.9) yields

n∑i=1

∫ x1

x0

[Fyi(x,Y(x),Y′(x))− d

dxFzi(x,Y(x),Y′(x))

]∆Yi(x)dx

=

n∑i=1

[Fzi(x0,a,Y′(x0)) + λi] ∆Yi(x0)(4.4.10)

+

n∑i=1

[−Fzi(x1,b,Y′(x1)) + µi] ∆Yi(x1).

If j is any fixed integer (1 ≤ j ≤ n) we can choose each ∆Yi ≡ 0 for all i 6= j sothat ∫ x1

x0

[Fyj (x,Y(x),Y′(x))− d

dxFzj (x,Y(x),Y′(x))

]∆Yj(x)dx

=[Fzj (x0,a,Y

′(x0)) + λj]

∆Yj(x0) +[−Fzj (x1,b,Y

′(x1)) + µj]

∆Yj(x1)

which holds for every function ∆Yj ∈ C 1[x0, x1]. By Du Bois-Reymond’s lemmaand the argument used to derive (4.3.14), we can conclude that the n extremumfunctions Y1(x), · · · , Yn(x) must satisfy

(4.4.11) Fyi(x,Y(x),Y′(x))− d

dxFzi(x,Y(x),Y′(x)) = 0

for all x ∈ [x0, x1] and i = 1, · · · , n.

Example 4.10. Consider n = 2 and

(4.4.12) F (x, y1, y2, z1, z2) = −2y21 + 2y1y2 − z2

1 + z22 .

We seek to minimize or maximize the functional

J (Y) =

∫ x1

x0

F (x, Y1(x), Y2(x), Y ′1(x), Y ′2(x))dx, Y = (Y1, Y2),

among all vectors Y = (Y1, Y2) with Yi ∈ C 1[x0, x1], subject to the fixed endpointconditions

(4.4.13) Y1(x0) = a0, Y2(x0) = a2, Y1(x1) = b1, Y2(x1) = b2.

Since

Fy1(x, y1, y2, z1, z2) = −4y1 + 2y2,

Fy2(x, y1, y2, z1, z2) = 2y1,

Fz1(x, y1, y2, z1, z2) = −2z1,

Fz2(x, y1, y2, z1, z2) = 2z2,

it follows from (4.4.11) that

(4.4.14)

0 = −2Y1(x) + Y2(x) + Y ′′1 (x),0 = Y1(x)− Y ′′2 (x).

Consequently,

(4.4.15) Y(4)2 (x)− 2Y ′′2 (x) + Y2(x) = 0.

Introducing

(4.4.16) u(x) + Y ′′2 (x)− Y2(x),

60 YI LI

we have

u′′(x) = Y(4)2 (x)− Y ′′2 (x) = 2Y ′′2 (x)− Y2(x)− Y ′′2 (x)

= Y ′′1 (x)− Y2(x) = u(x).

Hence

(4.4.17) Y ′′2 (x)− Y2(x) = u(x) = Aex +Be−x

for some constants A and B. If the right-hand side of (4.4.17) is zero, then we haveY2(x) = Cex +De−x. Therefore, we may consider

(4.4.18) Y2(x) = C(x)ex +D(x)e−x

for suitable functions C(x) and D(x). From

Y ′′2 (x) = [C ′′(x) + 2C ′(x) + C(x)] ex + [D′′(x)− 2D′(x) +D(x)] e−x,

we arrive at

(4.4.19) C ′′(x) + 2C ′(x) = A, D′′(x)− 2D′(x) = B.

Solving (4.4.19) implies

C(x) =A

2x+ C0 +

C1

2e−2x,(4.4.20)

D(x) = −B2x+D0 +

D1

2e2x,(4.4.21)

for arbitrary constants C0, C1, D0, D1. Substituting (4.4.20) and (4.4.21) into (4.4.18),we find that

(4.4.22) Y2(x) =

(C0 +

D1

2

)ex +

(C1

2+D0

)e−x +

x

2

(Aex −Be−x

),

and

(4.4.23) Y2(x) =

(C0 +

D0

2+A

)ex +

(C1

2+D0 +B

)e−x +

x

2

(Aex −Be−x

).

Example 4.11. (Hamilton’s principle of stationary action) Netwon’s second law ofmotion states that the total force F = (F1, F2, F3) which acts on a particle locatedat a position y = (y1, y2, y3) will cause a motion of the particle along a curve γgiven as

(4.4.24) γ : y = Y(t),

in accordance with the vector equation

(4.4.25) mY′′ = F

where m is the mass of the particle and t denotes the time.For simplicity, we assume that the total force F is the negative of the gradient

of a real-valued function V = V (y) as

(4.4.26) F = −∇V,where V is called the potential associated with a particle of mass m located at y inthe presence of the force F. For example, the Newtonian gravitational force exertedon a particle of mass m at y due to a body of mass M located at the origin can begiven as

(4.4.27) F = −GMmy

||y||3= − GMmy

(y21 + y2

2 + y23)3/2

= −∇V,


where

(4.4.28) V = −GMm

||y||= − GMm

(y21 + y2

2 + y23)1/2

.

Here G is the Newtonian gravitational constant.The kinetic energy of such a particle moving along a path γ given as (4.4.24) is

denoted as T and is defined by

(4.4.29) T =1

2m||Y′||2 =

1

2m[Y ′1(t)2 + Y ′2(t)2 + Y ′3(t)2

].

From (4.4.26), (4.4.26), and (4.4.29), we have

d

dt(T + V ) = m〈Y′,Y′′〉+ 〈∇V,Y′〉 = 〈Y′,mY′′ +∇V 〉 = 0;

thus the total energy T + V is conserved during the motion of a particle in thepresence of a force F = −∇V .

We consider the motion of a particle beginning at a fixed point a = (a1, a2, a3)at time ta and ending at a point b = (b1, b2, b3) at time tb. Then

(4.4.30) γ : y = Y(t), t ∈ [ta, tb]

and the vector function Y = Y(t) is required to satisfy the endpoint conditions

(4.4.31) Y(ta) = a, Y(tb) = b.

For any continuously differentiable vector function Y = Y(t), consider the actionof a motion of a particle of mass m along such a path γ, following Hamilton, as

(4.4.32) A(Y) =

∫ tb

ta

[1

2m||Y′(t)||2 − V (Y(t))

]dt =

∫ tb

ta

L(Y(t),Y′(t))dt,

where

(4.4.33) L(y, z) = T (z)− V (y) =1

2m||z||2 − V (y)

is called the Lagrangian function of the motion.Hamilton’s principle of least action (for conservative forces) amounts to the

assertion that from among all (actual, determined by (4.4.31), and hypothetical,determined by (4.4.31) and continuously differentiable) motions which begin at aat time ta and end at b at time tb the particle will actually experience that motionwhich minimizes the action; i.e., the actual motion will correspond to the functionY = Y(t) which minimizes the action functional (4.4.32) among all continuouslydifferentiable functions Y = Y(t) which satisfy (4.4.31).

Then the Euler-Lagrange equation implies

(4.4.34) Lyi(Y(t),Y′(t)) =d

dtLzi(Y(t),Y′(t)), i = 1, 2, 3

for t ∈ (ta, tb). Since

Lyi(y, z) = − ∂

∂yiV (y) = Fi(y),

Lzi(Y, z) =1

2m

∂

∂zi||z||2 = mzi,

it follows that

(4.4.35) Fi(Y(t)) =d

dtmY ′i (t) = mY ′′i (t);

62 YI LI

thus F = mY′′.

As an application of Hamilton’s principle, we consider the vibration of beads onan elastic string.

Example 4.12. (Vibration problem)For simplicity, we shall consider a case involv-ing only two beads. The beads are attached to a light elastic string of length 4`which is stretched at a large tension τ between two fixed points. One bead of massm is located at the position 2`, and a heavier bead of mass 2m is located at theposition 3`.

We consider only transverse vibrations in a fixed vertical plane, and we let Y1 =Y1(t) and Y2 = Y2(t) be the perpendicular displacements of the two beads from theequilibrium position of the string.

The kinetic energy of motion of the first bead of mass m is given as

T1 =1

2mY ′1(t)2,

while the kinetic energy of motion of the second bead of mass 2m is

T2 = mY ′2(t)2.

If we neglect the mass of the light string, then the total kinetic energy of thevibrating system is

(4.4.36) T = m

1

2Y ′1(t)2 + Y ′2(t)2

We assume that potential energy of the

system due to the stretchingof the elastic string

is proportional to

the amount bywhich the string has

been stretched

.

The left half of the string is stretched by

√(2`)2 + Y1(t)2 − 2` = 2`

√1 +

(Y1(t)

2`

)2

− 1

≈ 2` · 1

2

(Y1(t)

2`

)2

=Y1(t)2

4`

since we considered only small vibrations. Similarly, the stretch of the portion ofthe string between the two beads and the remaining piece of string are

[Y2(t)− Y1(t)]2

2`, and

Y2(t)2

2`,

respectively. Therefore

(4.4.37) V = τ

Y1(t)2

4`+

[Y2(t)− Y1(t)]2

2`+Y2(t)2

2`

.

The action of the motion during a given time interval [t0, t1] is given by

(4.4.38) A(Y) =

∫ t1

t0

L(Y(t),Y′(t))dt,


where

(4.4.39) L(y, z) = m

(1

2z2

1 + z22

)− τ

[y2

1

4`+

(y2 − y1)2

2`+y2

2

2`

].

Since

Ly1(y, z) = −τ(y1

2`− y2 − y1

`

),

Ly2(y, z) = −τ(y2

`+y2 − y1

`

),

Lz1(y, z) = mz1,

Lz2(y, z) = 2mz2,

we conclude from (4.4.34) that

Y ′′1 (t) =τ

2`m(−3Y1(t) + 2Y2(t)) ,(4.4.40)

Y ′′2 (t) =τ

2`m(Y1(t)− 2Y2(t)) .(4.4.41)

To solving the system (4.4.40)–(4.4.41), consider

(4.4.42) Uα + Y1 + αY2.

Then

U ′′α =τ

2`m(−3Y1 + 2Y2 + αY1 − 2αY2)

=τ(α− 3)

2`m

[Y1 +

2(1− α)

α− 3Y2

];

choosing α such that

α =2(1− α)

α− 3,

which gives us α = 2 or −1, we have

(4.4.43) U ′′α =τ(α− 3)

2`mUα.

Let

ω +τ

2`m.

If α = 2, then

(4.4.44) U ′′2 = −ω2U2, U2(t) = a1 · cos[ω(t− θ1)];

if α = −1, then

(4.4.45) U ′′−1 = −4ω2U−1, U−1(t) = a2 · cos[2ω(t− θ2)].

Using (4.4.42), (4.4.44), and (4.4.45), we can solve Y1(t) and Y2(t).

64 YI LI

4.5. Functionals involving higher-order derivatives. Consider the functional

(4.5.1) J (Y ) =

∫ x1

x0

F (x, Y (x), Y ′(x), Y ′′(x))dx,

where the function F = F (x, y, z, w) is a specified given function defined for allpoints (x, y, z, w) in some open set in R4, and Y ∈ C 2[x0, x1]. We use the norm

||Y ||C 2[x0,x1] =

2∑i=0

maxx∈[x0,x1]

∣∣∣Y (i)(x)∣∣∣

on the vector space C 2[x0, x1]. We also assume that the functional (4.5.1) is definedin some open subset D ⊂ C 2[x0, x1].

We now consider the problem of minimizing or maximizing the functional (4.5.1)subject to the constraints

(4.5.2) Y (x0) = y0, Y (x1) = y1, Y ′(x0) = m0

for given numbers y0, y1, and m0. If we define functionals K1,K2, and K3 by

(4.5.3) K1(Y ) = Y (x0), K2(Y ) = Y (x1), K3(Y ) = Y ′(x0).

for any function Y ∈ C 2[x0, x1], then the problem is to find extremum vectors inD [K1 = y0,K2 = y1,K3 = m0] for the functional (4.5.1). Note that

δK1(Y ; ∆Y ) = ∆Y (x0),

δK2(Y ; ∆Y ) = ∆Y (x1),(4.5.4)

δK3(Y ; ∆Y ) = ∆Y ′(x0)

for any function ∆Y ∈ C 2[x0, x1], and also

δJ (Y ; ∆Y ) =

∫ x1

x0

[Fy(x, Y (x), Y ′(x), Y ′′(x))∆Y (x)

+Fz(x, Y (x), Y ′(x), Y ′′(x))∆Y ′(x)(4.5.5)

+Fw(x, Y (x), Y ′(x), Y ′′(x))∆Y ′′(x)] dx.

To apply Theorem 3.10, we need to show that

det + det

δK1(Y ; ∆Y1) δK1(Y ; ∆Y2) δK1(Y ; ∆Y3)δK2(Y ; ∆Y1) δK2(Y ; ∆Y2) δK2(Y ; ∆Y3)δK3(Y ; ∆Y1) δK3(Y ; ∆Y2) δK3(Y ; ∆Y3)

does not vanish identically. Indeed, we take

∆Y1(x) = 1, ∆Y2(x) =x− x0

x1 − x0, ∆Y3(x) =

(x1 − xx1 − x0

)2

,

then

det =

∣∣∣∣∣∣1 0 11 1 00 1

x1−x0

−2x1−x0

∣∣∣∣∣∣ =−1

x1 − x0< 0.

Consequently, if Y = Y (x) is a local extremum vector in D [K1 = y0,K2 = y1,K3 =m0], where D = C 2[x0, x1], for J , there exist constants λ1, λ2, λ3 such that

(4.5.6) δJ (Y ; ∆Y ) = λ1δK1(Y ; ∆Y ) + λ2δK2(Y ; ∆Y ) + λ3δK3(Y ; ∆Y )

holds for all vectors ∆Y ∈ C 2[x0, x1].


Plugging (4.4.4) and (4.5.5) into (4.5.6) yields

λ1∆Y (x0) + λ2∆Y (x1) + λ3∆Y ′(x0)

=

∫ x1

x0

[Fy(x, Y (x), Y ′(x), Y ′′(x))∆Y (x)

+Fz(x, Y (x), Y ′(x), Y ′′(x))∆Y ′(x) + Fw(x, Y (x), Y ′(x), Y ′′(x))∆Y ′′(x)] dx.(4.5.7)

From ∫ x1

x0

Fz(x, Y (x), Y ′(x), Y ′′(x))∆Y ′(x)dx

= Fz(x1, Y (x1), Y ′(x1), Y ′′(x1))∆Y (x1)

− Fz(x0, Y (x0), Y ′(x0), Y ′′(x0))∆Y (x0)

−∫ x1

x0

∆Y (x)d

dxFz(x, Y (x), Y ′(x), Y ′′(x))dx

and ∫ x1

x0

Fw(x, Y (x), Y ′(x), Y ′′(x))∆Y ′′(x)dx

=

∫ x1

x0

Fw(x, Y (x), Y ′(x), Y ′′(x))d∆Y ′(x)

= Fw(x1, Y (x1), Y ′(x1), Y ′′(x1))∆Y ′(x1)

− Fw(x0, Y (x0), Y ′(x0), Y ′′(x0))∆Y ′(x0)

−∫ x1

x0

d

dxFw(x, Y (x), Y ′(x), Y ′′(x))∆Y ′(x)dx

= Fw(x1, Y (x1), Y ′(x1), Y ′′(x1))∆Y ′(x1)

− Fw(x0, Y (x0), Y ′(x0), Y ′′(x0))∆Y ′(x0)

− ∆Y (x1)d

dxFw(x1, Y (x1), Y ′(x1), Y ′′(x1))

+ ∆Y (x0)d

dxFw(x0, Y (x0), Y ′(x0), Y ′′(x0))

+

∫ x1

x0

∆Y (x)d2

dx2Fw(x, Y (x), Y ′(x), Y ′′(x))dx.

Hence (4.5.6) can be written as∫ x1

x0

[Fy(x, Y (x), Y ′(x), Y ′′(x))− d

dxFz(x, Y (x), Y ′(x), Y ′′(x))

+d2

dx2Fw(x, Y (x), Y ′(x), Y ′′(x))

]∆Y (x)dx

=

[λ1 + Fz(x0, Y (x0), Y ′(x0), Y ′′(x0))− d

dxFw(x0, Y (x0), Y ′(x0), Y ′′(x0))

]∆Y (x0)

+

[λ2 − Fz(x1, Y (x1), Y ′(x1), Y ′′(x1)) +

d

dxFw(x1, Y (x1), Y ′(x1), Y ′′(x1))

]∆Y (x1)

+ [λ3 + Fw(x0, Y (x0), Y ′(x0), Y ′′(x0))] ∆Y ′(x0)

− Fw(x1, Y (x1), Y ′(x1), Y ′′(x1))∆Y ′(x1).

66 YI LI

In particular,∫ x1

x0

[Fy(x, Y (x), Y ′(x), Y ′′(x))− d

dxFz(x, Y (x), Y ′(x), Y ′′(x))

+d2

dx2Fw(x, Y (x), Y ′(x), Y ′′(x))

]∆Y (x)dx = 0,(4.5.8)

for all functions ∆Y ∈ C 2[x0, x1] which satisfy ∆Y (x0) = ∆Y ′(x0) = ∆Y (x1) =∆Y ′(x1) = 0. By Du Bois-Reymond’s lemma, we have

Fy(x, Y (x), Y ′(x), Y ′′(x))− d

dxFz(x, Y (x), Y ′(x), Y ′′(x))

+d2

dx2Fw(x, Y (x), Y ′(x), Y ′′(x)) = 0,(4.5.9)

and then[λ1 + Fz(x0, Y (x0), Y ′(x0), Y ′′(x0))− d

dxFw(x0, Y (x0), Y ′(x0), Y ′′(x0))

]∆Y (x0)

+

[λ2 − Fz(x1, Y (x1), Y ′(x1), Y ′′(x1)) +

d

dxFw(x1, Y (x1), Y ′(x1), Y ′′(x1))

]∆Y (x1)

+ [λ3 + Fw(x0, Y (x0), Y ′(x0), Y ′′(x0))] ∆Y ′(x0)

− Fw(x1, Y (x1), Y ′(x1), Y ′′(x1))∆Y ′(x1) = 0.

Choosing ∆Y to be function such that

∆Y (x0) = ∆Y (x1) = ∆Y ′(x0) = 0, ∆Y ′(x1) 6= 0,

we find

(4.5.10) Fw(x1, y1, Y′(x1), Y ′′(x1)) = 0.


(4.5.11) F (x, y, z, w) =w2

(1 + z2)5/2, x0 = 0, x1 = a

and

(4.5.12) Y (0) = b, Y (a) = 0, Y ′(0) = 0.

By (4.5.9), we get

(4.5.13) 5d

dx

(Y ′(x)Y ′′(x)2

[1 + Y ′(x)2]7/2

)+ 2

d2

dx2

(Y ′′(x)

[1 + Y ′(x)2]5/2

)= 0.

The condition (4.5.10) now becomes

(4.5.14) Y ′′(a) = 0.

Integrating (4.5.13) yields

(4.5.15)d

dx

(Y ′′(x)

[1 + Y ′(x)2]5/2

)+

5

2

Y ′(x)Y ′′(x)2

[1 + Y ′(x)2]7/2= A

for some suitable constant A. The equation (4.5.15) can be written as

(4.5.16) Y ′′′(x)− 5

2

Y ′(x)Y ′′(x)2

1 + Y ′(x)2= A[1 + Y ′(x)2]5/2.


4.6. Functionals involving several independent variables. Consider the func-tional

(4.6.1) J (Z) =

∫∫R

F (x, y, Z(x, y), Zx(x, y), Zy(x, y))dxdy, Z ∈ C 1(R),

for some function F = F (x, y, z, w, u), where R is some given fixed bounded openregion in the (x, y)-plane. We assume that

(4.6.2) Z(x, y) = φ(x, y), (x, y) ∈ ∂R,for some given fixed continuous function defined on the boundary of R.

We now consider the problem of minimizing or maximizing the functional (4.6.1)among all functions Z defined on R which satisfy the boundary constraint (4.6.2).We also assume that the domain D of J is some given open subset of the normedvector space C 1(R+ ∂R). We shall use the symbol

D [Z(x, y) = φ(x, y), (x, y) ∈ ∂R]

to denote the subset of D consisting of all vectors Z ∈ D which satisfy the boundaryconstraint (4.6.2) on ∂R. The extremum problem under consideration is to find localextremum vectors in D [Z(x, y) = φ(x, y), (x, y) ∈ ∂R] for J .

If Z∗ is a local minimum vector in D [Z(x, y) = φ(x, y), (x, y) ∈ ∂R] for J , then

(4.6.3) J (Z∗) ≤ J (Z∗ + U)

for all functions U which vanish on the boundary ∂R and which lie in some ballcentered at the zero vector in C 1(R + ∂R). Let C 1

0 (R + ∂R) be the subspace ofC 1(R+ ∂R) consisting of all functions of class C 1 on R+ ∂R which vanish on theboundary ∂R. Consider a new functional J0 defined by

(4.6.4) J0(U) + J (Z∗ + U)

for all vectors U in some ball D0 = Bρ(0) ⊂ C 10 (R + ∂R). Then (4.6.3) can be

written as

(4.6.5) J0(0) ≤ J0(U), U ∈ D0.


(4.6.6) limε→0

J0(ε∆U)− J0(0)

ε= δJ0(0; ∆U) = 0

for all vectors ∆U ∈ C 10 (R+ ∂R). Thus

(4.6.7) δJ (Z∗; ∆Z) = limε→0

J (Z∗ + ε∆Z)− J (Z∗)

ε= 0

for all vectors ∆Z ∈ C 10 (R+ ∂R).

Note that

δJ (Z; ∆Z) =

∫∫R

[Fz(x, y, Z(x, y), Zx(x, y), Zy(x, y))∆Z(x, y)

+Fw(x, y, Z(x, y), Zx(x, y), Zy(x, y))∆Zx(x, y)(4.6.8)

+Fu(x, y, Z(x, y), Zx(x, y), Zy(x, y))∆Zy(x, y)] dxdy

for any vector Z ∈ D and any vector ∆Z ∈ C 1(R + ∂R). Here ∆Zx = ∂∆Z/∂xand ∆Zy = ∂∆Z/∂y. The Green theorem states that(4.6.9)∫∫

R

fx(x, y)dxdy =

∮∂R

f(x, y)Nxds,

∫∫R

fy(x, y)dxdy =

∮∂R

f(x, y)Nyds

68 YI LI

for any function f ∈ C 1(R+∂R), where N = (Nx, Ny) denotes the exterior-directedunit normal vector on ∂R. Consequently,∫∫

R

Fw∆Zxdxdy = −∫∫

R

(∂

∂xFw

)∆Zdxdy +

∮∂R

Fw∆ZNxds,(4.6.10)∫∫R

Fu∆Zydxdy = −∫∫

R

(∂

∂yFu

)∆Zdxdy +

∮∂R

Fu∆ZNyds.(4.6.11)

Plugging (4.6.10) and (4.6.11) into (4.6.8), we get

δJ (Z; ∆Z) =

∫∫R

[− ∂

∂xFw(x, y, Z(x, y), Zx(x, y), Zy(x, y))

+Fz(x, y, Z(x, y), Zx(x, y), Zy(x, y))

− ∂

∂yFu(x, y, Z(x, y), Zx(x, y), Zy(x, y))

]∆Z(x, y)dxdy(4.6.12)

+

∮∂R

[Fw(x, y, Z(x, y), Zx(x, y), Zy(x, y))Nx

+Fu(x, y, Z(x, y), Zx(x, y), Zy(x, y))Ny] ∆Z(x, y)ds.

By (4.6.7), we find that for any extremum vector Z = Z(x, y) we have

0 =

∫∫R

[− ∂


+Fz(x, y, Z(x, y), Zx(x, y), Zy(x, y))(4.6.13)

− ∂

∂yFu(x, y, Z(x, y), Zx(x, y), Zy(x, y))

]∆Z(x, y)dxdy

for all functions ∆Z ∈ C 10 (R + ∂R). According to Du Bois-Reymond’s lemma, we

get the following Euler-Lagrange equation

0 = Fz(x, y, Z(x, y), Zx(x, y), Zy(x, y))− ∂


− ∂

∂yFu(x, y, Z(x, y), Zx(x, y), Zy(x, y)).(4.6.14)

Example 4.14. (Minimal surface) If we take F to be defined by

(4.6.15) F (x, y, z, w, u) +√

1 + w2 + u2,

then

(4.6.16) J (Z) =

∫∫R

√1 + Zx(x, y)2 + Zy(x, y)2dxdy

gives the surface area of the graph of Z in R3. The graph of any extremum functionZ for the surface area functional (), subject to the boundary condition (4.6.2), iscalled a minimal surface. In this case, (4.6.14) becomes

(4.6.17)∂

∂x

Zx√1 + Z2

x + Z2y

+∂

∂y

Zy√1 + Z2

x + Z2y

= 0,

or

(4.6.18) (1 + Z2y)Zxx − 2ZxZyZxy + (1 + Z2

x)Zyy = 0.

The above equation is called the minimal surface equation.


Example 4.15. (Vibrating string) In Example 4.12, we considered the small trans-verse vibrations of a continuous string and now we consider a vibrating elastic stringof uniform cross section.

• Let m denote the total mass of the string.• Let ` denote the length of the quiet string at rest in its equilibrium positions.• We take the quiet string to coincide with the interval [0, `] along the x-axis,

and we suppose that the vibrations occurs in the (x, z)-plane.• We assume that the string can be given parametrically as

(4.6.19) γ : z = Z(x, t), x ∈ [0, `]

for some suitable function Z = Z(x, t), and impose the constraints

(4.6.20) Z(0, t) = Z(`, t) = 0

for all t.• The time interval is given by [t0, t1].• We shall assume that the potential energy V due to the stretching of the

elastic string is proportional to the amount by which the string has beenstretched. Hence

(4.6.21) V = τ

[∫ `

0

√1 + Zx(x, t)2dx−

∫ `

0

dx

]= τ

∫ `

0

[√1 + Zx(x, t)2 − 1

]dx

for some given positive proportionality factor τ which gives a measure ofthe tension in the string. When Zx is small, the quantity (4.6.21) can beapproximated by

(4.6.22) V0 =τ

2

∫ `

0

Zx(x, t)2dx.

• Finally, we shall consider only the case of transversal motions in which eachpiece of the string vibrates up and down. In this case, the kinetic energy Tis given by

(4.6.23) T =m

2`

∫ `

0

Zt(x, t)2dx.

Indeed, if we divide the initial interval [0, `] into n subintervals given as[x0, x1], [x1, x2], · · · , [xn−1, xn] with 0 = x0 < x1 < x2 < · · · < xn−1 <xn = `. Then the initial (at time t0) mass mi of the ith piece of string is

mi =xi − xi−1

`m,

from which the mass mi of the ith piece of string is constant during theentire motion (because of the transversal motions of the string where eachpiece vibrates up and down), with

mi =xi − xi−1

`m, t ∈ [t0, t1]

70 YI LI

and for i = 1, · · · , n. Therefore, where xi ∈ (xi−1, xi),

T = limn→∞

n∑i=1

Ti = limn→∞

n∑i=1

1

2miZt(xi, t)

2

= limn→∞

m

2`

n∑i=1

(xi − xi−1)Zt(xi, t)2 =

m

2`

∫ `

0

Zt(x, t)2dx.

From (4.6.22) and (4.6.23), the action functional A0 for this model is

(4.6.24) A0 =

∫ t1

t0

(T − V0)dt =

∫∫R

F (Zx(x, t), Zt(x, t))dxdt,

where

(4.6.25) F (p, q) =1

2

(m`q2 − τp2

), R = (x, t) ∈ R2 : x ∈ (0, `), t ∈ (t0, t1).

Hamilton’s principle asserts that the actual motion of the string is described bya function Z = Z(x, t) which furnishes a minimum value to the action functional(4.6.24) among all continuously differentiable functions which coincide with Z(x, t)at t = t0 an at t = t1 and which vanish at the end points x = 0 and x = ` (fort ∈ [t0, t1]). By (4.6.14), we get

(4.6.26)∂

∂xFp(Zx(x, t), Zt(x, t)) +

∂

∂tFq(Zx(x, t), Zt(x, t)) = 0,

and then

(4.6.27) ρZtt = τZxx

for ρ = m/`.In some case, both ρ = ρ(x) and τ = τ(x) become nonnegative functions of x

for x ∈ [0, `]. In this case, we can prove that, as (4.6.27), the small vibrations ofthe string are given by

(4.6.28) ρ(x)Ztt(x, t) =∂

∂x[τ(x)Zx(x, t)] .

Remark 4.16. Let the boundary of the plane region R be divided into two disjoint(distinct) parts ∂1R and ∂2R, and let φ be a given function defined on ∂1R. IfZ(x, y) minimizes or maximizes the functional J of (4.6.1) among all functionssatisfying the boundary condition Z(x, y) = φ(x, y) for all (x, y) ∈ ∂1R, then on∂2R, the extremum function Z4 must satisfy the natural boundary condition

Fw(x, y, Z(x, y), Zx(x, y), Zy(x, y))Nx

+ Fu(x, y, Z(x, y), Zx(x, y), Zy(x, y))Ny = 0, (x, y) ∈ ∂2R,(4.6.29)

where N = (Nx, Ny) denotes the exterior-directed unit normal vector on ∂R. Inthis case any such extremum function Z must satisfy the same Euler-Lagrangeequation (4.6.14) throughout entire region R.

Remark 4.17. Let x = (x1, x2, x3) and p = (p1, p2, p3) denote arbitrary points inR3, and let F = F (x1, x2, x3, u, p1, p2, p3) = F (x, u,p) be a given function of theseven real variables x1, x2, x3, u, p1, p2, p3. Define a functional

(4.6.30) J (U) :=

∫R

F (x, U(x),∇U(x))dx


for any real-valued function U = U(x) of class C 1 on a given fixed open set R ⊂ R3,where ∇U is the gradient of U given as ∇U(x) = (∂U/∂x1, ∂U/∂x2, ∂U/∂x3) =(Ux1 , Ux2 , Ux3), and where the integral is a volume integral taken over the regionR with dx = dx1dx2dx3. Given a function ψ ∈ C (∂R), consider the set

D := U ∈ C 1(R) : U(x, y) = φ(x, y) for all (x, y) ∈ ∂R.

If U is a local extremum function in D for the functional J , then

(4.6.31) Fu(x, U(x),∇U(x))−∑

1≤i≤3

∂

∂xiFpi(x, U(x),∇U(x)) = 0

for all points x ∈ R.

5. Application II: Sturm-Liouville eigenvalues

5.1. Sturm-Liouville problems. The equation (4.6.28) is a special case of thefollowing

(5.1.1)d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x) = −λρ(x)W (x),

where the function W = W (x), x ∈ (x0, x1), satisfies certain endpoint conditionsat x = x0 and x = x1. The functions τ(x), q(x), and ρ(x) are given functions, andλ is a parameter called a eigenvalue.

The Sturm-Liouville problem is to study the eigenvalues and we shall see be-low that those eigenvalues can be obtained as the solutions of certain extremumproblems.

Example 5.1. We solve the equation (4.6.28) subject to the fixed endpoint con-ditions

(5.1.2) Z(0, t) = Z(`, t) = 0, t ≥ 0

and the initial conditions

(5.1.3) Z(x, 0) = φ(x), Zt(x, 0) = ψ(x), x ∈ [0, `].

If Z(x, t) = X(x)T (t), then

(5.1.4)1

T (t)

d2

dt2T (t) =

1

ρ(x)X(x)

d

dx

[τ(x)

d

dxX(x)

]for all x ∈ (0, `) and t > 0. Thus, for some constant λ, we have

d2

dt2T (t) = −λT (t), t > 0,(5.1.5)

d

dx

[τ(x)

d

dxX(x)

]= −λρ(x)X(x), x ∈ (0, `).(5.1.6)

The endpoint conditions (5.1.2) reduce to

(5.1.7) X(0) = 0, X(`) = 0.

To avoid the trivial solution to (5.1.6), we impose the additional condition

(5.1.8)

∫ `

0

ρ(x)X(x)2dx > 0.

72 YI LI

If τ and ρ are positive constants, then (5.1.6) becomes

(5.1.9) X ′′(x) = − λc2X(x), c2 =

τ

ρ

with general solution given as

(5.1.10) X(x) = A · sin(√

λx

c

)+B · cos

(√λx

c

)for arbitrary constants of integration A and B. The boundary condition (5.1.7)implies that B = 0, and then the remaining boundary condition X(`) = 0 impliesthat

(5.1.11) A · sin(√

λ`

c

)= 0.

Consequently, A = 0 or√λ`/c = nπ for some integer n. By (5.1.8), we must have

(5.1.12) λn =(nπc

`

)2

=(nπ`

)2 τ

ρ, n ∈ N.

Thus we find the nontrivial solution given as

(5.1.13) Xn(x) = sinnπx

`

or any (nonzero) constant multiple of this solution. By (5.1.5), we get the mostgeneral solution given by

(5.1.14) Tn(t) = an · cos(√λnt) + bn · sin

√λnt = an · cos

nπct

`+ bn · sin

nπct

`

for arbitrary constants of integration an and bn. The resulting product solutionZ = XnTn becomes

(5.1.15) Zn(x, t) = Xn(x)Tn(t) = sinnπx

`

[an · cos

nπct

`+ bn · sin

nπct

`

].

clearly that the special solution (5.1.15) will in general not satisfy the initial con-ditions (5.1.3). However, we can form sums

(5.1.16) Z(x, t) =

∞∑n=1

Zn(x, t) =

∞∑n=1

sinnπx

`

[an · cos

nπct

`+ bn · sin

nπct

`

],

with the constants an and bn related as

φ(x) =

∞∑n=1

sinnπx

`an,

`

nπcψ(x) =

∞∑n=1

sinnπx

`bn.

By Fourier transform, an and bn are determined by

(5.1.17) an =2

`

∫ `

0

φ(x) sinnπx

`dx, bn =

2

ncπ

∫ `

0

ψ(x) sinnπx

`dx

with c =√τ/ρ.

The nth special product solution Zn(x, t) represents the nth fundamental vi-bration for the given string. The eigenfunction Xn(x) = sin(nπx/`) gives thesharp or form of this fundamental vibration at each t, while the function Tn(t) =an · cos(

√λnt) + bn · sin

√λnt causes each point of the string to vary periodically

with time with a period 2π/√λn = 2`

√ρ/τ/n.


5.2. Rayleigh quotient and the lowest eigenvalue. We shall now consider theSturm-Liouville problem for the following equation

(5.2.1)d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x) = −λρ(x)W (x), x ∈ (x0, x1).

We assume that

• the given functions τ and ρ are positive,• both ρ and q are continuous,• τ is continuously differentiable, and• W (x) satisfies the boundary condition

(5.2.2) W (x0) = W (x1) = 0.

It can be showed that this Sturm-Liouville problem always has infinitely manyeigenvalues

(5.2.3) λ1 < λ2 < · · · < · · · , limk→∞

λk = +∞.

Furthermore, corresponding to each eigenvalue λn there is an eigenfunction Wn(x)which is a nontrivial solution to the given boundary value problem, and any twoeigenfunctions which correspond to the same eigenvalue must be constant multiplesof each other.

Multiplying W (x) on both sides of (5.2.1), we get

λρ(x)W (x)2 = −W (x)d

dx

[τ(x)

d

dxW (x)

]− q(x)W (x)2

and then

λ

∫ x1

x0

ρ(x)W (x)2dx =

∫ x1

x0

τ(x)

[d

dxW (x)

]2

− q(x)W (x)2

dx.

For convenience, we introduce two functionals

D(W ) +∫ x1

x0

[τ(x)w′(x)2 − q(x)W (x)2

]dx,(5.2.4)

H(W ) +∫ x1

x0

ρ(x)W (x)2dx(5.2.5)

for any vector W ∈ C 10 [x0, x1], the vector space of all continuously differentiable

functions on the interval [x0, x1] which vanish at the endpoints. The Rayleighquotient now is defined as

(5.2.6) R(W ) +D(W )

H(W )

for any nonzero vector W ∈ C 10 [x0, x1].

If λ is any eigenvalue with corresponding eigenfunction W , then

λ = R(W ).

Proposition 5.2. (Rayleigh principle) The lowest eigenvalue λ1 is equal to theminimum value of the Rayleigh quotient R. Namely,

(5.2.7) λ1 = minW∈C 1

0 [x0,x1]R(W ).

74 YI LI

Proof. (Sketch) Since τ, ρ are positive and ρ, q are continuous, it follows that

D(W ) ≥ −||q||C [x0,x1]

∫ x1

x0

W (x)2dx,

H(W ) ≤ ||ρ||C [x0,x1]

∫ x1

x0

W (x)2dx

and then R(W ) is bounded below by −||q||C [x0,x1]/||ρ||C [x0,x1], from which we can

show that R achieves a minimum value on C 10 [x0, x1].

If W1 ∈ C 10 [x0, x1] is a minimum vector for the functional R, then

δR(W1; ∆W ) = 0

for all vectors ∆W ∈ C 10 [x0, x1]. From the formula

δR(W ; ∆W ) =δD(W ; ∆W )−R(W )δH(W ; ∆W )

H(W ),

we obtain

δD(W1; ∆W ) = R(W1)δH(W1; ∆W )

for all vectors ∆W ∈ C 10 [x0, x1]. On the other hand,

δD(W ; ∆W ) = 2

∫ x1

x0

[τ(x)W ′(x)∆W ′(x)− q(x)W (x)∆W (x)] dx

= −2

∫ x1

x0

d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x)

∆W (x)dx,

δH(W ; ∆W ) = 2

∫ x1

x0

ρ(x)W (x)∆W (x)dx

for all functions ∆W ∈ C 10 [x0, x1]. Hence we obtain∫ x1

x0

d

dx

[τ(x)

d

dxW1(x)

]+ q(x)W1(x) + λ∗ρ(x)W1(x)

∆W (x)dx = 0

for all functions ∆W ∈ C 10 [x0, x1], where

λ∗ + R(W1) =D(W1)

H(W1).

By Du Bois-Reymond’s lemma, W1 satisfy the following differential equation

d

dx

[τ(x)

d

dxW1(x)

]+ q(x)W1(x) = −λ∗ρ(x)W1(x)

for x ∈ (x0, x1). Thus, λ∗ = R(W1) is an eigenvalue and W1 is the correspondingeigenfunction (up to an arbitrary multiplicative constant).

If λ is any eigenvalue with corresponding eigenfunction W , then

λ∗ = R(W1) ≤ R(W ) = λ,

since W1 is a minimum vector in C 10 [x0, x1] for R. Hence λ∗ = λ1.

Example 5.3. Consider the Sturm-Liouville problem

W ′′(x)− xW (x) = −λW (x), x ∈ (0, 1)


with W (0) = W (1) = 0. According to (5.2.4)–(5.2.6), we have

R(W ) =

∫ 1

0[W ′(x)2 + xW (x)2]dx∫ 1

0W (x)2dx

=

∫ 1

0W ′(x)2dx∫ 1

0W (x)2dx

+

∫ 1

0xW (x)2dx∫ 1

0W (x)2dx

and then the lowest eigenvalue λ1 satisfies

minW∈C 1

0 [0,1]

∫ 1

0W ′(x)2dx∫ 1

0W (x)2dx

≤ λ1 = minW∈C 1

0 [0,1]R(W ) ≤ 1 + min

W∈C 10 [0,1]

∫ 1

0W ′(x)2dx∫ 1

0W (x)2dx

.

If µ1 is the lowest eigenvalue for the problem

W ′′(x) = −µW (x), x ∈ (0, 1)

with W (0) = W (1) = 0, then

µ1 ≤ λ1 ≤ 1 + µ1.

Solving the above equation yields√µn = nπ for n ∈ N. Consequently,

√µ1 = π

and then µ1 = π2. Hence π2 ≤ λ1 ≤ 1 + π2.

5.3. Rayleigh-Ritz method and the lowest eigenvalue. In this section weshall describe another method, called Rayleigh-Ritz method, which is also basedon Rayleigh’s principle and which yields upper bounds on the lowest eigenvalue.

We shall consider the following Sturm-Liouville problem

d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x) = −λρ(x)W (x), x ∈ (x0, x1),

with W (x0) = W (x1) = 0.the idea of Rayleigh-Ritz method is as follows:

(a) Consider the problem of minimizing R = D/H over all functions W ∈C 1

0 [x0, x1|ψ1, · · · , ψn] the subspace spanned by a given collection of fixedfunctions ψ1, · · · , ψn ∈ C 1

0 [x0, x1]. Then any such a function W can bewritten as

W =

n∑i=1

ciψi

for arbitrary constants c1, · · · , cn. This simpler problem of minimizing theRayleigh quotient R involves only a suitable choice for the n constantsc1, · · · , cn.

(b) Let c = (c1, · · · , cn) and define two real-valued functions by

d(c) = D

(n∑i=1

ciψi

), h(c) = H

(n∑i=1

ciψi

)for any n-tuple c = (c1, · · · , cn) ∈ Rn. Then

R(Wc) =d(c)

h(c)

for any nonzero function Wc =∑ni=1 ciψi, and so the problem of minimiz-

ing the Rayleigh quotient over C 10 [x0, x1|ψ1, · · · , ψn] is equivalent to the

problem of minimizing the ratio d(c)/h(c) over Rn.

76 YI LI

(c) If c∗ = (c∗1, · · · , c∗n) is a minimum vector in Rn for the ratio d(c)/h(c), then

∂

∂ck

[d(c)

h(c)

] ∣∣∣c=c∗

= 0, k = 1, · · · , n

from which we find that

(5.3.1)∂

∂ckd(c∗) = r∗

∂

∂ckh(c∗)

for k = 1, · · · , n, where r∗ = d(c∗)/h(c∗). By the definition, we have

(5.3.2) d(c) =∑

1≤i,j≤n

aijcicj , h(c) =∑

1≤i,j≤n

bijcicj ,

where

aij =

∫ x1

x0

[τ(x)ψ′i(x)ψ′j(x)− q(x)ψi(x)ψj(x)

]dx,(5.3.3)

bij =

∫ x1

x0

ρ(x)ψi(x)ψj(x)dx.(5.3.4)

(d) From (5.3.1) and (5.3.2), we have

(5.3.5)

n∑j=1

akjc∗j = r∗

n∑j=1

bkjc∗j

or

(5.3.6) (A− r∗B)c∗ = 0,

where A = (aij)1≤i,j≤n and B = (bij)1≤i,j≤n. Hence, c∗ is nonzero if andonly if the matrix A− r∗B is singular–that is, if and only if r∗ satisfies theequation

(5.3.7) det(A− rB) = 0.

Each solution r of (5.3.7) is an eigenvalue for the matrix equation (5.3.6),and the smallest eigenvalue r1 is the minimum value of the Rayleigh quo-tient D/H over C 1

0 [x0, x1|ψ1, · · · , ψn].The smallest eigenvalue

of det(A− rB) = 0

⇐⇒

The minimum value of the Rayleigh

quotient D/H over C 10 [x0, x1|ψ1, · · · , ψn]

Consequently,

(5.3.8) λ1 ≤ r1.

Example 5.4. Consider the equation

τW ′′(x) = −λρW (x), x ∈ (0, `)

with W (0) = W (`) = 0. Here τ and ρ are given positive constants. From (5.1.12),we have

(5.3.9) λ1 =π2τ

`2ρ≈ 9.87τ

`2ρ<

10τ

`23ρ.

To use the Rayleigh-Ritz method, we define

ψ1(x) = x(`− x), ψ2(x) = x(`2 − x2), x ∈ [0, `].


Since

A = τ

( ∫ `0

(`− 2x)2dx∫ `

0(`− 2x)(`2 − 3x2)dx∫ `

0(`− 2x)(`2 − 3x2)dx

∫ `0

(`2 − 3x2)2dx

)

= τ`3(

13

`2

`2

45`

2

),

B = τ

( ∫ `0x2(`− x)2dx

∫ `0x2(`− x)(`2 − x2)dx∫ `

0x2(`− x)(`2 − x2)dx

∫ `0x2(`2 − x2)2dx

)

=ρ`5

5

(16

`42

`4

821`

2

),

it follows that (5.3.7) becomes

ρ2`4r2 − 52ρ`2τr + 420τ2 = 0

and then the smallest eigenvalue r1 over C 10 [0, `|ψ1, ψ2] is

(5.3.10) r1 =10τ

`2ρ.

5.4. Rayleigh quotient and Higher eigenvalues. The lowest eigenvalue λ1 forthe Sturm-Liouville problem

(5.4.1)d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x) = −λρ(x)W (x), x ∈ (x0, x1)

with W (x0) = W (x1) = 0, can be characterized as

(5.4.2) λ1 = minW∈C 1

0 [x0,x1]R(W ) = min

W∈C 10 [x0,x1]

D(W )

H(W ),

where

D(W ) =

∫ x1

x0

[τ(x)W ′(x)2 − q(x)W (x)2

]dx,

H(W ) =

∫ x1

x0

ρ(x)W (x)2dx.

Proposition 5.5. The second eigenvalue λ2 can be characterized as

(5.4.3) λ2 = min

D(W )

H(W ): W ∈ C 1

0 [x0, x1] with

∫ x1

x0

ρ(x)W1(x)W (x)dx = 0

where W1 is the eigenfunction corresponding to the lowest eigenvalue λ1.

Proof. If W2 is a solution to the right extremum problem, then by Theorem 3.5,we have

δD(W2; ∆W )−R(W2)δH(W2; ∆H) = C

∫ x1

x0

ρ(x)W1(x)∆W (x)dx

for all vectors ∆W ∈ C 10 [x0, x1]. Thus∫ x1

x0

d

dx

[τ(x)

d

dxW2(x)

]+ q(x)W1(x) + λ∗ρ(x)W2(x) + µρ(x)W1(x)

∆W (x)dx = 0

for some suitable constant µ and for all vectors ∆W ∈ C 10 [x0, x1], where

λ∗ = R(W2) =D(W2)

H(W2).

78 YI LI

By Du Bois-Reymond’s lemma, W2 satisfies the following differential equation

d

dx

[τ(x)

d

dxW2(x)

]+ q(x)W2(x) = −λ∗ρ(x)W2(x)− µρ(x)W1(x)

for x ∈ (x0, x1). Multiplying by W1 on both sides and using the constraint yields

−µ∫ x1

x0

ρ(x)W1(x)2dx

=

∫ x1

x0

d

dx

[τ(x)

d

dxW2(x)

]W1(x) + q(x)W2(x)W1(x)

dx

= −∫ x1

x0

τ(x)d

dxW2(x)

d

dxW1(x)dx+

∫ x1

x0

q(x)W2(x)W1(x)dx

=

∫ x2

x1

d

dx

[τ(x)

d

dxW1(x)

]+ q(x)W1(x)

W2(x)dx

= −λ1

∫ x1

x0

ρ(x)W1(x)W2(x)dx = 0

so that µ = 0 and thus W1 satisfies

d

dx

[τ(x)

d

dxW2(x)

]+ q(x)W2(x) = −λ∗ρ(x)W2(x)

for x ∈ (x0, x1).Hence λ∗ = R(W2) is an eigenvalue for the Sturm-Liouville problem and W2 (up

to any constant multiplicative constants) is the corresponding eigenfunction. Nowwe can claim that

(5.4.4) λ∗ = λ2.

Before proving this, we need the following

Lemma 5.6. Let W ∗ and W ∗∗ be any two eigenfunctions for the Sturm-Liouvilleproblem

d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x) = −λρ(x)W (x), x ∈ (x0, x1)

with W (x0) = W (x1) = 0, corresponding to distinct eigenvalues λ∗ and λ∗∗. ThenW ∗ and W ∗∗ must satisfy the orthogonality condition∫ x1

x0

ρ(x)W ∗(x)W ∗∗(x)dx = 0.

Proof. From

d

dx

[τ(x)

d

dxW ∗(x)

]+ q(x)W ∗(x) = −λ∗ρ(x)W ∗(x),

we have

−λ∗∫ x1

x0

ρ(x)W ∗(x)W ∗∗(x)dx

=

∫ x1

x0

d

dx

[τ(x)

d

dxW ∗(x)

]W ∗∗(x)dx+

∫ x1

x0

q(x)W ∗(x)W ∗∗(x)dx

= −∫ x1

x0

τ(x)d

dxW ∗(x)

d

dxW ∗∗(x)dx+

∫ x1

x0

q(x)W ∗(x)W ∗∗(x)dx;


similarly,

−λ∗∗∫ x1

x0

ρ(x)W ∗(x)W ∗∗(x)dx

= −∫ x1

x0

τ(x)d

dxW ∗(x)

d

dxW ∗∗(x)dx+

∫ x1

x0

q(x)W ∗(x)W ∗∗(x)dx.

Therefore

(λ∗ − λ∗∗)∫ x1

x0

ρ(x)W ∗(x)W ∗∗(x)dx = 0,

from which we conclude that∫ x1

x0

ρ(x)W ∗(x)W ∗∗(x)dx = 0

since λ∗ 6= λ∗∗.

We return to proof (5.4.4). If λ 6= λ1 is any eigenvalue with correspondingeigenfunction W , then, by Lemma 5.6,∫ x1

x0

ρ(x)W1(x)W (x)dx = 0.

By the definition (5.4.3), we have

λ∗ = R(W2) ≤ R(W ) = λ

for any eigenvalue λ 6= λ1. Hence λ∗ = λ2.

If We already have the first n − 1 eigenvalues 0 < λ1 < λ2 < · · · < λn−1 withcorresponding eigenfunctions W1,W2, · · · ,Wn−1, then we can show that the ntheigenvalue λn is characterized as

λn = min

R(W ) : W ∈ C 1

0 [x0, x1],

∫ x1

x0

ρ(x)Wi(x)W (x)dx = 0, i = 1, · · · , n− 1

.

5.5. The Courant minimax principle. Given φ1, · · · , φn−1 ∈ C 10 [x0, x1], set

(5.5.1)

C(φ1, · · · , φn−1) = min

R(W ) : W ∈ C 1

0 [x0, x1],

∫ x1

x0

φi(x)W (x)dx = 0, 1 ≤ i ≤ n− 1

.

If, in particular, we takeφi(x) = ρ(x)Wi(x)

for i = 1, · · · , n− 1, where W1, · · · ,Wn−1 are the first (n− 1) eigenfunctions, then

C(ρW1, · · · , ρWn−1) = λn.

Lemma 5.7. For any φ1, · · · , φn−1 ∈ C 10 [x0, x1] we have

(5.5.2) C(φ1, · · · , φn−1) ≤ λn.

Proof. By (5.5.1) and (5.5.2), it suffices to find a function W ∗ ∈ C 10 [x0, x1] such

that

(5.5.3) R(W ∗) ≤ λn.We may consider

(5.5.4) W ∗ =

n∑j=1

cjWj

80 YI LI

for suitable constants c1, · · · , cn which must be determined so that W ∗ satisfies

0 =

∫ x1

x0

φi(x)W ∗(x)dx =

n∑j=1

γijcj , i = 1, · · · , n− 1,

where

γij =

∫ x1

x0

φi(x)Wj(x)dx.

but it is well known that such a system of n − 1 linear homogeneous equationsin n unknowns always has a nontrivial solution. Hence it is always find suitableconstants c1, · · · , cn not all zero, such that the resulting function W ∗ satisfies thegiven n− 1 constraints.

By (5.5.4), we get

R(W ∗) =D(∑nj=1 cjWj)

H(∑nj=1 cjWj)

=

∑1≤i,j≤n cicjαij∑1≤i,j≤n cicjβij

,

where

αij =

∫ x1

x0

[τ(x)W ′i (x)W ′j(x)− q(x)Wi(x)Wj(x)

]dx

= −∫ x1

x0

d

dx

[τ(x)

d

dxWi(x)

]+ q(x)Wi(x)

Wj(x)dx,

βij =

∫ x1

x0

ρ(x)Wi(x)Wj(x)dx = δijβij

by Lemma 5.6. Since Wi is the eigenfunction of the Sturm-Liouville problem cor-responding to the eigenvalue λi, it follows that

αij = λiβij .

Hence

R(W ∗) =

∑1≤i,j≤n cicjλiβij∑

1≤i,j≤n cicjβij=

∑ni=1 c

2iλiβii∑n

i=1 c2iβii

,

where

βii =

∫ x1

x0

ρ(x)Wi(x)2dx > 0.

From 0 < λ1 < · · · < λn, we get

R(W ∗) ≤∑ni=1 c

2iλnβii∑n

i=1 c2iβii

= λn.

Thus we proved (5.5.3).

Theorem 5.8. (Courant’s minimax principle) The nth eigenvalue λn is equal tothe maximum value of the expression C(φ1, · · · , φn−1) over all possible functionsφ1, · · · , φn−1 ∈ C 1

0 [x0, x1]. That is

(5.5.5) λn = maxφ1,··· ,φn∈C 1

0 [x0,x1]C(φ1, · · · , φn).


The minimax principle is well suited for the purpose of comparing the eigen-values of different Sturm-Liouville problems. Let λn and λ∗n denote the respectiveeigenvalues for the problems

d

dx

[τ(x)

d

dxW (x)

]+ q(x)W (x) = −λρ(x)W (x),

d

dx

[τ∗(x)

d

dxW (x)

]+ q∗(x)W (x) = −λρ∗(x)W (x)

with λ1 < · · · < λn < · · · and λ∗1 < · · · < λ∗n < · · · , where x ∈ (x0, x1) andW (x0) = W (x1) = 0. Recall the corresponding Rayleigh quotients

R(W ) =

∫ x1

x0[τ(x)W ′(x)2 − q(x)W (x)2]dx∫ x1

x0ρ(x)W (x)2dx

,

R∗(W ) =

∫ x1

x0[τ∗(x)W ′(x)2 − q∗(x)W (x)2]dx∫ x1

x0ρ∗(x)W (x)2dx

.

Here τ(x), q(x), ρ(x), τ∗(x), q∗(x), and ρ∗(x) are given functions defined on [x0, x1].

Lemma 5.9. If

(5.5.6) R(W ) ≤ R∗(W )

for all W ∈ C 10 [x0, x1], then

(5.5.7) λn ≤ λ∗nfor all n ∈ N.

Proof. If (5.5.6) holds, then

C(φ1, · · · , φn−1) ≤ C∗(φ1, · · · , φn−1)

for all φ1, · · · , φn−1 ∈ C 10 [x0, x1]. By Theorem 5.8, (5.5.7) follows.

Remark 5.10. If

(5.5.8) τ ≤ τ∗, q ≥ q∗, ρ ≥ ρ∗, x ∈ [x0, x1],

then (5.5.6) is valid.If, in particular, we take

τ∗ + maxx∈[x0,x1]

τ(x) = ||τ ||C [x0,x1] = τM ,

q∗ + minx∈[x0,x1]

q(x) = qm,(5.5.9)

ρ∗ + minx∈[x0,x1]

ρ(x) = ρm,

then (5.5.8) holds, and we have λn ≤ λ∗n. Since the given problem for λ∗ becomes

W ′′(x) = −qm + λ∗ρmτM

W (x), x ∈ (x0, x1)

with W (x0) = W (x1) = 0. The general form is

W (x) = an sin

[√qm + λ∗nρm

τM(x− x0)

],

82 YI LI

and the eigenvalues λ∗n and eigenvalues W ∗n(x) are given as

λ∗n = − qmρm

+n2π2

(x1 − x0)2

τMρm

,

W ∗m(x) = sin

[√qm + λ∗nρm

τM(x− x0)

], x ∈ [x0, x1].

According to Lemma 5.9, we get

(5.5.10) λn ≤ −qmρm

+n2π2

(x1 − x0)2

τMρm

for n ∈ N.If we define

τm = minx∈[x0,x1]

τ(x), qM = maxx∈[x0,x1]

q(x), ρM = maxx∈[x0,x1]

ρ(x),

then

(5.5.11) λn ≥ −qMρM

+n2π2

(x1 − x0)2

τmρM

for n ∈ N.In fact, we can prove

(5.5.12) λn =n2π2

(∫ x1

x0

√ρ(x)/τ(x)dx)2

+ En

for certain constant En which remain bounded.

5.6. Polya’s conjecture. We have showed in Example 5.3 that the equation

u′′(x) + λu(x) = 0 in Ω := (0, 1), u = 0 on ∂Ω = 0, 1

has a countable sequence of eigenvalues λj such that

λj = π2j2 = 4π2

(j

ω1|Ω|

)2/1

,

where ω1 = 2π1/2/Γ(1/2) = 2 denotes the volume of the unit disk in R1 (that is,the length of the interval (−1, 1)).

We turn to consider higher-dimension cases. Let Ω be a bounded open subset inRn and consider the Laplacian equation with Dirichlet boundary condition

(5.6.1) ∆u = −λu in Ω, u = 0 on ∂Ω.

We say λ the eigenvalue of the Laplacian equation with Dirichlet boundary con-dition. As in the Sturm-Liouville problem the problem (5.6.1) has infinitely manyeigenvalues

(5.6.2) 0 < λ1 ≤ λ2 ≤ · · · , limj→∞

λj =∞.

The asymptotic notation Pj ∼ Qj means

(5.6.3) limj→∞

PjQj

= 1.

The well-known Weyl’s asymptotic formula states that


Theorem 5.11. (Weyl’s asymptotic formula, 1912) Let Ω be a bounded open subsetin Rn. Then

(5.6.4) λj ∼ 4π2

(j

ωn|Ω|

)2/n

as j → ∞, where ωn = 2πn/2/nΓ(n/2) denotes the volume of the unit disk in Rn

and |Ω| is the volume of the domain Ω.

Conjecture 5.12. (Polya, 1960) Let Ω be a bounded open subset in Rn. Then

(5.6.5) λj ≥ 4π2

(j

ωn|Ω|

)2/n

The above conjecture is still open even for n = 2. The best result up to now isthe following Li-Yau’s inequality (5.6.9).

Using (5.6.4), one can show that

(5.6.6)∑

1≤j≤k

λj =n

n+ 2

4π2

ω2/nn

|Ω|−2/nkn+2n + Cn

|∂Ω||Ω|1+ 1

n

k1+ 1n + o

(k1+ 1

n

)as k →∞, where

(5.6.7) Cn :=

√πΓ(2 + n

2 )1+ 1n

(1 + n)Γ( 32 + n

2 )Γ(2)1n

.

The second term in (5.6.6) was established under some suitable conditions on Ω.

Theorem 5.13. (Li-Yau, 1983) Let Ω be a bounded open subset in Rn. Then

(5.6.8)∑

1≤j≤k

λj ≥n

n+ 24π2 k

n+2n

(ωn|Ω|)2/n.

Consequently,

(5.6.9) λj ≥n

n+ 24π2

(j

ωn|Ω|

)2/n

.

In view of the asymptotic formula (5.6.6), Li-Yau’s inequality is sharp.

Theorem 5.14. (Lieb, 1980) Let Ω be a bounded open subset in Rn. Then

(5.6.10) λj ≥ C ′n(j

|Ω|

)2/n

,

for some constant C ′n that differs from the constant 4π2/ω2/nn by a factor.

Theorem 5.15. (Melas, 2002) Let Ω be a bounded open subset in Rn. Then

(5.6.11)∑

1≤j≤k

λj ≥n

n+ 24π2 k

n+2n

(ωn|Ω|)2/n+Mnk

|Ω|I(Ω)

for some constant Mn depending only on n, where

(5.6.12) I(Ω) := mina∈R2

∫Ω

|x− a|2dx,

In dimension 2, we have some improvements on Li-Yau’s inequality.

84 YI LI

(1) Li-Yau inequality:

(5.6.13)∑

1≤j≤k

λj ≥2πk2

|Ω|;

(2) Melas inequality:

(5.6.14)∑

1≤j≤k

λj ≥2πk2

|Ω|+|Ω|

32I(Ω)k;

(3) Kovarik-Vugalter-Weidl inequality (2008): If Ω ⊂ R2 is a bounded opensubset with C2-boundary ∂Ω, then

(5.6.15)∑

1≤j≤k

λj ≥2πk2

|Ω|+ αc3k

32−ε(k)|Ω|−3/2C(k, |Ω|, ∂Ω) + (1− α)

|Ω|32I(Ω)

k

for any α ∈ [0, 1], where

ε(k) =2√

log2(2πk/c1), c1 =

√3π

1410−11, c3 =

2−3

9√

236(2π)5/4c

1/41 .

(4) Geisinger-Laptev-Weidl inequality (2010):

(5.6.16) λj ≥(

12

π

)1/nn

(n+ 3)1+ 1n

[Γ( 3+n

2 )

Γ(1 + n2 )

]2/n

4π2

(j

ωn|Ω|

)2/n

+1

l20,

where

l0 := infu∈Sn−1

supx∈Ω

l(x, u),(5.6.17)

l(x, u) := θ(x, u) + θ(x,−u),(5.6.18)

θ(x, u) := inft > 0 : x+ tu /∈ Ω.(5.6.19)

In dimension 2, they improved (5.6.16) as

λj1− α

≥ 10πα3/2 j

|Ω|+

15πC

8

|∂Ω||Ω|

√10πα3/2

j

|Ω|+

225π2C2

256

|∂Ω|2|Ω|2

+225π2C2

128

|∂Ω|2

|Ω|2

for any bounded convex subset Ω ⊂ R2 and any α ∈ (0, 1), where C is aconstant given by

C ≥ 11

9π2− 3

20π4− 2

5π2ln

(4π

3

)> 0.0642.

Using the isoperimetric inequality |∂Ω| ≥ 2π1/2|Ω|1/2 and choosing somesuitable constant α, they proved that

λ2 >15.03

|Ω|>

4π

|Ω|, λ3 >

21.52

|Ω|>

6π

|Ω|, · · · , λ23 >

144.58

|Ω|>

46π2

|Ω|

for any bounded convex subset Ω ⊂ R2.


6. Second variation in extremum problems

6.1. Higher-order variations. If J is a functional defined on an open subset Dof a normed vector space X , then the variation of J at a vector x ∈ D has beendefined by

δJ (x; ∆x) =d

dεJ (x+ ε∆x)

∣∣∣ε=0

provided that the expression J (x+ ε∆x) is differentiable with respect to ε at ε = 0for each vector ∆x ∈X . We refer to this variation as the first variation of J at x.The nth variation of J at a vector x ∈ D is defined by

(6.1.1) δnJ (x; ∆x) =dn

dεnJ (x+ ε∆x)

∣∣∣ε=0

provided that the expression J (x + ε∆x) is n times differentiable with respect toε at ε = 0 for every vector ∆x ∈X . The nth variation satisfies the homogeneityrelation

(6.1.2) δnJ (x; a∆x) = anδnJ (x; ∆x)

for any a ∈ R. In particular, the second variation of J at x is defined as

δ2J (x; ∆x) =d2

dε2J (x+ ε∆x)

∣∣∣ε=0

for any vector ∆x ∈X and satisfies the relation

δ2J (x; a∆x) = a2δ2J (x; ∆x)

for any vector ∆x ∈X and any number a ∈ R.

Example 6.1. (1) If J is an ordinary real-valued function defined in (a, b), then

d

dεJ (x+ ε∆x) = J ′(x+ ε∆x)∆x,

d2

dε2J (x+ ε∆x) = J ′′(x+ ε∆x)(∆x)2

for any suitable numbers x,∆x, and ε, from which

δ2J (x; ∆x) = J ′′(x)(∆x)2

for any ∆x ∈ R.(2) If J is a real-valued function defined in some open region of Rn, then

d

dεJ (x+ ε∆x) =

n∑i=1

∂

∂xiJ (x+ ε∆x)∆xi,

d2

dε2J (x+ ε∆x) =

∑1≤i,j≤n

∂2

∂xi∂xjJ (x+ ε∆x)∆xi∆xj

from which we find that

δ2J (x; ∆x) =∑

1≤i,j≤n

∂2

∂xi∂xjJ (x)∆xi∆xj

for any vector ∆x = (∆x1, · · · ,∆xn) ∈ Rn.(3) If the functional J has the form (F = F (t, x, y))

J (Y ) =

∫ t1

t0

F (t, Y (t), Y ′(t))dt

86 YI LI

for any vector Y ∈ C 1[t0, t1] (or in some given open subset of C 1[t0, t1] relative tosome given norm), then

J (Y + ε∆Y ) =

∫ t1

t0

F (t, Y (t) + ε∆Y (t), Y ′(t) + ε∆Y ′(t))dt,

and

d

dεJ (Y + ε∆Y ) =

∫ t1

t0

[Fx(t, Y (t) + ε∆Y (t), Y ′(t) + ε∆Y ′(t))∆Y (t)

+Fy(t, Y (t) + ε∆Y (t), Y ′(t) + ε∆Y ′(t))∆Y ′(t)] dt,

d2

dε2J (Y + ε∆Y ) =

∫ t1

t0

Fxx(t, Y (t) + ε∆Y (t), Y ′(t) + ε∆Y ′(t))[∆Y (t)]2

2Fxy(t, Y (t) + ε∆Y (t), Y ′(t) + ε∆Y ′(t))∆Y (t)∆Y ′(t)

+Fyy(t, Y (t) + ε∆Y (t), Y ′(t) + ε∆Y ′(t))[∆Y ′(t)]2dt

Consequently,

δ2J (Y ; ∆Y ) =

∫ t1

t0

Fxx(t, Y (t), Y ′(t))[∆Y (t)]2

+2Fxy(t, Y (t), Y ′(t))∆Y (t)∆Y ′(t)(6.1.3)

Fyy(t, Y (t), Y ′(t))[∆Y ′(t)]2dt

for any vector ∆Y ∈ C 1[t0, t1].

Proposition 6.2. Let J be a given functional defined on an open subset D of anormed vector space X and assume that the expression J (x + ε∆x) is n timescontinuously differentiable with respect to ε for all ε near ε = 0, for a fixed vectorx ∈ D and for any vector ∆x ∈X .

(i) For all small ε, we have

(6.1.4) J (x+ ε∆x) =

n∑k=0

εk

k!δkJ (x; ∆x) +Rn(x; ∆x; ε),

where

(6.1.5) Rn(x; ∆x; ε) =

∫ ε

0

(ε− σ)n−1

(n− 1)!

[dn

dσnJ (x+ σ∆x)− δnJ (x; ∆x)

]dσ.

(ii) For all small ε, we have

(6.1.6) |Rn(x; ∆x; ε)| ≤ |ε|n

n!max|σ|≤|ε|

∣∣∣∣ dndσnJ (x+ σ∆x)− δnJ (x; ∆x)

∣∣∣∣ .(iii) In particular, the second variation of J can be given as

(6.1.7) δ2J (x; ∆x) = limε→0

J (x+ ε∆x) + J (x− ε∆x)− 2J (x)

ε2.


Proof. We notice that (6.1.5) can be written as

Rn(x; ∆x; ε) =

∫ ε

0

(ε− σ)n−1

(n− 1)!d

[dn−1

dσn−1J (x+ σ∆x)

]−[∫ ε

0

σn−1

(n− 1)!dσ

]δnJ (x; ∆x)

=

∫ ε

0

(ε− σ)n−1

(n− 1)!d

[dn−1


]− εn

n!δnJ (x; ∆x).

By integration by part, we furthermore obtain

Rn(x; ∆x; ε) = − εn−1

(n− 1)!

dn−1


∣∣∣σ=0

+

∫ ε

0

(ε− σ)n−2

(n− 2)!

dn−1

dσn−1J (x+ σ∆x)dσ − εn

n!δnJ (x; ∆x)

= −n∑

k=n−1

εk

k!δkJ (x; ∆x) +

∫ ε

0

(ε− σ)n−2

(n− 2)!

dn−1

dσn−1J (x+ σ∆x)dσ.

Keeping this process, we can conclude that

Rn(x; ∆x; ε) = −n∑k=i

εk

k!δkJ (x; ∆x) +

∫ ε

0

(ε− σ)i−1

(i− 1)!

di

dσiJ (x+ σ∆x)dσ

for each i = 1, · · · , n− 1. In particular,

Rn(x; ∆x; ε) = −n∑k=1

εk

k!δkJ (x; ∆x) +

∫ ε

0

d

dσJ (x+ σ∆x)dσ

= −n∑k=1

εk

k!δkJ (x; ∆x) + J (x)− J (x+ ε∆x)

which indicates (6.1.4) and then (6.1.6).To prove (6.1.7), letting n = 2 in (6.1.4) yields

J (x+ ε∆x) = J (x) + εδJ (x; ∆x) +ε2

2δ2J (x; ∆x) +R2(x; ∆x; ε),

J (x− ε∆x) = J (x)− εδJ (x; ∆x) +ε2

2δ2J (x; ∆x) +R1(x; ∆x;−ε).

Thus

J (x+ ε∆x) + J (x− ε∆x)− 2J (x)

ε2−δ2J (x; ∆x) =

R2(x; ∆x; ε) +R2(x; ∆x;−ε)ε2

.

However, the estimate (6.1.6) gives∣∣∣∣R2(x; ∆x; ε) +R2(x; ∆x;−ε)ε2

∣∣∣∣ ≤ max|σ|≤|ε|

∣∣∣∣ d2

dσ2J (x+ σ∆x)− δ2J (x; ∆x)

∣∣∣∣which tends to 0, because the d2J (x+ σ∆x)/dσ2 is continuous.

Remark 6.3. If f is a differentiable function in (a, b), then

(6.1.8) f ′(x) = limε→0+

f(x+ ε) + f(x− ε)− 2f(x)

2ε

88 YI LI

for any x ∈ (a, b). However, the converse is not in general true; that means afunction satisfying (6.1.8) may not be differentiable. For instance, consider thefunction f(x) = |x| defined in (−1, 1). Then

f(0 + ε) + f(0− ε)− 2f(0)

ε=ε+ (−ε)− 0

2ε= 0

for all ε > 0.

6.2. Necessary conditions for a local extremum. Let J be a functional de-fined on an open subset D of a normed vector space X . If x∗ is a local extremumvector in D for J with

(6.2.1) δJ (x∗; ∆x) = 0

for every vector ∆x ∈X , and if the expression J (x∗ + ε∆x) is twice continuouslydifferentiable near ε = 0, then by Proposition 6.2 implies that

(6.2.2) J (x∗ + ε∆x)− J (x∗) =ε2

2δ2J (x∗; ∆x) +R2(x; ∆x; ε),

where

(6.2.3) limε→0

R2(x; ∆x; ε)

ε2= 0.

If x∗ is a local minimum vector in D for J , then

J (x∗ + ε∆x)− J (x∗) ≥ 0

for all small numbers ε and then

δ2J (x∗; ∆x) +2

ε2R2(x∗; ∆x; ε) ≥ 0

for all small nonzero numbers ε. Letting ε→ 0 yields

(6.2.4) δ2J (x∗; ∆x) ≥ 0

for every vector ∆x ∈ X . Similarly, if x∗ is a local maximum vector in D for J ,then

(6.2.5) δ2J (x∗; ∆x) ≤ 0

for every vector ∆x ∈X .


K(Y ) =

∫ 1

0

[Y (t)]3dt

for any function Y ∈ C 0[0, 1] equipped with the norm ||Y || = maxt∈[0,1] |Y (t)|.Then

δK(0; ∆Y ) = δ2K(0; ∆Y ) = 0,

but K(0) = 0 ≥ −1 = K(−1) (which means that Y ∗ = 0 is not a local mini-mum vector for K), This example shows that conditions (6.2.4) and (6.2.5) are notsufficient.


6.3. Sufficient conditions for a local extremum. Let J be a functional definedon an open subset D of a normed vector space X , and we consider a vector x∗ ∈ Dfor which the first variation of J vanishes as

(6.3.1) δJ (x∗; ∆x) = 0

for every vector ∆x ∈ X . We also assume that the second variation of J isnonnegative at x∗ as

(6.3.2) δ2J (x∗; ∆x) ≥ 0

for every vector ∆x ∈X . By Example 6.4, x∗ need not be a local extremum vector.We seek suitable additional conditions which will be sufficient to guarantee that x∗

is in fact a local minimum vector in D for J .

(1) Suppose that

(6.3.3) δ2J (x∗; ∆x) > 0

for all nonzero vector ∆x ∈X . As in Proposition 5.2, we have

J (x∗ + ε∆x)− J (x∗) =ε2

2δ2J (x∗; ∆x) +R2(x∗; ∆x; ε)

where

limε→0

R2(x∗; ∆x; ε)

ε2= 0.

When X is finite-dimensional, it can be showed that, under conditions(6.3.1), (6.3.2), and (6.3.3), x∗ is a local minimum vector for J .

(2) If in addition to (6.3.1) J satisfies both

(6.3.4) J (x∗ + ∆x)− J (x∗) =1

2δ2J (x∗; ∆x) + E2(x∗; ∆x),

where

(6.3.5) lim∆x→0

E2(x∗; ∆x)

||∆x||2X= 0,

and

(6.3.6) δ2J (x∗; ∆x) ≥ p||x||2Xfor some positive constant p > 0 and for all small vectors ∆x ∈X , then

J (x∗ + ∆x)− J (x∗) ≥ ||∆x||2X[p

2+E2(x∗; ∆x)

||∆x||2X

]≥ 0

for all sufficiently small vectors ∆x ∈X .

Department of Mathematics, Johns Hopkins University, 3400 North Charles Street,Baltimore, MD 21218

E-mail address: [email protected]; [email protected]

VARIATIONAL METHODS Contentsmath.jhu.edu/~yli/Variational methods.pdfVARIATIONAL METHODS 3 Note that...

Documents

Transcript of VARIATIONAL METHODS Contentsmath.jhu.edu/~yli/Variational methods.pdfVARIATIONAL METHODS 3 Note that...