CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical...
Transcript of CHAPTER FOUR CRITICAL POINT ANALYSIS AND OPTIMIZATION …sswift/Homepage/Teaching/Mathematical...
tim swift 1
CHAPTER FOUR
CRITICAL POINT ANALYSIS AND OPTIMIZATION
4.0 Motivation
We wish to generalize the techniques of critical point analysis, and of local and global
optimization, which you know in the context of the theory of real-valued functions of one
real variable. In this chapter, we will consider critical point analysis and optimization for
real-valued functions of several real variables.
4.1 Critical Points and Local Extrema
We start off with the key definitions, which are straight generalizations of the one-
dimensional ones; for neighbourhoods of points, we just replace open intervals by open
balls.
Definition 4.1.1
Suppose that pA R⊆ , and that R→Af : . Let A∈0x .
Then, 0x is said to be a local maximum of f if
)))()(())(()()(0( 00 xxxxx ffBAr r ≤⇒∈∈∀>∃ .
Similarly, 0x is said to be a local minimum of f if
)))()(())(()()(0( 00 xxxxx ffBAr r ≥⇒∈∈∀>∃ .
Finally, 0x is said to be a local extremum of f if 0x is either a local maximum of f
or a local minimum of f .
Remark 4.1.2
Denote by )(Loc.Max. f the set of all local maxima of f , )(Loc.Min. f the set of all
local minima of f , and )(Loc.Extr. f the set of all local extrema of f .
Then, by definition, we have
)(Loc.Min.)(Loc.Max.)(Loc.Extr. fff ∪= )( U⊆ .
tim swift 2
Definition 4.1.3
Suppose that pU R⊆ is open, and that R→Uf : is differentiable.
Then, U∈0x is said to be a critical point of f if
0x =)( 0Df .
Define
})(:{)(Crit 0xx =∈= DfUf )( U⊆ .
Remark 4.1.4
Note that
=
∂∂
∈∀∈= 0)()},...,1{(:)(Crit xxjx
fpjUf .
Just as in the on-dimensional case, we have the following basic theorem connecting
local extrema and critical points for a differentiable function. The result tells us that we
should look among the critical points to find local extrema (but, of course, it does not say
that every critical point is a local extremum!).
Theorem 4.1.5
Suppose that pU R⊆ is open, and that R→Uf : is differentiable.
Then,
)(Crit)(Loc.Extr. ff ⊆ .
Remark 4.1.6
The proof of Theorem 4.1.5 is based on the one-dimensional case: fix pRh∈
consider the function ) (; 0 hx tftg +a defined on an interval containing R∈0 .
To complement Theorem 4.1.5, we make the following definition.
tim swift 3
Definition 4.1.7
Suppose that pU R⊆ is open, and that R→Uf : is differentiable.
Then, U∈0x is said to be a saddle point of f if 0x is a critical point of f , but 0x is
not a local extremum of f .
Remarks 4.1.8
Denote by ).(Pt.Sad f the set of all saddle points of f .
Then we have the following:
)(Loc.Extr.)(Crit).(Pt.Sad fff −= ,
and
)(Crit f ).(Pt.Sad)(Loc.Extr. ff ∪=
).(Pt.Sad))(Loc.Min.)(Loc.Max.( fff ∪∪= .
Note also the following.
Suppose that )(Crit0 f∈x .
Then ).(Pt.Sad0 f∈x if and only if
)))()(())(())()(())((()0( 0000 xxxxxxxx ffBUffBUr rr <∩∈∃∧>∩∈∃>∀ −−++
i.e., no matter how small an open neighbourhood of the critical point 0x we take, we
can always find two points in this neighbourhood, one of which has a function value
larger than that at 0x , the other having a function value less than that at 0x .
tim swift 4
Examples 4.1.9
(i) Define 222 ),(;: yxyxf −−→ aRR .
Then, )(Loc.Max.(0,0) f∈ .
(ii) Define 222 ),(;: yxyxf −→ aRR .
Then, )(Sad.Pt.(0,0) f∈ .
(iii) Define 222 ),(;: yxyxf +→ aRR .
Then, )(Loc.Min.(0,0) f∈ .
Remark 4.1.10
The examples just given are important, because, as we will see in the next section,
they provide local models for non-degenerate critical points of a function R→Uf : ,
where 2open
R⊆U .
4.2 Critical Point Analysis
We will now discuss a means of analyzing critical points, namely their location and
their classification.
Remark 4.2.1
To locate the critical points of a differentiable function R→Uf : , where pU Ropen⊆ ,
we need to find the subset )(Crit f of U . In other words, we need to find all
Uxx p ∈= )...,,( 1x which satisfy the p simultaneous equations
=
∂∂
∧∧
=
∂∂ 0)(...0)(
1
xxpx
fxf .
tim swift 5
Having found all the critical points, we then need to classify each one as a local
maximum, a local minimum or a saddle point. For us, the local extrema will be the most
important, because every global extremum is, of course, a local extremum. Note
however, there are situations where saddle points are also important, e.g., in some areas
of Economics.
Remark 4.2.2
Recall from Theorem 3.2.8 that, if U∈0x and h is near 0 , then
h.o.t. )))((( 21)()()( 0000 +++=+ hxhhxxhx fhessDfff T .
Hence, if )(Crit0 f∈x , we may write
h.o.t.)))((( 21)()( 000 +=−+ hxhxhx fhessff T ,
so that we might expect that the local behaviour of f near the critical point is
controlled by the value of the hessian matrix at 0x , at least if this matrix is non-
singular.
This remark underlies the second derivative test for classifying (nondegenerate)
critical points.
Before we can state the second derivative test, we need the following definitions from
linear algebra. Recall that we use symmppM × to denote the set of all symmetric pp×
matrices with real entries.
tim swift 6
Definition 4.2.3
Let symmppMS ×∈ , and consider the associated quadratic form
hhhRR Sq TpS a;: → .
We say that S is negative definite if
)0)(()}{( <−∈∀ h0Rh Sp q .
We say that S is positive definite if
)0)(()}{( >−∈∀ h0Rh Sp q .
We say that S is indefinite if
))0)()((())0)()((( >∈∃∧<∈∃ ++−− hRhhRh Sp
Sp qq .
Theorem 4.2.4 (The second derivative test for classifying critical points)
Suppose that pU R⊆ is open, and that R→Uf : has continuous second partial
derivatives.
Let )(Crit0 f∈x . Then, we have:
( ))(( 0xfhess negative definite ) ⇒ ( )(Loc.Max.0 f∈x ) ;
( ))(( 0xfhess positive definite ) ⇒ ( )(Loc.Min.0 f∈x ) ;
( ))(( 0xfhess indefinite ) ⇒ ( )(Sad.Pt.0 f∈x ) .
Remark 4.2.5
In order to be able to apply Theorem 4.2.4 in a useful way, we need criteria for
determining whether or not a symmetric pp× matrix is positive definite, negative
definite or indefinite. We do not consider this problem in general, but deal only with
the cases 1=p and 2=p .
tim swift 7
Examples 4.2.6
(i) Suppose that 1=p .
Let )(Crit0 fx ∈ , where R→),(: baf .
Thus, 0xx = is a solution of the equation
0)( =′ xf .
Note that )())(( 00 xfxfhess ′′= (after identifying 11symm
11 ×× = MM with R ), so
that the associated quadratic form is the function 2
0)( )(;:0
hxfhq xf ′′→′′ aRR .
Hence, Theorem 4.2.4 just reduces to the familiar Theorem 1.5.13:
))(Loc.Max.()0)(( 00 fxxf ∈⇒<′′ ;
))(Loc.Min.()0)(( 00 fxxf ∈⇒>′′ .
Note that the hypothesis in the third possibility of Theorem 4.2.4 cannot arise if
1=p . This is because a 11× symmetric matrix cannot be indefinite. Indeed,
the image of 20)( )(;:
0hxfhq xf ′′→′′ aRR cannot contain both a positive real
number and a negative one.
Thus, in the 1=p case, if )(Crit0 fx ∈ and 0x is a nondegenerate critical point,
i.e., if 0)( 0 ≠′′ xf , then 0x is automatically a local extremum of f .
If, on the other hand, 0x is a degenerate critical point, i.e., if 0)( 0 =′′ xf , then
)())(( 00 xfxfhess ′′= does not provide sufficient information to classify the
critical point. Indeed, as you know, if 0)( 0 =′′ xf , then the critical point 0x
could be a local maximum, or a local minimum, or neither.
tim swift 8
(ii) Suppose now that 2=p .
Note that this is the most important case for this module.
Suppose that 2open
R⊆U , and that ),(),(;: yxfyxUf aR→ has continuous
second partial derivatives.
Let )(Crit),( 00 fyx ∈ . Thus, ),(),( 00 yxyx = is a solution of the simultaneous
equations
)0),(()0),(( =∧= yxfyxf yx .
Before we consider the hessian matrix, we need the following linear algebra
lemma concerning symmetric 22× matrices
Lemma
Let symm22×∈
= M
cbba
S , so that 2)det( bacS −= . Then, we have
( S is negative definite ) ⇔ ( )0()0)det(( <∧> aS ) ;
( S is positive definite ) ⇔ ( )0()0)det(( >∧> aS ) ;
( S is indefinite ) ⇔ ( 0)det( <S ) .
We apply this lemma to the symmetric 22× matrix
=
),(),(),(),(
),)((0000
000000 yxfyxf
yxfyxfyxfhess
yyyx
yxxx ,
thereby arriving at the following version of Theorem 4.2.4 in the case 2=p .
Put )),)(((det),( 0000 yxfhessyx =∆ . Then,
( )0),(()0),(( 0000 <∧>∆ yxfyx xx ) ⇒ ( )(Loc.Max.),( 00 fyx ∈ ) ;
( )0),(()0),(( 0000 >∧>∆ yxfyx xx ) ⇒ ( )(Loc.Min.),( 00 fyx ∈ ) ;
( 0),(( 00 <∆ yx ) ⇒ ( )(Sad.Pt.),( 00 fyx ∈ ) .
tim swift 9
We repackage the discussion of Example 4.2.6(ii) as an algorithm as follows.
Remark 4.2.7 (Algorithm for critical point analysis for functions of two variables)
To locate and classify the critical points of ),(),(;: yxfyxUf aR→ , perform the
following three steps.
Step Zero
For a general Uyx ∈),( , compute
( )),(),(),( yxfyxfyxDf yx= ,
and
=
),(),(),(),(
),)((yxfyxfyxfyxf
yxfhessyyyx
yxxx .
Step One
Find )(Crit f })00(),(:),({ =∈= yxDfUyx
})0),(()0),((:),({ =∧=∈= yxfyxfUyx yx .
Step Two
For each )(Crit),( 00 fyx ∈ , compute the symmetric matrix ),)(( 00 yxfhess ,
and also the real number )),)(((det),( 0000 yxfhessyx =∆ .
Then apply the second derivative test as described above to classify each critical
point:
( )0),(()0),(( 0000 <∧>∆ yxfyx xx ) ⇒ ( )(Loc.Max.),( 00 fyx ∈ ) ;
( )0),(()0),(( 0000 >∧>∆ yxfyx xx ) ⇒ ( )(Loc.Min.),( 00 fyx ∈ ) ;
( 0),(( 00 <∆ yx ) ⇒ ( )(Sad.Pt.),( 00 fyx ∈ ) .
tim swift 10
Note that the algorithm described in Remark 4.2.7 does not deal with the possibility
that 0)),)(((det 00 =yxfhess . This is the generalization of the one-dimensional
condition 0)( 0 =′′ xf . We make the following general definition (cf. Remarks 1.5.14).
Definition 4.2.8
Suppose that pU R⊆ is open, and that R→Uf : has continuous second partial
derivatives.
The critical point 0x of f is said to be nondegenerate if the symmetric pp× matrix
))(( 0xfhess is nonsingular, i.e., if
0)))(((det 0 ≠xfhess .
The critical point 0x of f is said to be degenerate if the symmetric pp× matrix
))(( 0xfhess is singular, i.e., if
0)))(((det 0 =xfhess .
Remarks 4.2.9
Thus our second derivative test in the 2=p case, i.e., Step Two of the algorithm in
Remark 4.2.7, deals only with the case of nondegenerate critical points.
If a critical point happens to be degenerate, then, just as in in the 1=p case, we need
additional information - i.e., information beyond the hessian - to classify the critical
point. Ways of doing this include a consideration of higher partial derivatives or of
the geometry of the graph.
In this module, we will deal, in the main, only with nondegenerate critical points, so
that our algorithm can be followed through right to completion.
tim swift 11
4.3 Global Extrema
We make a few brief remarks on the nature of global extrema. The discussion
generalizes that Section 1.5 where we reviewed the one-dimensional case.
Definition 4.3.1
Suppose that pA R⊆ , and that R→Af : . Let A∈0x .
We say that 0x is a global maximum of f if
))()(()( 0xxx ffA ≤∈∀ .
Similarly, we say that 0x is a global minimum of f if
))()(()( 0xxx ffA ≥∈∀ .
We say that 0x is a global extremum of f if 0x is either a global maximum of f or a
global minimum of f .
Denote by )(Glob.Max. f the set of all global maxima of f , and )(Glob.Min. f the
set of all global minima of f .
If )(Glob.Max.0 f∈x , then )( 0
def.
max xff = is called the maximum value of f .
If )(Glob.Min.0 f∈x , then )( 0
def.
min xff = is called the minimum value of f .
Remark 4.3.2
By definition, we have (cf. Definition 4.1.1)
)(Loc.Max.)(Glob.Max. ff ⊆ ,
and
)(Loc.Min.)(Glob.Min. ff ⊆ .
tim swift 12
Remark 4.3.3
Note that global extrema of functions on open sets, even bounded open sets, do not
always exist.
Example 4.3.4
Consider the differentiable function
211
1;)(:x
xR0−
→ aBf .
By inspection, }{)(Glob.Min. 0=f , and 1)(min == 0ff .
However, ∅=)(Glob.Max. f , so that maxf does not exist.
If the domain is a closed and bounded set, then we do have the following useful
existence result for continuous functions. (See Definition 2.3.10 for the definition of a
closed subset, and Definition 2.3.18 for the definition of a bounded subset.)
Theorem 4.3.5 (An existence result for global extrema)
Suppose that pV R⊆ is closed and bounded, and that R→Vf : is continuous.
Then,
∅≠)(Glob.Max. f ,
and
∅≠)(Glob.Min. f .
This result provides the following useful way of finding global extrema (cf. Theorem
1.5.18).
tim swift 13
Theorem 4.3.6
Suppose that pV R⊆ is closed and bounded, and write V as a disjoint union
VUV ∂∪= (see Remark 2.3.15). Here, 0VU = is the interior of V , and V∂ is the
boundary of V .
Suppose further that R→Vf : is continuous, and that R→Uf U : is differentiable.
Let )(Glob.Max.0 f∈x . Then one of the following holds:
(i) V∂∈0x ;
(ii) )(Loc.Max.0 Uf∈x .
A corresponding statement holds if 0x is a global minimum of f .
Remark 4.3.7
Theorem 4.3.6 provides us with a means of locating, say, maxf , for a function
R→Vf : on a closed and bounded set.
Firstly, we locate the local maxima of the restriction of the function to the interior.
Secondly, we find the local maxima of the restriction of the function to the boundary.
To do this, we can sometimes use the method of constrained optimization (see
Sections 4.4 and 4.5).
Thirdly, we compute the value of the function at all of the points found in the first two
steps. The largest of these values will be the maximum value maxf of the function.
Similarly, we can find the minimum value minf of the function.
tim swift 14
4.4 Constrained Optimization I - Solving the Constraint Equation
In Section 4.2, we considered the problem of locating and classifying the critical
points of a function R→Uf : , where pU R⊆ is open. This is a crucial step in solving
the problem of optimizing the function f , because, as we have seen (cf. Theorem 4.1.5),
the local extrema of f (i.e., the local maxima and the local minima of f ) are contained
in the set )(Crit f of critical points of f . Moreover, finding the local extrema of f will
lead us to the global extrema (if these exist) (cf. Remarks 4.3.2 and 4.3.3). This process
may be described as unconstrained optimization inasmuch as the domain of the objective
function is the whole of the set U , and is not constrained in any way.
Now, in many situations in Applied Mathematics, we are often concerned not with
unconstrained optimization, but with constrained optimization; the variables in the
optimization problem are not free to take on any value, but are constrained in some way.
We now set up a description of constrained optimization.
As above, consider a function R→Uf : , where U - the domain of optimization - is
an open subset of pR . In the theory of optimization, the function f is often called the
objective function. Suppose that the variables ),...,( 1 pxx=x are not free to take just any
value in U , but rather they are constrained to lie in some subset S of U , which we call
the constraint set. We assume that there exists a constraint function R→U:ϕ and a
constraint value R∈c such that S is the c-level set of ϕ , i.e., UcS ⊆= − )(1ϕ (see
Definition 2.5.3). Thus, the constraint equation may be written c=)(xϕ .
We may summarize the above discussion as follows.
tim swift 15
Remark 4.4.1 (The language of constrained optimization)
pU Ropen
⊆ the domain of optimization ;
R→Uf : the objective function ;
R→U:ϕ the constraint function ;
R∈c the constraint value ;
)(1 cS −= ϕ the constraint set ;
c=)(xϕ the constraint equation .
Example 4.4.2
Take pU R= , 2;: xxR a→Uϕ , and 1=c .
The constraint set is then the unit )1( −p -dimensional sphere, )(1 0S , sitting inside pU R= .
Our constrained optimization problem in this case would be: find maxf (or minf ) for
an objective function R→Uf : subject to the condition that the variable x is
constrained to lie on )(1 0S .
Having set up a framework of constrained optimization, the problem now is to
optimize f subject to the imposed constraint. In other words, we have to optimize the
function R→Sf S : , the restriction of the objective function f to the constraint set S .
Thus, we have:
Remark 4.4.3 (The problem of constrained optimization)
Optimize the restricted function R→Sf S : ,
i.e., find ( )maxSf
(or ( )minSf , depending on the type of problem under consideration).
tim swift 16
Ideally, we would obtain an ‘explicit’ description of the function R→Sf S : , and we
would apply the techniques of unconstrained optimization described in Sections 4.2 and
4.3. In particular, by analogy with the unconstrained case, we might expect that any
solutions to the constrained optimization problem would be among the critical points of
0Sf or possibly on the boundary of S .
One way of proceeding may be described as follows.
Remark 4.4.4 (Solving the constraint equation)
We try to solve the constraint equation, i.e., we use the implicit relation
cxx p =),...,( 1ϕ ,
to express px (say) as an explicit function of the remaining variables 11 ,..., −pxx ,
),...,( 11 −= pp xxXx .
Then we could substitute for px in ),...,( 1 pxxf , thereby obtaining a function of
)1( −p variables, viz.
)),...,(,,...,(),...,(ˆ),...,(; ˆ:ˆ1111
.
1111 −−−− =→ pp
def
pp xxXxxfxxfxxUf aR ,
(where the domain U must be specified).
Finally we could apply our unconstrained optimization techniques (see Sections 4.2
and 4.3) to the unconstrained function ˆ:ˆ R→Uf .
Note that this method of solving the constraint equation transforms a constrained
optimization problem in p variables into an unconstrained problem in )1( −p
variables.
If it works, then this method is sometimes the most direct way to proceed.
tim swift 17
Example 4.4.5
Suppose that we wish to construct an open cuboidal box which contains a given
volume 3. mV . Letting the base dimensions be . 1 mx and . 2 mx , and the height . 3 mx ,
we see that the volume constraint may be written Vxxx =321 . Suppose that we wish
to minimize the amount of material used in the construction of the box, i.e., we wish to
minimize the total surface area 2323121 . ))(2( mxxxxxx ++ . Using the language
developed above, we have:
domain of optimization }0,,:),,({ 3213
321 >∈= xxxxxxU R ;
objective function 32121321 )(2),,(;: xxxxxxxxUf ++→ aR ;
constraint function 321321 ),,(;: xxxxxxU aR→ϕ ;
constraint value Vc = ;
constraint set }:),,({ 321321 VxxxUxxxS =∈= .
Solving the constraint equation
Vxxx =321
gives us
),( 213 xxXx = ,
where
2121 ),(
xxVxxX = .
tim swift 18
Hence, defining the open set
}0,:),({ˆ21
221 >∈= xxxxU R ,
we must minimize the function
++=++=→
2121
212121212121
112)(2)),(,,(),(;ˆ:ˆxx
Vxxxx
VxxxxxxXxxfxxUf aR
Now, having solved the constraint, it remains for us to perform an unconstrained
optimization analysis on the function R→Uf ˆ:ˆ of two variables. Thus, we compute
)ˆCrit( f , etc., as in Section 4.2.
Remark 4.4.6
Note that, in the above example, and, of course, in general optimization problems, we
must prove that we actually find a global extremum of Sf . This can be done
explicitly, or by using additional information.
For example, suppose that, for some reason, we know that Sf has a unique global
extremum. Then, if there exists only one critical point, we must indeed have located
this global extremum.
4.5 Constrained Optimization II - The Method of Lagrange Multipliers
Unfortunately, in general, it is not possible, or not convenient, to solve the constraint
equation cxx p =),...,( 1ϕ to give one of the variables as an explicit function of the
remaining )1( −p variables.
Hence, another procedure is required. We now describe such a procedure, namely the
very powerful Method of Lagrange Multipliers. In order to state the result on which the
Method of Lagrange Multipliers is based, it is useful first to introduce the notion of the
gradient vector field.
tim swift 19
Definition 4.5.1
Suppose that pU Ropen
⊆ , and that R→UF : is differentiable.
The gradient of the function F is the vector field pUF R→∇ : defined by
∂∂
∂∂
=∇px
FxFF ,...,
1
.
Remarks 4.5.2
On identifying pR with the space of )1( p× -matrices pM ×1 , the gradient of
R→UF : is nothing more than the derivative of F (cf. Definition 3.1.7).
In particular, observe that, for U∈x , we have 0x =∇ )(F if and only if )(Crit F∈x .
The following result partly explains the geometric significance of the gradient vector
field.
Proposition 4.5.3
Suppose that pU Ropen⊆ , and that R→UF : is continuously differentiable.
Let R∈c . Suppose that )(10 cF −∈x , and assume that 0x ≠∇ )( 0F .
Then, in a neighbourhood of 0x , the c -level set )(1 cF − is a hypersurface (of
dimension equal to )1( −p ).
Moreover, )( 0xF∇ is a normal vector to )(1 cF − at 0x
(i.e., )( 0xF∇ is orthogonal to every tangent vector to )(1 cF − at 0x ).
The following Theorem, which gives a necessary condition for a constrained local
extremum, is fundamental to the Method of Lagrange Multipliers:
tim swift 20
Theorem 4.5.4
Suppose that pU Ropen
⊆ , and that R→Uf : and R→U:ϕ are continuously
differentiable functions.
Let R∈c , and denote by S the c -level set of ϕ , i.e., )(1 cS −= ϕ . Consider
R→Sf S : , the restriction of f to the subset S of U .
Let S∈0x , and assume that )(Crit0 ϕ∉x , i.e., assume that 0x ≠∇ )( 0ϕ .
Then, if )(Loc.Extr.0 Sf∈x , there exists R∈0λ such that
)()( 000 xx ϕλ ∇=∇f .
Remark 4.5.5
The conclusion of Theorem 4.5.4 may be interpreted geometrically as follows:
if R→Uf : , when restricted to the hypersurface US ⊆ , has a local extremum at
S∈0x , then )( 0xf∇ is orthogonal to S at 0x .
Remark 4.5.6
Theorem 4.5.4 provides us with two mutually exclusive possibilities for a local
extremum of the restriction of the objective function to the constraint set.
If )(Loc.Extr.0 Sf∈x , then, precisely one of the following two conditions must hold:
(Con 1) ))(())(( 00 0xx =∇∧= ϕϕ c ;
(Con 2) )))()(()(())(())(( 000000 xxR0xx ϕλλϕϕ ∇=∇∈∃∧≠∇∧= fc ;
In (Con 2), such a real number 0λ is called a Lagrange multiplier. In certain
applications of the method - e.g., to problems in Economics - the Lagrange multiplier
may be interpreted in a very useful way.
tim swift 21
Thus, in order to locate local extremal points of the constrained problem, we need to
search among points U∈0x satisfying either (Con 1) or (Con 2).
Define subsets 1L and 2L of the domain of optimization U as follows:
}1)(Consatisfies:{1 xx UL ∈= ;
}2)(Consatisfies:{2 xx UL ∈= .
We consider 1L and 2L in turn.
We have
)(Crit}))(())((:{1 ϕϕϕ ∩==∇∧=∈= ScUL 0xxx .
Thus, in order to obtain 1L , we just find all solutions U∈x of the simultaneous
equations c=)(xϕ and 0x =∇ )(ϕ . Alternatively, we could find )Crit(ϕ and then
intersect this with the constraint set S . Note that, in general, 1L will be ‘small’.
We have
})))()(()(())(())((:{2 xxR0xxx ϕλλϕϕ ∇=∇∈∃∧≠∇∧=∈= fcUL .
In order to find 2L , it is convenient to introduce the lagrangian function
RR →×− )(: 1LUg defined by
))(()(),( cfg −−= xxx ϕλλ ,
for all Rx ×−∈ )(),( 1LUλ .
Observe that:
)Crit(g })0))((())()((:)(),({ =−−∧=∇−∇×−∈= cfLU I x0xxRx ϕϕλλ
}))()(())(())((:),({ xx0xxRx ϕλϕϕλ ∇=∇∧≠∇∧=×∈= fcU .
tim swift 22
Hence, we see that
}))Crit(),(()(:{2 gUL ∈∈∃∈= λλ xRx .
Thus, in order to obtain 2L , we can just find all the critical points ),( λx of the
lagrangian function g , and then project out x from each one.
Observe that, having found 1L , we are left with the problem of finding the critical
points of RR →×− )(: 1LUg , a function of )1( +p variables ),,...,(),( 1 λλ pxx=x .
Thus, we might say that the Method of Lagrange Multipliers transforms a constrained
optimization problem in p variables into an unconstrained problem in )1( +p
variables.
The disjoint union SLLL ⊆∪= 21
def. contains all the critical points of R→Sf S : .
Some of these critical points might be local maxima, some might be local minima, but
some might be neither (just as in unconstrained critical point analysis). The next step
is to classify the elements of L .
One possible classification method uses a second derivative test, which involves the
notion of the so-called bordered hessian matrix, but we will not discuss this here.
Another method is to examine the geometry of the situation using level sets of the
objective function (see Examples 4.5.8).
Alternatively, if we had some theoretical result asserting the existence of global
extrema of R→Sf S : , then we need only compare the values of the objective
function at the critical points on our list (again, see Examples 4.5.8).
Now, for convenience, we express the Method of Lagrange multipliers, as described in
Remarks 4.5.6, in the form of an algorithm.
tim swift 23
Remark 4.5.7 (Algorithm for applying the Method of Lagrange Multipliers)
The problem is to optimize )(xf subject to c=)(xϕ . Perform the following five
steps.
Step Zero
Identify the data (cf. Remark 4.4.1):
pU Ropen⊆ the domain of optimization ;
R→Uf : the objective function ;
R→U:ϕ the constraint function ;
R∈c the constraint value .
Step One
Compute )Crit(1 ϕ∩= SL by finding all U∈x which satisfy the simultaneous
equations c=)(xϕ and 0x =∇ )(ϕ .
Step Two
(i) Write down the lagrangian function
))(()(),(; )(: 1 cfLUg −−→×− xxxRR ϕλλ a .
(ii) Find )Crit(g in the usual way (cf. Section 4.2).
(iii) Compute 2L by projecting out x from each element ),( λx of )Crit(g .
Step Three
Write down 21 LLL ∪= .
Then L is a subset of S which contains all possible candidates for a solution to
the original constrained optimization problem.
tim swift 24
Step Four
If it is known that the problem does have a solution (say from some theoretical
result), then this solution must be an element of the set L constructed in Step
Three. To identify the solution, proceed as follows. For each L∈x , compute
the corresponding value )(xf of the objective function, and see which x give
the highest or lowest value.
For example, suppose that the constraint set S is closed and bounded. Then,
because the constrained objective function R→Sf S : is continuous, it is
bounded, and, moreover, it attains its global maximum value ( )maxSf and its
global minimum value ( )minSf at some points of S . In particular, suppose that
2=L . Then one of the elements of L must be the global maximum, and the
other must be the global minimum. However, if 2>L , then some of the points
in L might not correspond to local extrema of R→Sf S : .
Note that, if S is not bounded, then Sf need not possess a global maximum or
global minimum; in this case, the original problem may not have a solution.
Examples 4.5.8
(i) Problem: maximize 21xx subject to 23 21 =+ xx .
Step Zero
2R=U ;
2121 ),(; : xxxxUf aR→ ;
2121 3),(; : xxxxU +→ aRϕ ;
2=c .
tim swift 25
Step One
We have 11
=∂∂xϕ and 3
2
=∂∂xϕ , so that ∅=)Crit(ϕ .
Hence, ∅=∩= )Crit(1 ϕSL .
Step Two
The lagrangian function g is given by
)23(),,(; : 2121212 −+−→× xxxxxxg λλ aRRR ,
so that λ−=∂∂
21
xxg , λ31
2
−=∂∂ xxg , and )23( 21 −+−=
∂∂ xxgλ
.
Hence, )Crit(),,( 21 gxx ∈λ if and only if ),,( 21 λxx satisfies the following
system of equations:
=−+=−=−
.023,03,0
21
1
2
xxxx
λλ
(*)
(Note that - as expected! - the equation 0=∂∂λg just gives us back the
constraint.)
As can be easily checked - do it! - the system (*) has a unique solution, namely
),,1(),,( 31
31
21 =λxx .
Hence, }),1({ 31
2 =L .
Step Three
We have }),1({ 31
21 =∪= LLL .
Step Four
It remains to check whether ),1(),( 31
21 =xx is a solution of our original
constrained optimization problem.
(Note that, since the constraint set }23:),{( 212
21 =+∈= xxxxS R is a line, we
can’t use existence arguments based on boundedness.)
tim swift 26
In fact, it is straightforward to check that Sxx ∈= ),1(),( 31
21 gives the global
maximum of R→Sf S : .
One way of doing this is using a geometrical argument. On the same diagram,
draw some appropriate level sets (curves) of 2121 ),( xxxxf = , together with the
constraint set (line) 23 21 =+ xx . Convince yourself that the point on
23 21 =+ xx where 2121 ),( xxxxf = takes its maximum value is the unique
point where 23 21 =+ xx is tangent to a level curve of f ; this point is indeed
),1(),( 31
21 =xx .
Note that this kind of level set analysis is often useful for classifying constrained
extrema.
(ii) Problem: minimize 1x subject to 022
31 =− xx .
Step Zero
2R=U ;
121 ),(; : xxxUf aR→ ;
22
3121 ),(; : xxxxU −→ aRϕ ;
0=c .
Step One
We have 21
1
3xx
=∂∂ϕ and 2
2
2xx
−=∂∂ϕ , so that )}0,0{()Crit( =ϕ .
Since }0:),{( 22
31
221 =−∈= xxxxS R ,
we have that )}0,0{()Crit(1 =∩= ϕSL .
tim swift 27
Step Two
The lagrangian function g is given by
)(),,(; )})0,0{((: 22
31121
2 xxxxxg −−→×− λλ aRRR ,
so that 21
1
31 xxg λ−=
∂∂ , 2
2
2 xxg λ=
∂∂ , and )( 2
23
1 xxg−−=
∂∂λ
.
Hence, )Crit(),,( 21 gxx ∈λ if and only if ),,( 21 λxx satisfies the following
system of equations:
=−==−
.0,02
,031
22
31
2
21
xxxx
λλ
(**)
You will see that the system (**) has no solutions – prove this! - so ∅=2L .
Step Three
We have })0,0({21 =∪= LLL .
Step Four
It remains to check whether )0,0(),( 21 =xx is a solution of our original
constrained optimization problem. Draw a picture of the constraint set
}0:),{( 22
31
221 =−∈= xxxxS R . Can you see why )0,0(),( 21 =xx is the
unique solution of the problem of minimizing 1x subject to Sxx ∈),( 21 ?
(iii) Problem: maximize 22
1 xx − subject to 122
21 =+ xx .
Step Zero
2R=U ;
22
121 ),(; : xxxxUf −→ aR ;
22
2121 ),(; : xxxxU +→ aRϕ ;
1=c .
tim swift 28
Step One
We have 11
2xx
=∂∂ϕ and 2
2
2xx
=∂∂ϕ , so that )}0,0{()Crit( =ϕ .
However, since }1:),{( 22
21
221 =+∈= xxxxS R ,
we see that ∅=∩= )Crit(1 ϕSL .
Step Two
The lagrangian function g is given by
)1(),,(; : 22
212
2121
2 −+−−→× xxxxxxg λλ aRRR ,
so that 111
22 xxxg λ−=
∂∂ , 2
2
21 xxg λ−−=
∂∂ , and )1( 2
22
1 −+−=∂∂ xxgλ
.
Hence, )Crit(),,( 21 gxx ∈λ if and only if ),,( 21 λxx satisfies the following
system of equations:
=−+=−−=−
.01,021,022
22
21
2
11
xxxxx
λλ
(***)
Solve the system (***).
You should find that })1,,(),1,,(),,1,0(),,1,0({)Crit( 21
23
21
23
21
21 −−−−−−=g .
Hence, we obtain }),(),,(),1,0(),1,0({ 21
23
21
23
2 −−−−=L .
Step Three
We have }),(),,(),1,0(),1,0({ 21
23
21
23
21 −−−−=∪= LLL .
Step Four
Note that the constraint set )0,0(}1:),{( 12
22
12
21 SxxxxS ==+∈= R is the
unit circle, and this is closed and bounded.
Hence the continuous function R→Sf S : attains its global maximum value
( )maxSf and its global minimum value ( )
minSf at some points of S .
tim swift 29
A global maximum of Sf is, in particular, a local maximum of Sf , and so is
an element of the set L . Similarly, a global minimum of Sf is an element of
the set L .
To pick out global maxima and global minima, we just compute the value of f
at each of the four points in L .
We find the following:
1)1,0( −=f ; 1)1,0( =−f ; 45
21
23 ),( =−f ; 4
521
23 ),( =−−f .
We conclude that ( ) 45
21
23
21
23
max),(),( =−−=−= fff S ,
and that ( ) 1)1,0(min
−== ff S .
The nature of the remaining point )1,0( − may be investigated by drawing, in the
same diagram, the constraint set S together with appropriate level curves of f .
You should be able to see that )1,0( − is, in fact, a local minimum of Sf .
(iv) Problem: Let +∈R321 ,, aaa , and consider the ellipsoid
=++∈= 1:),,(2
3
23
22
22
21
213
321ax
ax
ax
xxxS R .
Find the maximum and minimum values of the sum 321 xxx ++ for
Sxxx ∈),,( 321 .
Step Zero
3R=U ;
321321 ),,(; : xxxxxxUf ++→ aR ;
23
23
22
22
21
21
321 ),,(; :ax
ax
ax
xxxU ++→ aRϕ ;
1=c .
tim swift 30
Step One
We have 21
1
1
2ax
x=
∂∂ϕ , 2
2
2
2
2a
xx
=∂∂ϕ and 2
3
3
3
2ax
x=
∂∂ϕ , so that )}0,0,0{()Crit( =ϕ .
However, S∉)0,0,0( , so ∅=∩= )Crit(1 ϕSL .
Step Two
The lagrangian function g is given by
−++−++→× 1),,,(; : 2
3
23
22
22
21
21
3213213
ax
ax
ax
xxxxxxg λλ aRRR ,
so that 21
1
1
21a
xxg λ
−=∂∂ , 2
2
2
2
21a
xxg λ
−=∂∂ , 2
3
3
3
21
ax
xg λ
−=∂∂ , and
−++−=
∂∂ 12
3
23
22
22
21
21
ax
ax
axg
λ.
Hence, )Crit(),,,( 321 gxxx ∈λ if and only if ),,,( 321 λxxx satisfies the
following system of equations:
=−++
=−
=−
=−
.01
,02
1
,02
1
,021
23
23
22
22
21
21
23
3
22
2
21
1
ax
ax
ax
ax
ax
ax
λ
λ
λ
(****)
Solving the system (****) yields the two solutions
=
2,,,),,,(
23
22
21
321q
qa
qa
qa
xxx λ ,
and
−−−−=
2,,,),,,(
23
22
21
321q
qa
qa
qa
xxx λ ,
where, for convenience, we have defined 23
22
21 aaaq ++= .
tim swift 31
Hence, we obtain
−−−
=
qa
qa
qa
qa
qa
qa
L2
32
22
12
32
22
12 ,,,,, .
Step Three
We have
−−−
=∪=
qa
qa
qa
qa
qa
qa
LLL2
32
22
12
32
22
121 ,,,,, .
Step Four
Note that the constraint set S is an ellipsoid, and thus it is closed and bounded.
Hence the continuous function R→Sf S : attains its global maximum value
( )maxSf and its global minimum value ( )
minSf at some points of S .
Since 2=L , one of the elements be the global maximum of Sf , and the
other must be the global minimum.
We have
aq
aq
af =
23
22
21 ,, ,
and
aq
aq
af −=
−−−
23
22
21 ,, ,
so we conclude that the maximum value of the function 321 xxx ++ on the
ellipsoid 123
23
22
22
21
21 =++
ax
ax
ax is equal to 2
32
22
1 aaa ++ , whilst the minimum
value is equal to 23
22
21 aaa ++− .