Chapter 4
Several-variable calculus
4.1 Derivatives of Functions of Several Variables
4.1.1 Functions of Several Variables
² A function f of n variables (x1, x2, . . . , xn) in Rn is an entity that operates onthese variables to produce another real number y = f(x1, x2, . . . , xn).
² x1, x2, . . ., xn are called the independent variables, y the dependent variable.
² We write f : Rn ! R to indicate that f maps Rn (or a domain within Rn) into R.
4.1.2 Geometric Interpretation
For a function of two variables, f(x, y), consider (x, y) as defining a point P in the xy-plane. Let the value of f(x, y) be taken as the length PP 0 drawn parallel to the z-axis(or the height of point P 0 above the plane). Then as P moves in the xy-plane, P 0 mapsout a surface in space whose equation is z = f(x, y).
33
Just as a function of one variable has a graph which is cut only once by each verticalline (constant x), here the surface can only be cut once by each vertical line (constant xand y).
Example: f(x, y) = 6¡ 2x¡ 3yThe surface z = 6¡ 2x¡ 3y, i.e. 2x+ 3y + z = 6, is a plane with intersects:
the x-axis where y = z = 0, i.e. x = 3;the y-axis where x = z = 0, i.e. y = 2;the z-axis where x = y = 0, i.e. z = 6.
Example: f(x, y) = x2 ¡ y2 In the plane x = 0, there is a maximum at y = 0; in theplane y = 0, there is a minimum at x = 0. The whole surface is shaped like a horse’ssaddle; and the point (0, 0) is called a saddle point (of which, more later).
x
y
z = x2 ¡ y2
Example: f(x, y) = x2+y2 The intersection with the plane x = 0 is the parabola z = y2
and with the plane y = 0 is the parabola z = x2. This surface is symmetric about the zaxis, and is a paraboloid (parabolic bowl).
x
y
z = x2 + y2
4.1.3 Partial Derivatives
Given a function of several variables, we could choose to hold all but one of these variablesfixed at arbitrarily chosen values, thereby obtaining a function of one variable (the
34
remaining one), which could then be differentiated.
Definition
Given a function f(x1, . . . , xn) of n variables and an integer k between 1 and n, thepartial derivative
∂f
∂xk= fxk = ∂xkf
of f with respect to the variable xk is the derivative of f with respect to xk only, whilethe remaining n¡ 1 variables are all held fixed. Explicitly
∂f
∂xk´ fxk = lim
δxk→0
·f(x1, . . . , xk−1, xk + δxk, xk+1, . . . , xn)¡ f(x1, . . . , xn)
δxk
¸, (4.1)
itself a function of (x1, . . . , xn). In practice the variables held fixed act as constants:
f(x) = 3x4 + sin (2x) ) df/dx = 12x3 + 2 cos (2x)
f(x, y) = yx4 + sin (yx) ) ∂f/∂x = 4yx3 + y cos (yx) .
Geometrical interpretation of partial derivatives in the case n = 2
Recall that the graph of f is the surface z = f(x, y) with the z coordinate measuredvertically upwards. The cross section of this surface cut by a vertical plane y = constantis a curve whose slope (gradient) is the partial derivative fx. (see figure). Similarly fy isthe slope of the cross section of the graph by a vertical plane x = constant.
One may interpret the partial derivatives fx and fy as the slope encountered by“walking” over the surface in the x and y directions respectively.
35
Remark
It is obvious from the definition that the partial derivative with respect to a particularvariable obeys the same sum, product and quotient rules D II - D IV as the ordinary(single variable) derivative, i.e., if u and v are both functions of x1, . . . , xn, then, fork = 1, . . . , n,
∂
∂xk(u+ v) =
∂u
∂xk+
∂v
∂xk, (4.2)
∂
∂xk(uv) = u
∂v
∂xk+ v
∂u
∂xk, (4.3)
∂
∂xk
³uv
´=
1
v2
µv
∂u
∂xk¡ u ∂v
∂xk
¶(v 6= 0). (4.4)
Corresponding to D I, we have the result that
f(x1, . . . , xn) is independent of xk iff∂f
∂xkis zero for all (x1, . . . , xn). (4.5)
and the consequent result
f(x1, . . . , xn) = constant iff the n partial derivatives are all zero for all (x1, . . . , xn).(4.6)
Corresponding to the chain rule D V, we have the result that, if g is a function ofx1, . . . , xn and f a function of a single variable, then
∂
∂xk[ffg(x1, . . . , xn)g] = f 0fg(x1, . . . , xn)g
∂g
∂xk. (4.7)
For example, if f (g) = sin g and g (x, y) = x2 + xy then
f (x, y) = sin¡x2 + xy
¢fx = (2x+ y) cos
¡x2 + xy
¢A more powerful and very important generalization of the chain rule is coming up laterin this chapter.ExampleCalculate the partial derivatives of the functions:
(a) f(x, y) = x2 + 2xy2 + y3;
(b) f(x, y, z) = xz + eyz + sin (xy).
Solution
(a) Holding y constant gives ∂f∂x= 2x+ 2y2 + 0.
Holding x constant gives ∂f∂y= 0 + 4xy + 3y2.
(b) Holding both y and z constant gives fx = z + 0 + y cos (xy).Holding both x and z constant gives fy = 0 + zeyz + x cos (xy).Holding both x and y constant gives fz = x+ yeyz + 0.
36
Example (implicit partial differentiation)If z is a function of two independent variables x and y, and z satisfies xz + ln z =
2x+ 3y, find ∂z∂xin terms of x, y and z.
SolutionDifferentiating each term in the equation with respect to x, holding y constant, and
treating z as a function of x, we obtain
x∂z
∂x+ z +
1
z
∂z
∂x= 2
so that∂z
∂x=z(2¡ z)(1 + xz)
.
4.1.4 Second and Higher Order Partial Derivatives
Since ∂f∂x= fx and
∂f∂y= fy are themselves functions of x and y, they themselves have
partial derivatives, for which we use the notations
∂2f
∂x2=
∂
∂x
µ∂f
∂x
¶= (fx)x = fxx, (4.8)
∂2f
∂y∂x=
∂
∂y
µ∂f
∂x
¶= (fx)y = fxy, (4.9)
∂2f
∂x∂y=
∂
∂x
µ∂f
∂y
¶= (fy)x = fyx, (4.10)
∂2f
∂y2=
∂
∂y
µ∂f
∂y
¶= (fy)y = fyy. (4.11)
This notation extends obviously to higher order derivatives and to functions of three ormore variables. For obvious reasons, fxy and fyx are called mixed derivatives.ExampleIf f(x, y) = x4y2 ¡ x2y6 then
∂f
∂x= 4x3y2 ¡ 2xy6
∂f
∂y= 2x4y ¡ 6x2y5
∂2f
∂x2= 12x2y2 ¡ 2y6
∂2f
∂y∂x= 8x3y ¡ 12xy5
∂2f
∂y2= 2x4 ¡ 30x2y4
∂2f
∂x∂y= 8x3y ¡ 12xy5
Mixed Derivatives Theorem
If fx, fy and fxy exist and are continuous, then fyx exists and fxy = fyx.
37
Wewill not prove this theorem (we have not fully defined the word continuous); but forreasonable functions it will always apply. This means that to calculate a mixed derivativewe can calculate in either order. For third-order derivatives the mixed derivatives theoremgives fxxy = fxyx = fyxx and so on (check for yourself in the last example).ExampleVerify the Mixed Derivatives Theorem for the function f(x, y) = xy3 + x sinxy.
SolutionUsing the sum, product and chain rules, we see that fx = y3+sinxy+xy cosxy, and
hence that
fxy = (fx)y = 3y2 + x cosxy + (x cosxy ¡ x2y sinxy) = 3y2 + 2x cosxy ¡ x2y sinxy.
Similarly, fy = 3xy2 + x2 cosxy, so fyx = (fy)x = 3y2 + (2x cosxy ¡ x2y sinxy) = fxy.
ExampleIn 3 dimensions, the distance r of a point from the origin is given in terms of its
Cartesian coordinates x, y and z by r =px2 + y2 + z2 = (x2 + y2 + z2)1/2. Show that
the function φ(x, y, z) = 1/r = (x2 + y2 + z2)−1/2 obeys Laplace’s equation
∂2φ
∂x2+
∂2φ
∂y2+
∂2φ
∂z2= 0 (except at the origin).
SolutionBy the chain rule,
∂φ
∂x= ¡1
2(x2 + y2 + z2)−3/2 (2x) = ¡x(x2 + y2 + z2)−3/2.
Therefore, by the product and chain rules,
∂2φ
∂x2= ¡(x2 + y2 + z2)−3/2 + (¡x)
·¡32(x2 + y2 + z2)−5/2 (2x)
¸= ¡(x2 + y2 + z2)−3/2 + 3x2(x2 + y2 + z2)−5/2.
Similarly, by symmetry,
∂2φ
∂y2= ¡(x2 + y2 + z2)−3/2 + 3y2(x2 + y2 + z2)−5/2,
∂2φ
∂z2= ¡(x2 + y2 + z2)−3/2 + 3z2(x2 + y2 + z2)−5/2.
Adding the three above equations now gives
∂2φ
∂x2+
∂2φ
∂y2+
∂2φ
∂z2= ¡3(x2 + y2 + z2)−3/2 + 3(x2 + y2 + z2)(x2 + y2 + z2)−5/2
= ¡3(x2 + y2 + z2)−3/2 + 3(x2 + y2 + z2)−3/2 = 0.
38
4.2 Linear Approximations and Tangents
4.2.1 Tangent to Graph of a Function of One Variable
The tangent to the curve y = f(x) at A = (a, f(a)) is the straight line through A withslope f 0(a), i.e. it has the equation
y = f(a) + (x¡ a)f 0(a). (4.12)
NB1. for this line, dydx= f 0(a) and y = f(a) at x = a.
NB2. The RHS consists of the first two terms of the Taylor expansion of f about x = a(i.e. it is the best linear approximation to f (x) near x = a).ExampleFind the linear approximation to f(x) = 1 + x2 near x = 2.
SolutionIf f(x) = 1 + x2 then f 0(x) = 2x. At the point x = 2 we have f = 5 and f 0 = 4.
Therefore the linear approximation is
f(x) ¼ 5 + 4(x¡ 2) = 4x¡ 3.
4.2.2 Tangent Plane to Graph of a Function of Two Variables
By analogy with the above, this is the (best) linear approximation to f near (a, b), asgiven by the first two terms of the two-variable Taylor series (appendix F). It is the planewhose equation is
z = f(a, b) + (x¡ a)fx(a, b) + (y ¡ b)fy(a, b). (4.13)
NB: For this plane, ∂z∂x= fx(a, b), ∂z
∂y= fy(a, b) and z = f(a, b) at x = a and y = b, i.e.
we have matched the first derivatives and the value of the function at (a, b).ExampleFind the tangent plane to the surface z = f(x, y) = x2 + y2 near the point x = 1,
y = 2.SolutionIf f(x, y) = x2 + y2 then fx = 2x and fy = 2y. At the point (1, 2) we have f = 5,
fx = 2 and fy = 4. Thus the tangent plane is
z = 5 + (x¡ 1)2 + (y ¡ 2)4 = 2x+ 4y ¡ 5.
4.3 Directional derivatives and the gradient vector
For f(x, y), fx and fy measure the rates of change of f along the x and y directions. Howcan we can calculate the rate of change of f in any direction?We need to know how much f changes when both x and y change by small amounts.
Near x = a and y = b, f(x, y) is approximately given by equation (4.13) for the tangentplane. Let x change by a (vanishingly) small amount dx, and y by dy (i.e. x = a + dx,y = b+ dy) then
f (x, y) ¼ f(a, b) + (x¡ a)fx(a, b) + (y ¡ b)fy(a, b)f(a+ dx, b+ dy) ¼ f(a, b) + (dx)fx(a, b) + (dy)fy(a, b).
39
The change in f is df = f(a+ dx, b+ dy)¡ f(a, b), so
df = (dx)fx(a, b) + (dy)fy(a, b)
= rf ¢ dr,
where we have defined the two dimensional vector representing the change in x and y,
dr = dxi+ dyj = (dx, dy)
and the two dimensional gradient vector
rf = ∂f
∂xi+
∂f
∂yj =
µ∂f
∂x,∂f
∂y
¶. (4.14)
We can, additionally, write dr = udr where u is a unit vector in the direction dr and dris the magnitude of the change. Then:
df = rf ¢ udr
and so
Rate of change of f in the direction of u =df
dr= rf ¢ u.
The above generalises to functions of more than two variables. E.g. for a functionof three variables, f (x, y, z) the three-dimensional gradient vector is
rf = ∂f
∂xi+
∂f
∂yj +
∂f
∂zk =
µ∂f
∂x,∂f
∂y,∂f
∂z
¶(4.15)
4.3.1 Two properties of the gradient
The change df in f due to a change in the position by dr = udr is given by
df = rf ¢ dr = rf ¢ udr = jrf j dr cos θ (4.16)
where θ is the angle between the vectors dr andrf . We look at cases where dr is parallelor perpendicular to rf .Property 1. From (4.16) the direction dr for which df is amaximum is that for whichcos θ = 1, or θ = 0, i.e. dr in the direction of rf .Thus
At any point, rf points in the direction in which f is increasing most rapidlyand its magnitude jrf j gives this maximum rate of change.
i.e. rf “points uphill”.Property 2. From (4.16), df = 0 corresponds to θ = π/2, when rf and dr areperpendicular. But df = 0 means that f has not changed – so dr is along the surfacef =constant. Thus
At any point, rf points is perpendicular to the surface f = constant throughthat point.
NB f =constant is a contour of the function f .For a function of two variables, these two properties are illustrated in the following
picture:
40
ExampleIf f(x, y, z) = z3 + 3x2y2 + sin z, find rf .
SolutionThe three partial derivatives are
∂f
∂x= 0 + 6xy2 + 0 = 6xy2
∂f
∂y= 0 + 6x2y + 0 = 6x2y
∂f
∂z= 3z2 + 0 + cos z = 3z2 + cos z
so rf =¡6xy2, 6x2y, 3z2 + cos z
¢.
ExampleIf f (x, y, z) = x2+xy+z, findrf . What is the rate of change of f along the direction
i+2j+2k at the point P (1, 1, 1)? What is the magnitude of the maximum rate of changeof f at this point?Solution
rf =µ∂f
∂x,∂f
∂y,∂f
∂z
¶= (2x+ y, x, 1)
Now, at the point P (1, 1, 1), rf = (3, 1, 1). To find the rate of change of f along avector v = (1, 2, 2) , we need the unit vector along this direction, which is
v̂ =1p
12 + 22 + 22(1, 2, 2) =
µ1
3,2
3,2
3
¶.
So, the rate of change of f in this direction is
rf ¢ v̂ = (3, 1, 1) ¢µ1
3,2
3,2
3
¶= 1 +
2
3+2
3=7
3.
41
The maximum rate of change of f at the point P is jrf j = (11)1/2.ExampleFind a unit vector perpendicular to the surface z = x2 + y2 at the point A(1, 2, 5).
Solution A (via the tangent plane)Earlier, we found the tangent plane to this surface at this point to be
2x+ 4y ¡ z = 5.
The vector equation of a plane can be written as r ¢ n = a where r = (x, y, z) and n is avector perpendicular to the plane. By inspection, we see that
n = (2, 4,¡1)
is such a vector, and so a unit vector in this direction is
n̂ =1p
22 + 42 + 12(2, 4,¡1) = 1p
21(2, 4,¡1) .
Solution B (treat the surface as a contour of a function of three variables).The equation of the surface can be written as
x2 + y2 ¡ z = 0
so if we define a function f(x, y, z) = x2+ y2¡ z we can say the surface is the contour ofthe function f given by
f(x, y, z) = 0 = constant.
We know, from Property 2 above, that rf is perpendicular to the surface f =constant.
rf = (2x, 2y,¡1) = (2, 4,¡1) at the point A(1, 2, 5).
So, as before, a unit vector perpendicular to the surface is
n̂ =1p
22 + 42 + 12(2, 4,¡1) = 1p
21(2, 4,¡1) .
4.4 Stationary (Critical) Points of a Function of TwoVariables
4.4.1 Definition
For a function of two variables, f(x, y), a stationary point (x∗, y∗) is defined to be a pointat which the gradient vector is zero:
rf j(x∗,y∗) = (fx(x∗, y∗), fy(x∗, y∗)) = (0, 0), (4.17)
i.e. both of the partial derivatives ∂f∂xand ∂f
∂yare zero at that point. The value z∗ =
f(x∗, y∗) of f at (x∗, y∗) is the corresponding stationary value (SV).
42
4.4.2 Classification of SP’s of a Function of Two Variables
There are three main types of stationary point for a function of two variables,maximum,minimum and saddle points. These are sketched as follows:
Maximum: A local peak in the function. To get a peak, we must ensure that whenpoint (x, y) moves away from (x∗, y∗) a small distance in any direction, the value off(x, y) always decreases.Minimum: A local trough in the function. To get a trough, we must ensure that whenpoint (x, y)moves away from (x∗, y∗) a small distance in any direction, the value of f(x, y)always increases.Saddle point: Looks like a horse’s saddle!. Moving off in some directions away from(x∗, y∗) leads to an increase in f , while moving off in other directions leads to a decreasein f .
Contours: We can represent the “landscape” of the surface z = f(x, y) by contourlines, which are curves in the (x, y) plane on which f(x, y) takes different constant val-ues. Around a maximum, the value of f(x, y) is always smaller than its value z∗ atthe maximum. The contours are closed loops around the stationary point. Around aminimum, f(x, y) > z∗ and again the contours are closed loops around the stationarypoint.The representation of a saddle point by contour lines has the characteristic appearance
depicted below.At the level of the saddle there are two contour lines which cross at thesaddle. These two crossing contour lines separate two regions in which f > z∗ from tworegions in which f < z∗. Thus, as we move away from the saddle in different directions,there are two pairs of opposite directions in which f stays fixed (along the crossingcontour lines), and these directions separate two opposite ranges of direction in which fincreases from two opposite ranges of direction in which f decreases.
43
44
To investigate what type a given stationary point is, we must look at what values ftakes close to this point. Consider the Taylor expansion (from appendix F) of f(x, y)about a point (x∗, y∗):
f(x, y) = f(x∗, y∗) + (x¡ x∗)fx(x∗, y∗) + (y ¡ y∗)fy(x∗, y∗)
+1
2(x¡ x∗)2 fxx(x∗, y∗) + (x¡ x∗)(y ¡ y∗)fxy(x∗, y∗) +
1
2(y ¡ y∗)2 fyy(x∗, y∗)
+ higher order terms,
(NB this matches the first and second derivatives of f(x, y) at the point (x∗, y∗)).Suppose that (x∗, y∗) is a stationary point. Then fx(x∗, y∗) = fy(x∗, y∗) = 0, and if
we label the values
f(x∗, y∗) = z∗, x¡x∗ = δx, y¡y∗ = δy, fxx(x∗, y∗) = A, fxy(x
∗, y∗) = B, fyy(x∗, y∗) = C,(4.18)
we can rewrite the Taylor series in the form
f(x, y) = z∗ +1
2
£A δx2 + 2B δxδy + C δy2
¤+ higher order terms, (4.19)
where it is convenient to write
Q(δx, δy) = A δx2 + 2B δxδy + C δy2 (4.20)
for the quadratic expression in the square brackets.Let’s look at the values of Q around a circle surrounding the stationary point, i.e. let
δx = δs cos θ and δy = δs sin θ
where θ is an angle we can vary. Note that,
² For a minimum, Q will always be positive (f > z∗).
² For a maximum, Q will always be negative (f < z∗).
² For a saddle, Q will change sign around the circle.
Substituting in, we get:
Q(δs cos θ, δs sin θ) = δs2¡A cos2 θ + 2B cos θ sin θ + C sin2 θ
¢= δs2
·1
2A(1 + cos 2θ) +B sin 2θ +
1
2C(1¡ cos 2θ)
¸.
After a few more trig identities (see Appendix G) we get
Q(δs cos θ, δs sin θ) = δs2·1
2(A+ C) +R cos(2θ ¡ φ)
¸where R > 0 and
R2 =1
4(A+ C)2 + (B2 ¡AC)
and the angle φ is such that 12(A¡ C) = R cosφ and B = R sinφ.
45
ConsiderQ
δs2=
·1
2(A+ C) +R cos(2θ ¡ φ)
¸.
As θ varies, this oscillates with amplitude R about an average value of 12(A+C). Hence,
ifR >
1
2jA+ Cj
then the oscillations are large enough to change the sign of Q as θ is varied, giving asaddle point. This condition simplifies to
R2 >1
4(A+ C)2
i.e. AC ¡B2 < 0
Hence, the condition for a saddle point is:
fxxfyy ¡ f2xy < 0 Condition for a saddle point.
If it is not a saddle point, then it is a maximum or a minimum. We can determine whichby looking at the sign of fxx (or fyy). Appendix G gives more details. Hence:
fxxfyy ¡ f2xy > 0, fxx > 0 Condition for a minimum.
fxxfyy ¡ f2xy > 0, fxx < 0 Condition for a maximum.
Note that in BOTH cases the function fxxfyy ¡ f2xy must be POSITIVE at the SP.Example:Locate and classify the stationary points of the function f(x, y) = 12x3+y3+12x2y¡
75y.Solutionfx = 36x
2 + 24xy = 12x(3x+ 2y), fy = 3y2 + 12x2 ¡ 75 = 3(4x2 + y2 ¡ 25). SP’s givenby fx = 0, fy = 0.
For fx = 0 we havex (3x+ 2y) = 0
so EITHER x = 0 OR 3x+ 2y = 0, y = ¡32x.
If x = 0 then fy = 3(y2 ¡ 25) = 0) y = §5.If y = ¡3
2x then
fy = 3¡4x2 + y2 ¡ 25
¢= 3
µ4x2 +
9
4x2 ¡ 25
¶= 3
µ25
4x2 ¡ 25
¶=
75
4
¡x2 ¡ 4
¢and so x = §2, y = ¡3
2x = ¨3.
So there are 4 SP’s, (0, 5), (0,¡5), (2,¡3) and (¡2, 3), with respective SV’s¡250, 250, 150and ¡150.
46
The 2nd order partial derivatives are
fxx = 72x+ 24y = 24(3x+ y), fxy = 24x, fyy = 6y.
At (0, 5) , fxx = 120 > 0, fxy = 0, fyy = 30, H∗ = fxxfyy ¡ f2xy = 3600 > 0, so this SPis a minimum.
At (0,¡5) , fxx = ¡120 < 0, fxy = 0, fyy = ¡30, H∗ = 3600 > 0, so this SP is amaximum.
At (2,¡3) , fxx = 72, fxy = 48, fyy = ¡18, H∗ = ¡72£ 18¡ 482 < 0, so this SP is asaddle point.
At (¡2, 3) , fxx = ¡72, fxy = ¡48, fyy = 18, H∗ = ¡72£ 18¡ 482 < 0, so this SP isa saddle point.This is a sketch of the contours. For the connectivity, it helps to note the stationary
values.
4.4.3 Definition: Hessian
The function fxxfyy ¡ f2xy is called the Hessian H(x, y) of f . It may be written as a2£ 2 determinant:
H(x, y) =
¯̄̄̄fxx fxyfyx fyy
¯̄̄̄.
47
4.4.4 Definition: Degenerate stationary point
A stationary point (x∗, y∗) at which H∗ = H(x∗, y∗) = 0 is said to be degenerate.Such stationary points will be excluded from this course. They require further
investigation, involving cubic or higher order terms in the Taylor expansion.
4.5 Lagrange Multipliers
4.5.1 Introductory example
Suppose we want to find the area of the smallest circle centred on the origin which touchesthe line y = ¡3x + 4. The diagram shows three candidate circles: the smallest is toosmall as it fails to touch the line, the largest too large (we can do better); the ideal circlejust touches the line in one place. Note that this means the line is the tangent line tothe circle at that point, and the normal to the circle is also normal to the line
.
Now we can write the question as:minimise f(x, y) = π(x2 + y2) such that g(x, y) = y + 3x¡ 4 = 0. Each candidate circleis a line f(x, y) =constant and the line is of the form g(x, y) =constant, so to make thetwo normals parallel we put
rf = λrgand we retain the constraint g(x, y) = 0.
This procedure always produces a maximum or minimum of f given the constraint thatg = 0. The quantity λ is called a Lagrange multiplier.In this case we have rf = (2πx, 2πy) and rg = (3, 1) so we put 2πx = 3λ and
2πy = λ and the constraint g(x, y) = 0 gives
λ/2π + 9λ/2π ¡ 4 = 0
λ = 4π/5
and so, x = 6/5, y = 2/5
Hence, the area of circle = π(36/25 + 4/25) = π(40/25) = 8π/5.
48
4.5.2 General principle
In general, to minimise or maximise f (x1, . . . , xn) subject to a constraint g (x1, . . . , xn) =constant,we set
rf = λrgwhere the unknown λ is called the Lagrange multiplier. rf = λrg and the constraintg =constant gives n+ 1 equations altogether, enough in principle to solve for the n+ 1unknowns, x1, . . . , xn and λ.
Why does this work? With reference to the picture above, we can make the followingtwo comments:
1. The smallest value of f on the contour g =constant is where this contour justtouches a contour f =constant. For contours to just touch, the perpendiculars tothe contour must be parallel, so rf = λrg.
2. To minimise f along the contour g =constant we require only that the componentof rf along the contour is zero; we are only interested in changes of f along thecontour. So, rf is allowed to have a component perpendicular to the contour andwe can set rf = λrg for some unknown λ.
Example 1Minimize f(x, y, z) = x2 + y2 + z2 subject to the constraint x¡ 2y + z = 3.
SolutionWe write the constraint condition as g(x, y, z) = x¡ 2y + z = 3 so we can calculate
rf = (2x, 2y, 2z)
rg = (1,¡2, 1)
49
and to have rf = λrg requires:
2x = λ; 2y = ¡2λ; 2z = λ.
We substitute these into the constraint condition:
0 = x¡ 2y + z ¡ 3 = 1
2λ+ 2λ+
1
2λ¡ 3 = 3λ¡ 3
) λ = 1
to determine the point:
(x, y, z) = (1
2,¡1, 1
2)
at which f(x, y, z) = 12
2+ 12 + 1
2
2= 3/2.
Warning: This procedure does not tell us the difference between a minimum and amaximum of f , so you may need to check some other values to verify you have thecorrect solution. In this case, taking (3, 0, 0) (which satisfies the constraint) gives f = 9which is greater than the value at our stationary point, so we have found a minimum.
Example 2Find the maximim area of a rectangle with perimeter P .
SolutionWe need to maximise the areaA = xy subject to the constraint g (x, y) = 2x+2y = P .
So rA = λrg gives
y = 2λ
x = 2λ
so y = x. Substituting into the constraint, we have
4x = P
x =P
4= y.
Hence, A = P 2/16.This is a maximum, as can be seen by e.g. choosing x = P/6 and y = P/3 (which
satisfies the constraint) which gives A = P 2/18.
Extension. If there is more than one constraint, e.g. g = 0, h = 0, then we use morethan one Lagrange multiplier: solve for rf = λrg + µrh in terms of x, y, z, λ and µsubject to g = 0, h = 0.
4.6 The Chain Rule
We have seen (section 4.1.3) the chain rule for ffg(x1, . . . , xn)g.Consider f (x, y) where x and y are functions of another variable t (i.e. x (t) and
y (t)). If t increases by ∆t then x increases by ∆x = ∆tdxdtand y increases by ∆y = ∆tdy
dt
so the change in position is
∆r = (∆x,∆y) = ∆t
µdx
dt,dy
dt
¶= ∆t
dr
dt
50
wheredr
dt=
µdx
dt,dy
dt
¶is the rate of change of position with t.Now, recall from our work on directional derivatives: the change in f for this small
change in position is
∆f = rf ¢∆r
= rf ¢ drdt∆t.
If we rearrange and let ∆t! 0 we obtain the chain rule for a function of two variables,which is,
df
dt= rf ¢ dr
dt=
∂f
∂x
dx
dt+
∂f
∂y
dy
dt(4.21)
NB f depends on x and y [so partial derivatives ∂f∂x, ∂f
∂y] whilst x and y depend on just
t [so ordinary derivatives dxdt, dydt]. Thus f depends on t and has the ordinary derivative
dfdtgiven by the chain rule (4.21).
ExampleIf f(x, y) = x2 + y2, where x = sin t, y = t3, then
df
dt=
∂f
∂x
dx
dt+
∂f
∂y
dy
dt= 2x cos t+ 2y3t2 = 2 sin t cos t+ 6t5.
Of course in this simple example we can check the result by substituting for x and ybefore differentiation to give f(t) = (sin t)2 + (t3)2, so df
dt= 2 sin t cos t+ 6t5 as before.
The chain rule extends directly to functions of three or more variables, and to includeimplicit differentiation.ExampleIf f(x, y, z) = ln(2x¡ 3y + 4z), where x = et, y = ln t, z = cosh t, then
df
dt=
∂f
∂x
dx
dt+
∂f
∂y
dy
dt+
∂f
∂z
dz
dt
=2et
2x¡ 3y + 4z ¡ 3(1/t)
2x¡ 3y + 4z +4 sinh t
2x¡ 3y + 4z
=2et ¡ 3/t+ 4 sinh t2et ¡ 3 ln t+ 4cosh t .
4.6.1 Extended chain rule
For f(x, y) suppose that x and y depend on two variables s and t (e.g. polar co-ordinates,x = s cos t, y = s sin t). Then
∂r
∂s=
µ∂x
∂s,∂y
∂s
¶and
∂r
∂t=
µ∂x
∂t,∂y
∂t
¶(4.22)
are two vectors representing the rate of change of position with s and t respectively.
51
Changing either s or t changes x and y, so changes f , i.e. producing ∂f∂sand ∂f
∂t
according to the extended chain rule
∂f
∂s= rf ¢ ∂r
∂s=
∂f
∂x
∂x
∂s+
∂f
∂y
∂y
∂s. (4.23)
∂f
∂t= rf ¢ ∂r
∂t=
∂f
∂x
∂x
∂t+
∂f
∂y
∂y
∂t(4.24)
Example f(x, y) = x2y3, where x = s¡ t2, y = s+ 2t. Then∂f
∂x= 2xy3 and
∂f
∂y= 3x2y2
and
∂f
∂s=
∂f
∂x
∂x
∂s+
∂f
∂y
∂y
∂s
= 2xy3.1 + 3x2y2.1
= xy2 (2y + 3x)
= (s¡ t2)(s+ 2t)2(5s+ 4t¡ 3t2)
∂f
∂t=
∂f
∂x
∂x
∂t+
∂f
∂y
∂y
∂t
= 2xy3(¡2t) + 3x2y2(2)= 2xy2 (¡2ty + 3x)= 2(s¡ t2)(s+ 2t)2(3s¡ 2st¡ 7t2).
Examples
(i) If f is a function of x and y, where x = es cos t, y = es sin t, prove that sin t∂f∂s+
cos t∂f∂t= es ∂f
∂y.
(ii) If f is a function of z/x and x/y, prove that x∂f∂x+ y ∂f
∂y+ z ∂f
∂z= 0.
Solutions
(i) If x = es cos t and y = es sin t then
∂x
∂s= es cos t
∂y
∂s= es sin t
∂x
∂t= ¡es sin t ∂y
∂t= es cos t.
It follows that
∂f
∂s=
∂f
∂xes cos t+
∂f
∂yes sin t
∂f
∂s= ¡∂f
∂xes sin t+
∂f
∂yes cos t
Combining these two equations we have
sin t∂f
∂s+ cos t
∂f
∂t= es
∂f
∂y.
52
(ii) Let u = z/x and v = x/y. Then ∂u∂x= ¡z/x2, ∂u
∂y= 0, ∂u
∂z= 1/x, ∂v
∂x= 1/y and
∂v∂y= ¡x/y2 and ∂v
∂z= 0.
∂f
∂x=
∂f
∂u
∂u
∂x+
∂f
∂v
∂v
∂x= ¡∂f
∂u
z
x2+
∂f
∂v
1
y∂f
∂y=
∂f
∂u
∂u
∂y+
∂f
∂v
∂v
∂y= ¡∂f
∂v
x
y2
∂f
∂z=
∂f
∂u
∂u
∂z+
∂f
∂v
∂v
∂z=
∂f
∂u
1
x.
and so
x∂f
∂x+ y
∂f
∂y+ z
∂f
∂z= 0
as required.
4.6.2 Definition of the Jacobian
We could write the extended chain rule (4.23) in matrix-vector form as follows:Ã ∂f∂s
∂f∂t
!=
Ã∂x∂s
∂y∂s
∂x∂t
∂y∂t
!0@ ∂f∂x
∂f∂y
1A (4.25)
which leads us naturally to the Jacobian matrix
J =
Ã∂x∂s
∂y∂s
∂x∂t
∂y∂t
!(4.26)
whose determinant is the Jacobian of the transformation from x, y to s, t:
∂ (x, y)
∂ (s, t)=
¯̄̄̄¯
∂x∂s
∂y∂s
∂x∂t
∂y∂t
¯̄̄̄¯ = ∂x
∂s
∂y
∂t¡ ∂y
∂s
∂x
∂t. (4.27)
NB The rows of the matrix J are the vectors ∂r∂s=
¡∂x∂s, ∂y∂s
¢and ∂r
∂t=
¡∂x∂t, ∂y∂t
¢expressing
the rate of change of position (x, y) with s and t respectively. Geometrically, the Jacobian
is ∂(x,y)∂(s,t)
=¯̄̄∂r∂s
£ ∂r∂t
¯̄̄=
¯̄̄∂r∂s
¯̄̄ ¯̄̄∂r∂t
¯̄̄sin θ, where θ is the angle between ∂r
∂sad ∂r
∂t. Hence, the
Jacobian is the area of the parallelogram whose sides are ∂r∂sand ∂r
∂t.
4.6.3 Change of Variables
Suppose we have x and y expressed in terms of two other variables s and t. How wouldwe go about finding expressions for s and t in terms of x and y? Is this always possible?Carrying out a change of variables.The procedure for reversing a change of variables is to look for combinations of x and ywhich eliminate all dependence on one of s and t. This is best shown by example.ExampleIf x = s2t and y = t2/s, find s and t in terms of x and y.
Solution
53
We begin by looking for a combination of x and y that has no t-dependence. Fromthe first equation, we can write t = x/s2 so we substitute this into the second equation:
y = (x/s2)2/s = x2/s5
and manipulate this result to give s:
s5 = x2/y ) s = x2/5y−1/5.
We can then substitute this into either of the definitions to eliminate s – we choose thesecond:
y = t2x−2/5y1/5 ) t = x1/5y2/5
so the full solution iss = x2/5y−1/5, and t = x1/5y2/5.
ExampleIf x = s cos t and y = s sin t, find s and t in terms of x and y.
SolutionHere we start by eliminating t. The simplest way to do this is to use the identity
sin2 t+ cos2 t = 1:
x2 + y2 = s2 cos2 t+ s2 sin2 t = s2 ) s = (x2 + y2)1/2,
and to eliminate s we simply divide the two expressions:
y/x = s sin t/s cos t = tan t) t = tan−1(y/x).
ExampleIf x = s2t and y = s4t2 + 2s2t+ 4, express s and t in terms of x and y.
SolutionWe try to eliminate t by using the x-equation:
x = s2t) t = xs−2
in the y-equation:
y = s4t2 + 2s2t+ 4 = s4[xs−2]2 + 2s2[xs−2] + 4 = x2 + 2x+ 4
and we find that we have no s-dependence in this equation so we can’t rearrange to finds.In this case it is not possible to determine s and t from values of x and y. Since y
can be written in terms of x only, y and x are not independent.Jacobian and change of variablesIn the example above, we could not find s and t from x and y. The critical quantity hereis the Jacobian:
∂ (x, y)
∂ (s, t)=
∂x
∂s
∂y
∂t¡∂y
∂s
∂x
∂t= (2st)(2s4t+2s2)¡(4s3t2+4st)(s2) = 4s5t2+4s3t¡4s5t2¡4s3t = 0.
In general, it is only possible to change variables and change back again if the Jacobianof transformation is not zero.When the Jacobian is zero, the area of the parallelogram whose sides are ∂r
∂sand ∂r
∂t
is zero, which means ∂r∂sand ∂r
∂tare parallel. Changes in either s or t give changes in the
position (x, y) in the same direction, so y and x are not independent, and s and t cannotbe uniquely determined from x and y.
54
Top Related