Download - Chapter 4 Several-variable calculus - Mathematics at Leedsdjread/math1960/chapter4.pdf · Chapter 4 Several-variable calculus 4.1 Derivatives of Functions of Several Variables ...

Chapter 4

Several-variable calculus

4.1 Derivatives of Functions of Several Variables

4.1.1 Functions of Several Variables

² A function f of n variables (x1, x2, . . . , xn) in Rn is an entity that operates onthese variables to produce another real number y = f(x1, x2, . . . , xn).

² x1, x2, . . ., xn are called the independent variables, y the dependent variable.

² We write f : Rn ! R to indicate that f maps Rn (or a domain within Rn) into R.

4.1.2 Geometric Interpretation

For a function of two variables, f(x, y), consider (x, y) as defining a point P in the xy-plane. Let the value of f(x, y) be taken as the length PP 0 drawn parallel to the z-axis(or the height of point P 0 above the plane). Then as P moves in the xy-plane, P 0 mapsout a surface in space whose equation is z = f(x, y).

33

Just as a function of one variable has a graph which is cut only once by each verticalline (constant x), here the surface can only be cut once by each vertical line (constant xand y).

Example: f(x, y) = 6¡ 2x¡ 3yThe surface z = 6¡ 2x¡ 3y, i.e. 2x+ 3y + z = 6, is a plane with intersects:

the x-axis where y = z = 0, i.e. x = 3;the y-axis where x = z = 0, i.e. y = 2;the z-axis where x = y = 0, i.e. z = 6.

Example: f(x, y) = x2 ¡ y2 In the plane x = 0, there is a maximum at y = 0; in theplane y = 0, there is a minimum at x = 0. The whole surface is shaped like a horse’ssaddle; and the point (0, 0) is called a saddle point (of which, more later).

x

y

z = x2 ¡ y2

Example: f(x, y) = x2+y2 The intersection with the plane x = 0 is the parabola z = y2

and with the plane y = 0 is the parabola z = x2. This surface is symmetric about the zaxis, and is a paraboloid (parabolic bowl).

x

y

z = x2 + y2

4.1.3 Partial Derivatives

Given a function of several variables, we could choose to hold all but one of these variablesfixed at arbitrarily chosen values, thereby obtaining a function of one variable (the

34

remaining one), which could then be differentiated.

Definition

Given a function f(x1, . . . , xn) of n variables and an integer k between 1 and n, thepartial derivative

∂f

∂xk= fxk = ∂xkf

of f with respect to the variable xk is the derivative of f with respect to xk only, whilethe remaining n¡ 1 variables are all held fixed. Explicitly

∂f

∂xk´ fxk = lim

δxk→0

·f(x1, . . . , xk−1, xk + δxk, xk+1, . . . , xn)¡ f(x1, . . . , xn)

δxk

¸, (4.1)

itself a function of (x1, . . . , xn). In practice the variables held fixed act as constants:

f(x) = 3x4 + sin (2x) ) df/dx = 12x3 + 2 cos (2x)

f(x, y) = yx4 + sin (yx) ) ∂f/∂x = 4yx3 + y cos (yx) .

Geometrical interpretation of partial derivatives in the case n = 2

Recall that the graph of f is the surface z = f(x, y) with the z coordinate measuredvertically upwards. The cross section of this surface cut by a vertical plane y = constantis a curve whose slope (gradient) is the partial derivative fx. (see figure). Similarly fy isthe slope of the cross section of the graph by a vertical plane x = constant.

One may interpret the partial derivatives fx and fy as the slope encountered by“walking” over the surface in the x and y directions respectively.

35

Remark

It is obvious from the definition that the partial derivative with respect to a particularvariable obeys the same sum, product and quotient rules D II - D IV as the ordinary(single variable) derivative, i.e., if u and v are both functions of x1, . . . , xn, then, fork = 1, . . . , n,

∂

∂xk(u+ v) =

∂u

∂xk+

∂v

∂xk, (4.2)

∂

∂xk(uv) = u

∂v

∂xk+ v

∂u

∂xk, (4.3)

∂

∂xk

³uv

´=

1

v2

µv

∂u

∂xk¡ u ∂v

∂xk

¶(v 6= 0). (4.4)

Corresponding to D I, we have the result that

f(x1, . . . , xn) is independent of xk iff∂f

∂xkis zero for all (x1, . . . , xn). (4.5)

and the consequent result

f(x1, . . . , xn) = constant iff the n partial derivatives are all zero for all (x1, . . . , xn).(4.6)

Corresponding to the chain rule D V, we have the result that, if g is a function ofx1, . . . , xn and f a function of a single variable, then

∂

∂xk[ffg(x1, . . . , xn)g] = f 0fg(x1, . . . , xn)g

∂g

∂xk. (4.7)

For example, if f (g) = sin g and g (x, y) = x2 + xy then

f (x, y) = sin¡x2 + xy

¢fx = (2x+ y) cos

¡x2 + xy

¢A more powerful and very important generalization of the chain rule is coming up laterin this chapter.ExampleCalculate the partial derivatives of the functions:

(a) f(x, y) = x2 + 2xy2 + y3;

(b) f(x, y, z) = xz + eyz + sin (xy).

Solution

(a) Holding y constant gives ∂f∂x= 2x+ 2y2 + 0.

Holding x constant gives ∂f∂y= 0 + 4xy + 3y2.

(b) Holding both y and z constant gives fx = z + 0 + y cos (xy).Holding both x and z constant gives fy = 0 + zeyz + x cos (xy).Holding both x and y constant gives fz = x+ yeyz + 0.

36

Example (implicit partial differentiation)If z is a function of two independent variables x and y, and z satisfies xz + ln z =

2x+ 3y, find ∂z∂xin terms of x, y and z.

SolutionDifferentiating each term in the equation with respect to x, holding y constant, and

treating z as a function of x, we obtain

x∂z

∂x+ z +

1

z

∂z

∂x= 2

so that∂z

∂x=z(2¡ z)(1 + xz)

.

4.1.4 Second and Higher Order Partial Derivatives

Since ∂f∂x= fx and

∂f∂y= fy are themselves functions of x and y, they themselves have

partial derivatives, for which we use the notations

∂2f

∂x2=

∂

∂x

µ∂f

∂x

¶= (fx)x = fxx, (4.8)

∂2f

∂y∂x=

∂

∂y

µ∂f

∂x

¶= (fx)y = fxy, (4.9)

∂2f

∂x∂y=

∂

∂x

µ∂f

∂y

¶= (fy)x = fyx, (4.10)

∂2f

∂y2=

∂

∂y

µ∂f

∂y

¶= (fy)y = fyy. (4.11)

This notation extends obviously to higher order derivatives and to functions of three ormore variables. For obvious reasons, fxy and fyx are called mixed derivatives.ExampleIf f(x, y) = x4y2 ¡ x2y6 then

∂f

∂x= 4x3y2 ¡ 2xy6

∂f

∂y= 2x4y ¡ 6x2y5

∂2f

∂x2= 12x2y2 ¡ 2y6

∂2f

∂y∂x= 8x3y ¡ 12xy5

∂2f

∂y2= 2x4 ¡ 30x2y4

∂2f

∂x∂y= 8x3y ¡ 12xy5

Mixed Derivatives Theorem

If fx, fy and fxy exist and are continuous, then fyx exists and fxy = fyx.

37

Wewill not prove this theorem (we have not fully defined the word continuous); but forreasonable functions it will always apply. This means that to calculate a mixed derivativewe can calculate in either order. For third-order derivatives the mixed derivatives theoremgives fxxy = fxyx = fyxx and so on (check for yourself in the last example).ExampleVerify the Mixed Derivatives Theorem for the function f(x, y) = xy3 + x sinxy.

SolutionUsing the sum, product and chain rules, we see that fx = y3+sinxy+xy cosxy, and

hence that

fxy = (fx)y = 3y2 + x cosxy + (x cosxy ¡ x2y sinxy) = 3y2 + 2x cosxy ¡ x2y sinxy.

Similarly, fy = 3xy2 + x2 cosxy, so fyx = (fy)x = 3y2 + (2x cosxy ¡ x2y sinxy) = fxy.

ExampleIn 3 dimensions, the distance r of a point from the origin is given in terms of its

Cartesian coordinates x, y and z by r =px2 + y2 + z2 = (x2 + y2 + z2)1/2. Show that

the function φ(x, y, z) = 1/r = (x2 + y2 + z2)−1/2 obeys Laplace’s equation

∂2φ

∂x2+

∂2φ

∂y2+

∂2φ

∂z2= 0 (except at the origin).

SolutionBy the chain rule,

∂φ

∂x= ¡1

2(x2 + y2 + z2)−3/2 (2x) = ¡x(x2 + y2 + z2)−3/2.

Therefore, by the product and chain rules,

∂2φ

∂x2= ¡(x2 + y2 + z2)−3/2 + (¡x)

·¡32(x2 + y2 + z2)−5/2 (2x)

¸= ¡(x2 + y2 + z2)−3/2 + 3x2(x2 + y2 + z2)−5/2.

Similarly, by symmetry,

∂2φ

∂y2= ¡(x2 + y2 + z2)−3/2 + 3y2(x2 + y2 + z2)−5/2,

∂2φ

∂z2= ¡(x2 + y2 + z2)−3/2 + 3z2(x2 + y2 + z2)−5/2.

Adding the three above equations now gives

∂2φ

∂x2+

∂2φ

∂y2+

∂2φ

∂z2= ¡3(x2 + y2 + z2)−3/2 + 3(x2 + y2 + z2)(x2 + y2 + z2)−5/2

= ¡3(x2 + y2 + z2)−3/2 + 3(x2 + y2 + z2)−3/2 = 0.

38

4.2 Linear Approximations and Tangents

4.2.1 Tangent to Graph of a Function of One Variable

The tangent to the curve y = f(x) at A = (a, f(a)) is the straight line through A withslope f 0(a), i.e. it has the equation

y = f(a) + (x¡ a)f 0(a). (4.12)

NB1. for this line, dydx= f 0(a) and y = f(a) at x = a.

NB2. The RHS consists of the first two terms of the Taylor expansion of f about x = a(i.e. it is the best linear approximation to f (x) near x = a).ExampleFind the linear approximation to f(x) = 1 + x2 near x = 2.

SolutionIf f(x) = 1 + x2 then f 0(x) = 2x. At the point x = 2 we have f = 5 and f 0 = 4.

Therefore the linear approximation is

f(x) ¼ 5 + 4(x¡ 2) = 4x¡ 3.

4.2.2 Tangent Plane to Graph of a Function of Two Variables

By analogy with the above, this is the (best) linear approximation to f near (a, b), asgiven by the first two terms of the two-variable Taylor series (appendix F). It is the planewhose equation is

z = f(a, b) + (x¡ a)fx(a, b) + (y ¡ b)fy(a, b). (4.13)

NB: For this plane, ∂z∂x= fx(a, b), ∂z

∂y= fy(a, b) and z = f(a, b) at x = a and y = b, i.e.

we have matched the first derivatives and the value of the function at (a, b).ExampleFind the tangent plane to the surface z = f(x, y) = x2 + y2 near the point x = 1,

y = 2.SolutionIf f(x, y) = x2 + y2 then fx = 2x and fy = 2y. At the point (1, 2) we have f = 5,

fx = 2 and fy = 4. Thus the tangent plane is

z = 5 + (x¡ 1)2 + (y ¡ 2)4 = 2x+ 4y ¡ 5.

4.3 Directional derivatives and the gradient vector

For f(x, y), fx and fy measure the rates of change of f along the x and y directions. Howcan we can calculate the rate of change of f in any direction?We need to know how much f changes when both x and y change by small amounts.

Near x = a and y = b, f(x, y) is approximately given by equation (4.13) for the tangentplane. Let x change by a (vanishingly) small amount dx, and y by dy (i.e. x = a + dx,y = b+ dy) then

f (x, y) ¼ f(a, b) + (x¡ a)fx(a, b) + (y ¡ b)fy(a, b)f(a+ dx, b+ dy) ¼ f(a, b) + (dx)fx(a, b) + (dy)fy(a, b).

39

The change in f is df = f(a+ dx, b+ dy)¡ f(a, b), so

df = (dx)fx(a, b) + (dy)fy(a, b)

= rf ¢ dr,

where we have defined the two dimensional vector representing the change in x and y,

dr = dxi+ dyj = (dx, dy)

and the two dimensional gradient vector

rf = ∂f

∂xi+

∂f

∂yj =

µ∂f

∂x,∂f

∂y

¶. (4.14)

We can, additionally, write dr = udr where u is a unit vector in the direction dr and dris the magnitude of the change. Then:

df = rf ¢ udr

and so

Rate of change of f in the direction of u =df

dr= rf ¢ u.

The above generalises to functions of more than two variables. E.g. for a functionof three variables, f (x, y, z) the three-dimensional gradient vector is

rf = ∂f

∂xi+

∂f

∂yj +

∂f

∂zk =

µ∂f

∂x,∂f

∂y,∂f

∂z

¶(4.15)

4.3.1 Two properties of the gradient

The change df in f due to a change in the position by dr = udr is given by

df = rf ¢ dr = rf ¢ udr = jrf j dr cos θ (4.16)

where θ is the angle between the vectors dr andrf . We look at cases where dr is parallelor perpendicular to rf .Property 1. From (4.16) the direction dr for which df is amaximum is that for whichcos θ = 1, or θ = 0, i.e. dr in the direction of rf .Thus

At any point, rf points in the direction in which f is increasing most rapidlyand its magnitude jrf j gives this maximum rate of change.

i.e. rf “points uphill”.Property 2. From (4.16), df = 0 corresponds to θ = π/2, when rf and dr areperpendicular. But df = 0 means that f has not changed – so dr is along the surfacef =constant. Thus

At any point, rf points is perpendicular to the surface f = constant throughthat point.

NB f =constant is a contour of the function f .For a function of two variables, these two properties are illustrated in the following

picture:

40

ExampleIf f(x, y, z) = z3 + 3x2y2 + sin z, find rf .

SolutionThe three partial derivatives are

∂f

∂x= 0 + 6xy2 + 0 = 6xy2

∂f

∂y= 0 + 6x2y + 0 = 6x2y

∂f

∂z= 3z2 + 0 + cos z = 3z2 + cos z

so rf =¡6xy2, 6x2y, 3z2 + cos z

¢.

ExampleIf f (x, y, z) = x2+xy+z, findrf . What is the rate of change of f along the direction

i+2j+2k at the point P (1, 1, 1)? What is the magnitude of the maximum rate of changeof f at this point?Solution

rf =µ∂f

∂x,∂f

∂y,∂f

∂z

¶= (2x+ y, x, 1)

Now, at the point P (1, 1, 1), rf = (3, 1, 1). To find the rate of change of f along avector v = (1, 2, 2) , we need the unit vector along this direction, which is

v̂ =1p

12 + 22 + 22(1, 2, 2) =

µ1

3,2

3,2

3

¶.

So, the rate of change of f in this direction is

rf ¢ v̂ = (3, 1, 1) ¢µ1

3,2

3,2

3

¶= 1 +

2

3+2

3=7

3.

41

The maximum rate of change of f at the point P is jrf j = (11)1/2.ExampleFind a unit vector perpendicular to the surface z = x2 + y2 at the point A(1, 2, 5).

Solution A (via the tangent plane)Earlier, we found the tangent plane to this surface at this point to be

2x+ 4y ¡ z = 5.

The vector equation of a plane can be written as r ¢ n = a where r = (x, y, z) and n is avector perpendicular to the plane. By inspection, we see that

n = (2, 4,¡1)

is such a vector, and so a unit vector in this direction is

n̂ =1p

22 + 42 + 12(2, 4,¡1) = 1p

21(2, 4,¡1) .

Solution B (treat the surface as a contour of a function of three variables).The equation of the surface can be written as

x2 + y2 ¡ z = 0

so if we define a function f(x, y, z) = x2+ y2¡ z we can say the surface is the contour ofthe function f given by

f(x, y, z) = 0 = constant.

We know, from Property 2 above, that rf is perpendicular to the surface f =constant.

rf = (2x, 2y,¡1) = (2, 4,¡1) at the point A(1, 2, 5).

So, as before, a unit vector perpendicular to the surface is

n̂ =1p

22 + 42 + 12(2, 4,¡1) = 1p

21(2, 4,¡1) .

4.4 Stationary (Critical) Points of a Function of TwoVariables

4.4.1 Definition

For a function of two variables, f(x, y), a stationary point (x∗, y∗) is defined to be a pointat which the gradient vector is zero:

rf j(x∗,y∗) = (fx(x∗, y∗), fy(x∗, y∗)) = (0, 0), (4.17)

i.e. both of the partial derivatives ∂f∂xand ∂f

∂yare zero at that point. The value z∗ =

f(x∗, y∗) of f at (x∗, y∗) is the corresponding stationary value (SV).

42

4.4.2 Classification of SP’s of a Function of Two Variables

There are three main types of stationary point for a function of two variables,maximum,minimum and saddle points. These are sketched as follows:

Maximum: A local peak in the function. To get a peak, we must ensure that whenpoint (x, y) moves away from (x∗, y∗) a small distance in any direction, the value off(x, y) always decreases.Minimum: A local trough in the function. To get a trough, we must ensure that whenpoint (x, y)moves away from (x∗, y∗) a small distance in any direction, the value of f(x, y)always increases.Saddle point: Looks like a horse’s saddle!. Moving off in some directions away from(x∗, y∗) leads to an increase in f , while moving off in other directions leads to a decreasein f .

Contours: We can represent the “landscape” of the surface z = f(x, y) by contourlines, which are curves in the (x, y) plane on which f(x, y) takes different constant val-ues. Around a maximum, the value of f(x, y) is always smaller than its value z∗ atthe maximum. The contours are closed loops around the stationary point. Around aminimum, f(x, y) > z∗ and again the contours are closed loops around the stationarypoint.The representation of a saddle point by contour lines has the characteristic appearance

depicted below.At the level of the saddle there are two contour lines which cross at thesaddle. These two crossing contour lines separate two regions in which f > z∗ from tworegions in which f < z∗. Thus, as we move away from the saddle in different directions,there are two pairs of opposite directions in which f stays fixed (along the crossingcontour lines), and these directions separate two opposite ranges of direction in which fincreases from two opposite ranges of direction in which f decreases.

43

To investigate what type a given stationary point is, we must look at what values ftakes close to this point. Consider the Taylor expansion (from appendix F) of f(x, y)about a point (x∗, y∗):

f(x, y) = f(x∗, y∗) + (x¡ x∗)fx(x∗, y∗) + (y ¡ y∗)fy(x∗, y∗)

+1

2(x¡ x∗)2 fxx(x∗, y∗) + (x¡ x∗)(y ¡ y∗)fxy(x∗, y∗) +

1

2(y ¡ y∗)2 fyy(x∗, y∗)

+ higher order terms,

(NB this matches the first and second derivatives of f(x, y) at the point (x∗, y∗)).Suppose that (x∗, y∗) is a stationary point. Then fx(x∗, y∗) = fy(x∗, y∗) = 0, and if

we label the values

f(x∗, y∗) = z∗, x¡x∗ = δx, y¡y∗ = δy, fxx(x∗, y∗) = A, fxy(x

∗, y∗) = B, fyy(x∗, y∗) = C,(4.18)

we can rewrite the Taylor series in the form

f(x, y) = z∗ +1

2

£A δx2 + 2B δxδy + C δy2

¤+ higher order terms, (4.19)

where it is convenient to write

Q(δx, δy) = A δx2 + 2B δxδy + C δy2 (4.20)

for the quadratic expression in the square brackets.Let’s look at the values of Q around a circle surrounding the stationary point, i.e. let

δx = δs cos θ and δy = δs sin θ

where θ is an angle we can vary. Note that,

² For a minimum, Q will always be positive (f > z∗).

² For a maximum, Q will always be negative (f < z∗).

² For a saddle, Q will change sign around the circle.

Substituting in, we get:

Q(δs cos θ, δs sin θ) = δs2¡A cos2 θ + 2B cos θ sin θ + C sin2 θ

¢= δs2

·1

2A(1 + cos 2θ) +B sin 2θ +

1

2C(1¡ cos 2θ)

¸.

After a few more trig identities (see Appendix G) we get

Q(δs cos θ, δs sin θ) = δs2·1

2(A+ C) +R cos(2θ ¡ φ)

¸where R > 0 and

R2 =1

4(A+ C)2 + (B2 ¡AC)

and the angle φ is such that 12(A¡ C) = R cosφ and B = R sinφ.

45

ConsiderQ

δs2=

·1

2(A+ C) +R cos(2θ ¡ φ)

¸.

As θ varies, this oscillates with amplitude R about an average value of 12(A+C). Hence,

ifR >

1

2jA+ Cj

then the oscillations are large enough to change the sign of Q as θ is varied, giving asaddle point. This condition simplifies to

R2 >1

4(A+ C)2

i.e. AC ¡B2 < 0

Hence, the condition for a saddle point is:

fxxfyy ¡ f2xy < 0 Condition for a saddle point.

If it is not a saddle point, then it is a maximum or a minimum. We can determine whichby looking at the sign of fxx (or fyy). Appendix G gives more details. Hence:

fxxfyy ¡ f2xy > 0, fxx > 0 Condition for a minimum.

fxxfyy ¡ f2xy > 0, fxx < 0 Condition for a maximum.

Note that in BOTH cases the function fxxfyy ¡ f2xy must be POSITIVE at the SP.Example:Locate and classify the stationary points of the function f(x, y) = 12x3+y3+12x2y¡

75y.Solutionfx = 36x

2 + 24xy = 12x(3x+ 2y), fy = 3y2 + 12x2 ¡ 75 = 3(4x2 + y2 ¡ 25). SP’s givenby fx = 0, fy = 0.

For fx = 0 we havex (3x+ 2y) = 0

so EITHER x = 0 OR 3x+ 2y = 0, y = ¡32x.

If x = 0 then fy = 3(y2 ¡ 25) = 0) y = §5.If y = ¡3

2x then

fy = 3¡4x2 + y2 ¡ 25

¢= 3

µ4x2 +

9

4x2 ¡ 25

¶= 3

µ25

4x2 ¡ 25

¶=

75

4

¡x2 ¡ 4

¢and so x = §2, y = ¡3

2x = ¨3.

So there are 4 SP’s, (0, 5), (0,¡5), (2,¡3) and (¡2, 3), with respective SV’s¡250, 250, 150and ¡150.

46

The 2nd order partial derivatives are

fxx = 72x+ 24y = 24(3x+ y), fxy = 24x, fyy = 6y.

At (0, 5) , fxx = 120 > 0, fxy = 0, fyy = 30, H∗ = fxxfyy ¡ f2xy = 3600 > 0, so this SPis a minimum.

At (0,¡5) , fxx = ¡120 < 0, fxy = 0, fyy = ¡30, H∗ = 3600 > 0, so this SP is amaximum.

At (2,¡3) , fxx = 72, fxy = 48, fyy = ¡18, H∗ = ¡72£ 18¡ 482 < 0, so this SP is asaddle point.

At (¡2, 3) , fxx = ¡72, fxy = ¡48, fyy = 18, H∗ = ¡72£ 18¡ 482 < 0, so this SP isa saddle point.This is a sketch of the contours. For the connectivity, it helps to note the stationary

values.

4.4.3 Definition: Hessian

The function fxxfyy ¡ f2xy is called the Hessian H(x, y) of f . It may be written as a2£ 2 determinant:

H(x, y) =

¯̄̄̄fxx fxyfyx fyy

¯̄̄̄.

47

4.4.4 Definition: Degenerate stationary point

A stationary point (x∗, y∗) at which H∗ = H(x∗, y∗) = 0 is said to be degenerate.Such stationary points will be excluded from this course. They require further

investigation, involving cubic or higher order terms in the Taylor expansion.

4.5 Lagrange Multipliers

4.5.1 Introductory example

Suppose we want to find the area of the smallest circle centred on the origin which touchesthe line y = ¡3x + 4. The diagram shows three candidate circles: the smallest is toosmall as it fails to touch the line, the largest too large (we can do better); the ideal circlejust touches the line in one place. Note that this means the line is the tangent line tothe circle at that point, and the normal to the circle is also normal to the line

.

Now we can write the question as:minimise f(x, y) = π(x2 + y2) such that g(x, y) = y + 3x¡ 4 = 0. Each candidate circleis a line f(x, y) =constant and the line is of the form g(x, y) =constant, so to make thetwo normals parallel we put

rf = λrgand we retain the constraint g(x, y) = 0.

This procedure always produces a maximum or minimum of f given the constraint thatg = 0. The quantity λ is called a Lagrange multiplier.In this case we have rf = (2πx, 2πy) and rg = (3, 1) so we put 2πx = 3λ and

2πy = λ and the constraint g(x, y) = 0 gives

λ/2π + 9λ/2π ¡ 4 = 0

λ = 4π/5

and so, x = 6/5, y = 2/5

Hence, the area of circle = π(36/25 + 4/25) = π(40/25) = 8π/5.

48

4.5.2 General principle

In general, to minimise or maximise f (x1, . . . , xn) subject to a constraint g (x1, . . . , xn) =constant,we set

rf = λrgwhere the unknown λ is called the Lagrange multiplier. rf = λrg and the constraintg =constant gives n+ 1 equations altogether, enough in principle to solve for the n+ 1unknowns, x1, . . . , xn and λ.

Why does this work? With reference to the picture above, we can make the followingtwo comments:

1. The smallest value of f on the contour g =constant is where this contour justtouches a contour f =constant. For contours to just touch, the perpendiculars tothe contour must be parallel, so rf = λrg.

2. To minimise f along the contour g =constant we require only that the componentof rf along the contour is zero; we are only interested in changes of f along thecontour. So, rf is allowed to have a component perpendicular to the contour andwe can set rf = λrg for some unknown λ.

Example 1Minimize f(x, y, z) = x2 + y2 + z2 subject to the constraint x¡ 2y + z = 3.

SolutionWe write the constraint condition as g(x, y, z) = x¡ 2y + z = 3 so we can calculate

rf = (2x, 2y, 2z)

rg = (1,¡2, 1)

49

and to have rf = λrg requires:

2x = λ; 2y = ¡2λ; 2z = λ.

We substitute these into the constraint condition:

0 = x¡ 2y + z ¡ 3 = 1

2λ+ 2λ+

1

2λ¡ 3 = 3λ¡ 3

) λ = 1

to determine the point:

(x, y, z) = (1

2,¡1, 1

2)

at which f(x, y, z) = 12

2+ 12 + 1

2

2= 3/2.

Warning: This procedure does not tell us the difference between a minimum and amaximum of f , so you may need to check some other values to verify you have thecorrect solution. In this case, taking (3, 0, 0) (which satisfies the constraint) gives f = 9which is greater than the value at our stationary point, so we have found a minimum.

Example 2Find the maximim area of a rectangle with perimeter P .

SolutionWe need to maximise the areaA = xy subject to the constraint g (x, y) = 2x+2y = P .

So rA = λrg gives

y = 2λ

x = 2λ

so y = x. Substituting into the constraint, we have

4x = P

x =P

4= y.

Hence, A = P 2/16.This is a maximum, as can be seen by e.g. choosing x = P/6 and y = P/3 (which

satisfies the constraint) which gives A = P 2/18.

Extension. If there is more than one constraint, e.g. g = 0, h = 0, then we use morethan one Lagrange multiplier: solve for rf = λrg + µrh in terms of x, y, z, λ and µsubject to g = 0, h = 0.

4.6 The Chain Rule

We have seen (section 4.1.3) the chain rule for ffg(x1, . . . , xn)g.Consider f (x, y) where x and y are functions of another variable t (i.e. x (t) and

y (t)). If t increases by ∆t then x increases by ∆x = ∆tdxdtand y increases by ∆y = ∆tdy

dt

so the change in position is

∆r = (∆x,∆y) = ∆t

µdx

dt,dy

dt

¶= ∆t

dr

dt

50

wheredr

dt=

µdx

dt,dy

dt

¶is the rate of change of position with t.Now, recall from our work on directional derivatives: the change in f for this small

change in position is

∆f = rf ¢∆r

= rf ¢ drdt∆t.

If we rearrange and let ∆t! 0 we obtain the chain rule for a function of two variables,which is,

df

dt= rf ¢ dr

dt=

∂f

∂x

dx

dt+

∂f

∂y

dy

dt(4.21)

NB f depends on x and y [so partial derivatives ∂f∂x, ∂f

∂y] whilst x and y depend on just

t [so ordinary derivatives dxdt, dydt]. Thus f depends on t and has the ordinary derivative

dfdtgiven by the chain rule (4.21).

ExampleIf f(x, y) = x2 + y2, where x = sin t, y = t3, then

df

dt=

∂f

∂x

dx

dt+

∂f

∂y

dy

dt= 2x cos t+ 2y3t2 = 2 sin t cos t+ 6t5.

Of course in this simple example we can check the result by substituting for x and ybefore differentiation to give f(t) = (sin t)2 + (t3)2, so df

dt= 2 sin t cos t+ 6t5 as before.

The chain rule extends directly to functions of three or more variables, and to includeimplicit differentiation.ExampleIf f(x, y, z) = ln(2x¡ 3y + 4z), where x = et, y = ln t, z = cosh t, then

df

dt=

∂f

∂x

dx

dt+

∂f

∂y

dy

dt+

∂f

∂z

dz

dt

=2et

2x¡ 3y + 4z ¡ 3(1/t)

2x¡ 3y + 4z +4 sinh t

2x¡ 3y + 4z

=2et ¡ 3/t+ 4 sinh t2et ¡ 3 ln t+ 4cosh t .

4.6.1 Extended chain rule

For f(x, y) suppose that x and y depend on two variables s and t (e.g. polar co-ordinates,x = s cos t, y = s sin t). Then

∂r

∂s=

µ∂x

∂s,∂y

∂s

¶and

∂r

∂t=

µ∂x

∂t,∂y

∂t

¶(4.22)

are two vectors representing the rate of change of position with s and t respectively.

51

Changing either s or t changes x and y, so changes f , i.e. producing ∂f∂sand ∂f

∂t

according to the extended chain rule

∂f

∂s= rf ¢ ∂r

∂s=

∂f

∂x

∂x

∂s+

∂f

∂y

∂y

∂s. (4.23)

∂f

∂t= rf ¢ ∂r

∂t=

∂f

∂x

∂x

∂t+

∂f

∂y

∂y

∂t(4.24)

Example f(x, y) = x2y3, where x = s¡ t2, y = s+ 2t. Then∂f

∂x= 2xy3 and

∂f

∂y= 3x2y2

and

∂f

∂s=

∂f

∂x

∂x

∂s+

∂f

∂y

∂y

∂s

= 2xy3.1 + 3x2y2.1

= xy2 (2y + 3x)

= (s¡ t2)(s+ 2t)2(5s+ 4t¡ 3t2)

∂f

∂t=

∂f

∂x

∂x

∂t+

∂f

∂y

∂y

∂t

= 2xy3(¡2t) + 3x2y2(2)= 2xy2 (¡2ty + 3x)= 2(s¡ t2)(s+ 2t)2(3s¡ 2st¡ 7t2).

Examples

(i) If f is a function of x and y, where x = es cos t, y = es sin t, prove that sin t∂f∂s+

cos t∂f∂t= es ∂f

∂y.

(ii) If f is a function of z/x and x/y, prove that x∂f∂x+ y ∂f

∂y+ z ∂f

∂z= 0.

Solutions

(i) If x = es cos t and y = es sin t then

∂x

∂s= es cos t

∂y

∂s= es sin t

∂x

∂t= ¡es sin t ∂y

∂t= es cos t.

It follows that

∂f

∂s=

∂f

∂xes cos t+

∂f

∂yes sin t

∂f

∂s= ¡∂f

∂xes sin t+

∂f

∂yes cos t

Combining these two equations we have

sin t∂f

∂s+ cos t

∂f

∂t= es

∂f

∂y.

52

(ii) Let u = z/x and v = x/y. Then ∂u∂x= ¡z/x2, ∂u

∂y= 0, ∂u

∂z= 1/x, ∂v

∂x= 1/y and

∂v∂y= ¡x/y2 and ∂v

∂z= 0.

∂f

∂x=

∂f

∂u

∂u

∂x+

∂f

∂v

∂v

∂x= ¡∂f

∂u

z

x2+

∂f

∂v

1

y∂f

∂y=

∂f

∂u

∂u

∂y+

∂f

∂v

∂v

∂y= ¡∂f

∂v

x

y2

∂f

∂z=

∂f

∂u

∂u

∂z+

∂f

∂v

∂v

∂z=

∂f

∂u

1

x.

and so

x∂f

∂x+ y

∂f

∂y+ z

∂f

∂z= 0

as required.

4.6.2 Definition of the Jacobian

We could write the extended chain rule (4.23) in matrix-vector form as follows:Ã ∂f∂s

∂f∂t

!=

Ã∂x∂s

∂y∂s

∂x∂t

∂y∂t

!0@ ∂f∂x

∂f∂y

1A (4.25)

which leads us naturally to the Jacobian matrix

J =

Ã∂x∂s

∂y∂s

∂x∂t

∂y∂t

!(4.26)

whose determinant is the Jacobian of the transformation from x, y to s, t:

∂ (x, y)

∂ (s, t)=

¯̄̄̄¯

∂x∂s

∂y∂s

∂x∂t

∂y∂t

¯̄̄̄¯ = ∂x

∂s

∂y

∂t¡ ∂y

∂s

∂x

∂t. (4.27)

NB The rows of the matrix J are the vectors ∂r∂s=

¡∂x∂s, ∂y∂s

¢and ∂r

∂t=

¡∂x∂t, ∂y∂t

¢expressing

the rate of change of position (x, y) with s and t respectively. Geometrically, the Jacobian

is ∂(x,y)∂(s,t)

=¯̄̄∂r∂s

£ ∂r∂t

¯̄̄=

¯̄̄∂r∂s

¯̄̄ ¯̄̄∂r∂t

¯̄̄sin θ, where θ is the angle between ∂r

∂sad ∂r

∂t. Hence, the

Jacobian is the area of the parallelogram whose sides are ∂r∂sand ∂r

∂t.

4.6.3 Change of Variables

Suppose we have x and y expressed in terms of two other variables s and t. How wouldwe go about finding expressions for s and t in terms of x and y? Is this always possible?Carrying out a change of variables.The procedure for reversing a change of variables is to look for combinations of x and ywhich eliminate all dependence on one of s and t. This is best shown by example.ExampleIf x = s2t and y = t2/s, find s and t in terms of x and y.

Solution

53

We begin by looking for a combination of x and y that has no t-dependence. Fromthe first equation, we can write t = x/s2 so we substitute this into the second equation:

y = (x/s2)2/s = x2/s5

and manipulate this result to give s:

s5 = x2/y ) s = x2/5y−1/5.

We can then substitute this into either of the definitions to eliminate s – we choose thesecond:

y = t2x−2/5y1/5 ) t = x1/5y2/5

so the full solution iss = x2/5y−1/5, and t = x1/5y2/5.

ExampleIf x = s cos t and y = s sin t, find s and t in terms of x and y.

SolutionHere we start by eliminating t. The simplest way to do this is to use the identity

sin2 t+ cos2 t = 1:

x2 + y2 = s2 cos2 t+ s2 sin2 t = s2 ) s = (x2 + y2)1/2,

and to eliminate s we simply divide the two expressions:

y/x = s sin t/s cos t = tan t) t = tan−1(y/x).

ExampleIf x = s2t and y = s4t2 + 2s2t+ 4, express s and t in terms of x and y.

SolutionWe try to eliminate t by using the x-equation:

x = s2t) t = xs−2

in the y-equation:

y = s4t2 + 2s2t+ 4 = s4[xs−2]2 + 2s2[xs−2] + 4 = x2 + 2x+ 4

and we find that we have no s-dependence in this equation so we can’t rearrange to finds.In this case it is not possible to determine s and t from values of x and y. Since y

can be written in terms of x only, y and x are not independent.Jacobian and change of variablesIn the example above, we could not find s and t from x and y. The critical quantity hereis the Jacobian:

∂ (x, y)

∂ (s, t)=

∂x

∂s

∂y

∂t¡∂y

∂s

∂x

∂t= (2st)(2s4t+2s2)¡(4s3t2+4st)(s2) = 4s5t2+4s3t¡4s5t2¡4s3t = 0.

In general, it is only possible to change variables and change back again if the Jacobianof transformation is not zero.When the Jacobian is zero, the area of the parallelogram whose sides are ∂r

∂sand ∂r

∂t

is zero, which means ∂r∂sand ∂r

∂tare parallel. Changes in either s or t give changes in the

position (x, y) in the same direction, so y and x are not independent, and s and t cannotbe uniquely determined from x and y.

54