MA20218 Lecture Notes

MA20218 Analysis 2A

Lecture Notes

Roger MoserDepartment of Mathematical Sciences

University of Bath

Semester 1, 2014/5

Contents

1 Riemann Integration 51.1 Lower and Upper Riemann Sums . . . . . . . . . . . . . . . . 51.2 Criteria for Integrability . . . . . . . . . . . . . . . . . . . . . 111.3 Riemann Sums . . . . . . . . . . . . . . . . . . . . . . . . . . 151.4 Properties of the Integral . . . . . . . . . . . . . . . . . . . . 171.5 The Fundamental Theorem of Calculus . . . . . . . . . . . . . 231.6 Integration Techniques . . . . . . . . . . . . . . . . . . . . . . 261.7 Exchanging Integrals with Limits . . . . . . . . . . . . . . . . 271.8 Improper Integrals . . . . . . . . . . . . . . . . . . . . . . . . 29

2 Analysis in Several Variables 332.1 The Euclidean Space RN . . . . . . . . . . . . . . . . . . . . . 332.2 Convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . 352.3 Open and Closed Sets . . . . . . . . . . . . . . . . . . . . . . 372.4 Continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 382.5 Norms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 402.6 Derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 432.7 Higher Order Derivatives . . . . . . . . . . . . . . . . . . . . 512.8 The Implicit Function Theorem . . . . . . . . . . . . . . . . . 542.9 The Lagrange Multiplier Rule . . . . . . . . . . . . . . . . . . 57

Index 61

3

4 CONTENTS

Chapter 1

Riemann Integration

Integration deals with two different questions.

Area under a curve Given an interval [a, b] ⊂ R and a function f :[a, b]→ R (say, continuous), we obtain a curve in the plane describedby its graph

(x, y) ∈ R2 : a ≤ x ≤ b, y = f(x)

. This curve may be

interpreted as a piece of the boundary of a region in the plane. (Typi-cally, the rest of the boundary is taken to be a piece of the x-axis andtwo vertical line segments; see Fig. 1.0.1.) What is the area of such aregion?

Antiderivative Given an interval [a, b] ⊂ R and a function f : [a, b] → R,is it possible to find another function F : [a, b]→ R such that F ′ = fin (a, b)? If so, how?

At first, these questions may seem unrelated. But it turns out that there isa deep connection; in fact, they are two sides of the same coin.

1.1 Lower and Upper Riemann Sums

Throughout this chapter, let a, b ∈ R with a < b. We first want to find thearea under a curve given by a function f : [a, b] → R. The main idea is todivide the interval [a, b] into many small intervals and approximate the regionunder the curve by rectangles. We can both overestimate and underestimatethe area this way, and when we choose increasingly fine subdivisions, thenwe hope that the difference decreases (cf. Fig. 1.1.1).

Definition 1.1.1. A subdivision or partition of [a, b] is a finite sequence(x0, x1, . . . , xN ) such that

a = x0 < x1 < · · · < xN = b.

If ∆ = (x0, x1, . . . , xN ) is a subdivision of [a, b], then In = [xn−1, xn] is calledthe n-th interval of ∆ for n = 1, . . . , N . The number |In| = xn − xn−1 is

5

6 CHAPTER 1. RIEMANN INTEGRATION

Figure 1.0.1: Area under a curve

Figure 1.1.1: Lower and upper Riemann sum represented in terms of rect-angles

1.1. LOWER AND UPPER RIEMANN SUMS 7

called the length of In and

‖∆‖ = max|I1|, . . . , |IN |

is called the mesh of ∆.

Now that we have subdivided the interval, we can calculate the areas ofthe rectangles for a given function.

Definition 1.1.2. Let f : [a, b] → R be a bounded function and ∆ =(x0, . . . , xN ) a subdivision of [a, b]. Let In = [xn−1, xn] be the n-th intervalof ∆. Then

L(f,∆) =N∑n=1

inf f(In)|In|

is called the lower Riemann sum of f on ∆ and

U(f,∆) =

N∑n=1

sup f(In)|In|

is called the upper Riemann sum of f on ∆.

As mentioned before, the lower sum Riemann will underestimate thearea and the upper Riemann sum will overestimate it in general (see Fig.1.1.1). This is the motivation for the next definition.

Definition 1.1.3. Let f : [a, b]→ R be bounded. Then

ˆ b

af(x) dx = sup L(f,∆): ∆ is a subdivision of [a, b]

is the lower Riemann integral of f on [a, b] and

ˆ b

af(x) dx = inf U(f,∆): ∆ is a subdivision of [a, b]

is the upper Riemann integral of f on [a, b]. If

ˆ b

af(x) dx =

ˆ b

af(x) dx,

then we say that f is Riemann integrable (or integrable for short). Thecommon value is then called the Riemann integral (or integral for short)and denoted by ˆ b

af(x) dx.


Figure 1.1.2: The integral corresponds to a signed area

Remark 1.1.1. These definitions are for bounded functions on boundedintervals only. Extensions to unbounded functions and unbounded intervalswill be discussed later.

Remark 1.1.2. If f : [a, b]→ R has negative values, it can happen that theupper or lower Riemann integral is negative (or at least some of the termsin the upper and lower Riemann sums are negative). So strictly speaking,what we consider here are signed areas, where the area of a region belowthe x-axis is considered to be negative (see Fig. 1.1.2).

Before we study integrals, we need to know a few facts about lower andupper Riemann sums.

Lemma 1.1.1. Let f : [a, b]→ R be a bounded function and let

m = inf f([a, b]) and M = sup f([a, b]).

Then for any subdivision ∆ of [a, b],

m(b− a) ≤ L(f,∆) ≤ U(f,∆) ≤M(b− a).

Proof. Suppose that ∆ = (x0, . . . , xN ). Let In = [xn−1, xn] be the n-thinterval of ∆ and let mn = inf f(In) and Mn = sup f(In) for n = 1, . . . , N .Then f(In) ⊂ f([a, b]) and hence

m ≤ mn ≤Mn ≤M

for every n. Therefore,

N∑n=1

m(xn − xn−1) ≤N∑n=1

mn(xn − xn−1)

≤N∑n=1

Mn(xn − xn−1) ≤N∑n=1

M(xn − xn−1).

1.1. LOWER AND UPPER RIEMANN SUMS 9

ButN∑n=1

(xn − xn−1) = xN − x0 = b− a,

so we can rewrite these inequalities as

m(b− a) ≤ L(f,∆) ≤ U(f,∆) ≤M(b− a),

as required.

Definition 1.1.4. Let ∆1 = (x0, . . . , xM ) and ∆2 = (y0, . . . , yN ) be subdi-visions of [a, b]. If for every m ∈ 0, . . . ,M there exists an n ∈ 0, . . . , Nsuch that xm = yn, then ∆2 is called a refinement of ∆1.

Definition 1.1.5. Let f : [a, b]→ R be a bounded function. If S ⊂ [a, b] isany set, then ω(f, S) = sup f(S)− inf f(S) is called the oscillation of f onS. If ∆ is a subdivision of [a, b] with intervals I1, . . . , IN , then we define

Ω(f,∆) =

N∑n=1

ω(f, In)|In|.

Remark 1.1.3. From the formulas for L(f,∆), U(f,∆), and Ω(f,∆), weimmediately obtain

Ω(f,∆) = U(f,∆)− L(f,∆).

Lemma 1.1.2. Let f be a bounded real function on [a, b].

(i) If ∆ and ∆′ are subdivisions of [a, b] and ∆′ refines ∆, then

L(f,∆′) ≥ L(f,∆), U(f,∆′) ≤ U(f,∆), and Ω(f,∆′) ≤ Ω(f,∆).

(ii) For any subdivisions ∆1,∆2 of [a, b],

L(f,∆1) ≤ U(f,∆2).

(iii) Furthermore, ˆ b

af(x) dx ≤

ˆ b

af(x) dx.

The last statement is consistent with the expectation that the lowerRiemann integral underestimates the area, while the upper Riemann integraloverestimates it in general.


Proof. (i) If ∆′ refines ∆, then there exists a number ` ∈ N ∪ 0 such that∆′ has ` points more than ∆. We use induction over `.

If ` = 0, then ∆′ = ∆ and there is nothing to prove. If ` = 1, then wehave numbers x0, . . . , xN with

a = x0 < x1 < · · · < xN = b

such that ∆ = (x0, . . . , xN ), and we have another number x∗ ∈ (xK−1, xK)for some K ∈ 1, . . . , N such that

∆′ = (x0, . . . , xK−1, x∗, xK , . . . , xN ).

Write In = [xn−1, xn] for n = 1, . . . , N and

I− = [xK−1, x∗] and I+ = [x∗, xK ].

Define mn = inf f(In) for n = 1, . . . , N and define

m− = f(I−) and m+ = f(I+).

Then m− ≥ mK (as I− ⊂ IK) and m+ ≥ mK (as I+ ⊂ IK). Hence

L(f,∆) =

K−1∑n=1

mn|In|+mK |IK |+N∑

n=K+1

mn|In|

=K−1∑n=1

mn|In|+mK(|I−|+ |I+|) +N∑

n=K+1

mn|In|

≤K−1∑n=1

mn|In|+m−|I−|+m+|I+|+N∑

n=K+1

mn|In| = L(f,∆′).

The inequality U(f,∆) ≥ U(f,∆′) is proved similarly. The inequalityΩ(f,∆) ≥ Ω(f,∆′) follows from the other two.

Finally, for the induction step, consider the case ` ≥ 2 and assume thatfor any subdivision E of [a, b] and any refinement E′ of E with less than `additional points, we have

L(f,E′) ≥ L(f,E), U(f,E′) ≤ U(f,E), and Ω(f,E′) ≤ Ω(f,E).

Choose a point x∗ from ∆′ that does not belong to ∆. Define ∆′′ to be thesubdivision obtained from ∆′ by removing x∗. Then ∆′′ is a refinement of ∆with `− 1 additional points and ∆′ is a refinement of ∆′′ with 1 additionalpoint. Hence by the induction hypothesis,

L(f,∆′) ≥ L(f,∆′′) ≥ L(f,∆)

andU(f,∆′) ≤ U(f,∆′′) ≤ U(f,∆).

1.2. CRITERIA FOR INTEGRABILITY 11

It then also follows that Ω(f,∆′) ≤ Ω(f,∆).

(ii) Let ∆ be a common refinement of ∆1 and ∆2. Then from (i) andLemma 1.1.1,

L(f,∆1) ≤ L(f,∆) ≤ U(f,∆) ≤ U(f,∆2).

(iii) For any subdivisions ∆1 and ∆2 of [a, b], part (ii) yields

L(f,∆1) ≤ U(f,∆2).

Taking the supremum over all subdivisions ∆1 while fixing ∆2, we obtain

ˆ b

af(x) dx ≤ U(f,∆2).

Now taking the infimum over ∆2 yields the desired inequality.

1.2 Criteria for Integrability

Not every bounded function is integrable, and so we need tools to help usdecide whether we can integrate a given function.

Example 1.2.1. Let f : [0, 1]→ R be the function with

f(x) =

1 if x ∈ [0, 1] ∩Q,0 if x ∈ [0, 1]\Q.

Consider any subdivision ∆ of [0, 1], say with intervals I1, . . . , IN . Theneach In, being an interval of positive length, contains both rational andirrational numbers. Therefore, we have inf f(In) = 0 and sup f(In) = 1 forn = 1, . . . , N . It follows that

L(f,∆) =

N∑n=1

0 · |In| = 0 and U(f,∆) =

N∑n=1

1 · |In| = 1,

regardless of the subdivision. Hence

ˆ 1

0f(x) dx = 0 and

ˆ 1

0f(x) dx = 1.

In particular, this function is not Riemann integrable.

Theorem 1.2.1. A bounded function f : [a, b] → R is Riemann integrableif, and only if, for every ε > 0 there exists a subdivision ∆ of [a, b] withΩ(f,∆) < ε.


Proof. Suppose that f is Riemann integrable. Then

ˆ b

af(x) dx = sup L(f,∆): ∆ is a subdivision of [a, b]

and simultaneously

ˆ b

af(x) dx = inf U(f,∆): ∆ is a subdivision of [a, b] .

Thus for any ε > 0, there exist subdivisions ∆1,∆2 of [a, b] such that

L(f,∆1) >

ˆ b

af(x) dx− ε

2

and

U(f,∆2) <

ˆ b

af(x) dx+

ε

2.

So U(f,∆2)− L(f,∆1) < ε. Now let ∆ be a common refinement of ∆1 and∆2. Then

Ω(f,∆) = U(f,∆)− L(f,∆) ≤ U(f,∆2)− L(f,∆1) < ε

according to Lemma 1.1.2.(i).

Conversely, suppose that for every ε > 0 there exists a subdivision ∆ of[a, b] with Ω(f,∆) < ε. Then, since

ˆ b

af(x) dx ≤ U(f,∆) and

ˆ b

af(x) dx ≥ L(f,∆),

it follows that

0 ≤ˆ b

af(x) dx−

ˆ b

af(x) dx ≤ U(f,∆)− L(f,∆) < ε.

But since ε is an arbitrary positive number, we must have

ˆ b

af(x) dx =

ˆ b

af(x) dx.

That is, the function f is Riemann integrable.

Example 1.2.2. We claim that the function f : [0, 1] → R, x 7→ x, isRiemann integrable with ˆ 1

0x dx =

1

2.

1.2. CRITERIA FOR INTEGRABILITY 13

In order to prove this, let N ∈ N and consider the subdivision ∆N =(0, 1

N ,2N , . . . ,

N−1N , 1) of [0, 1]. For n = 1, . . . , N , let In = [n−1

N , nN ] and

mn = inf f(In) =n− 1

N, Mn = sup f(In) =

n

N.

Then

L(f,∆N ) =N∑n=1

mn|In| =N∑n=1

n− 1

N2=

(N − 1)N

2N2=

1

2− 1

2N

and

U(f,∆N ) =N∑n=1

Mn|In| =N∑n=1

n

N2=N(N + 1)

2N2=

1

2+

1

2N.

Hence1

2− 1

2N≤ˆ 1

0x dx ≤

ˆ 1

0x dx ≤ 1

2+

1

2N

for any N ∈ N. When we let N →∞, we obtain

1

2≤ˆ 1

0x dx ≤

ˆ 1

0x dx ≤ 1

2,

which implies the claim.

The monotonicity is the reason why we can determine the lower andupper Riemann sums quite easily in this example. We can draw similarconclusions for other monotonic functions.

Corollary 1.2.1. Let f : [a, b]→ R be monotonic (and therefore bounded).Then f is Riemann integrable.

Proof. We only consider the case where f is increasing, as the case of adecreasing function is similar.

Consider a subdivision ∆ = (x0, . . . , xN ) of [a, b]. As usual, let In =[xn−1, xn] for n = 1, . . . , N . Because f is increasing, we have inf f(In) =f(xn−1) and sup f(In) = f(xn). Thus

Ω(f,∆) =

N∑n=1

(f(xn)− f(xn−1))(xn − xn−1)

≤ ‖∆‖N∑n=1

(f(xn)− f(xn−1)) = ‖∆‖(f(b)− f(a)).

Thus for any ε > 0, we can achieve Ω(f,∆) < ε by choosing the mesh smallenough; more precisely, by choosing ‖∆‖ < ε/(f(b) − f(a)). Now Theorem1.2.1 implies the claim.


Corollary 1.2.2. Let f : [a, b] → R be continuous. Then f is Riemannintegrable.

Proof. First we note that f is bounded by the Weierstrass extreme valuetheorem. Furthermore, by the theorem of uniform continuity, it is uniformlycontinuous.

We want to use Theorem 1.2.1 again, so fix ε > 0. By the uniformcontinuity, we can choose δ > 0 such that for all x, y ∈ [a, b], if |x− y| < δ,then |f(x)− f(y)| < ε/(b− a).

Let ∆ be a subdivision of [a, b] with mesh ‖∆‖ < δ. Let I1, . . . , IN bethe intervals of ∆. Then |In| < δ for every n, and hence ω(f, In) < ε/(b−a).Therefore,

Ω(f,∆) =N∑n=1

ω(f, In)|In| <ε

b− a

N∑n=1

|In| = ε.

Now Theorem 1.2.1 implies that f is Riemann integrable.

Lemma 1.2.1. Let f : [a, b] → R be a bounded function and write m =inf f([a, b]) and M = sup f([a, b]). Let ∆ be a subdivision of [a, b] withintervals I1, . . . , IN , and let ∆′ be a subdivision formed by adding one extrapoint x∗ to ∆, say x∗ ∈ In. Then

L(f,∆′) ≤ L(f,∆) + (M −m)|In|

and

U(f,∆′) ≥ U(f,∆)− (M −m)|In|.

Proof. Write I1n and I2

n for the two intervals into which x∗ divides In, sothat |I1

n|+ |I2n| = |In|. Then

L(f,∆) =N∑k=1

inf f(Ik)|Ik|,

whereas

L(f,∆′) = inf f(I1n)|I1

n|+ inf f(I2n)|I2

n|+∑k 6=n

inf f(Ik)|Ik|.

When we take the difference, most of the terms cancel. More precisely,

L(f,∆′)− L(f,∆) = inf f(I1n)|I1

n|+ inf f(I2n)|I2

n| − inf f(In)|In|≤M |I1

n|+M |I2n| −m|In| = (M −m)|In|.

The first inequality follows. The second inequality is proved similarly.

1.3. RIEMANN SUMS 15

Theorem 1.2.2. A bounded function f : [a, b] → R is Riemann integrableif, and only if, for every ε > 0 there exists a number δ > 0 such that anysubdivision ∆ of [a, b] of mesh ‖∆‖ < δ will satisfy Ω(f,∆) < ε.

Proof. Let f : [a, b] → R be bounded. First suppose that for every ε > 0there exists a number δ > 0 such that for any subdivision ∆ of [a, b] with‖∆‖ < δ, we have Ω(f,∆) < ε. There clearly exists a subdivision with‖∆‖ < δ, so Theorem 1.2.1 immediately shows that f is Riemann integrable.

Conversely, suppose that f is Riemann integrable. Fix ε > 0. Then byTheorem 1.2.1, there exists a subdivision E of [a, b] with Ω(f,E) < ε/2.Let N be the number of points of E and let m = inf f([a, b]) and M =sup f([a, b]).

Now consider an arbitrary subdivision ∆ of [a, b] and consider the com-mon refinement ∆′, obtained by adding all points of E to ∆ (unless theyalready belong to ∆). Then by Lemma 1.1.2,

Ω(f,∆′) ≤ Ω(f,E) <ε

2.

On the other hand, the subdivision ∆′ is formed by adding at most N pointsto ∆. Applying Lemma 1.2.1 each time, we obtain

L(f,∆′) ≤ L(f,∆) +N(M −m)‖∆‖

andU(f,∆′) ≥ U(f,∆)−N(M −m)‖∆‖.

Hence

Ω(f,∆) ≤ Ω(f,∆′) + 2N(M −m)‖∆‖ < ε

2+ 2N(M −m)‖∆‖.

Chooseδ =

ε

4N(M −m).

If ‖∆‖ < δ, it follows that Ω(f,∆) < ε.

1.3 Riemann Sums

Definition 1.3.1. Suppose that ∆ = (x0, . . . , xN ) is a subdivision of [a, b]and Ξ = (ξ1, . . . , ξN ) is a finite sequence of numbers ξ1, . . . , ξN with

xn−1 ≤ ξn ≤ xn, n = 1, . . . , N.

Then the pair (∆,Ξ) is called a tagged subdivision of [a, b]. For a boundedfunction f : [a, b]→ R, the expression

S(f,∆,Ξ) =

N∑n=1

f(ξn)(xn − xn−1)

is called a Riemann sum of f .


It is clear that for a Riemann sum with tagged subdivision (∆,Ξ) as inthis definition, we have

L(f,∆) ≤ S(f,∆,Ξ) ≤ U(f,∆).

For a Riemann integrable function, the lower and the upper Riemann sumswill both be good approximations for the Riemann integral if ‖∆‖ is suf-ficiently small by Theorem 1.2.2. It follows that any Riemann sum withsubdivision ∆ will also approximate the integral.

Corollary 1.3.1. Let f : [a, b]→ R be a Riemann integrable function. Thenfor every ε > 0 there exists a number δ > 0 such that every tagged subdivision(∆,Ξ) of [a, b] with ‖∆‖ < δ will satisfy∣∣∣∣ˆ b

af(x) dx− S(f,∆,Ξ)

∣∣∣∣ < ε.

Proof. Fix ε > 0 and invoke Theorem 1.2.2 to find a number δ > 0 suchthat Ω(f,∆) < ε for all subdivisions ∆ of [a, b] with ‖∆‖ < δ. We have

L(f,∆) ≤ S(f,∆,Ξ) ≤ U(f,∆)

as well as

L(f,∆) ≤ˆ b

af(x) dx ≤ U(f,∆).

So the numbers ˆ b

af(x) dx and S(f,∆,Ξ)

both belong to the interval [L(f,∆), U(f,∆)] of length less than ε. Thisimplies the desired inequality.

Example 1.3.1. Consider the function f : [0, 1] → R given by f(x) = x2.Being continuous, this function is Riemann integrable by Corollary 1.2.2.Now we want to calculate the integral.

Let N ∈ N and consider the tagged subdivision (∆N ,ΞN ) with ∆N =(0, 1

N ,2N , . . . , 1) and ΞN = ( 1

N ,2N , . . . , 1). Then we compute

S(f,∆N ,ΞN ) =

N∑n=1

(n/N)2

N=

1

N3

N∑n=1

n2 =N(N + 1)(2N + 1)

6N3,

using the well-known formula for the sum of the first N squares. Since‖∆N‖ → 0 as N →∞, we have

ˆ 1

0x2 dx = lim

N→∞

N(N + 1)(2N + 1)

6N3=

1

3.

1.4. PROPERTIES OF THE INTEGRAL 17

1.4 Properties of the Integral

Theorem 1.4.1. (i) Let f, g : [a, b] → R be Riemann integrable. Thenf + g is Riemann integrable and

ˆ b

a(f(x) + g(x)) dx =

ˆ b

af(x) dx+

ˆ b

ag(x) dx.

(ii) Let f : [a, b] → R be Riemann integrable and α ∈ R. Then αf isRiemann integrable and

ˆ b

aαf(x) dx = α

ˆ b

af(x) dx.

Proof. (i) Consider any subdivision ∆ of [a, b] with intervals I1, . . . , IN .Then for n = 1, . . . , N and for x ∈ In, we have

inf f(In) + inf g(In) ≤ f(x) + g(x) ≤ sup f(In) + sup g(In).

Hence

inf f(In) + inf g(In) ≤ inf (f + g)(In)

≤ sup (f + g)(In) ≤ sup f(In) + sup g(In).

As a consequence, we find that

L(f,∆) + L(g,∆) ≤ L(f + g,∆) ≤ U(f + g,∆) ≤ U(f,∆) + U(g,∆).

Let ε > 0. By Theorem 1.2.2, we can choose a number δ > 0 such thatwhenever ‖∆‖ < δ, it follows that Ω(f,∆) < ε/2 and Ω(g,∆) < ε/2. Hence

U(f + g,∆)− L(f + g,∆) ≤ U(f,∆) + U(g,∆)− (L(f,∆) + L(g,∆)) < ε.

Using Theorem 1.2.2 again, we conclude that f + g is integrable.

In order to compute its integral, we first consider an arbitrary taggedsubdivision (∆,Ξ) of [a, b] with ∆ = (x0, . . . , xN ) and Ξ = (ξ1, . . . , ξN ). Weobserve that

S(f + g,∆,Ξ) =

N∑n=1

(f(ξn) + g(ξn))(xn − xn−1)

=

N∑n=1

f(ξn)(xn − xn−1) +

N∑n=1

g(ξn)(xn − xn−1)

= S(f,∆,Ξ) + S(g,∆,Ξ).


Now consider a sequence of tagged subdivisions (∆k,Ξk) of [a, b] suchthat ‖∆k‖ → 0 as k →∞. Then by Corollary 1.3.1 and the algebra of limitstheorem, we haveˆ b

a(f(x) + g(x)) dx = lim

k→∞S(f + g,∆k,Ξk)

= limk→∞

(S(f,∆k,Ξk) + S(g,∆k,Ξk))

= limk→∞

S(f,∆k,Ξk) + limk→∞

S(g,∆k,Ξk)

=

ˆ b

af(x) dx+

ˆ b

ag(x) dx.

(ii) Suppose first that α > 0. Then for any subdivision ∆ of [a, b] withintervals I1, . . . , IN , we have

inf (αf)(In) = α sup f(In) and sup (αf)(In) = α sup f(In).

Hence L(αf,∆) = αL(f,∆) and U(αf,∆) = αU(f,∆). Taking the supre-mum and the infimum, respectively, we obtainˆ b

aαf(x) dx = α

ˆ b

af(x) dx

and ˆ b

aαf(x) dx = α

ˆ b

af(x) dx.

If f is integrable, then it follows that

ˆ b

aαf(x) dx =

ˆ b

aαf(x) dx = α

ˆ b

af(x) dx.

In the case α = −1, we have, using the same notation,

inf (−f)(In) = − sup f(In) and sup (−f)(In) = − inf f(In).

Hence L(−f,∆) = −U(f,∆) and U(−f,∆) = −L(f,∆), leading to

ˆ b

a(−f(x)) dx = −

ˆ b

af(x) dx

and ˆ b

a(−f(x)) dx = −

ˆ b

af(x) dx.

If f is integrable, then

ˆ b

a(−f(x)) dx =

ˆ b

a(−f(x)) dx = −

ˆ b

af(x) dx.

Finally, in the case α < 0, we can write α = (−1)|α|, and the claimfollows from the two cases already considered. The case α = 0 is trivial.


Notation. When we have a function f : [a, b]→ R, we are sometimes onlyinterested in its behaviour on a subinterval [c, d] ⊂ [a, b], in which case weconsider the restriction of f to [c, d]. We say that f is Riemann integrableon [c, d] if the restriction to [c, d] is Riemann integrable and we write

ˆ d

cf(x) dx

for the corresponding integral.

Theorem 1.4.2. Let f : [a, b]→ R be a bounded function.

(i) If a ≤ c < d ≤ b and f is Riemann integrable, then it is Riemannintegrable on [c, d] as well.

(ii) Suppose that a < c < b and f is Riemann integrable on both [a, c] and[c, b]. Then it is Riemann integrable on [a, b] and

ˆ b

af(x) dx =

ˆ c

af(x) dx+

ˆ b

cf(x) dx. (1.1)

Proof. (i) Let ε > 0. According to Theorem 1.2.1, there exists a subdivision∆ of [a, b] such that Ω(f,∆) < ε. Let ∆′ be the subdivision of [a, b] obtainedby adding the points c and d to ∆ (unless they already belong to ∆). Thenby Lemma 1.1.2,

Ω(f,∆′) ≤ Ω(f,∆) < ε.

Say that ∆′ = (x0, . . . , xN ). Write In = [xn−1, xn] for n = 1, . . . , N . Thereexist some numbers K,L with 1 ≤ K ≤ L ≤ N , such that c = xK−1 andd = xL. Then E = (xK−1, . . . , xL) is a subdivision of [c, d]. We have

Ω(f,E) =L∑

n=K

ω(f, In)|In| ≤N∑n=1

ω(f, In)|In| = Ω(f,∆′) < ε.

Hence f is Riemann integrable on [c, d] by Theorem 1.2.1.

(ii) Let ε > 0. Choose subdivisions ∆1 = (x0, . . . , xM ) of [a, c] and∆2 = (y0, . . . , yN ) of [c, b] such that

Ω(f,∆1) <ε

2and Ω(f,∆2) <

ε

2.

Let ∆ = (x0, . . . , xM , y1, . . . , yN ). Then ∆ is a subdivision of [a, b] with

L(f,∆) = L(f,∆1) + L(f,∆2), U(f,∆) = U(f,∆1) + U(f,∆2),

and

Ω(f,∆) = Ω(f,∆1) + Ω(f,∆2) < ε. (1.2)


Hence f is Riemann integrable on [a, b].

Since

L(f,∆1) ≤ˆ c

af(x) dx ≤ U(f,∆1)

and

L(f,∆2) ≤ˆ b

cf(x) dx ≤ U(f,∆2),

it also follows that

L(f,∆) ≤ˆ c

af(x) dx+

ˆ b

cf(x) dx ≤ U(f,∆).

Because of (1.2), this means that the numbers

ˆ c

af(x) dx+

ˆ b

cf(x) dx and

ˆ b

af(x) dx

both belong to the interval [L(f,∆), U(f,∆)] of length less than ε. There-fore, we have ∣∣∣∣ˆ c

af(x) dx+

ˆ b

cf(x) dx−

ˆ b

af(x) dx

∣∣∣∣ < ε.

As ε can be chosen arbitrarily small, we have in fact

ˆ c

af(x) dx+

ˆ b

cf(x) dx =

ˆ b

af(x) dx,

as required

Notation. If d < c, define

ˆ d

cf(x) dx = −

ˆ c

df(x) dx.

Furthermore, define ˆ c

cf(x) dx = 0

for any c ∈ [a, b]. Then (1.1) holds for any three numbers a, b, c in thedomain of an integrable function.

Theorem 1.4.3. Suppose that f : [a, b]→ R is a Riemann integrable func-tion and g : R→ R is continuous. Then g f is Riemann integrable.


Proof. Let m = inf f([a, b]) and M = sup f([a, b]). Then g is uniformlycontinuous on [m,M ] by the theorem of uniform continuity. Define ` =min g([m,M ]) and L = max g([m,M ]), both of which exist by the Weier-strass extreme value theorem.

Let ε > 0. Then by the uniform continuity, there exists a number δ > 0such that whenever s, t ∈ [m,M ] with |s − t| < δ, we have |g(s) − g(t)| <ε. Therefore, if I ⊂ [a, b] is an interval with ω(f, I) < δ, it follows thatω(g f, I) < ε.

By Theorem 1.2.1, there exists a subdivision ∆ of [a, b] with Ω(f,∆) <εδ. Let I1, . . . , IN be the intervals of ∆, which we now divide into twocategories. Let A be the set comprising all indices n ∈ 1, . . . , N such thatω(f, In) < δ, and let B comprise all n ∈ 1, . . . , N such that ω(f, In) ≥ δ.Then

εδ > Ω(f,∆) =

N∑n=1

ω(f, In)|In| ≥∑n∈B

ω(f, In)|In| ≥ δ∑n∈B|In|.

Therefore, we have

ε >∑n∈B|In|,

which implies∑n∈B

ω(g f, In)|In| ≤ (L− `)∑n∈B|In| < (L− `)ε.

On the other hand,∑n∈A

ω(g f, In)|In| < ε∑n∈A|In| ≤ (b− a)ε.

Therefore,

Ω(g f,∆) =∑n∈A

ω(g f, In)|In|+∑n∈B

ω(g f, In)|In| < (L− `+ b− a)ε.

The right-hand side can be made arbitrarily small, and thus gf is Riemannintegrable by Theorem 1.2.1.

Corollary 1.4.1. Let f : [a, b] → R be Riemann integrable. Then |f | isRiemann integrable.

Proof. Apply Theorem 1.4.3 with g(y) = |y|.

Corollary 1.4.2. Let f, g : [a, b] → R be Riemann integrable functions.Then fg is Riemann integrable.


Proof. If φ : [a, b] → R is Riemann integrable, then so is φ2 by Theorem1.4.3. Furthermore, the functions f + g and f − g are Riemann integrableby Theorem 1.4.1. Now use the formula

fg =1

4

((f + g)2 − (f − g)2

)and use Theorem 1.4.1 again.

Theorem 1.4.4. (i) Let f, g : [a, b] → R be Riemann integrable func-tions. Suppose that f ≤ g on [a, b]. Then

ˆ b

af(x) dx ≤

ˆ b

ag(x) dx.

(ii) Let f : [a, b]→ R be Riemann integrable. Then∣∣∣∣ˆ b

af(x) dx

∣∣∣∣ ≤ ˆ b

a|f(x)| dx.

Proof. (i) If f ≤ g on [a, b], then for any interval I ⊂ [a, b], we have inf f(I) ≤inf g(I). Hence for any subdivision ∆ of [a, b],

L(f,∆) ≤ L(g,∆).

Taking the supremum, we obtain

ˆ b

af(x) dx ≤

ˆ b

ag(x) dx.

Since we have Riemann integrable functions, this implies the desired in-equality.

(ii) If ˆ b

af(x) dx ≥ 0,

then we use the fact that f ≤ |f | on [a, b], together with part (i). Theconclusion is then that∣∣∣∣ˆ b

af(x) dx

∣∣∣∣ =

ˆ b

af(x) dx ≤

ˆ b

a|f(x)| dx.

If ˆ b

af(x) dx < 0,

then we use the same argument for −f , obtaining∣∣∣∣ˆ b

af(x) dx

∣∣∣∣ = −ˆ b

af(x) dx =

ˆ b

a(−f(x)) dx ≤

ˆ b

a|f(x)| dx

in this case.

1.5. THE FUNDAMENTAL THEOREM OF CALCULUS 23

1.5 The Fundamental Theorem of Calculus

This is the section where we draw the link between the two problems at thebeginning of the chapter. So far we have calculated areas under a curve.Now we find a connection with differentiation.

Theorem 1.5.1 (First Fundamental Theorem of Calculus). Suppose thatf : [a, b] → R is Riemann integrable and F : [a, b] → R is continuous on[a, b] and differentiable on (a, b) with F ′ = f . Then

ˆ b

af(x) dx = F (b)− F (a).

That is, given a function f , if we can find a continuous function whosederivative is f , then we can easily compute the integral of f .

Theorem 1.5.2 (Second Fundamental Theorem of Calculus). Suppose thatf : [a, b]→ R is Riemann integrable and F : [a, b]→ R is defined by

F (x) =

ˆ x

af(t) dt

for x ∈ [a, b]. If c ∈ (a, b) and f is continuous at c, then F is differentiableat c with F ′(c) = f(c).

In other words, given a continuous function f , we can use the integralto construct a function whose derivative is f . So we may regard integrationas the reverse of differentiation.

We have one more piece of information about the function F defined inthis theorem.

Theorem 1.5.3 (Continuity Theorem). Let f : [a, b] → R be Riemannintegrable and let F : [a, b]→ R be the function such that

F (x) =

ˆ x

af(t) dt

for x ∈ [a, b]. Then F is Lipschitz continuous.

We now need to prove all three theorems. We begin with the easiest,which is Theorem 1.5.3.

Proof of Theorem 1.5.3. Let K = supx∈[a,b] |f(x)|. Then we have |f | ≤ Kin [a, b]. Suppose that x, y ∈ [a, b]. If x ≤ y, then by Theorem 1.4.2 andTheorem 1.4.4, we have

|F (y)− F (x)| =∣∣∣∣ˆ y

af(t) dt−

ˆ x

af(t) dt

∣∣∣∣=

∣∣∣∣ˆ y

xf(t) dt

∣∣∣∣ ≤ ˆ y

x|f(t)| dt ≤ K(y − x).

If y < x, we exchange the roles of x and y and obtain a similar inequality.


Proof of Theorem 1.5.1. Consider a subdivision ∆ = (x0, . . . , xN ) of [a, b].Then by the mean value theorem, for any n = 1, . . . , N , there exists a pointξn ∈ (xn−1, xn) such that

F (xn)− F (xn−1) = f(ξn)(xn − xn−1).

Let Ξ = (ξ1, . . . , ξN ). Then (∆,Ξ) is a tagged subdivision of [a, b]. Further-more,

S(f,∆,Ξ) =N∑n=1

f(ξn)(xn − xn−1) =N∑n=1

(F (xn)− F (xn−1)) = F (b)− F (a).

Now choose a sequence of subdivisions ∆k with ‖∆k‖ → 0 as k → ∞.With the above observation, we find corresponding sequences of tags Ξksuch that

S(f,∆k,Ξk) = F (b)− F (a)

for every k ∈ N. But the left-hand side converges to

ˆ b

af(x) dx

by Corollary 1.3.1, which proves the desired formula.

Proof of Theorem 1.5.2. Suppose that f is continuous at c. Fix ε > 0 andchoose δ > 0 such that for any x ∈ [a, b] with |x − c| < δ, we have |f(x) −f(c)| < ε.

Now for x ∈ [a, b], we have

F (x)− F (c) =

ˆ x

af(t) dt−

ˆ c

af(t) dt

=

ˆ x

cf(t) dt

=

ˆ x

c(f(c) + f(t)− f(c)) dt

= f(c)(x− c) +

ˆ x

c(f(t)− f(c)) dt.

If x 6= c with |x− c| < δ, then∣∣∣∣F (x)− F (c)

x− c− f(c)

∣∣∣∣ =

∣∣∣∣ˆ x

c

f(t)− f(c)

x− cdt

∣∣∣∣ ≤ ˆ x

c

|f(t)− f(c)||x− c|

dt ≤ ε.

That is, we have

limx→c

F (x)− F (c)

x− c= f(c),

which is exactly what we have to prove.

1.5. THE FUNDAMENTAL THEOREM OF CALCULUS 25

Definition 1.5.1. Let I ⊂ R be an open interval and let f, F : I → R betwo functions. If F is differentiable in I and F ′(x) = f(x) for all x ∈ I, thenF is called a primitive for f in I.

Remark 1.5.1. The expression ‘antiderivative’ is also common.

Corollary 1.5.1. Let I ⊂ R be an open interval and f : I → R a continuousfunction.

(i) Then f has a primitive in I.

(ii) If F is a primitive for f in I and x0 ∈ I is any point, then there existsa constant c ∈ R such that

F (x) =

ˆ x

x0

f(t) dt+ c

for all x ∈ I.

Remark 1.5.2. The second statement means that primitives are unique upto a constant.

Proof. (i) Fix x0 ∈ I and define

G(x) =

ˆ x

x0

f(t) dt.

Then for any x > x0, Theorem 1.5.2 implies that G′(x) = f(x), as f iscontinuous.

Now consider x ≤ x0. Choose a point x1 ∈ I with x1 < x (which existsas I is open). Then

G(x) =

ˆ x

x1

f(t) dt−ˆ x0

x1

f(t) dt

by Theorem 1.4.2. Hence again we have G′(x) = f(x), so G is a primitivefor f in I.

(ii) Define G as before. If F is another primitive for f in I, then considerthe function H = F −G. Then for any x ∈ I, we have

H ′(x) = F ′(x)−G′(x) = f(x)− f(x) = 0.

A result from MA10207 implies that H is constant, i.e., there exists a con-stant c ∈ R such that H(x) = c for all x ∈ I. That is, we have

F (x) = G(x) + c =

ˆ x

x0

f(t) dt+ c

for all x ∈ I.


1.6 Integration Techniques

Most methods to calculate integrals rely on the first fundamental theoremof calculus: in order to integrate a function f over [a, b], we first find aprimitive, which we then evaluate at the end points a and b.

Example 1.6.1. Let n ∈ N. What isˆ b

axn dx?

Define F (x) = xn+1

n+1 and check that F ′(x) = xn for all x ∈ (a, b). Fur-thermore, this function is continuous on [a, b]. So by Theorem 1.5.1,

ˆ b

axn dx = F (b)− F (a) =

bn+1

n+ 1− an+1

n+ 1.

We have differentiation rules for products and compositions, and thesegive rise to similar rules for integrals.

Theorem 1.6.1 (Integration by Parts). Let f, g : [a, b] → R be Riemannintegrable functions. Suppose that F,G : [a, b]→ R are continuous functionsthat are primitives of f and g, respectively, in (a, b). Then

ˆ b

af(x)G(x) dx+

ˆ b

aF (x)g(x) dx = F (b)G(b)− F (a)G(a).

Proof. Write H = FG and note that this function is continuous on [a, b]and differentiable in (a, b) with H ′ = fG + Fg by the product rule. Henceby Theorem 1.5.1,ˆ b

a(f(x)G(x) + F (x)g(x)) dx = H(b)−H(a) = F (b)G(b)− F (a)G(a).

An application of Theorem 1.4.1 now completes the proof.

In practice, this formula is typically used in order to reduce the integral´ ba f(x)G(x) dx into the hopefully easier expression

F (b)G(b)− F (a)G(a)−ˆ b

aF (x)g(x) dx.

Examples can be found in Exercise 5.1.

Theorem 1.6.2 (Integration by Substitution). Let I ⊂ R be an open in-terval and f : I → R a continuous function. Suppose that u : [a, b] → Iis continuous on [a, b] and differentiable in (a, b) with u′ continuous andbounded. Extend u′ to [a, b] by assigning u′(a) and u′(b) arbitrarily. Then

ˆ u(b)

u(a)f(y) dy =

ˆ b

af(u(x))u′(x) dx.

1.7. EXCHANGING INTEGRALS WITH LIMITS 27

Proof. Choose a primitive F for f in I (which is possible by Corollary 1.5.1).Then F u is continuous on [a, b] and differentiable in (a, b) with (F u)′ =(f u)u′ by the chain rule. Now Theorem 1.5.1 implies that

ˆ u(b)

u(a)f(y) dy = F (u(b))− F (u(a))

and ˆ b

af(u(x))u′(x) dx = F (u(b))− F (u(a)).

Hence the two integrals are equal.

1.7 Exchanging Integrals with Limits

Consider a sequence (fk)k∈N of functions fk : [a, b] → R. Recall that fkconverges uniformly to a function f on [a, b] if

∀ε > 0 ∃K ∈ N ∀k ≥ K ∀x ∈ [a, b] : |fk(x)− f(x)| < ε.

By a result from MA10207, the uniform limit of continuous functions iscontinuous.

Theorem 1.7.1. Let (fk)k∈N be a sequence of Riemann integrable functionson [a, b] converging uniformly to a function f : [a, b] → R. Then f isRiemann integrable and

ˆ b

afk(x) dx→

ˆ b

af(x) dx

as k →∞.

Remark 1.7.1. The conclusion of the theorem can be written in the form

limk→∞

ˆ b

afk(x) dx =

ˆ b

alimk→∞

fk(x) dx.

So we can summarise the theorem as follows: if we have uniform convergence,then we can exchange the integral with the limit.

Proof. Let ε > 0 and fix a number K ∈ N such that for all k ≥ K andall x ∈ [a, b], we have |fk(x) − f(x)| < ε. Using Theorem 1.2.1, we finda subdivision ∆ of [a, b] such that Ω(fK ,∆) < ε. Let I1, . . . , IN be theintervals of ∆. Then for every n = 1, . . . , N , we have

ω(f, In) = sup f(In)− inf f(In)

≤ sup fK(In) + ε− inf fK(In) + ε

= ω(fK , In) + 2ε.


It follows that

Ω(f,∆) =N∑n=1

ω(f, In)|In| ≤N∑n=1

ω(fK , In)|In|+ 2εN∑n=1

|In|

= Ω(fK ,∆) + 2ε(b− a) < ε(1 + 2b− 2a).

The right-hand side can be made arbitrarily small. Thus Theorem 1.2.1implies that f is Riemann integrable.

Moreover, we have∣∣∣∣ˆ b

afk(x) dx−

ˆ b

af(x) dx

∣∣∣∣ =

∣∣∣∣ˆ b

a(fk(x)− f(x)) dx

∣∣∣∣≤ˆ b

a|fk(x)− f(x)| dx

≤ (b− a) supx∈[a,b]

|fk(x)− f(x)| → 0

as k → ∞ by Theorem 1.4.1 and Theorem 1.4.4. Therefore, we have thedesired convergence.

Corollary 1.7.1. If∑∞

k=1 fk is a uniformly convergent series of Riemannintegrable functions fk : [a, b] → R, then the sum of the series is Riemannintegrable and ˆ b

a

∞∑k=1

fk(x) dx =

∞∑k=1

ˆ b

afk(x) dx.

Corollary 1.7.2. Suppose that∑∞

k=0 αkxk is a power series with radius of

convergence R ∈ (0,∞] and −R < a < b < R. Then

ˆ b

a

∞∑k=0

αkxk dx =

∞∑k=0

αkk + 1

(bk+1 − ak+1

).

Proof. See exercise sheets.

Theorem 1.7.2. Let (fk)k∈N be a sequence of continuously differentiablefunctions on (a, b). Suppose that

(i) there exists a number x0 ∈ (a, b) such that the sequence (fk(x0))k∈N isconvergent, and

(ii) (f ′k)k∈N converges uniformly.

Then there exists a continuously differentiable function f : (a, b) → R suchthat fk → f uniformly and f ′k → f ′ uniformly as k →∞.

1.8. IMPROPER INTEGRALS 29

Proof. Let y0 = limk→∞ fk(x0) and let g : (a, b)→ R be the uniform limit off ′k. Then g is continuous, being the uniform limit of continuous functions.Define

f(x) = y0 +

ˆ x

x0

g(t) dt, x ∈ (a, b).

Then f ′ = g by Theorem 1.5.2. Moreover, for any x ∈ (a, b),

|fk(x)− f(x)| =∣∣∣∣fk(x0) +

ˆ x

x0

f ′k(t) dt− y0 −ˆ x

x0

g(t) dt

∣∣∣∣≤ |fk(x0)− y0|+

∣∣∣∣ˆ x

x0

(f ′k(t)− g(t)) dt

∣∣∣∣≤ |fk(x0)− y0|+

ˆ x

x0

|f ′k(t)− g(t)| dt

≤ |fk(x0)− y0|+ (b− a) supt∈[a,b]

|f ′k(t)− g(t)| → 0.

Hence fk → f uniformly. We already know that f ′k → g = f ′ uniformly.

1.8 Improper Integrals

If we have an unbounded interval or an unbounded function, then the pre-vious theory does not apply. When we think in terms of area under a curve,this means that we have only discussed bounded regions in R2 so far. Butsometimes it is appropriate to assign an area to an unbounded region. Thiscan often be done by taking a limit.

There are two distinct situation that we consider.

(i) Suppose that f : [a, b] → R is unbounded, but is Riemann integrable(and in particular bounded) on [c, b] for any c ∈ (a, b) and the limit

limc→a+

ˆ b

cf(x) dx

exits. Then we defineˆ b

af(x) dx = lim

c→a+

ˆ b

cf(x) dx.

The value of f at a is irrelevant here, so we can use the same definitionfor a function f : (a, b]→ R (provided that is satisfies the appropriateconditions).

Similarly, if f : [a, b] → R or f : [a, b) → R is Riemann integrable on[a, c) for any c ∈ (a, b), then

ˆ b

af(x) dx = lim

c→b−

ˆ c

af(x) dx,

provided that the limit exits.


(ii) Suppose that f : [a,∞)→ R is a function that is Riemann integrableon [a, c] for any c > a and the limit

limc→∞

ˆ c

af(x) dx

exists. Then we defineˆ ∞a

f(x) dx = limc→∞

ˆ c

af(x) dx.

Similarly, if we have a function f : (−∞, b] → R that is Riemannintegrable on [c, b] for any c < b, then

ˆ b

−∞f(x) dx = lim

c→−∞

ˆ b

cf(x) dx,

provided that the limit exists.

In either case, these are called improper integrals.

Example 1.8.1. Consider ˆ 1

0

dx√x.

We compute ˆ 1

c

dx√x

= 2√

1− 2√c→ 2

as c→ 0+. Hence ˆ 1

0

dx√x

= 2.

Example 1.8.2. Consider ˆ ∞1

dx

x

We have ˆ c

1

dx

x= log c− log 1 = log c→∞

as c→∞. Hence this is not an improper integral. Even so, it is customaryto write ˆ ∞

1

dx

x=∞.

Suppose that f : [a,∞)→ R is a non-negative function that is integrableon [a, c] for any c > a. Then the function F , given by

F (x) =

ˆ x

af(t) dt

1.8. IMPROPER INTEGRALS 31

is increasing. Hence F (x) either converges or diverges to ∞ as x → ∞. Ineither case, it makes sense to writeˆ ∞

af(x) dx = lim

x→∞F (x).

Theorem 1.8.1 (Integral Test for Convergence of Series). Let K ∈ Z andlet f : [K,∞)→ R be a decreasing, non-negative function. Then the series

∞∑k=K

f(k)

is convergent if, and only if, ˆ ∞K

f(x) dx <∞.

Proof. See Exercise 5.3.

Example 1.8.3. Let s ∈ R and consider the series∞∑k=1

ks.

In order to test convergence, also consider the integralˆ ∞1

xs dx.

We distinguish two cases. If s = −1, then the function x 7→ xs has theprimitive x 7→ log x in (0,∞). Henceˆ ∞

1xs dx = lim

c→∞log c =∞.

We conclude that∞∑k=1

1

k=∞.

If s 6= −1, then the function x 7→ xs has the primitive x 7→ xs+1

s+1 in (0,∞).We have ˆ ∞

1xs dx =

limc→∞ cs+1 − 1

s+ 1=

∞ if s > −1,

− 1s+1 if s < −1.

Hence we have∞∑k=1

ks =∞

if s ≥ −1 and∞∑k=1

ks <∞

if s < −1.

Chapter 2

Analysis in Several Variables

So far we have studied functions in one variable, i.e., defined on a domainS ⊂ R (typically an interval). Most of the concepts seen in MA10207 and inthis course, however, have generalisations for functions in several variables.So from now on, we consider domains S ⊂ RN and functions f : S → R orf : S → RM , where M,N ∈ N.

First we need to make a few observations about the space RN itself.

2.1 The Euclidean Space RN

Notation. We identify the elements of RN with column vectors (real N ×1matrices). In order to save space, we often make use of the matrix transpose,writing

(x1, . . . , xN )T =

x1...xN

.

Recall that RN has the familiar vector addition and multiplication withscalars, making it a vector space. The Euclidean inner product (or Euclideanscalar product) of x, y ∈ RN is

〈x, y〉 =N∑n=1

xnyn = yTx

when x = (x1, . . . , xN )T and y = (y1, . . . , yN )T . The Euclidean norm of avector x = (x1, . . . , xN )T ∈ RN is given by

‖x‖ =√〈x, x〉 =

√x2

1 + · · ·+ x2N .

The following properties are easily proved:

• 〈 · , · 〉 is linear in each variable,

33

34 CHAPTER 2. ANALYSIS IN SEVERAL VARIABLES

Figure 2.1.1: Geometric interpretation of the triangle inequality in R2: thelength of the vector x+ y is at most the sum of the lengths of x and y.

• ‖x‖ ≥ 0 for all x ∈ RN , with equality if, and only if, x = 0,

• ‖αx‖ = |α|‖x‖ for all x ∈ RN and α ∈ R, and

• 〈x, y〉 = 〈y, x〉 for all x, y ∈ RN .

The Cauchy-Schwarz inequality,

| 〈x, y〉 | ≤ ‖x‖‖y‖

for all x, y ∈ RN , is proved in MA20216.

Lemma 2.1.1 (Triangle Inequality). For all x, y ∈ RN ,

‖x+ y‖ ≤ ‖x‖+ ‖y‖.

The triangle inequality is illustrated in Fig. 2.1.1.

Proof. We have

‖x+ y‖2 = 〈x+ y, x+ y〉 = ‖x‖2 + 2 〈x, y〉+ ‖y‖2

≤ ‖x‖2 + 2‖x‖‖y‖+ ‖y‖2 = (‖x‖+ ‖y‖)2,

using the Cauchy-Schwarz inequality in the third step. Taking square rootsyields the result.

2.2. CONVERGENCE 35

2.2 Convergence

Definition 2.2.1. A sequence (xk)k∈N in RN is said to converge to the limitx0 ∈ RN if

∀ε > 0 ∃K ∈ N ∀k ≥ K : ‖xk − x0‖ < ε.

If so, we write x0 = limk→∞ xk or xk → x0 as k →∞.

Remark 2.2.1. This condition is equivalent to limk→∞ ‖xk − x0‖ = 0.

Lemma 2.2.1. Let (xk)k∈N be a sequence in RN , where

xk =(x

(1)k , . . . , x

(N)k

)Tfor every k ∈ N. Furthermore, let

x0 =(x

(1)0 , . . . , x

(N)0

)T∈ RN .

Then x0 = limk→∞ xk if, and only if, x(n)0 = limk→∞ x

(n)k for every n =

1, . . . , N .

Proof. Suppose that x0 = limk→∞ xk. Then for n = 1, . . . , N , we have

∣∣∣x(n)k − x

(n)0

∣∣∣ ≤ ( N∑n=1

(x

(n)k − x

(n)0

)2)1/2

= ‖xk − x0‖ → 0.

So x(n)k → x

(k)0 as k →∞.

Conversely, suppose that x(n)0 = limk→∞ x

(n)k for n = 1, . . . , N . Then

‖xk − x0‖ =

(N∑n=1

(x

(n)k − x

(n)0

)2)1/2

→ 0

as k →∞ by the algebra of limits in R.

Lemma 2.2.2. Let (xk)k∈N and (yk)k∈N be convergent sequences in RNand (αk)k∈N a convergent sequence in R. Furthermore, let x0 = limk→∞ xk,y0 = limk→∞ yk, and α0 = limk→∞ αk. Then

(i) x0 + y0 = limk→∞(xk + yk),

(ii) α0x0 = limk→∞(αkxk),

(iii) 〈x0, y0〉 = limk→∞ 〈xk, yk〉, and

(iv) ‖x0‖ = limk→∞ ‖xk‖.


Proof. (i) We have

‖(xk + yk)− (x0 + y0)‖ ≤ ‖xk − x0‖+ ‖yk − y0‖ → 0

by the triangle inequality.(ii) Here we observe that

‖αkxk − α0x0‖ = ‖αk(xk − x0) + (αk − α0)x0‖≤ ‖αk(xk − x0)‖+ ‖(αk − α0)x0‖= |αk|‖xk − x0‖+ |αk − α0|‖x0‖.

We know that ‖xk − x0‖ → 0 and |αk − α0| → 0 as k → ∞. Moreover, wehave |αk| → |α0|. It follows that ‖αkxk − α0x0‖ → 0 as k →∞ as well.

(iii) We compute

| 〈xk, yk〉 − 〈x0, y0〉 | = | 〈xk − x0, yk − y0〉+ 〈x0, yk − y0〉+ 〈xk − x0, y0〉 |≤ | 〈xk − x0, yk − y0〉 |+ | 〈x0, yk − y0〉 |

+ | 〈xk − x0, y0〉 |≤ ‖xk − x0‖‖yk − y0‖+ ‖x0‖‖yk − y0‖

+ ‖xk − x0‖‖y0‖.

Now we observe that all the terms on the last line tend to 0.(iv) This is proved in Exercise 6.4.

Definition 2.2.2. A set S ⊂ RN is bounded if there exists a number R ≥ 0such that ‖x‖ ≤ R for all x ∈ S. A sequence (xk)k∈N is bounded if the setxk : k ∈ N is bounded.

Theorem 2.2.1 (Bolzano-Weierstrass). Every bounded sequence in RN hasa convergent subsequence.

Proof. Let (xk)k∈N be a bounded sequence in RN with

xk =(x

(1)k , . . . , x

(N)k

)T.

Note that for any n = 1, . . . , N , we have |x(n)k | ≤ ‖xk‖, so the sequence

(x(n)k )k∈N is bounded in R.

By the Bolzano-Weierstrass theorem in R, there exists a convergent sub-

sequence of (x(1)k )k∈N. That is, there exists an infinite subset Λ1 ⊂ N such

that (x(1)k )k∈Λ1 is convergent. Let x

(1)0 denote its limit.

Now (x(2)k )k∈Λ1 is a bounded sequence in R as well. Apply the same ar-

guments to obtain an infinite set Λ2 ⊂ Λ1 such that (x(2)k )k∈Λ2 is convergent

with limit x(2)0 . Note that (x

(1)k )k∈Λ2 , being a subsequence of a convergent

sequence, still converges to x(1)0 .

2.3. OPEN AND CLOSED SETS 37

Apply the same arguments to the remaining coordinates in turn, con-

structing infinite subsets N ⊃ Λ1 ⊃ . . . ⊃ ΛN chosen such that x(n)k → x

(n)0

as k →∞ while k ∈ Λn for n = 1, . . . , N .

Then for (xk)k∈ΛN, all the coordinates converge. Using Lemma 2.2.1, we

see that the subsequence is convergent in RN .

2.3 Open and Closed Sets

Definition 2.3.1. If x ∈ RN and r > 0, then

Br(x) =y ∈ RN : ‖x− y‖ < r

is called the open ball with centre x and radius r.

Definition 2.3.2. A set G ⊂ RN is called open if for every x ∈ G thereexists an r > 0 such that Br(x) ⊂ G. A set F ⊂ RN is called closed if thecomplement RN\F is open.

Remark 2.3.1. In this context, ‘closed’ is not the same as ‘not open’. Thereare sets that are neither open nor closed, and there are even some sets thatare both open and closed.

Example 2.3.1. For N = 1 (i.e., in R), open intervals (a, b) are open andclosed intervals [a, b] are closed. A half-open interval [a, b) or (a, b] is neitheropen nor closed.

Theorem 2.3.1. Let S ⊂ RN . Then S is closed if, and only if, it containsthe limit of every sequence in S that converges in RN .

Proof. Suppose that S is closed and consider a sequence (xk)k∈N in S. Forany point y0 ∈ RN\S, there exists a number r > 0 with Br(y0) ⊂ RN\S,since RN\S is open. Therefore, we have

‖xk − y0‖ ≥ r

for all k ∈ N and y0 is certainly not a limit of the sequence. So if a limitexists at all, it must belong to S.

Now suppose that S is not closed. Then RN\S is not open. Hence thereexists a point x0 ∈ RN\S such that for any r > 0, we have Br(x0) 6⊂ RN\S.In particular, for any k ∈ N, we have B1/k(x0) ∩ S 6= ∅; so we can choose apoint xk ∈ B1/k(x0) ∩ S. Thus we construct a sequence (xk)k∈N in S withthe property that

‖xk − x0‖ <1

k→ 0.

Hence we have convergence to x0, which is not in S.


Definition 2.3.3. A subset of RN is called compact if it is closed andbounded.

Compact sets have particularly nice properties. The following is an ex-ample.

Corollary 2.3.1. Let C ⊂ RN be a compact set. Then every sequence in Chas a subsequence converging to a point of C.

Proof. Given a sequence in C, Theorem 2.2.1 implies that there exists aconvergent subsequence. By Theorem 2.3.1, the limit must belong to C.

2.4 Continuity

Definition 2.4.1. Let S ⊂ RN and f : S → RM . For x0 ∈ S, we say thatf is continuous at x0 if

∀ε > 0 ∃δ > 0 ∀x ∈ S : ‖x− x0‖ < δ ⇒ ‖f(x)− f(x0)‖ < ε.

We say that f is continuous if it is continuous at every point of S. Finally,we say that f is uniformly continuous if

∀ε > 0 ∃δ > 0 ∀x, y ∈ S : ‖x− y‖ < δ ⇒ ‖f(x)− f(y)‖ < ε.

Definition 2.4.2. Let S ⊂ RN and f : S → RM . For x0 ∈ S and ` ∈ RM ,we say that f(x) converges to ` as x→ x0 if

∀ε > 0 ∃δ > 0 ∀x ∈ S : 0 < ‖x− x0‖ < δ ⇒ ‖f(x)− `‖ < ε.

In this case, we write f(x)→ ` as x→ x0 or ` = limx→x0 f(x).

Remark 2.4.1. It follows that f is continuous at x0 if, and only if, f(x)→f(x0) as x→ x0.

Theorem 2.4.1. Let S ⊂ RN and f : S → RM . Furthermore, let x0 ∈ S.Then f is continuous at x0, if, and only if, for any sequence (xk)k∈N in Sconverging to x0, the sequence (f(xk))k∈N converges to f(x0) as k →∞.

Proof. Suppose that f is continuous at x0. Consider a sequence (xk)k∈Nconverging to x0 as k →∞. Let ε > 0. Then by the continuity, there existsa number δ > 0 such that for all x ∈ S,

‖x− x0‖ < δ ⇒ ‖f(x)− f(x0)‖ < ε.

By the convergence, there exists a number K ∈ N such that ‖xk−x0‖ < δ forall k ≥ K. Then ‖f(xk) − f(x0)‖ < ε for all k ≥ K. Hence f(xk) → f(x0)as k →∞.

2.4. CONTINUITY 39

Conversely, suppose that f is not continuous at x0. Then there existsa number ε > 0 such that for all δ > 0 there exists a point x ∈ S with‖x− x0‖ < δ, but ‖f(x)− f(x0)‖ ≥ ε. In particular, for any k ∈ N, we canchoose a point xk ∈ S with

‖xk − x0‖ <1

k, but ‖f(xk)− f(x0)‖ ≥ ε.

Then the sequence (xk)k∈N evidently converges to x0, but the sequence(f(xk))k∈N does not converge to f(x0).

Corollary 2.4.1. Let S ⊂ RN and x0 ∈ S.

(i) If f, g : S → RM are both continuous at x0, then f + g is continuousat x0.

(ii) If f : S → RM is continuous at x0 and φ : S → R is continuous at x0,then φf is continuous at x0.

Proof. Combine Theorem 2.4.1 with Lemma 2.2.2.

Theorem 2.4.2. Let A ⊂ RN and B ⊂ RM and let x0 ∈ A. Suppose thatf : A → RM and g : B → RK are functions with f(A) ⊂ B, such that f iscontinuous at x0 and g is continuous at f(x0). Then g f is continuous atx0.

Proof. First note that g f is well-defined by the assumption f(A) ⊂ B.

Let ε > 0. By the continuity of g at f(x0), there exists a number η > 0such that ‖g(y)− g(f(x0))‖ < ε for all y ∈ B with ‖y − f(x0)‖ < η. By thecontinuity of f at x0, there exists a number δ > 0 such that ‖f(x)−f(x0)‖ <η for all x ∈ A with ‖x− x0‖ < δ.

Now if x ∈ A and ‖x−x0‖ < δ, it follows that ‖g(f(x))− g(f(x0))‖ < ε.Hence g f is continuous at x0.

Theorem 2.4.3 (Weierstrass Extreme Value Theorem). Let C ⊂ RN benon-empty and compact. Then any continuous function f : C → R isbounded and attains its infimum and supremum.

Proof. Let α = sup f(C) ∈ (−∞,∞]. Then we can choose a sequence(αk)k∈N in f(C) such that α = limk→∞ αk. For each k ∈ N, choose a pointxk ∈ C with f(xk) = αk.

By Corollary 2.3.1, there exists a convergent subsequence (xkj )j∈N con-verging to a point of C, say xkj → x0 ∈ C as j → ∞. By the continuity off , we now have

f(x0) = limj→∞

f(xkj ) = limj→∞

αkj = limk→∞

αk = sup f(C).


Hence the supremum is attained. It also follows that

sup f(C) = f(x0) <∞.

For the infimum, we can use the same arguments. We conclude that theinfimum is attained and

inf f(C) > −∞.

Therefore, we also conclude that f is bounded.

Theorem 2.4.4. If C ⊂ RN is compact, then every continuous functionf : C → RN is uniformly continuous.

Proof. Assume, by way of contradiction, that f is continuous but not uni-formly so. Then there exists an ε > 0 such that for every δ > 0 there existtwo points x, y ∈ C with ‖x − y‖ < δ, but ‖f(x) − f(y)‖ ≥ ε. If we fix εwith this property, then that means in particular that for any k ∈ N, thereexist xk, yk ∈ C such that ‖xk − yk‖ < 1

k , but ‖f(xk)− f(yk)‖ ≥ ε.Corollary 2.3.1 implies that (xk)k∈N has a convergent subsequence with

limit in C, say xkj → x0 ∈ C. Then

‖ykj − x0‖ ≤ ‖ykj − xkj‖+ ‖xkj − x0‖ → 0,

hence we have ykj → x0 as well. Thus by the continuity of f and Theorem2.4.1, we have

f(xkj )→ f(x0) and f(ykj )→ f(x0).

On the other hand, we have the inequality

‖f(xkj )− f(ykj )‖ ≥ ε,

and the two statements contradict each other.

2.5 Norms

Everything that we have done so far in this chapter is based on the ideathat we can measure distances in RN in terms of the Euclidean norm ‖·‖.This concept can be generalised.

Definition 2.5.1. A norm on real vector space V is a map ‖·‖V : V → Rsuch that

(i) ‖x‖V ≥ 0 for all x ∈ V with equality if, and only if, x = 0,

(ii) ‖αx‖V = |α|‖x‖V for all x ∈ V and all α ∈ R, and

(iii) ‖x+ y‖V ≤ ‖x‖V + ‖y‖V for all x, y ∈ V .

2.5. NORMS 41

Example 2.5.1. We have already seen that the Euclidean norm ‖·‖ on RNsatisfies these conditions. Other examples of norms on RN include ‖·‖1 and‖·‖∞ with

‖x‖1 =

N∑n=1

|xn| and ‖x‖∞ = max|x1|, . . . , |xN |

for x = (x1, . . . , xN )T .

Given a real vector space V with a norm ‖·‖V , we can define convergence,balls, and open sets in V analogously to the corresponding concepts in RN ,simply replacing the Euclidean norm by ‖·‖V everywhere. If we have tworeal vector spaces V and W with norms ‖·‖V and ‖·‖W , respectively, thenwe can also define continuity of a map f : V →W .

Definition 2.5.2. Two norms ‖·‖1 and ‖·‖2 on a real vector space V areequivalent if there exists a number C ≥ 0 such that ‖x‖1 ≤ C‖x‖2 and‖x‖2 ≤ C‖x‖1 for all x ∈ V .

Proposition 2.5.1. Let ‖·‖1 and ‖·‖2 be two equivalent norms on the realvector space V . Then a sequence (xk)k∈N in V converges to a limit x0 ∈ Vwith respect to the norm ‖·‖1 if, and only if, it converges to x0 with respectto ‖·‖2.

Proof. Convergence with respect to ‖·‖1 means that ‖xk − x0‖1 → 0 ask → ∞. But then ‖xk − x0‖2 ≤ C‖xk − x0‖1 → 0 as well, so we haveconvergence with respect to ‖·‖2. The arguments for the converse are thesame.

Remark 2.5.1. So equivalent norms give rise to the same notion of conver-gence. It follows that they also give rise to the same continuous functions.

Theorem 2.5.1. Any two norms on RN are equivalent.

Proof. It suffices to show that any norm ‖·‖∗ on RN is equivalent to theEuclidean norm ‖·‖. Let (e1, . . . , eN ) be the standard basis in RN . Thenfor x = (x1, . . . , xN )T ∈ RN , we have

‖x‖∗ =

∥∥∥∥∥N∑n=1

xnen

∥∥∥∥∥∗

≤N∑n=1

‖xnen‖∗

=

N∑n=1

|xn|‖en‖∗ ≤

(N∑n=1

x2n

)1/2( N∑n=1

‖en‖2∗

)1/2

by the triangle inequality, property (ii) in Definition 2.5.1, and the Cauchy-

Schwarz inequality. Setting C1 =(∑N

n=1 ‖en‖2∗)1/2

, we obtain

‖x‖∗ ≤ C1‖x‖. (2.1)


Now note that for x, y ∈ RN , we have

|‖x‖∗ − ‖y‖∗| ≤ ‖x− y‖∗ ≤ C1‖x− y‖

by the triangle inequality and the first part of this proof (cf. Exercise 6.3).Hence ‖·‖∗ is a continuous function with respect to the Euclidean norm.Let

S =x ∈ RN : ‖x‖ = 1

,

which is a closed and bounded set with respect to ‖·‖. It follows fromWeierstrass’s extreme value theorem (Theorem 2.4.3) that there exists apoint x0 ∈ S such that ‖x‖∗ ≥ ‖x0‖∗ for all x ∈ S. Let C2 = 1

‖x0‖∗ . Thenfor any x 6= 0, we have

x

‖x‖∈ S.

Hence1

C2≤∥∥∥∥ x

‖x‖

∥∥∥∥∗

=‖x‖∗‖x‖

.

That is,‖x‖ ≤ C2‖x‖∗. (2.2)

This inequality is trivial for x = 0.The equivalence of the norms now follows from the two inequalities (2.1)

and (2.2).

The following is another example of a norm. We will use this specificnorm later.

Definition 2.5.3. Let Hom(RN ,RM ) denote the space of all linear mapsA : RN → RM . The operator norm is the norm ‖·‖ on Hom(RN ,RM )defined by

‖A‖ = sup‖Ax‖ : x ∈ RN with ‖x‖ ≤ 1

.

Remark 2.5.2. For any A ∈ Hom(RN ,RM ), the operator norm ‖A‖ isfinite. Indeed, if (amn)m,n is the transformation matrix of A with respect tothe standard basis, then for all x ∈ RN , we have

‖Ax‖ =

M∑m=1

(N∑n=1

amnxn

)21/2

≤

(M∑m=1

(N∑n=1

a2mn

)(N∑n=1

x2n

))1/2

=

(M∑m=1

N∑n=1

a2mn

)1/2

‖x‖

by the Cauchy-Schwarz inequality; so

‖A‖ ≤

(M∑m=1

N∑n=1

a2mn

)1/2

.

It is checked in Exercise 7.4 that the operator norm is a norm.

2.6. DERIVATIVES 43

Proposition 2.5.2. (i) If A ∈ Hom(RN ,RM ) and x ∈ RN , then

‖Ax‖ ≤ ‖A‖‖x‖.

(ii) If A,B ∈ Hom(RN ,RM ), then

‖AB‖ ≤ ‖A‖‖B‖.

Proof. (i) If x = 0, then this is trivial. Otherwise, we have

‖Ax‖ = ‖x‖∥∥∥∥A x

‖x‖

∥∥∥∥ ≤ ‖A‖‖x‖.(ii) For any x ∈ RN with ‖x‖ ≤ 1, we have

‖ABx‖ ≤ ‖A‖‖Bx‖ ≤ ‖A‖‖B‖‖x‖ ≤ ‖A‖‖B‖.

The desired inequality then follows.

Remark 2.5.3. The space Hom(RN ,RM ) can be identified with RMN byidentifying a linear map with the corresponding matrix. Theorem 2.5.1 saysthat any two norms are equivalent on RMN , and the same statement canthen be made for Hom(RN ,RM ). However, other norms will not satisfy theinequalities from Proposition 2.5.2 in general.

2.6 Derivatives

Recall that for a function f : (a, b) → R, the derivative of f at a pointx0 ∈ (a, b), if it exists, is defined by

f ′(x0) = limx→x0

f(x)− f(x0)

x− x0.

In this form, the definition has no obvious generalisation for functions S →RM with S ⊂ RN , because we cannot divide by a vector x − x0 withx, x0 ∈ RN . However, there is another characterisation of the derivative:it determines the best linear approximation of a given function near a givenpoint (cf. Fig. 2.6.1). This concept does have a generalisation to higherdimensions.

For differentiation, we need to work with maps defined on open sets.From now on, Ω denotes an open subset of RN .

Definition 2.6.1. Let x0 ∈ Ω. A map f : Ω→ RM is Frechet differentiable(or differentiable for short) at x0 if there exists a linear map A : RN → RMsuch that

f(x)− f(x0)−A(x− x0)

‖x− x0‖→ 0 as x→ x0.

In this case, the linear map A is called the Frechet derivative (or derivativefor short) of f at x0 and denoted Df(x0).


Figure 2.6.1: Linear approximation of a differentiable function f : (a, b)→ R

In the special case N = M = 1, we have

limx→x0

f(x)− f(x0)− (x− x0)f ′(x0)

x− x0= lim

x→x0

f(x)− f(x0)

x− x0− f ′(x0) = 0,

so

limx→x0

f(x)− f(x0)− (x− x0)f ′(x0)

|x− x0|= 0

as well. Hence the Frechet derivative at x0 is the linear map Df(x0) : R→ Rwith Df(x0)h = f ′(x0)h.

Remark 2.6.1. Recall the ‘little o’ notation: when we write g(x) = o(h(x))as x→ x0, say, this means that

limx→x0

g(x)

h(x)= 0.

For example, we have ‖x−x0‖2 = o(‖x−x0‖) as x→ x0. With this notation,the condition for the Frechet derivative can be written in the form

f(x) = f(x0) +Df(x0)(x− x0) + o(‖x− x0‖) as x→ x0.

Since the definition of the derivative involves a linear map, the followinginformation is useful.

Lemma 2.6.1. Any linear map A : RN → RM is continuous.

2.6. DERIVATIVES 45

Proof. By results from linear algebra, there is an (M ×N)-matrix (amn)m,nsuch that the components of Ax are

N∑n=1

amnxn, m = 1, . . . ,M.

Thus the claim follows from Lemma 2.2.1 and Theorem 2.4.1.

Proposition 2.6.1. Let f : Ω → RM be a map that is differentiable atx0 ∈ Ω. Then f is continuous at x0.

Proof. We have

f(x)− f(x0) = Df(x0)(x− x0) + o(‖x− x0‖)→ 0 as x→ x0

by Lemma 2.6.1, as required.

We also have a different notion of derivative.

Definition 2.6.2. Let f : Ω → RM be a map. Let x0 ∈ Ω. For n =1, . . . , N , let en = (0, . . . , 0, 1, 0, . . . , 0)T denote the n-th standard unit vectorin RN . Then

∂f

∂xn(x0) = lim

h→0

f(x0 + hen)− f(x0)

h,

if the limit exists, is called the partial derivative of f at x0 with respect toxn. If we write f(x) = (f1(x), . . . , fM (x))T for x ∈ Ω, then the matrix

Jf(x0) =

(∂fm∂xn

(x0)

)m,n

is called the Jacobi matrix of f at x0, provided that these partial derivativesexist.

Definition 2.6.3. Suppose that x0 ∈ Ω, let f : Ω→ RM be a map, and letv ∈ RN . Then

Dvf(x0) = limh→0

f(x0 + hv)− f(x0)

h,

if it exists, is called the directional derivative of f at x0 in direction v.

Remark 2.6.2. We have ∂f∂xn

(x0) = Denf(x0) if it exists.

Definition 2.6.4. Let x0 ∈ Ω. If f : Ω→ R is a function such that Jf(x0)exists, then

∇f(x0) = (Jf(x0))T

is called the gradient of f at x0.


Because we assume that f is a function with values in R here, the gradientat a point x0 is a column vector. It has a geometric interpretation: forx0 ∈ Ω, the direction of the vector ∇f(x0) is the direction of steepest ascentof f at x0, while its length is the rate of change in that direction.

Lemma 2.6.2. Suppose that f : Ω→ RM is differentiable at x0 ∈ Ω. Thenall the partial derivatives ∂f

∂xn(x0) exist for n = 1, . . . , N , and the Jacobian

matrix Jf(x0) is the transformation matrix of the linear map Df(x0) withrespect to the standard bases in RM and RN .

Proof. Let (e1, . . . , en) be the standard basis of RN and (ε1, . . . , εm) thestandard basis of RM . Let A = (amn)m,n be the transformation matrix ofDf(x0). Fix n ∈ 1, . . . , N and m ∈ 1, . . .M. Then for h 6= 0 with |h|small enough, we have x0 + hen ∈ Ω, as Ω is open. Moreover,∣∣∣∣fm(x0 + hen)− fm(x0)

h− amn

∣∣∣∣=

∣∣∣∣⟨εm, f(x0 + hen)− f(x0)

h−Df(x0)en

⟩∣∣∣∣≤∥∥∥∥f(x0 + hen)− f(x0)−Df(x0)hen

h

∥∥∥∥→ 0

as h→ 0. Hence we obtain

∂fm∂xn

(x0) = amn

when taking the limit.

Remark 2.6.3. This lemma implies that the derivative Df(x0), if it exists,is unique.

Remark 2.6.4. If f is differentiable at x0, then for any v ∈ RN , we canshow with the arguments from Lemma 2.6.2 that Dvf(x0) = Df(x0)v.

Theorem 2.6.1. Let x0 ∈ Ω. If f : Ω → RM is a map such that all thepartial derivatives ∂f

∂x1, . . . , ∂f

∂xNexist throughout Ω and are continuous at x0,

then f is differentiable at x0.

Proof. We first consider the case M = 1.Let r > 0 such that Br(x0) ⊂ Ω. Let h = (h1, . . . , hN )T ∈ Br(0).

Consider the function t 7→ f(x0 + te1), which is differentiable in (−r, r) withderivative

∂f

∂x1(x0 + te1).

By the mean value theorem, there exists a number θ1 ∈ R with |θ1| ≤ |h1|such that

f(x0 + h1e1) = f(x0) + h1∂f

∂x1(x0 + θ1e1).

2.6. DERIVATIVES 47

Similarly, there exists a number θ2 ∈ R with |θ2| ≤ |h2| such that

f(x0 + h1e1 + h2e2) = f(x0 + h1e1) + h2∂f

∂x2(x0 + h1e1 + θ2e2)

= f(x0) + h1∂f

∂x1(x0 + θ1e1)

+ h2∂f

∂x2(x0 + h1e1 + θ2e2).

Continuing with the coordinates x3, x4, . . . , xN , we obtain θ3, . . . , θN with|θn| ≤ |hn| for n = 1, . . . , N such that

f(x0 + h) = f(x0) + h1∂f

∂x1(x0 + θ1e1)

+ · · ·+ hN∂f

∂xN(x0 + h1e1 + · · ·+ hN−1eN−1 + θNeN ).

Setting

bn =∂f

∂xn(x0 + h1e1 + · · ·+ hn−1en−1 + θnen)− ∂f

∂xn(x0),

we obtain

f(x0 + h) = f(x0) + Jf(x0)h+N∑n=1

bnhn. (2.3)

By the continuity of the partial derivatives, we have bn → 0 as h→ 0. Hence

N∑n=1

bnhn = o(‖h‖).

So (2.3) implies that Df(x0) exists and is represented by the matrix Jf(x0).If M ≥ 2, then we apply these arguments to every component of f . The

claim then follows in this case as well.

Definition 2.6.5. A map f : Ω→ RM is called continuously differentiableif it is differentiable throughout Ω and the map Df : Ω→ Hom(RN ,RM ) iscontinuous.

Here continuity is meant with respect to the operator norm on the spaceHom(RN ,RM ). However, it follows from Lemma 2.2.1, Theorem 2.4.1, The-orem 2.6.1, and the equivalence of all norms on Hom(RN ,RM ) that f iscontinuously differentiable if, and only if, all partial derivatives exist andare continuous in Ω.

Example 2.6.1. Let f : R3 → R2 with

f(x) =

(x2

1 + x22x3

x1x2x3

)


for x ∈ R3. We compute

Jf(x) =

(2x1 2x2x3 x2

2

x2x3 x1x3 x1x2

).

All of these expressions give rise to continuous functions, hence f is contin-uously differentiable.

Theorem 2.6.2 (Chain Rule). Let U ⊂ RN and V ⊂ RM be open sets. Letf : U → RM and g : V → RK be maps and suppose that f(U) ⊂ V . Letx0 ∈ U . If f is differentiable at x0 and g is differentiable at f(x0), then gfis differentiable at x0 with

D(g f)(x0) = Dg(f(x0))Df(x0).

Proof. Defineφ(x) = f(x)− f(x0)−Df(x0)(x− x0)

for x ∈ U and

ψ(y) = g(y)− g(f(x0))−Dg(f(x0))(y − f(x0))

for y ∈ V . Then

limx→x0

φ(x)

‖x− x0‖= 0 (2.4)

and

limy→f(x0)

ψ(y)

‖y − f(x0)‖= 0 (2.5)

by the definition of the Frechet derivative.Let x ∈ U\x0. If f(x) = f(x0), then obviously g(f(x))−g(f(x0)) = 0.

Otherwise,

g(f(x))− g(f(x0)) = Dg(f(x0))(f(x)− f(x0)) + ψ(f(x))

= Dg(f(x0)) (Df(x0)(x− x0) + φ(x))

+ψ(f(x))

‖f(x)− f(x0)‖‖f(x)− f(x0)‖

= Dg(f(x0))Df(x0)(x− x0) +Dg(f(x0))φ(x)

+ψ(f(x))

‖f(x)− f(x0)‖‖Df(x0)(x− x0) + φ(x)‖.

Hence

‖g(f(x))− g(f(x0))−Dg(f(x0))Df(x0)(x− x0)‖‖x− x0‖

≤ ‖Dg(f(x0))‖ ‖φ(x)‖‖x− x0‖

+‖ψ(f(x))‖

‖f(x)− f(x0)‖

(‖Df(x0)‖+

‖φ(x)‖‖x− x0‖

).

2.6. DERIVATIVES 49

Figure 2.6.2: A level set of f and the gradient ∇f(x0) perpendicular to it.

Now (2.4) and (2.5) imply that

limx→x0

‖g(f(x))− g(f(x0))−Dg(f(x0))Df(x0)(x− x0)‖‖x− x0‖

= 0.

Thus Dg(f(x0))Df(x0) is the Frechet derivative of g f at x0.

If we have a function f : Ω→ R, then the chain rule can be used to findanother geometric interpretation of the gradient ∇f . Suppose that x0 ∈ Ωand f is differentiable at x0. Let α = f(x0) and consider the level set Sα =x ∈ Ω: f(x) = α. Then ∇f(x0) is perpendicular to Sα in the followingsense (cf. Fig. 2.6.2). Suppose that we have a curve γ : (−r, r) → Sα withγ(0) = x0. If the derivative γ′(0) exists, then we can interpret it as a tangentvector to Sα at x0. Consider the function h = f γ. We have h(t) = α forevery t ∈ (−r, r), because γ takes values in Sα. Using the chain rule, wenow compute

0 = h′(0) = Df(x0)γ′(0) =⟨∇f(x0), γ′(0)

⟩.

(It is possible, however, that Sα does not have any tangent vectors at x0

except 0.)

Notation. If x, y ∈ RN , then we write [x, y] = (1− t)x+ ty : 0 ≤ t ≤ 1for the line segment connecting x and y.


Proposition 2.6.2 (Mean Value Inequality). Let x, y ∈ Ω with [x, y] ⊂ Ω.If f : Ω → RM is continuous on [a, b] and differentiable at every pointz ∈ [a, b]\a, b with ‖Df(z)‖ ≤ K, then

‖f(x)− f(y)‖ ≤ K‖x− y‖.Proof. Fix v ∈ RM and define

g(t) = 〈v, f((1− t)x+ ty)〉 , 0 ≤ t ≤ 1.

Then by Theorem 2.4.2, the function g is continuous on [0, 1] and by Theo-rem 2.6.2, it is differentiable in (0, 1) with

g′(t) = 〈v,Df((1− t)x+ ty)(y − x)〉 .By the mean value theorem, there exists a number τ ∈ (0, 1) such that

g(1)− g(0) = g′(τ),

i.e.,

〈v, f(y)− f(x)〉 = 〈v,Df((1− τ)x+ τy)(x− y)〉 ≤ K‖v‖‖x− y‖.Choose v = f(y)− f(x), then we obtain

‖f(y)− f(x)‖2 ≤ K‖f(y)− f(x)‖‖x− y‖.If f(x) = f(y), then there is nothing to prove. Otherwise, we divide by‖f(y)− f(x)‖ on both sides to obtain the desired inequality.

Definition 2.6.6. Suppose that x0 ∈ Ω. Let f : Ω→ R be a function. If fis differentiable at x0 and Df(x0) = 0, then x0 is called a critical point (orstationary point) of f .

If there exists a number r > 0 such that f(x0) ≤ f(x) for all x ∈ Br(x0),then x0 is called a local minimum point of f . If there exists a number r > 0such that f(x0) ≥ f(x) for all x ∈ Br(x0), then x0 is called a local maximumpoint of f . If x0 is a critical point of f and neither a local minimum pointnor a local maximum point, then it is called a saddle point of f .

Proposition 2.6.3. Suppose that f : Ω → R is a function. If x0 ∈ Ω is alocal minimum point or local maximum point of f and f is differentiable atx0, then x0 is a critical point of f .

Proof. Choose r > 0 such that Br(x0) ⊂ Ω. Fix n ∈ 1, . . . , N and let enbe the n-th standard unit vector in RN . Consider the function

g(t) = f(x0 + ten), t ∈ (−r, r).This function is differentiable at 0 by the chain rule with

g′(0) =∂f

∂xn(x0).

Moreover, the function g has a local minimum or maximum point at 0.Hence g′(0) = 0. It follows that ∂f

∂xn(x0) = 0 for n = 1, . . . N , and therefore

Df(x0) = 0.

2.7. HIGHER ORDER DERIVATIVES 51

2.7 Higher Order Derivatives

If we have a map f : Ω→ RM such that the partial derivative with respectto xj (for some j = 1, . . . , N) exists throughout Ω, then it may happen that∂f∂xj

itself has a partial derivative, say with respect to xi. Then we write

∂2f

∂xi∂xj=

∂

∂xi

∂f

∂xj,

or possibly

∂2f

∂x2i

=∂

∂xi

∂f

∂xi

if i = j. Even higher order partial derivatives are defined similarly.

If f has a Frechet derivative throughout Ω, then we have a map Df :Ω → Hom(RN ,RM ). It may happen that Df has a Frechet derivative ata point x0 ∈ Ω. This is then denoted by D2f(x0) and is a linear mapRN → Hom(RN ,RM ) (i.e., an element of Hom(RN ,Hom(RN ,RM )). Again,we can define even higher derivatives similarly.

Definition 2.7.1. Suppose that f : Ω→ R is a function with second partialderivatives at a point x0 ∈ Ω. Then the matrix

Hf(x0) =

(∂2f

∂xi∂xj(x0)

)i,j

is called the Hessian (or Hessian matrix ) of f at x0.

Theorem 2.7.1 (Symmetry of the Hessian). Let f : Ω → R be a functionthat has continuous second order partial derivatives in Ω. Then for i, j =1, . . . , N ,

∂2f

∂xi∂xj=

∂2f

∂xj∂xi.

Proof. Let x0 ∈ Ω and choose r > 0 such that B2r(x0) ⊂ Ω. Let en bethe n-th standard unit vector in RN . Let h ∈ (0, r) and consider the twofunctions

g1(s) = f(x0 + sei + hej)− f(x0 + sei), 0 ≤ s ≤ hg2(t) = f(x0 + hei + tej)− f(x0 + tej), 0 ≤ t ≤ h.

Note that

g′1(s) =∂f

∂xi(x0 + sei + hej)−

∂f

∂xi(x0 + sei).


By the mean value theorem, there exists a number σ1 ∈ (0, h) such thatg1(h)− g1(0) = hg′(σ1). That is,

f(x0 + hei + hej)− f(x0 + hei)− f(x0 + hej) + f(x0)

= h

(∂f

∂xi(x0 + σ1ei + hej)−

∂f

∂xi(x0 + σ1ei)

).

Applying the mean value theorem to the function t 7→ ∂f∂xi

(x0 + σ1ei + tej),we find a number τ1 ∈ (0, h) such that

∂f

∂xi(x0 + σ1ei + hej)−

∂f

∂xi(x0 + σ1ei) = h

∂2f

∂xj∂xi(x0 + σ1ei + τ1ej).

That is,

1

h2(f(x0 + hei + hej)− f(x0 + hei)− f(x0 + hej) + f(x0))

=∂2f

∂xj∂xi(x0 + σ1ei + τ1ej).

Replacing g1 with g2 and applying the same arguments, we find σ2, τ2 ∈(0, h) such that

1

h2(f(x0 + hei + hej)− f(x0 + hei)− f(x0 + hej) + f(x0))

=∂2f

∂xi∂xj(x0 + σ2ei + τ2ej).

Hence∂2f

∂xj∂xi(x0 + σ1ei + τ1ej) =

∂2f

∂xi∂xj(x0 + σ2ei + τ2ej). (2.6)

Fix ε > 0. But by the continuity of the second partial derivatives, wecan choose h so small that we have∣∣∣∣ ∂2f

∂xj∂xi(x0 + σ1ei + τ1ej)−

∂2f

∂xj∂xi(x0)

∣∣∣∣ ≤ εand ∣∣∣∣ ∂2f

∂xi∂xj(x0 + σ2ei + τ2ej)−

∂2f

∂xi∂xj(x0)

∣∣∣∣ ≤ ε.Then (2.6) implies ∣∣∣∣ ∂2f

∂xj∂xi(x0)− ∂2f

∂xi∂xj(x0)

∣∣∣∣ < 2ε.

Since ε was chosen arbitrarily, this concludes the proof.

2.7. HIGHER ORDER DERIVATIVES 53

Notation. When working with higher order derivatives, we may have tokeep track of many indices, and then it is convenient to use a multi-indexnotation.

Let α = (α1, . . . , αN ) ∈ NN0 , where N0 = N ∪ 0. Then for x ∈ RN wedefine

xα = xα11 · · ·x

αNN .

Moreover, we set

|α| = α1 + · · ·+ αN and α! = α1! · · ·αN !.

If f : Ω→ R is a function, then we define

∂|α|f

∂xα=

∂|α|f

∂xα11 . . . ∂xαN

N

,

provided that this partial derivative exists.

Theorem 2.7.2 (Taylor’s Theorem). Suppose that f : Ω→ R has continu-ous partial derivatives up to order m throughout Ω. Let x, y ∈ Ω such that[x, y] ⊂ Ω. Then there exists a number θ ∈ (0, 1) such that

f(y) =∑

|α|≤m−1

1

α!

∂|α|f

∂xα(x)(y − x)α +

∑|α|=m

1

α!

∂mf

∂xα((1− θ)x+ θy)(y − x)α.

Moreover,

f(y) =∑|α|≤m

1

α!

∂|α|f

∂xα(x)(y − x)α + o(‖y − x‖m)

as y → x.

Proof. This follows from Taylor’s theorem in one variable, applied to thefunction t 7→ f((1− t)x+ ty). See Exercise 10.1 for the details.

Recall that given a real (N × N)-matrix A, we have a quadratic formRN → R, x 7→ 〈x,Ax〉. We say that A is

• positive definite if 〈x,Ax〉 > 0 for all x ∈ RN\0,

• negative definite if 〈x,Ax〉 < 0 for all x ∈ RN\0, and

• indefinite if there exist two points x−, x+ ∈ RN with 〈x−, Ax−〉 < 0and 〈x+, Ax+〉 > 0.

For a symmetric matrix (i.e., a matrix with AT = A), positive definitemeans that all eigenvalues are positive, negative definite means that alleigenvalues are negative, and indefinite means that there are positive andnegative eigenvalues.


Corollary 2.7.1 (Second Derivative Test). Let f : Ω → R be a functionwith continuous partial derivatives up to second order. Let x0 ∈ Ω be acritical point of f .

(i) If Hf(x0) is positive definite, then x0 is a local minimum point of f .

(ii) If Hf(x0) is negative definite, then x0 is a local maximum point of f .

(iii) If Hf(x0) is indefinite, then x0 is a saddle point of f .

Proof. By Taylor’s theorem, we can write

f(x) = f(x0) +1

2〈x− x0, Hf(x0)(x− x0)〉+R(x)

for a function R : Ω→ R with

limx→x0

R(x)

‖x− x0‖2= 0.

(i) If Hf(x0) is positive definite, then all of its eigenvalues are positive.Let λ0 be the smallest eigenvalue. Then we have

〈x− x0, Hf(x0)(x− x0)〉 ≥ λ0‖x− x0‖2

for all x ∈ RN . So

f(x) ≥ f(x0) + ‖x− x0‖2(λ0

2+

R(x)

‖x− x0‖2

).

If ‖x− x0‖ is sufficiently small, then it follows that f(x) ≥ f(x0).(ii) This is proved analogously.(iii) If Hf(x0) is indefinite, then it has a positive eigenvalue λ+ and

a negative eigenvalue λ−. Let u+ and u−, respectively, be correspondingeigenvectors of unit length. Then we have

f(x0 + tu+) = f(x0) + t2(λ+

2+R(tu+)

t2

)for any t ∈ R such that x0 + tu+ ∈ Ω. It follows that f(x0 + tu+) ≥ f(x0)whenever |t| is sufficiently small. Similarly, using u− and λ−, we see thatf(x0 + tu−) ≤ f(x0) whenever |t| is sufficiently small.

2.8 The Implicit Function Theorem

We now consider functions in two variables. That is, from now on, we haveN = 2 and we consider an open set Ω ⊂ R2. This is merely to avoid technicalcomplications, and the results do have counterparts in higher dimensions. In

2.8. THE IMPLICIT FUNCTION THEOREM 55

two dimensions, its convenient to denote the coordinates by (x, y)T ratherthan (x1, x2)T .

This section is about the observation that for a reasonably regular func-tion f : Ω→ R, the solutions of the equation

f(x) = 0

typically form a curve in Ω. For example, if f(x) = x21 +x2

2− r2 with r ∈ R,then we have a circle of radius |r|, except for r = 0, where we have a singlepoint that solves the equation.

We may want to use an equation like this in order to define a specificcurve in R2. But if we want to be certain that we actually obtain a curve, wehave to worry about degenerate cases like the case r = 0 above. Fortunately,we can give conditions that guarantee not only that we have a curve, buteven that we locally have the graph of a function. This function is implicitlydefined through the equation.

Theorem 2.8.1 (Implicit Function Theorem). Suppose that the functionf : Ω → R is continuously differentiable in Ω. Let (x0, y0)T ∈ Ω be a pointwith f(x0, y0) = 0 and ∂f

∂y (x0, y0) 6= 0. Then there exist two numbers r, s > 0such that (x0 − r, x0 + r) × (y0 − s, y0 + s) ⊂ Ω and there exists a uniquefunction g : (x0 − r, x0 + r) → (y0 − s, y0 + s) with f(x, g(x)) = 0 for allx ∈ (x0− r, x0 + r). Furthermore, this function g is differentiable at x0 with

g′(x0) = −∂f∂x (x0, y0)∂f∂y (x0, y0)

.

Remark 2.8.1. The uniqueness of g implies that g(x0) = y0. Once we havedetermined the function g, we can apply the theorem for any point (x, g(x))instead of (x0, y0) and it follows that g is differentiable with

g′(x) = −∂f∂x (x, g(x))∂f∂y (x, g(x))

for every x ∈ (−r, r) such that the denominator does not vanish (which isthe case at least sufficiently close to x0 by the continuity). The right-handside is continuous, so g is continuously differentiable.

Proof. Without loss of generality we may assume that x0 = 0 and y0 = 0,because otherwise we may apply a translation in R2. Furthermore, we mayassume without loss of generality that ∂f

∂y (0, 0) > 0; otherwise we may replacef by −f .


Step 1: construct g. Define α = 12∂f∂y (0, 0). By the continuity of the

partial derivatives, there exist r, s > 0 such that [−r, r] × [−s, s] ⊂ Ω and∂f∂y > α on (−r, r) × (−s, s). We may further suppose that r is chosen sosmall that |f(x, 0)| < αs for all x ∈ (−r, r) because of the continuity off( · , 0) and the fact that f(0, 0) = 0.

Fix x ∈ (−r, r) and consider the function f(x, ·), which is continuous on[−s, s] and continuously differentiable in (−s, s) with derivative ∂f

∂y (x, ·) >α. Hence the function is strictly increasing in [−s, s]. Moreover, we have

f(x,−s) = f(x, 0)−ˆ 0

−s

∂f

∂y(x, σ) dσ < 0

and

f(x, s) = f(x, 0) +

ˆ s

0

∂f

∂y(x, σ) dσ > 0.

We conclude that there exists a unique number y ∈ (−s, s) such thatf(x, y) = 0. We define g(x) = y.

Step 2: g is continuous at 0. We claim that there exists a constant Ksuch that

|g(x)| ≤ K|x| (2.7)

for all x ∈ (−r, r). In order to prove this, fix x and note that

0 = f(x, g(x)) = f(x, g(x))− f(0, g(x)) + f(0, g(x))− f(0, 0).

By the mean value theorem, there exist ρ, σ ∈ (0, 1) with

f(x, g(x))− f(0, g(x)) = x∂f

∂x(ρx, g(x))

and

f(0, g(x))− f(0, 0) = g(x)∂f

∂y(0, σg(x)).

So

0 = x∂f

∂x(ρx, g(x)) + g(x)

∂f

∂y(0, σg(x))

Recall that ∂f∂y > α in (−r, r)× (−s, s). Furthermore, since ∂f

∂x is continuouson [−r, r]× [−s, s], the Weierstrass extreme value theorem implies that thereexist a number C ≥ 0 such that∣∣∣∣∂f∂x (x, y)

∣∣∣∣ ≤ Cfor all x ∈ [−r, r] and all y ∈ [−s, s]. Hence

|g(x)| =

∣∣∣∣∣ ∂f∂x (ρx, g(x))∂f∂y (0, σg(x))

∣∣∣∣∣ |x| ≤ C

α|x|.

Choosing K = C/α, we obtain (2.7).

2.9. THE LAGRANGE MULTIPLIER RULE 57

Step 3: g is differentiable at 0. Next we want to show that g is differ-entiable at 0 with the derivative given in the statement of the theorem. Tothis end, note that by the differentiability of f , we have

f(x, g(x)) = f(0, 0) +Df(0, 0)(x, g(x))T +R(x)

for a function R : (−r, r)→ R with

limx→∞

R(x)√x2 + (g(x))2

= 0.

But f(0, 0) = f(x, g(x)) = 0. So we have

0 = x∂f

∂x(0, 0) + g(x)

∂f

∂y(0, 0) +R(x).

Thus we obtaing(x)

x= −

∂f∂x (0, 0)∂f∂y (0, 0)

− R(x)

x∂f∂y (0, 0).

Moreover, we have∣∣∣∣∣ R(x)

x∂f∂y (0, 0)

∣∣∣∣∣ ≤ |R(x)|√

1 + (g(x))2

x2

|x|√

1 + (g(x))2

x2

∣∣∣∂f∂y (0, 0)∣∣∣ ≤

|R(x)|√x2 + (g(x))2

√1 +K2

∂f∂y (0, 0)

→ 0

as x→ 0 by (2.7). Hence

g′(0) = limx→0

g(x)

x= −

∂f∂x (0, 0)∂f∂y (0, 0)

.

This completes the proof.

2.9 The Lagrange Multiplier Rule

This is another statement about maxima or minima of a given function, butnow we look at the problem of finding extrema relative to a side condition.

Theorem 2.9.1 (Lagrange Multiplier Rule). Let f : Ω→ R and g : Ω→ Rbe continuously differentiable functions and let

S =

(x, y)T ∈ Ω: g(x, y) = 0.

Suppose that (x0, y0)T ∈ S is a point such that

f(x0, y0) ≤ f(x, y)

for all (x, y)T ∈ S. If ∇g(x0, y0) 6= 0, then there exists a number λ ∈ R suchthat

∇f(x0, y0) = λ∇g(x0, y0).


Proof. Since ∇g(x0, y0) 6= 0, we have either

∂g

∂x(x0, y0) 6= 0 or

∂g

∂y(x0, y0) 6= 0.

If ∂g∂y (x0, y0) 6= 0, then we apply the implicit function theorem to find r, s > 0with [x0 − r, x0 + r]× [y0 − s, y0 + s] ⊂ Ω and a continuously differentiablefunction φ : (x0 − r, x0 + r) → (y0 − s, y0 + s) such that g(x, φ(x)) = 0 forall x ∈ (x0 − r, x0 + r). Moreover, we have

φ′(x0) = −∂g∂x(x0, y0)∂g∂y (x0, y0)

.

Now consider the function h : (x0−r, x0+r)→ R with h(x) = f(x, φ(x)).It has a minimum at x0, and so by the chain rule,

0 = h′(x0) =∂f

∂x(x0, y0) +

∂f

∂y(x0, y0)φ′(x0)

=∂f

∂x(x0, y0)− ∂f

∂y(x0, y0)

∂g∂x(x0, y0)∂g∂y (x0, y0)

.

Hence∂f∂x (x0, y0)∂g∂x(x0, y0)

=

∂f∂y (x0, y0)

∂g∂y (x0, y0)

=: λ.

Then we have

∂f

∂x(x0, y0) = λ

∂g

∂x(x0, y0) and

∂f

∂y(x0, y0) = λ

∂g

∂y(x0, y0),

which is another way of writing ∇f(x0, y0) = λ∇g(x0, y0).If ∂g

∂x(x0, y0) 6= 0, then we use the same arguments with the roles of thecoordinates x and y exchanged.

Remark 2.9.1. Theorem 2.9.1 is about a minimum point of f relative toS. If we apply the result to −f instead, then we obtain the correspondingstatement for maximum points relative to S as well.

Recall that we have a geometric interpretation of ∇f as a vector inthe direction of steepest ascent. Furthermore, we know that ∇g(x0, y0) isperpendicular to the level set S at the point (x0, y0)T . So the theorem saysthat at a minimum point relative to S, the direction of steepest ascent of fwill be perpendicular to S (cf. Fig. 2.9.1).

Example 2.9.1. Consider the function f : R2 → R with f(x, y) = x2 + y2

for (x, y)T ∈ R2. Find a minimum of f relative to the line(x, y)T ∈ R2 : 2x+ y = 5

.

2.9. THE LAGRANGE MULTIPLIER RULE 59

Figure 2.9.1: The gradient ∇f(x0) is perpendicular to the level set of g.

We use the function g : R2 → R with g(x, y) = 2x+ y − 5. We compute

∇f(x, y) =

(2x2y

)and ∇g(x, y) =

(21

).

Hence we look for three numbers x0, y0, λ ∈ R with

2x0 = 2λ,

2y0 = λ,

2x0 + y0 − 5 = 0.

This system is easy to solve: we have λ = 2, x0 = 2, and y0 = 1. So we haveonly one candidate for a minimum point: (2, 1)T .

The question is now whether there exists a minimum relative to the lineat all. In order to give an answer, we observe that

f(2, 1) = 5 < 9 = inff(x, y) ∈ R2 :

√x2 + y2 > 3

.

Hence everything outside of the set

(x, y)T ∈ R2 :√x2 + y2 ≤ 3

is irrele-

vant for the minimisation problem. So we can define

A =

(x, y)T ∈ R2 : g(x, y) = 0 and√x2 + y2 ≤ 3

and minimise in A instead. This is a closed and bounded set, and a minimiserexists by the Weierstrass extreme value theorem (Theorem 2.4.3). This pointmust be (2, 1)T by the previous observations.

Index

n-th interval, 5

Bolzano-Weierstrass theorem, 36bounded, 36

Cauchy-Schwarz inequality, 34chain rule, 48closed, 37compact, 38Continuity Theorem, 23continuity theorem, 23continuous, 38continuous at a point, 38continuously differentiable, 47convergence, 35, 38critical point, 50

directional derivative, 45

equivalent, 41Euclidean inner product, 33Euclidean norm, 33

first fundamental theorem of calcu-lus, 23

Frechet derivative, 43Frechet differentiable, 43fundamental theorem of calculus, 23

gradient, 45

Hessian, 51

implicit function theorem, 55improper integral, 30integral test, 31integration by parts, 26integration by substitution, 26

Jacobi matrix, 45

Lagrange multiplier rule, 57length, 7local maximum point, 50local minimum point, 50lower Riemann integral, 7lower Riemann sum, 7

mean value inequality, 49mesh, 7multi-index notation, 53

norm, 40

open, 37open ball, 37operator norm, 42oscillation, 9

partial derivative, 45partition, 5primitive, 25

refinement, 9Riemann integrable, 7Riemann integral, 7Riemann sum, 15

saddle point, 50second derivative test, 54second fundamental theorem of

calculus, 23stationary point, 50subdivision, 5symmetry of the Hessian, 51

tagged subdivision, 15

61

62 INDEX

Taylor’s theorem, 53triangle inequality, 34

uniformly continuous, 38upper Riemann integral, 7upper Riemann sum, 7

Weierstrass extreme value theorem,39

MA20218 Lecture Notes

Documents

Transcript of MA20218 Lecture Notes