Dinakar Ramakrishnan March 26, 2007 - Mathematicsdinakar/Vector Calculus Notes.pdf · Chapter 1...

Notes on Vector Calculus

Dinakar Ramakrishnan

March 26, 2007

Chapter 1

Subsets of Euclidean space, vectorfields, and continuity

Introduction

The aims of this course are the following:

(i) Extend the main results of one-variable Calculus to higher dimensions

(ii) Explore new phenomena which are non-existent in the one-dimensional case

Regarding the first aim, a basic step will be to define notions of continuity and differ-entiability in higher dimensions. These are not as intuitive as in the one-dimensional case.For example, given a function f : R → R, we can say that f is continuous if one can drawits graph Γf without lifting the pen (resp. chalk) off the paper (resp. blackboard). For anyn ≥ 1, we can still define the graph of a function (here called a scalar field) f : Rn → R tobe

Γ(f) := {(x, y) ∈ Rn × R | y = f(x)},where x denotes the vector (x1, . . . , xn) in Rn. Since Rn × R is just Rn+1, we can think ofΓ(f) as a subset of the (n + 1)-dimensional space. But the graph will be n-dimensional,which is hard (for non-constant f)to form a picture of, except possibly for n = 2; even thenit cannot be drawn on a plane like a blackboard or a sheet of paper. So one needs to definebasic notions such as continuity by a more formal method. It will be beneficial to think ofa lot of examples in dimension 2, where one has some chance of forming a mental picture.

1

Integration is also subtle. One is able to integrate nice functions f on closed rectangularboxes R, which naturally generalize the closed interval [a, b] in R, and when there is sphericalsymmetry, also over closed balls in Rn. Here f being nice means that f is bounded on Rand continuous outside a negligible set. But it is problematic to define integrals of evencontinuous functions over arbitrary subsets Y of Rn, even when they are bounded, i.e., canbe enclosed in a rectangular box. However, when Y is compact, i.e., closed and bounded,one can integrate continuous functions f over it, at least when f vanishes on the boundaryof Y .

The second aim is more subtle than the first. Already in the plane, one is interested inline integrals, i.e., integrals over (nice) curves C, of vector fields, i.e., vectors of scalar fields,and we will be interested in knowing when the integrals depend only on the beginning and endpoints of the curve. This leads to the notion of conservative fields, which is very importantalso for various other subjects like Physics and Electrical Engineering. Somehow the pointhere is to not blindly compute such integrals, but to exploit the (beautiful) geometry of thesituation.

This chapter is concerned with defining the basic structures which will be brought tobear in the succeeding chapters. We start by reviewing the real numbers.

1.1 Construction and properties of real numbers

This section intends to give some brief background for some of the basic facts about realnumbers which we will use in this chapter and in some later ones.

Denote by Z the set of integers {0,±1,±2, . . .} and by Q the set of rational numbers{a

b|a, b ∈ Z, b 6= 0}. As usual we identify a

bwith ma

mbfor any integer m 6= 0. To be precise,

the rational numbers are equivalence classes of ordered pairs (a, b) of integers, b 6= 0, with(a, b) being considered equivalent to (c, d) iff ad = bc. The rational numbers can be added,subtracted and multiplied, just like the integers, but in addition we can also divide anyrational number x by another (non-zero) y. The properties make Q into a field (as one saysin Algebra), not to be confused (at all) with a vector field which we will introduce below. Onedefines the absolute value of any rational number x as sgn(x)x, where sgn(x) denotes the signof x. Then it is easy to see that |xy| = |x||y| and |x+y| ≤ |x|+|y| (triangle inequality). Thereare other absolute values one can define on Q, and they satisfy a stronger inequality thanthe triangle inequality. For this reason, some call the one above the archimedean absolutevalue.

The real numbers are not so easy to comprehend, not withstanding the fact that we have

2

been using them at will and with ease, and their construction is quite subtle. It was a cleverploy on the part of mathematicians to call them real numbers as it makes people feel thatthey are real and so easy to understand. Some irrational numbers do come up in geometry,like the ubiquitous π, which is the area of a unit circle, and the quadratic irrational

√2,

which is the length of the hypotenuse of a right triangle with two unit sides. However,many irrational real numbers have no meaning except as limits of nice sequences of rationalnumbers or as points on a continuum.

What constitutes a nice sequence? The basic criterion has to be that the sequence lookslike it has a chance of converging, though we should be able to see that without knowinganything about the limit. The precise condition below was introduced by the nineteenthcentury French mathematician Cauchy, whence the term Cauchy sequence.

Definition. A sequence {x1, x2, . . . , xn, . . .} of rational numbers is Cauchy iff we can find,for every ε > 0 in Q, a positive integer N such that |xn − xm| < ε, for all n,m ≥ N .

Simply put, a sequence is Cauchy iff the terms eventually bunch up around each other.This behavior is clearly necessary to have a limit.

Possibly the simplest (non-constant) Cauchy sequence of rational numbers is { 1n|n ≥ 1}.

This sequence has a limit in Q, namely 0. (Check it!) An example of a sequence which isnot Cauchy is given by {xn}, with xn =

∑nk=1

1k. It is not hard to see that this sequence

diverges; some would say that the limit is +∞, which lies outside Q.

There are many sequences of rational numbers which are Cauchy but do not have a limitin Q. Two such examples {xn|n ≥ 1} are given by the following: (i) xn =

∑nk=1

1k2 ,

and (ii) xn =∑n

k=11k!

. The first sequence converges to π2/6 and the second to e, both ofwhich are not rational (first proved by a mathematician named Lambert around 1766). Infact, these numbers are even transcendental, which means they are not roots of polynomialsf(X) = a0 + a1X + · · · + anXn with a0, a1, . . . , an ∈ Q. Numbers like

√2 and i =

√−1are algebraic numbers, though irrational, and are not transcendental. Of course i does not a“real number”.

Take your favorite decimal expansion such as the one of π = 3.1415926..... This can beviewed as a Cauchy sequence {3, 31/10, 314/100, 3141/1000, ...}; you may check that such asequence is Cauchy no matter what the digits are! Again, for ”most” expansions you cancome up with, the limit is not in Q. Recall that a decimal expansion represents (convergesto) a rational number if and only if it is periodic after some point.

Essentially, the construction of R consists of formally adjoining to Q the missing limits ofCauchy sequences of rational numbers. How can we do this in a logically satisfactory manner?Since every Cauchy sequence should give a real number and every real number should arise

3

from a Cauchy sequence, we start with the set X of all Cauchy sequences of rational numbers.On this set we can even define addition and multiplication by {x1, x2, ..} + {y1, y2, ...} :={x1 + y1, x2 + y2, ..} etc. and also division if the denominator sequence stays away from zero(there is something to be checked here, namely that for example {xn + yn} is again Cauchybut this is not so hard). It seems that X with these operations is already something likethe real numbers but there is a problem: Two sequences might have the same limit, thinkof the zero sequence {0, 0, 0..} and the reciprocal sequence {1/n|n ≥ 1}, and such sequencesshould give the same real number, which is 0 in this case. So the last step is to introduceon X an equivalence relation: Declare x = {xn}, y = {yn} in X to be equivalent if for anyε > 0 in Q, there exists N > 0 such that |xn − yn| < ε for all n > N (Check that this isindeed an equivalence relation). The equivalence classes of sequences in X are declared toconstitute the set R of all “real numbers”. The rational numbers naturally form a subset ofR by viewing q ∈ Q as the class of the constant sequence {q, q, . . . , q, . . . }; note that in thisclass we have many other sequences such as {q +1/n}, {q +2−n}, {q +1/n!}). Besides Q weobtain a lot of new numbers to play with. The real number represented by the sequence (ii)above, for example, is called e. When we say we have an x ∈ R we think of some Cauchysequence {x1, x2, ..} representing x, for example its decimal expansion. But note that ourdefinition of R is in no way tied to the base ten, only to the rational numbers. Now one hasto check that the addition, multiplication and division defined above pass to the equivalenceclasses, i.e. to R. This is a doable exercise. One can also introduce an order on X whichpasses to equivalences classes: {xn} < {yn} if there is some N so that xn < yn for all n > N .So the notion of positivity for rational numbers carries over to the reals. One has the triangleinequality

|x + y| ≤ |x|+ |y|, forall x, y ∈ R.

where |x| = x, resp. −x if x ≥ 0, resp. x < 0.

Here are the key facts about the real numbers on which much of Calculus is based.

Theorem 1 (a) (Completeness of R) Every Cauchy sequence of real numbers has a limitin R.

(b) (Density of the rationals) Every real number is the limit of a Cauchy sequence ofrational numbers.

More precisely, part (a) says that given {xn} ⊂ R so that ∀ ε > 0, ∃ N such that∀ n, m > N, |xn − xm| < ε, then there exists x ∈ R so that ∀ ε > 0, ∃ N such that∀ n > N, |x − xn| < ε. In other words, if we repeat the completion process by which weobtained R from Q (i.e. start with the set of Cauchy sequences of real numbers and pass toequivalence classes) we end up with the real numbers again.

4

Part (b) says that given any real number x and an ε > 0, we can find infinitely manyrational numbers in the interval (x − ε, x + ε). In particular, we have the following veryuseful fact:

Given any pair x, y of real numbers, say with x < y, we can find a rationalnumber z such that x < z < y, regardless of how small x− y is.

Proof of Theorem 1. Part (b) holds by construction, and it suffices to prove part (a).If {xn} ⊂ R is a Cauchy sequence we represent each xn as a Cauchy sequence of rationalnumbers xn,i, say. We can find f(n) ∈ N so that |xn,i − xn,j| < 2−n for i, j ≥ f(n) becausexn,i is Cauchy for fixed n. Then the sequence xn,f(n) ∈ Q is Cauchy. Indeed, given ε maken,m large enough so that 2−n < ε/3, 2−m < ε/3 and |xn − xm| < ε/3. Now unravel themeaning of this last statement. It means that for k large enough we have |xn,k−xm,k| < ε/3.Make k also bigger than f(n) and f(m). Then we have

|xn,f(n) − xm,f(m)| < |xn,f(n) − xn,k|+ |xn,k − xm,k|+ |xm,k − xm,f(m)| < 2−n + ε/3 + 2−m < ε

for large enough n,m. The real number x represented by the Cauchy sequence xn,f(n) isthe limit of the sequence xn. To see this, given ε take n large so that 2−n < ε/2 and|xk,f(k) − xn,f(n)| < ε/2 for k ≥ n. If we also have k > f(n) then

|xk,f(k) − xn,k| ≤ |xk,f(k) − xn,f(n)|+ |xn,f(n) − xn,k| < ε.

By definition this means that |x− xn| < ε.

Done.

The completeness of R has a number of different but equivalent formulations. Here isone of them. Call a subset A of real numbers bounded from above if there is some y ∈ Rsuch that x ≤ y for every x in A. Such a y is called an upper bound for A. We are interestedin knowing if there is a least upper bound. For example, when A is the interval (a, 5) anda < 5, A has a least upper bound, namely the number 5. On the other hand, if a ≥ 5, theset A is empty, and it has no least upper bound. The least upper bound is also called thesupremum, when it exists, and denoted lub or sup. It is easy to see that the least upperbound has to be unique if it exists.

Theorem 2 Let A be a non-empty subset of R which is bounded from above. Then A hasa least upper bound.

Proof. As we have already mentioned, this property is actually equivalent to the completenessof R. Let’s first deduce it from completeness. Since A is bounded from above, there exists,

5

by definition, some upper bound b1 ∈ R. Pick any a1 ∈ A, which exists because A is non-empty. Consider the midpoint z1 = (a1 + b1)/2 between a1 and b1. If z1 is an upper boundof A, put b2 = z1 and a2 = a1. Otherwise, there is an a2 ∈ A so that a2 > z1 and we putb2 = b1. In both cases we have achieved the inequality |b2 − a2| ≤ 1

2|b1 − a1|. Next consider

the mid-point z2 between a2 and b2 and define a3, b3 by the same procedure. Continuingthus “ad infinitum” (to coin a favorite phrase of Fermat), we arrive at two sequences of realnumbers, namely {an|n ≥ 1} and {bn|n ≥ 1}. If we put c = |x1 − y1|, it is easy to see thatby construction,

|an − bn| ≤ c

2n−1,

and that|an − an+1| ≤ |an − bn| ≤ c

2n−1

and that the same holds for |bn − bn+1|. Consequently, both of these sequences are Cauchyand have the same limit z ∈ R, say. Now we claim that z is the lub of A. Indeed, since z isthe limit of the upper bounds bn, it must also be an upper bound. On the other hand, sinceit is also the limit of the numbers an lying in A, any smaller number than z cannot be anupper bound of A. Done.

We only sketch of the proof of the converse. Given a Cauchy sequence {xn} in R considerthe set

A = {y ∈ R| the set {n|xn ≤ y} is finite, possibly empty}.Then A is nonempty because there is N so that |xn − xm| < 1 for n,m ≥ N . If y = xN − 2then {n|xn ≤ y} ⊆ {1, 2, ..., N} is finite, so y ∈ A. One can then show that a least upperbound of A, which will in fact be unique, is also a limit of the sequence {xn}.QED

Finally a word about the geometric representation of real numbers. Real numbersx can be represented in a one-to-one fashion by the points P = P (x) on a line, called thereal line such that the following hold: (i) if x < y, P (y) is situated strictly to the right ofP (x); and (ii) |x− y| is the distance between P (x) and P (y). In particular, for any pair ofreal numbers x, y, the mid-point between P (x) and P (y), which one can find by a ruler andcompass, corresponds to a unique real number z such that |x− y| = 2|x− z| = 2|y − z|. Itis customary to identify the numbers with the corresponding points, and simply write x todenote both. Note that the notions of line and distance here are classical; in modern, set-theory-based mathematics one simply defines a line as some set of points that is in one-to-onecorrespondence with the real numbers.

6

1.2 The norm in Rn

Consider the n-dimensional Euclidean space

Rn = {x = (x1, x2, · · · , xn) |xi ∈ R, ∀i},

equipped with the inner product (also called the scalar product)

〈x, y〉 =n∑

j=1

xjyj,

for all x = (x1, . . . , xn) and y = (y1, . . . , yn) in Rn, and the norm (or length) given by

||x|| = 〈x, x〉1/2 ≥ 0.

Note that 〈x, y〉 is linear in each variable, and that ||x|| = 0 iff x = 0.

Basic Properties:

(i) ||cx|| = |c| ||x||, for all c ∈ R and x ∈ Rn.

(ii) (triangle inequality) ||x + y|| ≤ ||x||+ ||y||, for all x, y ∈ Rn.

(iii) (Cauchy-Schwarz inequality) |〈x, y〉| ≤ ||x||||y||, for all x, y in Rn.

Proof. Part (i) follows from the definition. We claim that part (ii) follows from part (iii).Indeed, by the bilinearity and symmetry of 〈., .〉,

〈x + y, x + y〉 = 〈x, x〉+ 〈y, y〉+ 2〈x, y〉.

By the Cauchy-Schwartz inequality, |〈x, y〉| ≤ ||x||||y||, whence the claim.

It remains to prove part (iii). Since the assertion is trivial if x or y is zero, we may assumethat x 6= 0 and y 6= 0. If w = αx + βy, with α, β ∈ R, we have

(∗) 0 ≤ 〈w, w〉 = α2〈x, x〉+ 2αβ〈x, y〉+ β2〈y, y〉.

Since this holds for all α, β, we are free to choose them. Put

α = 〈y, y〉, β = −〈x, y〉.

7

Dividing (∗) by α, we obtain the inequality

0 ≤ 〈x, x〉〈y, y〉 − 〈x, y〉2.Done.

The scalar product 〈., .〉 and norm ||.|| can both be defined on Cn as well. For this we set

〈x, y〉 =n∑

j=1

xjyj,

where z denotes, for any z ∈ C, the complex conjugate of z. Here 〈x, y〉 is linear in the firstvariable, but conjugate linear in the second, i.e., 〈αx, y〉 = α〈x, y〉, while 〈x, αy〉 = α〈x, y〉,for any complex scalar α. In French such a product will be said to be sesquilinear (meaning“one and a half” linear). In any case, note that 〈x, x〉 is a non-negative real number, whichis positive iff x 6= 0. So again it makes good sense to say that the norm (or length) of anyx ∈ Cn to be ||x|| =

√〈x, x〉. It is a routine exercise to verify that the basic properties (i),

(ii), (iii) above continue to hold in this case.

We may define a sequence of vectors v1, v2, . . . , vm, . . . in Rn a Cauchy sequence iff forevery ε > 0, we can find an N > 0 such that for all m, r > N , ||vm − vr|| < ε. In otherwords, the vectors in the sequence eventually become bunched up together, as tightly as onerequires.

Theorem 3 Rn is complete with respect to the norm || ||.

Idea of Proof. The inequality

|xi| =√

x2i ≤

√x2

1 + · · ·+ x2n = ||x||

shows that the components of any || ||-Cauchy sequence in Rn are ordinary Cauchy sequencesin R. Hence we are reduced to part (a) of Theorem 1.

1.3 Basic Open Sets in Rn

There are (at least) two types of basic “open” sets in Rn, one round, using open balls, andthe other flat, using rectangular boxes.

8

For each a ∈ Rn and r > 0 set

Ba(r) = {x ∈ Rn | ||x− a|| < r},and call it the open ball of radius r and center a; and

Ba(r) = {x ∈ Rn | ||x− a|| ≤ r},the closed ball of radius r and center a. One also defines the sphere

Sa(r) = {x ∈ Rn| ||x− a|| = r}.

A closed rectangular box in Rn is of the form

[a, b] = [a1, b1]× [a2, b2]× · · · × [an, bn] = {x ∈ Rn | xi ∈ [ai, bi], ∀i},for some a = (a1, · · · , an) ∈ Rn and b = (b1, · · · , bn) ∈ Rn.

An open rectangular box is of the form

(a, b) = (a1, b1)× (a2, b2)× · · · × (an, bn) = {x ∈ Rn | ∀i xi ∈ (ai, bi)}.

Definition. A basic open set in R2n is (either) an open ball or an open rectangular box.

You can use either or both. One gets the same answers for our problems below. Inthis vein, observe that every open ball contains an open rectangular box, and conversely.

Remark. It is important to note that given any pair of basic open sets V1, V2, we can finda nonempty basic open set W contained in their intersection V1 ∩ V2 if this intersection isnon-empty.

1.4 Open and closed sets

Given any subset X of Rn, let us denote by Xc the complement Rn −X in Rn. Clearly, thecomplement of the empty set ∅ is all of Rn.

Let A be a subset of Rn and let y be a point in Rn. Then there are three possibilities fory relative to A.

(IP) There exists a basic open set U containing y such that U ⊆ A.

9

(EP) There exists a basic open set U centered at y which lies completely in the complementof A. i.e., in Rn − A.

(BP) Every basic open set centered at y meets both A and Ac.

In case (IP), y is called an interior point of A. In case (EP), y is called an exteriorpoint of A. In case (BP), y is called a boundary point of A. Note that in case (IP)y ∈ A, in case (EP) y 6∈ A, and in case (BP) y may or may not belong to A.)

Definition. A set A in Rn is open if and only if every point of A is an interior point.

Explicitly, this says: “Given any z ∈ A, we can find a basic open set U containing z suchthat U ⊆ A.”

Definition. A ⊆ Rn is closed if its complement is open.

Lemma 1 A subset A of Rn is closed iff it contains all of its boundary points.

Proof. Let y be a boundary point of A. Suppose y is not in A. Then it belongs to Ac,which is open. So, by the definition of an open set, we can find a basic open set U containingy with U ⊆ Ac. Such a U does not meet A, contradicting the condition (BP). So A mustcontain y.

Conversely, suppose A contains all of its boundary points, and consider any z in Ac. Thenz has to be an interior point or a boundary point of Ac. But the latter possibility does notarise as then z would also be a boundary point of A and hence belong to A (by hypothesis).So z is an interior point of Ac. Consequently, Ac is open, as was to be shown.

Examples. (1) Basic open sets are open: Indeed, let y belong to the open ball Ba(r) ={x | ||x − a|| < r}. Then, since || y − a || < r, the number r′ = 1

2(r − || y − a ||) is positive,

and the open ball By(r′) around y is completely contained in Bx0(r). The case of open

rectangular boxes is left as an easy exercise.

(2) The empty set φ and Rn are both open and closed.

Since they are complements of each other, it suffices to check that they are both open,which is clear from the definition.

(3) Let {Wα} be a (possibly infinite and perhaps uncountable) collection of open sets inRn. Then their union W = ∪αWα is also open.

10

Indeed, let y ∈ W . Then y ∈ Wα for some index α, and since Wα is open, there is anopen set V ⊆ Wα containing y. Then we are done as y ∈ V ⊆ Wα ⊆ W .

(4) Let {W1,W2, · · · , Wn} be a finite collection of open sets. Then their intersection

W =n⋂

i=1

Wi is open.

Proof. Let y ∈ W . Then y ∈ Wi, ∀i. Since each Wi is open, we can find a basic open set Vi

such that y ∈ Vi ⊆ Wi. Then, by the remark at the end of the previous section, we can finda basic open set U contained in the intersection of the Vi such that y ∈ U . Done.

Warning. The intersection of an infinite collection of open sets need not be open, alreadyfor n = 1, as shown by the following (counter)example. Put, for each k ≥ 1, Wk =

(− 1k, 1

k

).

Then ∩kWk = {0}, which is not open.

(5) Any finite set of points A = {P1, . . . , Pr} is closed.

Proof. For each j, let Uj denote the complement of Pj (in Rn). Given any z in Uj, we caneasily find a basic open set Vj containing z which avoids Pj. So Uj is open, for each j. Thecomplement of A is simply ∩r

j=1Uj, which is then open by (4).

More generally, one can show, by essentially the same argument, that a finite union ofclosed sets is again closed.

It is important to remember that there are many sets A in Rn which are neither opennor closed. For example, look at the half-closed, half-open interval [0, 1) in R.

1.5 Compact subsets of Rn.

It is easy to check that the closed balls Ba(r) and the closed rectangular boxes [a, b] areindeed closed. But they are more than that. They are also bounded in the obvious sense.This leads to the notion of “compactness”.

Definition. An open covering of a set A in Rn is a collection U = {Vα} of open sets inRn such that

A ⊆ ∪αVα.

In other words, each Vα in the collection is open and any point of A belongs to some Vα.Note that U may be infinite, possibly uncountable.

Clearly, any subset A of Rn admits an open covering. Indeed, we can just take U to bethe singleton {Rn}.

11

A subcovering of an opencovering U = {Vα} of a set A is a subcollection U ′ of U suchthat any point of A belongs to some set in U ′.

Definition. A set A in Rn is compact if and only if any open covering U = {Vα} of Acontains a finite subcovering.

Example of a set which is not compact: Let A = (0, 1) be the open interval in R. Look atU = {W1, W2, · · · } where Wm = ( 1

m, 1− 1

m), for each m. We claim that U is an open covering

of A. Indeed, each Wm is clearly open and moreover, given any number x ∈ A, then x ∈ Wm

for some m. But no finite subcollection can cover A, which can be seen as follows. SupposeA is covered by Wm1 , . . . , Wmr for some r, with m1 < m2 < . . . < mr. Then Wmj

⊂ Wmr foreach j, while the point 1/(mr + 1) belongs to A, but not to Wmr , leading to a contradiction.Hence A is not compact.

Example of a set which is compact: Let A be a finite set ⊆ Rn. Then A is compact.Prove it!

Theorem 4 (Heine–Borel) The closed interval [a, b] in R is compact for any pair of realnumbers a, b with a < b.

Proof. Let U = {Vα} be an open covering of [a, b]. Let us call a subset of [a, b] good ifit can be covered by a finite number of Vα. Put

J = {x ∈ [a, b]| [a, x] is good}.Clearly, a belongs to J , so J is non-empty, and by Theorem 1 of section 2.1, J has a leastupper bound; denote it by z. Since b is an upper bound of J , z ≤ b. We claim that z liesin J . Indeed, pick any open set Vβ in U which contains z. (This is possible because U isan open covering of [a, b] and z lies in this interval.) Then Vβ will contain points to the leftof z; call one of them y. Then, by the definition of lub, y must lie in J and consequently,[a, y] is covered by a finite subcollection {Vα1 , . . . , Vαr} of U . Then [a, z] is covered by U ′ ={Vβ, Vα1 , . . . , Vαr}, which is still finite; so z ∈ J . In fact, z has to be b. Otherwise, the openset Vβ will also contain points to the right of z, and if we call one of them t, say, [a, t] willbe covered by U ′, implying that t lies in J , contradicting the fact that z is an upper boundof J . Thus b lies in J , and the theorem is proved.

Call a subset A in Rn bounded if we can enclose it in a closed rectangular box.

Theorem 5 Let A be a subset of Rn which is closed and bounded. Then A is compact.

12

Corollary 1 Closed balls and spheres in Rn are compact.

Remark. It can be shown that the converse of this theorem is also true, i.e., any compactset in Rn is closed and bounded.

Proof of Theorem 5. The first step is to show that any closed rectangular boxR = [a, b] in Rn is compact: When n = 1, this is just the Heine-Borel theorem. So letn > 1 and assume by induction that the assertion holds in dimension < n. Now we can write

R = [a1, b1]×R′, with R′ = [a2, b2]× . . .× [an, bn].

Let U = {Wα} be an open covering of R. Then, for each y = t × y′ in R with t in [a1, b1]and y′ = (y2, . . . , yn) in R′, there is a open set Wα(y) in U containing this point. By theopenness of Wα(y), we can then find an interval (c1, d1) ⊂ R and an open rectangular box(c′, d′) ⊂ Rn−1 such that t × y ∈ (c1, d1) × (c′, d′) ⊂ Wα(y). Then the collection of the sets(c′, d′) for y′ ∈ R′ and t fixed covers R′. Since R′ is compact by the induction hypothesis,we can find a finite set, call it V ′, of the (c′, d′) whose union covers R′. Let I(t) denote theintersection of the corresponding finite collection of open intervals (c1, d1), which is open(cf. the previous section) and contains t. Then the collection {I(t)} is an open covering of[a1, b1]. By Heine-Borel, we can then extract a finite subcovering, say V , of [a1, b1]. It is noweasy to see that V × V ′ is a finite subcovering of [a, b]. For any (c1, d1) ∈ V and (c′, d′) ∈ V ′we have (c1, d1)×(c′, d′) ⊆ Wα for some α so R is contained in a union of finitely many Wα’s.

The next step is to consider any closed, bounded set A in Rn. Pick any open coveringU of A. Since A is bounded, we can enclose it completely in a closed rectangular box [a, b].Since A is closed, its complement Ac is open. Thus U ∪ {Ac} is an open covering of [a, b].Then, by the first step, a finite subcollection, say {Vα1 , . . . , Vαr , A

c} covers [a, b]. Then, sinceAc ∩ A = ∅, the (finite) collection {Vα1 , . . . , Vαr} covers A. Done.

1.6 Vector fields and continuity

We are interested in functions:f : D → Rm

with D ⊆ Rn, called the domain of f . The image (or range) of f is f(D) ⊂ Rm. Such anf is called a vector field. When m = 1, one says scalar field instead, for obvious reasons.

For each j ≤ m, we write fj(x) for the jth coordinate of f(x). Then f is completelydetermined by the collection of scalar fields {fj|j = 1, . . . ,m}, called the component fields.

13

Definition. Let a ∈ D, b ∈ Rm. Then b is the limit of f(x) as x tends to a, denoted

b = limx→a

f(x),

if the following holds: For any ε > 0 there is a δ > 0 so that for all x ∈ D with ||x−a|| < δwe have ||f(x)− b|| < ε.

This is just the like the definition of limits for functions on the real line but with theabsolute value replaced by the norm. In dimension 1, the existence of a limit is equivalent tohaving a right limit and a left limit, and then having the two limits being equal. In higherdimensions, one can approach a point a from infinitely many directions, and one way tothink of it will be to start with an open neighborhood of a and then shrinking it in manydifferent ways to the point a. So the existence of a limit is more stringent a condition here.

The definition of continuity is now literally the same as in the one variable case:

Definition. Let a be in the domain D. Then f(x) is continuous at x = a if and only iflimx→a

f(x) = f(a).

Remarks:

a) A vector field f is continuous at a iff each component field fj is continuous, for j =1, . . . , m.

b) In these definitions we need not assume that a is an interior point of D. For example,a could be a boundary point of a domain D which is a closed box. In an extreme casea could also be the only point of D in some open ball. In this case continuity becomesan empty condition; every function f is continuous at such a “discrete” point a.

Examples:

(1) Let f(x, y, z) = ((x2 + y2x)z2, xy + yz), D = R3. Then f : R3 → R2 is continuous atany a in Rn.

More generally, any polynomial function is n variables is continuous everywhere. Ratio-

nal functionsP (x1, · · · , xn)

Q(x1, · · · , xn)are continuous at all x = (x1, · · · , xn) where Q(x) 6= 0.

(2) f is a polynomial function of sines, cosines and exponentials.

It is reasonable to ask at this point what all this has to do with open sets and compactsets. We answer this in the following two lemmas. We call a subset of D ⊆ Rn open, resp.closed, if it is the intersection of an open, resp. closed set of Rn with D.

14

Lemma 2 Let f : D → Rm be a vector field. Then f is continuous at every point a ∈ D ifand only if the following holds: For every open set W of Rm, its inverse image f−1(W ) :={x ∈ D|f(x) ∈ W} ⊆ D is open.

Warning: f continuous does not mean that the image of an open set is open. Take, forinstance, the constant function f : R → R, given by f(x) = 0, for all x ∈ R. This is acontinuous map, but any open interval (a, b) in R gets squished to a point, and f((a, b)) isnot open.

Proof. Let W be an open set in Rm and a ∈ f−1(W ). Then choose ε so that Bf(a)(ε) ⊆ Wwhich is possible since W is open. By continuity there is a δ so that f(Ba(δ)∩D) ⊆ Bf(a)(ε) ⊆W which just means that Ba(δ) ∩ D ⊆ f−1(W ). Since a is arbitrary we find that f−1(W )is open. Conversely, if f satisfies this condition then f−1(Bf (a)(ε)) is open since Bf (a)(ε) isopen. Hence we can find a small ball Ba(δ) ∩ D ⊆ f−1(Bf (a)(ε)) around a ∈ f−1(Bf (a)(ε))which implies that f is continuous at a.

Remark: This Lemma shows that the notion of continuity does not depend on the par-ticular norm function used in its definition, only on the collection of open sets defined viathis norm function (recall the equivalent ways of using boxes or balls to define open sets).

Lemma 3 Let f : Rn → Rm be continuous. Then, given any compact set C of Rn, f(C) iscompact.

Proof. Let C be a compact subset of Rn. Pick any open covering U = {Wα} of f(C).Then by the previous lemma, if we put Vα = f−1(Wα), each Vα is an open subset of Rn.Then the collection {Vα} will be an open covering of C. By the compactness of C, we cathen find a finite subcovering {Vα1 , . . . , Vαr} of C. Since each Wα is simply f(Vα), f(C) willbe covered by the finite collection {Wα1 , . . . , Wαr}. Done.

As a consequence, any continuous image of a closed rectangular box or a closed ball or asphere will be compact.

15

Chapter 2

Differentiation in higher dimensions

2.1 The Total Derivative

Recall that if f : R → R is a 1-variable function, and a ∈ R, we say that f is differentiableat x = a if and only if the ratio f(a+h)−f(a)

htends to a finite limit, denoted f ′(a), as h tends

to 0.

There are two possible ways to generalize this for vector fields

f : D → Rm, D ⊆ Rn,

for points a in the interior D0 of D. (The interior of a set X is defined to be the subsetX0 obtained by removing all the boundary points. Since every point of X0 is an interiorpoint, it is open.) The reader seeing this material for the first time will be well advised tostick to vector fields f with domain all of Rn in the beginning. Even in the one dimensionalcase, if a function is defined on a closed interval [a, b], say, then one can properly speak ofdifferentiability only at points in the open interval (a, b).

The first thing one might do is to fix a vector v in Rn and saythat f is differentiablealong v iff the following limit makes sense:

limh→0

1

h(f(a + hv)− f(a)) .

When it does, we write f ′(a; v) for the limit. Note that this definition makes sense because ais an interior point. Indeed, under this hypothesis, D contains a basic open set U containinga, and so a+hv will, for small enough h, fall into U , allowing us to speak of f(a+hv). This

1

derivative behaves exactly like the one variable derivative and has analogous properties. Forexample, we have the following

Theorem 1 (Mean Value Theorem for scalar fields) Suppose f is a scalar field. Assumef ′(a + tv; v) exists for all 0 ≤ t ≤ 1. Then there is a to with 0 ≤ to ≤ 1 for whichf(a + v)− f(a) = f ′(a + t0v; v).

Proof. Put φ(t) = f(a + tv). By hypothesis, φ is differentiable at every t in [0, 1], andφ′(t) = f ′(a + tv; v). By the one variable mean value theorem, there exists a t0 such thatφ′(t0) is φ(1)− φ(0), which equals f(a + v)− f(a). Done.

When v is a unit vector, f ′(a; v) is called the directional derivative of f at a in thedirection of v.

The disadvantage of this construction is that it forces us to study the change of f in onedirection at a time. So we revisit the one-dimensional definition and note that the conditionfor differentiability there is equivalent to requiring that there exists a constant c (= f ′(a)),

such that limh→0

(f(a + h)− f(a)− ch

h

)= 0. If we put L(h) = f ′(a)h, then L : R → R is

clearly a linear map. We generalize this idea in higher dimensions as follows:

Definition. Let f : D → Rm (D ⊆ Rn) be a vector field and a an interior point of D. Thenf is differentiable at x = a if and only if there exists a linear map L : Rn → Rm such that

(∗) limu→0

||f(a + u)− f(a)− L(u)||||u|| = 0.

Note that the norm || · || denotes the length of vectors in Rm in the numerator and in Rn inthe denominator. This should not lead to any confusion, however.

Lemma 1 Such an L, if it exists, is unique.

Proof. Suppose we have L,M : Rn → Rm satisfying (*) at x = a. Then

limu→0

||L(u)−M(u)||||u|| = lim

u→0

||L(u) + f(a)− f(a + u) + (f(a + u)− f(a)−M(u))||||u||

≤ limu→0

||L(u) + f(a)− f(a + u)||||u||

+ limu→0

||f(a + u)− f(a)−M(u)||||u|| = 0.

2

Pick any non-zero v ∈ Rn, and set u = tv, with t ∈ R. Then, the linearity of L,M impliesthat L(tv) = tL(v) and M(tv) = tM(v). Consequently, we have

limt→0

||L(tv)−M(tv)||||tv|| = 0

= limt→0

|t| ||L(v)−M(v)|||t| ||v||

=1

||v|| ||L(v)−M(v)||.

Then L(v)−M(v) must be zero.

Definition. If the limit condition (∗) holds for a linear map L, we call L the total deriva-tive of f at a, and denote it by Taf .

It is mind boggling at first to think of the derivative as a linear map. A natural questionwhich arises immediately is to know what the value of Taf is at any vector v in Rn. We willshow in section 2.3 that this value is precisely f ′(a; v), thus linking the two generalizationsof the one-dimensional derivative.

Sometimes one can guess what the answer should be, and if (*) holds for this choice, thenit must be the derivative by uniqueness. Here are two examples which illustrate this.

(1) Let f be a constant vector field, i.e., there exists a vector w ∈ Rm such thatf(x) = w, for all x in the domain D. Then we claim that f is differentiable at any a ∈ D0

with derivative zero. Indeed, if we put L(u) = 0, for any u ∈ Rn, then (*) is satisfied,because f(a + u)− f(a) = w − w = 0.

(2) Let f be a linear map. Then we claim that f is differentiable everywhere withTaf = f . Indeed, if we put L(u) = f(u), then by the linearity of f , f(a + u)− f(a) = f(u),and so f(a+u)−f(a)−L(u) is zero for any u ∈ Rn. Hence (*) holds trivially for this choiceof L.

Before we leave this section, it will be useful to take note of the following:

Lemma 2 Let f1, . . . , fm be the component (scalar) fields of f . Then f is differentiable ata iff each fi is differentiable at a. Moreover, Tf(v) = (Tf1(v), T f2(v), . . . , T fn(g)).

An easy consequence of this lemma is that, when n = 1, f is differentiable at a iff thefollowing familiar looking limit exists in Rm:

limh→0

f(a + h)− f(a)

h,

3

allowing us to suggestively write f ′(a) instead of Taf . Clearly, f ′(a) is given by the vector(f ′1(a), . . . , f ′m(a)), so that (Taf)(h) = f ′(a)h, for any h ∈ R.

Proof. Let f be differentiable at a. For each v ∈ Rn, write Li(v) for the i-th componentof (Taf)(v). Then Li is clearly linear. Since fi(a + u)− fi(u)− Li(u) is the i-th componentof f(a + u)− f(a)− L(u), the norm of the former is less than or equal to that of the latter.This shows that (*) holds with f replaced by fi and L replaced by Li. So fi is differentiablefor any i. Conversely, suppose each fi differentiable. Put L(v) = ((Taf1)(v), . . . , (Tafm)(v)).Then L is a linear map, and by the triangle inequality,

||f(a + u)− f(a)− L(u)|| ≤m∑

i=1

|fi(a + u)− fi(a)− (Tafi)(u)|.

It follows easily that (*) exists and so f is differentiable at a.

2.2 Partial Derivatives

Let {e1, . . . , en} denote the standard basis of Rn. The directional derivatives along the unitvectors ej are of special importance.

Definition. Let j ≤ n. The jth partial derivative of f at x = a is f ′(a; ej), denoted by∂f

∂xj

(a) or Djf(a).

Just as in the case of the total derivative, it can be shown that∂f

∂xj

(a) exists iff∂fi

∂xj

(a)

exists for each coordinate field fi.

Example: Define f : R3 → R2 by

f(x, y, z) = (exsin(y), zcos(y)).

All the partial derivatives exist at any a = (x0, y0, z0). We will show this for∂f

∂yand leave

it to the reader to check the remaining cases. Note that

1

h(f(a + he2)− f(a)) = (

ex0sin(y0+h) − ex0sin(y0)

h, z0

cos(y0 + h)− cos(y0)

h).

We have to understand the limit as h goes to 0. Then the methods of one variable calculusshow that the right hand side tends to the finite limit (x0cos(y0)e

x0sin(y0),−z0sin(y0)), which

4

is∂f

∂y(a). In effect, the partial derivative with respect to y is calculated like a one variable

derivative, keeping x and z fixed. Let us note without proof that∂f

∂x(a) is (sin(y0)e

x0sin(y0), 0)

and∂f

∂z(a) is (0, cos(y0)).

It is easy to see from the definition that f ′(a; tv) equals tf ′(a; v), for any t ∈ R. Thisfollows as 1

h(f(a + h(tv))− f(a)) = t 1

th(f(a + (ht)v)− f(a)). In particular the Mean Value

Theorem for scalar fields gives fi(a + hv) − f(a) = hf ′i(a + t0hv) = hfi(a + τv) for some0 ≤ τ ≤ h.

We also have the following

Lemma 3 Suppose the derivatives of f along any v ∈ Rn exist near a and are continuousat a. Then

f ′(a; v + v′) = f ′(a; v) + f ′(a; v′),

for all v, v′ in Rn. In particular, the directional derivatives of f are all determined by the npartial derivatives.

We will do this for the scalar fields fi. Notice

fi(a + hv + hv′)− fi(a) = fi(a + hv + hv′)− fi(a + hv) + fi(a + hv)− f(a)

= hfi(a + hv + τv′) + hfi(a + τ ′v)

where here 0 ≤ τ ≤ h and 0 ≤ τ ′ ≤ h. Now dividing by h and taking the limit and h → 0gives f ′i(a; v + v′) for the first expression. The last expression gives a sum of two limits

limh→0f′i(a + hv + τv′) + limh→0f

′i(a + τ ′v; v′).

But this is f ′i(a; v) + f ′i(a; v′).Recall both τ and τ ′ are between 0 and h and so as h goes to0 so do τ and τ ′. Here we have used the continuity of the derivatives of f along any line ina neighborhood of a.

Now pick e1, e2, . . . , en the usual orthogonal basis and recall v =∑

αiei. Then f ′(a; v) =f ′(a;

∑αiei) =

∑αif

′(a; ei). Also the f ′(a; ei) are the partial derivatives. The Lemma nowfollows easily.

In the next section (Theorem 1a) we will show that the conclusion of this lemma remainsvalid without the continuity hypothesis if we assume instead that f has a total derivative ata.

5

The gradient of a scalar field g at an interior point a of its domain in Rn is defined tobe the following vector in Rn:

∇g(a) = grad g(a) =

(∂g

∂x1

(a), . . . ,∂g

∂xn

(a)

),

assuming that the partial derivatives exist at a.

Given a vector field f as above, we can then put together the gradients of its componentfields fi, 1 ≤ i ≤ m, and form the following important matrix, called the Jacobian matrixat a:

Df(a) =

(∂fi

∂xj

(a)

)

1≤i≤m,1≤j≤n

∈ Mmn(R).

The i-th row is given by ∇fi(a), while the j-th column is given by∂f

∂xj

(a). Here we are using

the notation Mmn(R) for the collection of all m × n-matrices with real coefficients. Whenm = n, we will simply write Mn(R).

2.3 The main theorem

In this section we collect the main properties of the total and partial derivatives.

Theorem 2 Let f : D → Rm be a vector field, and a an interior point of its domain D ⊆ Rn.

(a) If f is differentiable at a, then for any vector v in Rn,

(Taf)(v) = f ′(a, v).

In particular, since Taf is linear, we have

f ′(a; αv + βv′) = αf ′(a; v) + βf ′(a; v′),

for all v, v′ in Rn and α, β in R.

(b) Again assume that f is differentiable. Then the matrix of the linear map Taf relativeto the standard bases of Rn, Rm is simply the Jacobian matrix of f at a.

(c) f differentiable at a ⇒ f continuous at a.

(d) Suppose all the partial derivatives of f exist near a and are continuous at a. Then Tafexists.

6

(e) (chain rule) Consider

Rn f−→ Rm g−→ Rh.a 7→ b = f(a)

Suppose f is differentiable at a and g is diffentiable at b = f(a). Then the compositefunction h = g ◦ f is differentiable at a, and moreover,

Tah = Tbg ◦ Taf.

In terms of the Jacobian matrices, this reads as

Dh(a) = Dg(b)Df(a) ∈ Mkn.

(f) (m = 1) Let f , g be scalar fields, differentiable at a. Then

(i) Ta(f + g) = Taf + Tag (additivity)

(ii) Ta(fg) = f(a)Tag + g(a)Taf (product rule)

(iii) Ta(f

g) =

g(a)Taf − f(a)Tag

g(a)2if g(a) 6= 0 (quotient rule)

The following corollary is an immediate consequence of the theorem, which we will makeuse of, in the next chapter on normal vectors and extrema.

Corollary 1 Let g be a scalar field, differentiable at an interior point b of its domain D inRn, and let v be any vector in Rn. Then we have

∇g(b) · v = g′(b; v).

Furthermore, let φ be a function from a subset of R into D ⊆ Rn, differentiable at an interiorpoint a mapping to b. Put h = g ◦ φ. Then h is differentiable at a with

h′(a) = ∇g(b) · φ′(a).

Proof of main theorem. (a) It suffices to show that (Tafi)(v) = fi(a; v) for each i ≤ n.By definition,

limu→0

||fi(a + u)− fi(a)− (Tafi)(u)||||u|| = 0

7

This means that we can write for u = hv, h ∈ R,

limh→0

fi(a + hv)− fi(a)− h(Tafi)(v)

|h|||v|| = 0.

In other words, the limit limh→0

fi(a+hv)−fi(a)h

exists and equals (Tafi)(v). Done.

(b) By part (a), each partial derivative exists at a (since f is assumed to be differentiableat a). The matrix of the linear map Taf is determined by the effect on the standard basisvectors. Let {e′i|1 ≤ i ≤ m} denote the standard basis in Rm. Then we have, by definition,

(Taf)(ej) =m∑

i=1

(Tafi)(ej)e′i =

m∑i=1

∂fi

∂xj

(a)e′i.

The matrix obtained is easily seen to be Df(a).

(c) First we need the following simple

Lemma 4 Let T : Rn → Rm be a linear map. Then, ∃c > 0 such that ||Tv|| ≤ c||v|| for anyv ∈ Rn.

Proof of Lemma. Let A be the matrix of T relative to the standard bases. Put C =

maxj{||T (ej)||}. If v =n∑

j=1

αjej, then

||T (v)|| = ||∑

j

αjT (ej)|| ≤ C

n∑j=1

|αj| · 1

≤ C(n∑

j=1

|αj|2)1/2(n∑

j=1

1)1/2 ≤ C√

n||v||,

by the Cauchy–Schwarz inequality. We are done by setting c = C√

n.

This shows that a linear map is continuous as if ||v − w|| < δ then ||T (v) − T (w)|| =||T (v − w)|| < c||v − w|| < cδ.

(c) Suppose f is differentiable at a. This certainly implies that the limit of the functionf(a + u) − f(a) − (Taf)(u), as u tends to 0 ∈ Rn, is 0 ∈ Rm (from the very definition ofTaf , ||f(a + u) − f(a) − (Taf)(u)|| tends to zero ”faster” than ||u||, in particular it tendsto zero). Since Taf is linear, Taf is continuous (everywhere), so that limu→0(Taf)(u) = 0.Hence limu→0 f(a + u) = f(a) which means that f is continuous at a.

8

(d) By hypothesis, all the partial derivatives exist near a = (a1, . . . , an) and are continuousthere. It suffices to show that each fi is differentiable at a by lemma 2. So we have only toshow that (*) holds with f replaced by fi and L(u) = f ′i(a; u). Write u = (h1, . . . , hn). ByLemma 3, we know that f ′i(a;−) is linear. So

L(u) =n∑

j=1

hj∂fi

∂xj

(a),

and we can write

fi(a + u)− fi(a) =n∑

j=1

(φj(aj + hj)− φj(aj)),

where each φj is a one variable function defined by

φj(t) = fi(a1 + h1, . . . , aj−1 + hj−1, t, aj+1, . . . , an).

By the mean value theorem,

φj(aj + hj)− φj(aj) = hjφ′j(tj) = hj

∂fi

∂xj

(y(j)),

for some tj ∈ [aj, aj + hj], with

y(j) = (a1 + h1, . . . , aj−1 + hj−1, tj, aj+1, . . . , an).

Putting these together, we see that it suffices to show that the following limit is zero:

limu→0

1

||u|| |n∑

j=1

hj

(∂fi

∂xj

(a)− ∂fi

∂xj

(y(j))

)|.

Clearly, |hj| ≤ ||u||, for each j. So it follows, by the triangle inequality, that this limit is

bounded above by the sum over j of limhj→0

| ∂fi

∂xj

(a)− ∂fi

∂xj

(y(j))|, which is zero by the continuity

of the partial derivatives at a. Here we are using the fact that each y(j) approaches a as hj

goes to 0. Done.

Proof of (e) Write L = Taf , M = Tbg, N = M ◦ L. To show: Tah = N .

Define F (x) = f(x)− f(a)− L(x− a), G(y) = g(y)− g(b)−M(y − b) and H(x) = h(x)−h(a)−N(x− a). Then we have

limx→a

||F (x)||||x− a|| = 0 = lim

y→b

||G(y)||||y − b|| .

9

So we need to show:

limx→a

||H(x)||||x− a|| = 0.

ButH(x) = g(f(x))− g(b)−M(L(x− a))

Since L(x− a) = f(x)− f(a)− F (x), we get

H(x) = [g(f(x))− g(b)−M(f(x)− f(a))] + M(F (x)) = G(f(x)) + M(F (x)).

Therefore it suffices to prove:

(i) limx→a

||G(f(x))||||x− a|| = 0 and

(ii) limx→a

||M(F (x))||||x− a|| = 0.

By Lemma 4, we have ||M(F (x))|| ≤ c||F (x)||, for some c > 0. Then||M(F (x))||||x− a|| ≤

c limx→a

||F (x)||||x− a|| = 0, yielding (ii).

On the other hand, we know limy→b

||G(y)||||y − b|| = 0. So we can find, for every ε > 0, a

δ > 0 such that ||G(f(x))|| < ε||f(x) − b|| if ||f(x) − b|| < δ. But since f is continuous,||f(x)− b|| < δ whenever ||x− a|| < δ1, for a small enough δ1 > 0. Hence

||G(f(x))|| < ε||f(x)− b|| = ε||F (x) + L(x− a)||≤ ε||F (x)||+ ε||L(x− a)||,

by the triangle inequality. Since limx→a

||F (x)||||x−a|| is zero, we get

limx→a

||G(f(x))||||x− a|| ≤ ε lim

x→a

||L(x− a)||||x− a|| .

Applying Lemma 4 again, we get ||L(x− a)|| ≤ c′||x− a||, for some c′ > 0. Now (i) followseasily.

(f) (i) We can think of f +g as the composite h = s(f, g) where (f, g)(x) = (f(x), g(x))and s(u, v) = u + v (“sum”). Set b = (f(a), g(a)). Applying (e), we get

Ta(f + g) = Tb(s) ◦ Ta(f, g) = Ta(f) + Ta(g).

10

Done. The proofs of (ii) and (iii) are similar and will be left to the reader.

QED.

Remark. It is important to take note of the fact that a vector field f may be differentiableat a without the partial derivatives being continuous. We have a counterexample alreadywhen n = m = 1 as seen by taking

f(x) = x2sin

(1

x

)if x 6= 0,

and f(0) = 0. This is differentiable everywhere. The only question is at x = 0, where the

relevant limit limh→0

f(h)h

is clearly zero, so that f ′(0) = 0. But for x 6= 0, we have by the

product rule,

f ′(x) = 2xsin

(1

x

)− cos

(1

x

),

which does not tend to f ′(0) = 0 as x goes to 0. So f ′ is not continuous at 0.

2.4 Mixed partial derivatives

Let f be a scalar field, and a an interior point in its domain D ⊆ Rn. For j, k ≤ n, we mayconsider the second partial derivative

∂2f

∂xj∂xk

(a) =∂

∂xj

(∂f

∂xk

)(a),

when it exists. It is called the mixed partial derivative when j 6= k, in which case it is ofinterest to know whether we have the equality

(3.4.1)∂2f

∂xj∂xk

(a) =∂2f

∂xk∂xj

(a).

Proposition 1 Suppose∂2f

∂xj∂xk

and∂2f

∂xk∂xj

both exist near a and are continuous there.

Then the equality (3.4.1) holds.

The proof is similar to the proof of part (d) of Theorem 1.

11

Chapter 3

Tangent spaces, normals and extrema

If S is a surface in 3-space, with a point a ∈ S where S looks smooth, i.e., without anyfold or cusp or self-crossing, we can intuitively define the tangent plane to S at a as follows.Consider a plane Π which lies outside S and bring it closer and closer to S till it touches Snear a at only one point, namely a, without crossing into S. This intuitive picture is evenclearer in the case of finding the tangent line to a smooth curve in the plane at a point. Aslightly different way is to start with a plane Π in R3 which slices S in a small neighborhoodU = Ba(r) of a, and then consider the limit when Π ∩ U shrinks to the point a, and callthe limiting plane the tangent plane at a. In the case of a plane curve arising as the graphof a nice function f , one knows that this process works and gives the slope of the tangentline to be f ′(a). In higher dimensions, if S lies in Rn, one can define tangent vectors byconsidering smooth curves on S through the point a, and when S is a level set of the formf(x1, . . . , xn) = c, one can use the gradient of f at a to define the tangent space. But thismethod fails when the gradient vanishes at a. Finally, it should be noted that one oftendefines the tangent space at a, when it makes sense, in such a way that it becomes a vectorspace. But if one looks at the case of plane curves, the usual tangent line is not a vectorspace as it may not contain the origin 0 (= (0, 0)), unless of course if a = 0. It becomes avector space if we parallel translate it to the origin by subtracting a. It is similar in higherdimensions. One calls a plane in 3-space an affine plane if it does not pass through theorigin.

1

3.1 Tangents to parametrized curves

By a parametrized curve, or simply a curve, in Rn, we mean the image C of a continuousfunction

α : [r, s] → Rn,

where [r, s] is a closed interval in R.

C is called a plane curve, resp. a space curve, if n = 2, resp. n = 3. An exampleof the former, resp.the latter, is the cycloid, resp. the right circular helix, parametrizedby α : [0, 2π] → R2 with α(t) = (t − sin t, 1 − cos t), resp. β : [−3π, 6π] → R3 withβ(t) = (cos t, sin t, t).

Note that a parametrized curve C has an orientation, i.e., a direction; it starts atP = α(r), moves along as t increases from r, and ends at Q = α(s). We call the directionfrom P to Q positive and the one from Q to P negative. It is customary to say that C is aclosed curve if P = Q.

We say that C is differentiable iff α is a differentiable function.

Definition. Let C be a differentiable curve in Rn parametrized by an α as above. Leta = α(t0), with t0 ∈ (r, s). Then α′(t0) is called the tangent vector to C at a (in thepositive direction).

For example, consider the unit circle C in the plane given by α : [0, 2π] → R2, t →(cos t, sin t). Clearly α is differentiable, and we have α′(t) = (− sin t, cos t) for all t ∈ (0, 2π).Consider the point a = (1/2,

√3/2) on the circle; clearly, a = α(π/3). By the above

definition, the tangent vector to C at a is given by α′(π/3) = (−√3/2, 1/2). Let us nowcheck that this agrees with our notion of a tangent from one variable Calculus. To thisend we want to think of C as the graph of a one variable function, but this is not possiblefor all of C. However, we can express the upper and the lower semi-circles as graphs ofsuch functions. And since the point of interest a lies on the upper semi-circle C+, we lookat the function y = f(x) with f(x) =

√1− x2, whose graph is C+. Note that (by the

chain rule) f ′(x) = −x(1− x2)−1/2. The slope of the tangent to C+ at a is then given byf ′(1/2) = −1/

√3. Hence the tangent line to C at a is the line L passing through (1/2,

√3/2)

of slope −1/√

3. This is the unique line passing through a in the direction of the tangentvector (−√3/2, 1/2). (In other words L is the unique line passing through a and parallelto the line joining the origin to (−√3/2, 1/2).) The situation is similar for any plane curvewhich is the graph of a one-variable function f , i.e. of the form α(t) = (t, f(t)). So thedefinition of a tangent vector above is very reasonable.

2

3.2 Tangent Spaces and Normals to Level Sets.

In the last paragraph we have defined the tangent line to a parametrized curve. This is aset defined by a map into Rn. Here we look at different kinds of sets which are defined bymaps from Rn.

Definition. Let f : D → R, D ⊂ Rn, be a scalar field. For each c ∈ R, the level set of fat c is given by

Lc = {x ∈ D | f(x) = c}

Example. f(x) = height from sea level, where x is a point on Mount Wilson (sitting inR3). Lc(f) consists of all points of constant height above sea level, just like those lines on atopographic map.

Definition. Let f : D → R, D ⊂ Rn, be a differentiable scalar field and a ∈ Lc(f). If∇f(a) 6= 0 then we define the tangent space Θa(Lc(f)) to Lc(f) at a to be the vectorspace

Θa(Lc(f)) = {x ∈ Rn | ∇f(a) · x = 0}.

This is a solution set of one nonzero equation in Rn, hence a vector space of dimensionn − 1. One calls a point a ∈ Lc(f) where ∇f(a) 6= 0 a smooth point of Lc(f). The ideabeing that a small neighborhood of a in Lc(f) looks just like a small neighborhod of anypoint in Rn−1 though we won’t make this statement precise.

Definition. Let f , Lc(f) and Θa(Lc(f)) be as in the previous definition. A normal vector(to Lc(f) at a) is a vector v ∈ Rn orthogonal to all vectors in Θa(Lc(f)).

It is clear that the normal vectors are just the scalar multiples of ∇f(a) (verify this!).Tangent vectors point “in the direction of” the level set whereas normal vectors point “asfar away as possible” from the level set.

Our two notions of tangent space are related by the following observation.

Proposition. Let f : D → R be a differentiable scalar field, with D ⊂ Rn, α : R → Lc(f)a curve which is differentiable at t0 ∈ R and so that a = α(t0) is a smooth point of Lc(f).Then α′(t0) ∈ Θa(Lc(f)).

3

Proof. Since α takes values in the level set we have f(α(t)) = c for all t ∈ R. Applyingthe chain rule to h(t) := f(α(t)) (a constant function!) we find

0 = h′(t) = ∇f(a) · α′(t0)which means that α′(t0) ∈ Θa(Lc(f)). QED

In particular, if n = 2 we know that Θa(Lc(f)) has dimension 1. So it must coincidewith the tangent line defined for parametrized curves in the previous section. One can infact show that given a smooth point a ∈ Lc(f) there always is a curve α : R → Lc(f) withα(0) = a and α′(0) any given tangent vector in Θa(Lc(f)). But we won’t prove this here.

Given any subset S of Rn, and a point a ∈ S, one may ask whether there is always a goodnotion of tangent space to S at a. If one starts with the idea that α′(t0) for any parametrizedcurve : α : R → S should be a tangent vector, the tangent vectors might not form a linearspace. Consider the example S = {(x, y) ∈ R2|xy = 0} at a = (0, 0). The only differentiablecurves in S through a (with non-zero gradient at 0) lie on one of the coordinate axes, andthe tangent vectors are those of the form (0, a) or (a, 0). If a set has a “corner” or “cusp”at a ∈ S there might be no differentiable curve with a = α(t0) and α′(t0) 6= 0. ConsiderS = {(x, y) ∈ R2|xy = 0, x ≥ 0, y ≥ 0} and a = (0, 0).

However, the two classes of sets we have studied so far readily generalize and so does thenotion of tangent space for them.

Definition We say that S ⊂ Rn can be differentiably parametrized around a ∈ S ifthere is a bijective, i.e., one-to-one and onto, differentiable function α : Rk → S ⊂ Rn withα(0) = a and so that the linear map T0α has largest possible rank, namely k. The tangentspace to S at a is the simply the image of T0α, a linear subspace of Rn. In particular wemust have k ≤ n.

Definition. Let f : Rn → Rm be a vector field with components (f1, ..., fm). Then forc = (c1, .., cm) ∈ Rm the level set

Lc(f) := {x ∈ Rn|f(x) = c} = Lc1(f1) ∩ ... ∩ Lcm(fm)

is defined as before, and the tangent space

Θa(Lc(f)) := {x ∈ Rn|(Taf)(x) = 0}is defined if Taf has largest possible rank, namely m. This latter condition is equivalent torequiring ∇f1(a), ...,∇fm(a) to be linearly independent. We also have

Θa(Lc(f)) = Θa(Lc1(f1)) ∩ ... ∩Θa(Lcm(fm))

4

and dimRΘa(Lc(f)) = n−m.

Once we have the notion of tangent space, a normal vector is defined as before for bothclasses of examples. Note that n−m is also the (intuitive) dimension of Lc(f).

Examples:

(1) Find a normal to the surface S given by z = x2y2 + y + 1 at (0, 0, 1): Set f(x, y, z) =x2y2 + y − z + 1. Then z = x2y2 + y + 1 iff (x, y, z) lies on the level set L0(f). Since f is apolynomial function, it is differentiable. We have:

∇f(a) = (2xy2, 2x2y + 1,−1)|(0,0,1) = (0, 1,−1),

which is a normal to L0(f) = S. Indeed, as the proof of the Proposition above shows, to benormal to a level set Lc(f) at a point a is the same as being perpendicular to the gradientof f at that point, namely ∇f(a). So a unit normal is given by n = (0, 1/

√2,−1/

√2).

(2) (Physics example) Let P be a unit mass particle at (x, y, z) in R3. Denote by ~Fthe gravitational force on P exerted by a point mass M at the origin. Then Newton’sgravitational law tells us that

~F = −GM

r2

(~r

r

),

where ~r = x~i + y~j + z~k, r = ||~r||, and G the gravitational constant. (Here ~i,~j,~k denotethe standard basis vectors (1, 0, 0), (0, 1, 0), (0, 0, 1).) Put V = −GM/r, which is calledthe gravitational potential. By chain rule, ∂V

∂xequals GM

r2∂r∂x

. Similarly for ∂V∂y

and ∂V∂z

.Therefore

−∇V =GM

r2(∂r

∂x,∂r

∂y,∂r

∂z) =

GM

r2(x

r,y

r,z

r) = ~F .

For c < 0, the level sets Lc(V ) are spheres centered at the origin of radius −GM/c, and ~Fis evidently perpendicular to each of them. Note that F gets stronger as r gets smaller asthe physical intuition would suggest.

(3) Let f = x3 + y3 + z3 and g = x + y + z. The level set S := L10(f) ∩ L4(g) is a curve.The set S ∩{(x, y, z) ∈ R3|x ≥ 0, y ≥ 0, z ≥ 0} is closed and bounded hence compact (verifythis!).

3.3 Relative maxima, relative minima and saddle points.

Let f : D → R be a scalar field, and let a be a point in the interior of D.

5

Definition. The point a = (a1, . . . , an) is a relative maximum (resp. relative mini-mum) iff we can find an open ball Ba(r) completely contained in D such that for every x inBa(r), f(x) ≤ f(a) (resp. f(x) ≥ f(a)).

By a relative extremum we mean a point a which is either a relative maximum or arelative minimum. Some authors call it a local extremum.

Examples:

(1) Fix real numbers b1, b2, . . . , bn, not all zero, and consider f : Rn → R definedby f(x1, x2, . . . , xn) = b1x1 + b2x2 + . . . + bnxn. In vector language, this is described byf(x) = b · x. Then no point in Rn is a relative extremum for this function. More precisely,we claim that, given any a in Rn and any other vector v ∈ Rn, f(x) − f(a) is positive andnegative for all x near a along the direction defined by v. Indeed, if we write x = a + tv, thelinearity of f shows that f(x)− f(a) = tf(v), which changes sign when t does.

(2) Fix a non-zero real number b and consider the scalar field f(x, y) = b(x2 + y2) onR2. It is an easy exercise to see that f has a local extremum at (0, 0), which is a relativemaximum iff b is negative.

(3) Let f : R → R be the function defined by f(x) = x − [x], where [x] denotes thelargest integer which is less than or equal to x, called the integral part of x. (For example,[π] = 3, [

√2] = 1 and [−π] = −4.) Then it is easy to see that f(x) is always non-negative

and that f(n) = 0 for all integers n. Hence the integers n are all relative minima for f . Thegraph of f is continuous except at the integers and is periodic of period 1. Moreover, theimage of f is the half-closed interval [0, 1). We claim that there is no relative maximum.Indeed the function is increasing (with slope 1) on [n, n + 1) for any integer n, any relativemaximum on [n, n + 1) has to be an absolute maximum and this does not exist; f(x) canget arbitrarily close to 1 but can never attain it.

In the first two examples above, f was continuous, even differentiable. In such cases therelative extrema exert an influence on the derivative, which can be used to locate them.

Lemma. If a = (a1, . . . , an) is a relative extremum of a differentiable scalar field f , then∇f(a) is zero.

Proof. For every j = 1, . . . , n, define a one-variable function gj by

gj(t) = f(a1, . . . , aj−1, aj + t, aj+1, . . . , an),

so that gj(0) = f(a). Since f has a relative extremum at a, gj has a local extremum at0. Since f is differentiable, gj is differentiable as well, and we see that by Ma 1a material,g′j(0) = 0 for every j. But g′j(0) is none other than ∂f

∂xj(a). Since∇f(a) = ( ∂f

∂x1(a), . . . ∂f

∂xn(a)),

it must be zero.

6

QED.

Definition. A stationary point of a differentiable scalar field f is a point a where∇f(a) = 0. It is called a saddle point if it is not a relative extremum.

Some call a stationary point a critical point. In the language of the previous section,stationary points are the non-smooth points of the level set. But here we focus on thefunction rather than its level sets, hence the new terminology.

Note that if a is a saddle point, then by definition, every open ball Ba(r) contains pointsx with f(x) ≥ f(a) and points x′ such that f(x′) ≤ f(a).

Examples (contd.):

(3) Let f(x, y) = x2 − y2, for all (x, y) ∈ R2. Being a polynomial, f is differentiableeverywhere, and ∇f = (2x,−2y). So the origin is the unique stationary point. We claimthat it is a saddle point. Indeed, consider any open ball B0(r) centered at 0. If we moveaway from 0 along the y direction (inside B0(r)), the function is z = −y2 and so the originis a local maximum along this direction. But if we move away from 0 in the x direction,we see that the origin is a local minimum along that direction. So f(x, y) − f(0, 0) is bothpositive and negative in B0(r), and this is true for any r > 0. Hence the claim. Note that ifwe graph z = x2− y2 in R3, the picture looks quite a bit like a real saddle around the origin.This is the genesis of the mathematical terminology of a saddle point.

(4) The above example can be jazzed up to higher dimensions as follows: Fix positiveintegers n, k with k < n, and define f : Rn → R by

f(x1, x2, . . . , xn) = x21 + . . . + x2

k − x2k+1 − . . .− x2

n.

Since ∇f = 2(x1, . . . , xk,−xk+1, . . . , xn), the origin is the only stationary point. Again, itis a saddle point by a similar argument.

3.4 The Hessian test for relative max/min

Given a stationary point a of a nice two-variable function (scalar field) f , say with continuoussecond partial derivatives at a, there is a test one can apply to determine whether it is arelative maximum or a relative minimum or neither. This generalizes in a non-obvious waythe second derivative test in the one-variable situation. One crucial difference is that inthat (one-variable) case, one does not require the continuity of the second derivative at a.This added hypothesis is needed in our (2-variable) case because the second mixed partialderivatives, namely ∂2f/∂x∂y and ∂2f/∂y∂x need not be equal at a.

7

First we need to define the Hessian. Suppose we are given a function f : D → R withD ⊂ Rn, and an interior point a of D where the second partial derivatives ∂2f/∂xi∂xj existfor all i, j ≤ n. We put

Hf(a) =

(∂2f

∂xi∂xj

(a)

),

which is clearly an n × n-matrix. It is a symmetric matrix iff all the mixed partialderivatives are independent of the order in which the partial differentiation is performed.The reason symmetry is of interest is that we know by a basic result in Linear Algebra(Ma 1b material) that any (real) symmetric matrix can be diagonalized with all eigenvaluesbeing real. The Hessian determinant of f at a is by definition the determinant of Hf(a),denoted det(Hf(a)).

We will make use of this notion in the plane.

Theorem Let D ⊂ R2, f : D → R a scalar field, and a ∈ D0. Assume that all the secondpartial derivatives of f exist and are continuous at a. Suppose a is a stationary point. Then

(i) If ∂2f/∂x2(a) < 0 and det(Hf(a)) > 0, then a is a relative maximum for f ;

(ii) If ∂2f/∂x2(a) > 0 and det(Hf(a)) > 0, then a is a relative maximum for f ;

(iii) If det(Hf(a)) < 0, a is a saddle point for f .

The test does not say anything if Hf(a) is singular, i.e., if it has zero determinant.

Since the second partial derivatives of f exist and are continuous at a, we know from theprevious chapter that

∂2f

∂x∂y(a) =

∂2f

∂y∂x(a).

This means the matrix Hf(a) is symmetric (and real). So by a theorem in Linear Algebra(Ma1b), we know that we can diagonalize it. In other words, we can find an invertible matrixM such that

M−1Hf(a)M =

(λ1 00 λ2

),

where λ1, λ2 are the eigenvalues. (We can even choose M to be an orthogonal matrix, i.e.,with M−1 being the transpose matrix M t.) Then we have

(D) det(Hf(a)) = λ1λ2,

and

(T ) tr(Hf(a)) = ∂2f/∂x2(a) + ∂2f/∂y2(a) = λ1 + λ2,

8

where tr denotes the trace. (Recall that the determinant and the trace do not change whena matrix is conjugated by another matrix, such as M .)

Now suppose Hf(a) has positive determinant, so that

∂2f/∂x2(a)∂2f/∂y2(a) >(∂2f/∂x∂y(a)

)2 ≥ 0.

In this case, by (D), λ1 and λ2 have the same sign. Using (T), we then see that ∂2f/∂x2(a)and ∂2f/∂y2(a) must have the same sign as λj, j = 1, 2. When this sign is positive (resp.negative), a is a relative minimum (resp. relative maximum) for f .

When the Hessian determinant is negative at a, λ1 and λ2 have opposite sign, whichmeans that f increases in some direction and decreases in another. In other words, a is asaddle point for f .

3.5 Constrained extrema and Lagrange multipliers.

Very often in practice one needs to find the extrema of a scalar field subject to some con-straints imposed by the geometry of the situation. More precisely, one starts with a scalarfield f : D → R, with D ⊂ Rn, and a subset S of the interior of D, and asks what themaxima and minima of f are when restricted to S.

For example, we could consider the saddle function f(x, y) = x2−y2 with D = R2, whichhas no relative extrema, and ask for the constrained extrema relative to

S = unit circle : x2 + y2 = 1

This can be solved explicitly. A good way to do it is to look at the level curves Lc(f). Forc = 0, we get the union of the lines y = x and y = −x. For c 6= 0, we get hyperbolas withfoci (±√c, 0) if c > 0 and (0,±√−c) if c < 0. It is clear by inspection that the constrainedextrema of f on S are at the four points (1, 0), (−1, 0), (0, 1), (0,−1); the former two areconstrained maxima (with value 1) and the latter two are constrained minima (with value−1). If you like a more analytical proof, see below.

The question of finding constrained extrema is very difficult to solve in general. Butluckily, if f is differentiable and if S is given as the intersection of a finite number of levelsets of differentiable functions (with independent gradients), there is a beautiful methoddue to Lagrange to find the extrema, which is efficient at least in low dimensions. Here isLagrange’s result.

9

Theorem 1 Let f be a differentiable scalar field on Rn. Suppose we are given scalarsc1, c2, . . . , cr and differentiable functions g1, g2, . . . , gr on Rn, so that the constraint set S isthe intersection of the level sets Lc1(g1), Lc2(g2), · · · , Lcr(gr). Let a be a point on S where∇g1(a),∇g2(a), · · · ,∇gr(a) are linearly independent. Then we have the following:

If a is a constrained extremum of f on S, then there are scalars λ1, λ2, · · · , λr (called the“Lagrange Multipliers”) such that

∇f(a) = λ1∇g1(a) + · · ·+ λr∇gr(a).

Proof. We use the fact, mentioned for r = 1 in the previous section, that one can alwaysfind a differentiable curve α : R → S with α(0) = a and α′(0) any given tangent vectorin Θa(S) (note that S is a generalized level set and the linear independence of the ∇gi(a)assures that Θa(S) is defined). This is not so easy to prove and we may do this later in thecourse if there is time left.

So if a is a constrained extremum of f on S we take such a curve α and notice that a isalso an extremum of f on the image of α. So the one-variable function h(t) := f(α(t)) hasvanishing derivative at t = 0 which by the chain rule implies

∇f(a) · α′(0) = 0.

Since α′(0) was any given tangent vector we find that ∇f(a) is orthogonal to the wholetangent space (i.e. it is a normal vector). The normal space N , say, has dimension r since itis a solution space of a system of linear equations of rank n− r (picking a basis θ1, ..., θn−r ofΘa(Lc(f)) the system of equations is θi · x = 0). On the other hand ∇gi(a) all lie in N andthey are linearly independent by assumption, so they form a basis of N . Since we’ve foundthat ∇f(a) lies in N , ∇f(a) must be a linear combination of these basis vectors. Done.

QED.

Now let us use (a superspecial case of) this Theorem to check the answer we gave abovefor the problem of finding constrained extrema of f(x, y) = x2− y2 relative to the unit circleS centered at (0, 0). Here r = 1.

Note first that S is the level set L0(g), where g(x, y) = x2 + y2 − 1. We need to solve∇f = λ∇g. Since ∇f = (2x,−2y) and ∇g = (2x, 2y), we are left to solve the simultaneousequations x = λx and −y = λy. For this to happen, x and y cannot both be non-zero,as otherwise we will get λ = 1 from the first equation and λ = −1 from the second! Theycannot both be zero either, as (x, y) is constrained to lie on S and so must satisfy x2+y2 = 1.So, either x = 0, y = ±1, when λ = −1, or x = ±1, y = 0, in which case λ = 1. So thefour candidates for constrained extrema are (0,±1), (±1, 0). One can now check that theyare indeed constrained extrema. We get the same answer as before.

10

We come back to the example of the set S defined as the intersection of the two level setsx3 + y3 + z3 = 10 and x + y + z = 4. Suppose we want to find points on S with maximal orminimal distance from the origin. We need to look at f(x, y, z) = x2 + y2 + z2 and solve thesystem of five equations in five variables

x3 + y3 + z3 =10

x + y + z =4

∇f(x, y, z) = (2x, 2y, 2z) =λ1(3x2, 3y2, 3z2) + λ2(1, 1, 1).

These are nonlinear equations for which there is no simple method like the Gauss Jordanalgorithm for linear equations. It’s a good idea in every such problem to look at symmetries.Our set S is invariant under any permutation of the coordinates x, y, z, so it has a sixfoldsymmetry. Each of x, y, z satisfies the quadratic equation 2t = 3λ1t

2 + λ2 or

t2 − 2

3λ1

t +λ2

3λ1

= 0. (∗)

Let’s check that λ1 = 0 cannot happen. Because if λ1 = 0 we’d have x = y = z = λ2/2 andx + y + z = 3λ2/2 = 4 hence λ2 = 8/3. But x3 + y3 + z3 = 3(4/3)3 6= 10. Contradiction!We’ve also shown here that x = y = z cannot happen. In particular our quadratic equation(*) has two distinct roots, and as you well know it has exactly two roots! So two of x, y, zmust coincide: x = y 6= z, or x = z 6= y or x 6= y = z. Let’s look at the second case. Then2x + y = 4 and y = 4 − 2x which leads to 2x3 + (4 − 2x)3 = 10. This is a cubic equationwhich, luckily, has an obvious solution, namely x = 1 (if you were not so lucky there’d stillbe some solution in R). Now x = 1 leads to (1, 2, 1) = λ1(3, 12, 3) + λ2(1, 1, 1) which weknow how to solve (if there is a solution). We find λ1 = 1/9 and λ2 = 2/3 is a solution. Thevalue of f at (1, 2, 1) is 12 + 22 + 12 = 6.

Now the cubic 2x3 + (4− 2x)3 − 10 = 0 written in standard form is x3 − 8x2 + 16x− 9

and it factors as (x− 1)(x2− 7x + 9). The two roots of the quadratic part are 7±√132

and wecheck that for each of them we find corresponding λ1, λ2 simply by identifying x2 − 7x + 9with (∗). Suppose now we look only at the part of S where all of x, y, z are positive. Then

there is only the possibility x0 := 7−√132

, y0 := 4− 2x = −3+√

13, z0 := 7−√132

. And we have

f(x0, y0, z0) = 53 − 13√

13 ∼ 6.12. This is bigger than 6, so (x0, y0, z0) is a maximum and(1, 2, 1) is a minimum (our function really doesn’t vary all that much). Note that we knowa priori that a maximum and a minimum must exist because the points in S with positivecoordinates form a closed and bounded, hence compact set.

Taking into account the possibilities where the other two coordinates coincide, we’vefound three maxima and three minima of f .

Remark. Suppose f : D → R is a differentiable scalar field, with D ⊂ R, and S a subset ofD. Suppose a is a constrained extremum on S. Then a must be one of the following:

11

(1) a is a relative extremum on S;

(2) a is a boundary point on S.

Gradients and parallelism:

We know how the gradients of a differentiable scalar field f give (non-zero) normal vectorsto the level sets Lc(f) at points a where ∇f(a) 6= 0. This is used in the following instances:

(1) To find points of tangency between the level sets Lc(f) and Lc′(g). In this case, if a isa point of tangency, ∇f(a) and ∇g(a) must be proportional.

(2) To find the extrema with constraints using Lagrange multipliers.

(3) To determine if two lines in R2 meet. (Solution: Two lines L1 and L2 in R2 meet ifand only if their gradients are not proportional.) Similarly, one may want to know iftwo planes in R3 meet, etc.

One can use the following approach to solve problem (3), which generalizes well to higherdimensions.

Let

L1=level set of f(x, y) = a1x + a2y at −a0

L2=level set of g(x, y) = b1x + b2y at −b0

Check if ∇f is proportional to ∇g. (∇f = (a1, a2) and ∇g = (b1, b2).)

L1 and L2 meet if and only if (a1, a2) 6= λ(b1, b2). In other words, det

(a1 a2

b1 b2

)should

not be zero.

There are 3 possibilities for L1 ∩ L2:

(1) L1 ∩ L2 = ∅. They are parallel but not equal, ∇f = λ∇g but L1 6= L2

(2) L1 ∩ L2 = L1 = L2: L1 and L2 are the same.

(3) L1 ∩ L2 is a point: This happens IFF ∇f is not proportional to ∇g IFF

det

(a1 a2

b1 b2

)6= 0.

12

CHAPTER 4 – MULTIPLE INTEGRALS

The objects of this chapter are five-fold. They are:(1) Discuss when scalar-valued functions f can be integrated over closed rectangular

boxes R in Rn; simply put, f is integrable over R iff there is a unique real number,to be denoted IR(f) or

∫R

f when it exists, which is caught between the upperand lower sums relative to any partition P of R.

(2) Show that any continuous function f can be integrated over R.(3) Discuss Fubini’s theorem, which when applicable, allows one to do multiple inte-

grals as iterated integrals, i.e, integrate one variable at a time.(4) Show that bounded functions with negligible sets of discontinuities can be inte-

grated over R.(5) Discuss integrals of continuous functions over general compact sets.

§4.1 Basic notions

We will first discuss the question of integrability of bounded functions on closed rect-angular boxes, and then move on to integration over slightly more general regions.

Recall that in one variable calculus, the integral of a function over an interval [a, b]was defined as the limit, when it exists, of certain sums over finite partitions P of [a, b]as P becomes finer and finer. To try to transport this idea to higher dimensions, we needto generalize the notions of partition and refinement.

In this chapter, R will always denote a closed rectangular box in Rn, written as[a, b] = [a1, b1]× · · · × [an, bn], where aj , bj ,∈ R for all j with aj < bj .

Definition. A partition of R is a finite collection P of subrectangular (closed) boxesS1, S2, . . . , Sr ⊆ R such that

(i) R = ∪rj=1Sj , and

(ii) the interiors of Si and Sj have no intersection for all i 6= j.

Definition. A refinement of a partition P = {Sj}rj=1 of R is another partition P ′ =

{S′k}mk=1 with each S′k contained in some Sj .

It is clear from the definition that given any two partitions P, P ′ of R, we can find athird partition P ′′ which is simultaneously a refinement of P and of P ′.

Now let f be a bounded function on R, and let P = {Sj}rj=1 a partition of R. Then

f is certainly bounded on each Sj , i.e., f(Sj) is a bounded subset of R. It was proved inChapter 2 that every bounded subset of R admits a sup (lowest upper bound) and aninf (greatest lower bound).

Typeset by AMS-TEX

1

2 CHAPTER 4 – MULTIPLE INTEGRALS

Definition. The upper (resp. lower) sum of f over R relative to the partition P ={Sj}r

j=1 is given by

U(f, P ) =r∑

j=1

vol(Sj) sup(f(Sj))

(resp. L(f, P ) =

r∑

j=1

vol(Sj) inf(f(Sj)).)

Here vol(Sj) denotes the volume of Sj . Of course, we have

L(f, P ) ≤ U(f, P )

for all P .More importantly, it is clear from the definition that if P ′ = {S′k}m

k=1 is a refinementof P , then

L(f, P ) ≤ L(f, P ′) and U(f, P ′) ≤ U(f, P ).

PutL(f) = {L(f, P ) | P partition of R} ⊆ R

andU(f) = {U(f, P ) | P partition of R} ⊆ R.

Lemma. L(f) admits a sup, denoted I(f), and U(f) admits an inf, denoted I(f).

Proof. Thanks to the discussion in Chapter 2, all we have to do is show that L(f)(resp. U(f)) is bounded from above (resp. below). So we will be done if we showthat given any two partitions P, P ′ of R, we have L(f, P ) ≤ U(f, P ′) as then L(f)will have U(f, P ′) as an upper bound and U(f) will have L(f, P ) as a lower bound.Choose a third partition P ′′ which refines P and P ′ simultaneously. Then we haveL(f, P ) ≤ L(f, P ′′) ≤ U(f, P ′′) ≤ U(f, P ′). Done. ¤

Definition. A bounded function f on R is integrable iff I(f) = I(f). When such anequality holds, we will simply write I(f) (or IR(f) if the dependence on R needs to bestressed) for I(f) (= I(f)), and call it the integral of f over R. Sometimes we willwrite

I(f) =∫

R

f or∫· · ·

∫

R

f(x1, . . . , xn) dx1 . . . dxn.

Clearly, when n = 1, we get the integral we are familiar with, often written as∫ b1

a1f(x1) dx1.

This definition is hard to understand, and a useful criterion is given by the following

CHAPTER 4 – MULTIPLE INTEGRALS 3

Lemma. f is integrable over R iff for every ε > 0 we can find a partition P of R suchthat

U(f, P )− L(f, P ) < ε.

Proof. If f is integrable, IR(f) is arbitrarily close to the sets of upper and lower sums,and we can certainly find, given any ε > 0, some P such that IR(f)−L(f, P ) < ε/2 andU(f, P )− IR(f) < ε/2. Done in this direction. In the converse direction, since

L(f, P ) ≤ I(f) ≤ I(f) ≤ U(f, P )

for any P , if U(f, P )− L(f, P ) < ε, we must have

I(f)− I(f) < ε.

Since ε is an arbitrary positive number, I(f) must equal I(f), i.e., f is integrable overR.

§4.2 Step functions

The obvious question now is to ask if there are integrable functions. One such exampleis given by the constant function f(x) = c, for all x ∈ R. Then for any partitionP = {Sj}, we have

L(f, P ) = U(f, P ) = c

r∑

j=1

vol(Sj) = c vol(R).

So I(f) = I(f) and∫

Rf = c vol(R).

This can be jazzed up as follows.

Definition. A step function on R is a function f on R which is constant on each ofthe subrectangular boxes Sj of some partition P .

Lemma. Every step function f on R is integrable.

Proof. By definition, there exists a partition P = {Sj}rj=1 of R and scalars {cj} such

that f(x) = cj , if x ∈ Sj . Then, arguing as above, it is clear that for any refinement P ′

of P , we have

L(f, P ′) = U(f, P ′) =r∑

j=1

cj vol(Sj).

Hence, I(f) = I(f). ¤

§4.3 Integrability of continuous functions

The most important bounded functions on R are continuous functions. (Recall fromChapter 1 that every continuous function on a compact set is bounded, and that R iscompact.) The first result of this chapter is given by the following


Theorem. Every continuous function f on a closed rectangular box R is integrable.

Proof. Let S be any closed rectangular box contained in R. Define the span of f on Sto be

spanf (S) = sup(f(S))− inf(f(S)).

A basic result about the span of continuous functions is given by the following:

The Small Span Theorem. For every ε > 0, there exists a partition P = {Sj}rj=1 of

R such that spanf (Sj) < ε, for each j ≤ r.

Let us first see how this implies the integrability of f over R. Recall that, by a Lemmaof section 4.1, it suffices to show that, given any ε > 0, there is a partition P of R suchthat U(f, P ) − L(f, P ) < ε. Now by the small span theorem, we can find a partitionP = {Sj} such that spanf (Sj) < ε′, for all j, where ε′ = ε/ vol(R). Then clearly,

U(f, P )− L(f, P ) < ε′ vol(R) = ε.

Done. ¤It now remains to supply a proof of the small span theorem. We will prove this

by contradiction. Suppose the theorem is false. Then there exists ε0 > 0 such that, forevery partition P = {Sj} of R, spanf (Sj) ≥ ε0 for some j. For simplicity of exposition,we will only treat the case of a rectangle R = [a1, b1] × [a2, b2] in R2. The general caseis very similar, and can be easily carried out along the same lines with a bit of book-keeping. Divide R into four rectangles by subdividing along the bisectors of [a1, b1] and[a2, b2]. Then for one of these four rectangles, call it R1, we must have that for everypartition {Sj} of R1 there is a j so that spanf (Sj) ≥ ε0. Do this again and again, and wefinally end up with an infinite sequence of nested closed rectangles R = R0, R1, R2, . . . ,such that, for every m ≥ 0, the span of f is at least ε0 for any partition of Pm = {Sj,m}of Rm on some Sj,m. Let zm = (xm, ym) denote the southwestern corner of Rm, foreach m ≥ 0. Then the sequence {zm}m≥0 is bounded, and so we may find the leastupper bound (sup) α (resp. β) of xm (resp. ym). Put γ = [α, β]. Then γ ∈ R as thenortheastern corner of R is clearly an upper bound of the zm. Since f is continuous atγ, we can find a non-empty closed rectangular subbox S of R containing γ such thatspanf (S) < ε0. But by construction Rm will have to lie inside S if m is large enough,say for m ≥ m0. This gives a contradiction to the span of f being ≥ ε0 on some openset of every partition of Rm0 . Thus the small span theorem holds for (f, R).

§4.4 Bounded functions with negligible discontinuities

One is very often interested in being able to integrate bounded functions over R whichare continuous except on a very “small” subset. To be precise, we say that a subset Y ofRn has content zero if, for every ε > 0, we can find closed rectangular boxes Q1, . . . , Qm

such that(i) Y ⊆ ∪m

i=1Qi, and(ii)

∑mi=1 vol(Qi) < ε.


Examples. (1) A finite set of points in Rn has content zero. (Proof is obvious!)

(2) Any subset Y of R which contains a non-empty open interval (a, b) does not havecontent zero.

Proof. It suffices to prove that (a, b) has non-zero content for a < b in R. Suppose(a, b) is covered by a finite union of closed intervals Ii, 1 ≤ i ≤ m in R. Then clearly,S :=

∑mi=1 length(Ii) ≥ length(a, b) = b− a. So we can never make S less than b− a.

(3) The line segment L = {x, 0) | a1 < x < b1} in R2 has content zero. (Comparingwith (2), we see that the notion of content is very much dependent on what the ambientspace is, and not just on the set.)

Proof. For any ε > 0, cover L by the single closed rectangle

R ={

(x, y)∣∣∣∣ a1 ≤ x ≤ b1, − ε

4(b1 − a1)≤ y ≤ ε

4(b1 − a1)

}.

Then vol(R) = (b1 − a1) ε2(b1−a1)

= ε2 < ε, and we are done.

The third example leads one to ask if any bounded curve in the plane has contentzero. The best result we can prove here is the following

Proposition. Let ϕ : [a, b] → R be a continuous function. Then the graph Γ of ϕ hascontent zero.

Proof. Note that Γ = {(x, y) ∈ R2 | a ≤ x ≤ b, y = ϕ(x)}. Let ε > 0. By the smallspan theorem, we can find a partition a = t0 < t1 < · · · < tr = b of [a, b] such thatspanϕ([ti−1, ti]) < ε

(b−a) , for every i = 1, . . . , r. Thus the piece of Γ lying between

x = ti−1 and x = ti can be enclosed in a closed rectangle Si of area less than ε(ti−ti−1)(b−a) .

Now consider the collection {Si}1≤i≤r which covers Γ. Then we have

r∑

j=1

area(Sj) <ε

(b− a)

r∑

i=1

(ti − ti−1) = ε. ¤

Theorem. Let f be a bounded function on R which is continuous except on a subset Yof content zero. Then f is integrable on R.

Proof. Let M > 0 be such that |f(x)| ≤ M , for all x ∈ R. Since Y has content zero, wecan find closed subrectangular boxes S1, . . . , Sm of R such that

(i) Y ⊆ ∪mi=1Si, and

(ii)∑m

i=1 vol(Si) < ε4M .


Extend {S1, . . . , Sm} to a partition P = {S1, . . . , Sr}, m < r, of R. Applying the smallspan theorem, we may suppose that Sm+1 . . . , Sr are so chosen that (for each i ≥ m+1)spanf (Si) < ε

2 vol(R) . (We can apply this theorem because f is continuous outside theunion of S1, . . . , Sm.) So we have

U(f, P )− L(f, P ) ≤ 2M

m∑

i=1

vol(Si) +r∑

i=m+1

spanf (Si) vol(Si)

< (2M)(

ε

2M

)+

ε

2 vol(R)

r∑

i=m+1

vol(Si).

But the right hand side is ≤ ε2 + ε

2 = ε, because∑r

i=m+1 vol(Si) ≤ vol(R). ¤

Example. Let R = [0, 1]× [0, 1] be the unit square in R2, and f : R → R the functiondefined by f(x, y) = x + y if x ≤ y and x− y if x ≥ y. Show that f is integrable on R.

Let D = {(x, x) | 0 ≤ x ≤ 1} be the “diagonal” in R. Then D has content zero as it isthe graph of the continuous function ϕ(x) = x, 0 ≤ x ≤ 1. Moreover, f is discontinuousonly on D. So f is continuous on R −D with D of content zero, and consequently f isintegrable on R.

Remark. We can use this theorem to define the integral of a continuous function f onany closed bounded set B in Rn if the boundary of B has content zero. Indeed,in such a case, we may enclose B in a closed rectangular box R and define a function fon R by making it equal f on B and 0 on R−B. Then f will be continuous on all of Rexcept for the boundary of B, which has content zero. So f is integrable on R. Since fis 0 outside B, it is reasonable to set

∫

B

f =∫

R

f .

It is often useful to consider a finer notion than content, called measure. Before givingthis definition recall that a set X is countable iff there is a bijection (or as some wouldsay, one-to-one correspondence, between X and a subset of the set N of natural numbers.Check that Z and Q are countable, while R is not.

A subset Y of Rn is said to have measure zero if, for every ε > 0, we can find acountable collection of closed rectangular boxes Q1, Q2, . . . , Qm, . . . such that

(i) Y ⊆ ∪i≥1Qi, and(ii)

∑i≥1 vol(Qi) < ε.

One can use open rectangular boxes instead of closed ones, and the resulting definitionwill be equivalent.

Examples. (1) A countable set of points in Rn has measure zero.

(2) Any subset Y of R which contains a non-empty open interval (a, b) does not havemeasure zero.


(3) A countable union of lines in [0, 1]× [0, 1] ⊂ R2 has measure zero.

We will state the following result without proof:

Theorem. Let R be a closed rectangular box in Rn, and f a bounded function on Rwhich is continuous except on a subset Y of measure zero. Then f is integrable on R.

§4.5 Fubini’s theorem

So far we have been meticulous in figuring out when a given bounded function f isintegrable on R. But if f is integrable, we have developed no method whatsoever toactually find a way to integrate it except in the really easy case of a step function. Wepropose to ameloriate the situation now by describing a very reasonable and computa-tionally helpful result. We will state it in the plane, but there is a natural analog inhigher dimensions as well. In any case, many of the intricacies of multiple integrationare present already for n = 2, and it is a wise idea to understand this case completely atfirst.

Theorem (Fubini). Let f be a bounded, integrable function on R = [a1, b1]× [a2, b2] ⊆R2. For x in [a1, b1], put A(x) =

∫ b2a2

f(x, y) dy and assume the following

(i) A(x) exists for each x ∈ [a1, b1], i.e., the function y 7→ f(x, y) is integrable on[a2, b2] for any fixed x in [a1, b1];

(ii) A(x) is integrable on [a1, b1].

Then ∫∫

R

f(x, y) dxdy =∫ b1

a1

(∫ b2

a2

f(x, y) dy

)dx.

In other words, once the hypotheses (i) and (ii) are satisfied, we can compute∫

Rf by

performing two 1-dimensional integrals in order. One cannot always reverse the order ofintegration, however, and if one wants to integrate over x first, one needs to assume theobvious analog of the conditions (i), (ii).

Proof. Let P1 = {Bi | 1 ≤ i ≤ `} (resp. P2 = {Cj | 1 ≤ j ≤ m}) be a partition of [a1, b1](resp. [a2, b2]), with Bi, Cj closed intervals in R. Then P = P1 × P2 = {Bi × Cj} is apartition of R. By hypothesis (i), we have

L(fx, P2) ≤ A(x) ≤ U(fx, P2),

where fx is the one-dimensional function y 7→ f(x, y). Then applying hypothesis (ii), weget

L(L(fx, P2), P1) ≤∫ b1

a1

A(x) dx ≤ U(U(fx, P2), P1).


But we have

L(L(fx, P2), P1) =∑

i=1

length(Bi) inf(L(fx, P2)(Bi))

=∑

i=1

length(Bi)m∑

j=1

length(Cj) inf(f(Bi × Cj)) = L(f, P ).

Similarly for the upper sum. Hence L(f, P ) ≤ ∫ b1a1

A(x) dx ≤ U(f, P ). Given any par-tition Q of R, we can find a partition P of the form P1 × P2 which refines Q. ThusL(f,Q) ≤ ∫ b1

a1A(x) dx ≤ U(f,Q), for every partition Q of R. Then by the uniqueness

of∫

Rf , which exists because f is integrable,

∫ b1a1

A(x) dx is forced to be∫

Rf . ¤

Remark. The reason we denote∫ b2

a2f(x, y) dy by A(x) is the following. The double

integral∫∫

Rf(x, y) dxdy is the volume subtended by the graph Γ = {(x, y, f(x, y)) ∈

R3} over the rectangle R. (Note that Γ is a “surface” since f is a function of twovariables.) When we fix x at the same point x0 in [a1, b1], the intersection of the plane{x = x0} with Γ in R3 is a curve, which is none other than the graph Γx0 of fx0 inthe (y, z)-plane shifted to x = x0. The area under Γx0 over the interval [a2, b2] is just∫ b2

a2fx0(y) dy; whence the name A(x0). Note also that as x0 goes from a1 to b1, the whole

volume is swept by the slice of area A(x0).

A natural question to ask at this point is whether the hypotheses (i), (ii) of Fubini’stheorem are satisfied by many functions. The answer is yes, and the prime examples arecontinuous functions.

Theorem. Let f be a continuous function on R = [a1, b1] × [a2, b2] ⊆ R2. Then∫

Rf

can be computed as an iterated integral in either order. To be precise, we have

∫∫

R


a1

[∫ b2

a2

f(x, y) dy

]dx =

∫ b2

a2

[∫ b1

a1

f(x, y) dx

]dy.

Proof. Since f is continuous on (the compact set) R, it is certainly bounded. Let M > 0be such that |f(x, y)| ≤ M . We have also seen that it is integrable. For each x, thefunction y 7→ f(x, y) is integrable on [a2, b2] because of continuity on [a2, b2]. So weget hypothesis (i) of Fubini. To get hypothesis (ii), it suffices to show that A(x) =∫ b2

a2f(x, y) dy is continuous in x. For h small, we have

|A(x + h)−A(x)| =∣∣∣∣∫ b2

a2

(f(x + h, y)− f(x, y)) dy

∣∣∣∣ ≤∫ b2

a2

|f(x + h, y)− f(x, y)| dy.

By the small span theorem we can find a partition {Sj} of R with spanf (Sj) < ε/(b2 −a2). If h is small enough so that (x + h, y) and (x, y) lie in the same box for all y


(which we can achieve since x is fixed and there are only finitely many boxes) we have|f(x+h, y)−f(x, y)| < spanf (Sj) < ε/(b2−a2) where Sj is a box containing both points.Note that this argument also works if (x, y) lies on the vertical boundary between twoboxes: for positive h we land in one box and for negative h in the other. Hence

∫ b2

a2

|f(x + h, y)− f(x, y)| dy < ε

for h sufficiently small. This shows that A(x) is continuous and hence integrable on[a1, b1]. We have now verified both hypotheses of Fubini, and hence

∫∫

R


a1

[∫ b2

a2

f(x, y) dy

]dx.

To prove that∫

Rf is also computable using the iteration in reverse order, all we have to

do is note that by a symmetrical argument, the integral∫ b1

a1f(x, y) dx makes sense and is

continuous in y, hence integrable on [a2, b2]. The Fubini argument then goes through. ¤Remark. We will note the following extension of the theorem above without proof.

Let f be a continuous function on a closed rectangular box R = [a1, b1]×· · ·× [an, bn].Then the integral of f over R is computable as an iterated integral

∫ b1

a1

[· · ·

[∫ bn−1

an−1

[∫ bn

an

f(x1, . . . , xn) dxn

]dxn−1

]. . .

]dx1.

Moreover, we can compute this in any order we want, e.g., integrate over x2 first, thenover x5, then over x1, etc. Note that there are n! possible ways here of permuting theorder of integration.

§4.6 Integration over special regions

Let Z be a compact set in Rn. Since it is bounded, we may enclose it in a closedrectangular box R. If f is a bounded function on Z, we may define an extension f to Rby setting f(x) to be f(x) (resp. 0) for x in Z (resp. in R− Z).

Let us say that f is integrable over Z if f is integrable over R, and put∫

Z

f =∫

R

f .

It is clear that this definition is independent of the choice of R. In §4.4, where weintroduced the notion of content, we remarked that if the boundary of Z had contentzero and if f is continuous, then f would be integrable on R. The same idea easily provesthe following


Theorem. Let Z be a compact subset of Rn such that the boundary of Z has contentzero. Then any function f on Z which is continuous on Z is integrable over Z.

In fact, one can replace content by measure in this Theorem.Now we will analyze the simplest cases of this phenomenon in R2.

Definition. A region of type I in R2 is a set of the form {a ≤ x ≤ b, ϕ1(x) ≤ y ≤ϕ2(x)}, where ϕ1, ϕ2 are continuous functions on [a, b].

A region of type II in R2 is a set of the form {c ≤ y ≤ d, ψ1(y) ≤ x ≤ ψ2(y)},where ψ1, ψ2 are continuous functions on [c, d].

A region of type III in R2 is a subset which is simultaneously of type I and type II.

Remark. Note that a circular region is of type III.

Theorem. Let f be a continuous function on a subset S of R2.(a) Suppose S is a region of type I defined by a ≤ x ≤ b, ϕ1(x) ≤ y ≤ ϕ2(x), with

ϕ1, ϕ2 continuous. Then f is integrable on S and

∫

S

f =∫ b

a

( ∫ ϕ2(x)

ϕ1(x)

f(x, y) dy

)dx.

(b) Suppose S is a region of type II defined by c ≤ y ≤ d, ψ1(y) ≤ x ≤ ψ2(y), withψ1, ψ2 continuous. Then f is integrable on S and

∫

S

f =∫ d

c

( ∫ ψ2(y)

ψ1(x)

f(x, y) dx

)dy.

Proof. We will prove (a) and leave the symmetrical case (b) to the reader.(a) Let R = [a, b] × [c, d], where c, d are chosen so that R contains S. Define f on

R as above (by extension of f by zero outside S). By the Proposition of §4.4, we knowthat the graphs of ϕ1 and ϕ2 are of content zero, since ϕ1, ϕ2 are continuous. Thusthe main theorem of §4.4 implies that f is integrable on R as its set of discontinuities iscontained in the boundary ∂S of S. It remains to prove that

∫S

f (=∫

Rf) is given by the

iterated integral∫ b

a(∫ ϕ2(x)

ϕ1(x)f(x, y) dy) dx. For each x ∈ (a, b), the integral

∫ d

cf(x, y) dy

exists as the set of discontinuities in [c, d] has at most two points. Moreover, the functionx 7→ ∫ d

cf(x, y), dy is integrable on [a, b]. Hence (the proof of) Fubini’s theorem applies

in this context and gives

∫

R

f =∫ b

a

( ∫ d

c

f(x, y) dy

)dx.

Since the inside integral (over y) is none other than∫ ϕ2(x)

ϕ1(x)f(x, y) dy, the assertion of the

theorem follows. ¤


§4.7 Examples

(1) Compute∫

Rf , where R is the closed rectangle [−1, 1] × [2, 3] and f the function

(x, y) 7→ x2y− x cos πy. Since f is continuous on R, we may apply Fubini’s theorem andcompute

∫R

I as the iterated integral

I =∫ 1

−1

( ∫ 3

2

(x2y − x cos πy) dy

)dx.

Recall that in∫ 3

2(x2y − x cosπy) dy, x is treated like a constant, hence equals

x2

∫ 3

2

y dy − x

∫ 3

2

cosπy dy = x2

(32

2− 22

2

)− x

(sin πy

π

)]3

2

=52

x2.

⇒ I =52

∫ 1

−1

x2 dx =52

(x3

3

)]1

−1

=53

.

We could also have computed it in the opposite order to get

I =∫ 3

2

[∫ 1

−1

(x2y − x cos πy) dx

]dy

=∫ 3

2

(y

(x3

3

)]1

−1

− cosπy

(x2

2

)]1

−1

)dy

=∫ 3

2

(2y

3

)dy =

y2

3

]3

2

=53

.

(2) Find the volume of the tetrahedron T in R3 bounded by the planes x = 0, y = 0,z = 0 and x− y − z = −1.

Note first that the base of T is a triangle 4 defined by −1 ≤ x ≤ 0, 0 ≤ y ≤ x + 1.Given any (x, y) in 4, the height of T above it is simply given by z = x− y + 1. Hencewe get by the Theorem of §4.6,

vol(T ) =∫∫

4(x− y + 1) dxdy =

∫ 0

−1

( ∫ x+1

0

(x− y + 1) dy

)dx

=∫ 0

−1

(xy − y2

2+ y

)]x+1

0

dx

=∫ 0

−1

(x + 1)2

xdx =

∫ 1

0

u2

2du =

16

.

(3) Fix a, b > 0, and consider the region S inside the ellipse defined by x2

a2 + y2

b2 = 1 inR2. Compute I =

∫∫S

√a2 − x2 dxdy.


Note that S is a region of type I as we may write it as{−a ≤ x ≤ a, −b

√1− x2

a2≤ y ≤ b

√1− x2

a2

}.

Since the function (x, y) 7→ √a2 − x2 is continuous, we can apply the main theorem of

§4.6. We obtain

I =∫ a

−a

√a2 − x2

( ∫ b−q

1− x2

a2

−bq

1− x2

a2

dy

)dx

=2b

a

∫ a

−a

(a2 − x2) dx =2b

a

(a2x− x3

3

)]a

−a

= 4a2b− 4a2b

3=

8a2b

3.

§4.8 Applications

Let S be a thin plate in R2 with matter distributed with density f(x, y) (= mass/unitarea). The mass of S is given by

m(S) =∫∫

S

f(x, y) dxdy.

The average density ism(S)area

=

∫∫S

f(x, y) dxdy∫∫S

dxdy.

The center of mass of S is given by z = (x, y), where

x =1

m(S)

∫∫

S

x f(x, y) dxdy

andy =

1m(S)

∫∫

S

y f(x, y) dxdy.

When the density is constant, the center of mass is called the centroid of S.Suppose L is a fixed line. For any point (x, y) on S, let δ = δ(x, y) denote the

(perpendicular) distance from (x, y) to L. The moment of inertia about L is givenby

IL =∫∫

S

δ2(x, y) f(x, y) dxdy.

When L is the x-axis (resp. y-axis), it is customary to write Ix (resp. Iy).Note that the center of mass is a linear invariant, while the moment of inertia is

quadratic.An interesting use of the centroid occurs in the computation of volumes of revolutions.

To be precise we have the following


Theorem (Pappus). Let S be a region of type I, i.e., given as {a ≤ x ≤ b, ϕ1(x) ≤y ≤ ϕ2(x)}, with ϕ1, ϕ2 continuous. Suppose that minx ϕ1(x) > 0, so that S lies abovethe x-axis. Denote by V the volume of the solid M obtained by revolving S about thex-axis, and by z = (x, y) the centroid of S. Then

V = 2πy a(S),

where a(S) is the area of S.

Proof. Let Vi denote the volume of the solid obtained by revolving {(x, ϕ1(x) | a ≤ x ≤ b}about the x-axis. Then

Vi = π

∫ b

a

ϕi(x)2 dx.

(This is a result from one-variable calculus.) But clearly, V = V2 − V1. So we have

V = π

∫ b

a

[ϕ2(x)2 − ϕ1(x)2] dx.

On the other hand, we have by the definition of the centroid,

y =1

a(S)

∫∫

S

y dxdy.

Since y is continuous and S a region of type I, we have

y =1

a(S)

∫ b

a

( ∫ ϕ2(x)

ϕ1(x)

y dy

)dx

=1

a(S)

∫ b

a

12 [ϕ2(x)2 − ϕ1(x)2] dx.

The theorem now follows immediately. ¤Examples. (1) Let S be the semi-circular region {−1 ≤ x ≤ 1, 0 ≤ y ≤ √

1− x2}.Compute the centroid of S.

Since S is of type I, we have

a(S) =∫∫

S

dxdy =∫ 1

−1

dx

∫ √1−x2

0

dy

=∫ 1

−1

√1− x2 dx = 2

∫ 1

0

√1− x2 dx.

Put x = sin t, 0 ≤ t ≤ π2 . Then dx = cos t dt and

√1− x2 = cos t. So we get

a(S) = 2∫ π

2

0

cos2 t dt = 2∫ π

2

0

(1 + cos t

2

)dt

= 2[

π

4+

sin 2t

4

]π2

0

]=

π

2.


Of course, we could have directly reasoned by geometry that the area of a semi-circularregion of radius 1 is π

2 .Let z = (x, y) be the centroid.

x =1

a(S)

∫∫

S

x dxdy =2π

∫ 1

−1

( ∫ √1−x2

0

dy

)x dx

=2π

∫ 1

−1

x√

1− x2 dx =2π

∫ π2

−π2

sin t cos2 t dt = 0,

since the integrand is an odd function.Again, the fact that x = 0 can be directly seen by geometry. The key thing is to

compute y. We have

y =2π

∫ 1

−1

dx

( ∫ √1−x2

0

y dy

)=

1π

∫ 1

−1

(1− x2) dx

=1π

(x− x3

3

)]1

−1

=2π− 2

3π=

43π

.

So the centroid of S is (0, 43π ).

(2) Find the volume V of the torus π obtained by revolving about the x-axis a circularregion S of radius r (lying above the x-axis).

The area a(S) is πr2, and the centroid (x, y) is located at the center of S (easy check!).Let b be the distance from the center of S to the x-axis. Then by Pappus’ theorem,

V = 2πy a(S) = 2πb(πr2) = 2π2r2b.

Chapter 5

Line Integrals

A basic problem in higher dimensions is the following. Suppose we are given a function(scalar field) g on Rn and a bounded curve C in Rn, can one define a suitable integral ofg over C? We can try the following at first. Since C is bounded, we can enclose it in aclosed rectangular box R and then define a function g on R to be g on C and 0 outside.Clearly,

∫R

g =∫

Cg, but for n > 1, this integral will usually give only zero because C will in

general have measure zero. (We say in general, because there are space-filling curves.) So wehave to do something different. Before giving the “right” definition, let us review the basicproperties of a curve C in Rn, parametrized by an interval in R.

Recall that a (parametrized) curve C in Rn is the image of a continuous map α : [a, b] →Rn, for some a < b in R. Such a curve is said to be C1 if and only if α′(t) exists andis continuous everywhere on the interval. We like this condition because it implies thatthe curve C has a well defined unit tangent vector T at every point where α′(t) 6= 0, andmoreover, that this T moves continuously as we move along the curve. (If α′(t) 6= 0, we takeT to be α′(t)/||α′(t)||.) In practice, we will be able to allow a finite number of points wherethe curve develops corners. This leads to the definition of a piecewise C1 curve C to be acurve which is a finite union of C1 curves C1, C2, . . . , Cr.

The arc length of a C1 curve C parametrized by α : [a, b] → Rn is defined at any pointt ∈ [a, b] to be

s(t) =

∫ t

a

‖α′(t)‖ dt.

s(b) is called the arc length of the whole curve.

We will now describe the heuristic reason for this definition.

Let Q be a partition of [a, b], i.e., the datum a = t0 < t1 < · · · < tm = b. Let Li denote

1

the line joining α(ti−1) to α(ti) in Rn, ∀ i ≥ 1. Put

`m =m∑

i=1

length(Li),

which equals∑m

i=1 ‖α(ti)− α(ti−1)‖.Clearly, as m is very large, `m provides a good approximation for s(b), and a proper

definition must surely express s(b) as the limit of `m as m tends to infinity.

For each i, we may apply the mean value theorem to α(t) restricted to [ti−1, ti] and finda point zi in that subinterval such that

α(ti)− α(ti−1) = (ti − ti−1)α′(zi).

This yields the equality

`m =m∑

i=1

(ti − ti−1)‖α′(zi)‖.

Since by assumption α′ is continuous, we may let m go to infinity and get

limm→∞

`m =

∫ b

a

‖α′(t)‖ dt.

Examples: (1) C = circle in R2 with radius r and center 0. It is parametrized by α(t) =

(r cos t, r sin t), 0 ≤ t ≤ 2π. Since sin t and cos t are continuously differentiable, C is C1.Moreover, α′(t) = (−r sin t, r cos t), and so ‖α′(t)‖ = r

√(− sin t)2 + (cos t)2 = r. Hence the

arc length of C is∫ 2π

0r dt = 2πr, as expected. (It is always good to do such a simple example

at first, because it tells us that the theory is seemingly on the right track.)

(2) (Helix) Let C in R3 be parametrized by α(t) = (cos t, sin t, t), 0 ≤ t ≤ 4π. Again, α is(certainly) C1, and α′(t) = (− sin t, cos t, 1). So ‖α′(t)‖ =

√(− sin t)2 + (cos t)2 + 1 =

√2,

and the arc length of C is∫ 4π

0

√2 dt = 4π

√2.

(3) Let C be defined by α(t) = (2t, t2, log t) ∈ R3 for t ∈ [14, 2000]. Suppose we need to find

the arc length ` between the points (2, 1, 0) and (4, 4, log 2). Note first that (2, 1, 0) = α(1)and (4, 4, log 2) = α(2). So we have:

` =

∫ 2

1

‖α′(t)‖ dt.

2

Also, α′(t) = (2, 2t, 1t), so ‖α′(t)‖ =

√4 + 4t2 + 1

t2= (2t + 1

t). (Note that since t > 0, we do

not need to take the absolute value while taking the square root.) Hence

` =

∫ 2

1

2t dt +

∫ 2

1

1

tdt = t2

]2

1

+ log t

]2

1

= 3 + log 2.

(4) This example deals with a piecewise C1 curve. Note that the definition of arc length goesover in the obvious way for such curves.

Take C to be the unit (upper) semicircle C1 centered at 0, together with the flat diameterC2, then we can parametrize C1 in the usual way by setting α1(t) = (cos t, sin t), for t in[0, π]. Let us parametrize C2 by a function α2 on [π, 2π] as follows. Write α2(t) = (dt + e, 0)with the requirement α2(π) = (−1, 0) and α2(2π) = (1, 0). Then we need dπ + e = −1 and2dπ + e = 1, which yields d = 2

πand e = −3. So we have: α2(t) = ( 2

πt− 3, 0) for t ∈ [π, 2π].

Clearly, α1 and α2 are both C1, so C is piecewise C1.

Suppose we want to find the arc length `(C) of the whole curve C. It is clear fromgeometry that `(C1) = π and `(C2) = 2, so `(C) = π + 2. We can also derive this fromCalculus. Indeed, ‖α′1(t)‖ = 1 and ‖α′2(t)‖ = ‖ 2

π‖ = 2

π. So

`(C) =

∫ π

0

1 dt +

∫ 2π

π

2

πdt = π + 2.

Now we are ready to give a way to integrate functions (scalar fields) over curves inn-space.

Let g : D → R be a scalar field, with C ⊆ D ⊆ Rn, where C is a C1 curve parametrizedby α : [a, b] → Rn. Then we define the line integral of g over C with respect to arclength to be ∫

C

g ds =

∫ b

a

g(α(t))‖α′(t)‖ dt.

Note that by the definition of s(t), s′(t) is none other than ‖α′(t)‖; so ‖α′(t)‖ dt may bethought of as ds.

There is yet another type of line integral. Suppose f : D → Rn is a vector field withD ⊆ Rn. (Note that the source space and the target space of f have the same dimension.)Then we may define the line integral of f over C to be

∫

C

f · dα =

∫ b

a

f(α(t)) · α′(t) dt.

3

These two integrals are related as follows. Let T denote the unit tangent vector to C at α(t).

Then T = α′(t)‖α′(t)‖ (assuming α′(t) 6= 0). Since ‖α′(t)‖ = s′(t), we can identify T with α′(s)

(= dαds

). Put g(α(t)) = f(α(t)) · T , which defines a scalar field. Then we have

∫

C

f · dα =

∫

C

g(α(t))‖α′(t)‖ dt =

∫

C

g ds.

The following lemma is immediate from the definition.

Lemma. (i)∫

Cf1 · dα +

∫C

f2 · dα =∫

C(f1 + f2) · dα

(ii)∫

C1f · dα1 +

∫C2

f · dα2 =∫

Cf · dα, where C is the union of C1 and C2, parametrized

by α(t) defined by putting together α1 and α2.

It is important to note that each parametrized curve C comes with an orientation (ordirection). This was not important when we calculated the arc length, but it is quiteimportant to be aware of when line integrals of vector fields are involved.

Suppose C is parametrized in two different ways, say by α : [a, b] → Rn and β : [c, d] →Rn. We say that α is equivalent to β, and write α ∼ β, if there exists a change of parameterC1 function u : [a, b] → [c, d] such that

(i) u is onto,

(ii) u′ is never 0, and

(iii) β(u(t)) = α(t), for any t ∈ [a, b].

It is not hard to see that ∫

C

f · dβ = δ

∫

C

f · dα,

where δ is the sign of u′ on [a, b]. For example, δ = −1 when α(t) = β(−t+a+ b), a ≤ t ≤ b.

5.1 An application

Line integrals arise all over the place. We will content ourselves with describing the exampleof a thin wire W in the shape of a C1 curve C parametrized by α : [a, b] → Rn. Suppose themass density of W is given by a function g(x), x = (x1, . . . , xn). Then the total mass ofW is given by

M =

∫

C

g ds.

4

Its center of mass x = (x1, x2, . . . , xn) is given by

xj =1

M

∫

C

xjg ds, 1 ≤ j ≤ n.

When g is a constant field, W is said to be uniform and x is called the centroid.

Suppose L is any fixed line in Rn. Denote by δ(x) the distance from x to L. Then themoment of inertia about L is defined to be

IL =

∫

C

δ2g ds.

Example. Consider the wire W in the shape of C = C1 ∪ C2, with C1 being parametrizedby α1(t) = (cos t, sin t), 0 ≤ t ≤ π, and C2 by α2(t) = (t, 0), −1 ≤ t ≤ 1. Suppose W hasmass density given by g(x, y) =

√x2 + y2. Then g(α1(t)) = 1 and g(α2(t)) = |t|. Also,

‖α′1(t)‖ = 1 and ‖α′2(t)‖ = 1. Thus the total mass is given by

M =

∫ π

0

1 dt−∫ 0

−1

t dt +

∫ 1

0

t dt = π + 1.

The coordinates of the center of mass are given by

x =1

π + 1

(∫ π

0

cos t dt−∫ 0

−1

t2 dt +

∫ 1

0

t2 dt

)= 0,

and

y =1

π + 1

(∫ π

0

sin t dt + 0

)=

2

π + 1.

The distance from (x, y) to the x-axis (resp. y-axis) is simply |y| (resp. |x|). Hence themoment of inertia about the x-axis is

Ix =

∫ π

0

sin2 t dt + 0 =

∫ π

0

(1− cos 2t

2

)dt =

π

2.

The moment of inertia about the y-axis is

Iy =

∫ π

0

cos2 t dt−∫ 0

−1

t3 dt +

∫ 1

0

t3 dt =π

2+

1

2.

5

5.2 Gradient fields

Given a differentiable scalar field g on an open set D in Rn, we may of course consider itsgradient field ∇g. It turns out, as shown below, that under mild hypotheses on ∇g andD, the line integrals

∫C∇g · dα can be evaluated simply. This result is sometimes called

the second fundamental theorem of Calculus for line integrals. Before stating andproving the result, we need to introduce an important concept called connectivity.

An open set S in Rn is (path) connected if and only if we can join any pair of points inS by a piecewise C1 path lying completely in S. For example, open balls (or boxes) and theinterior of an annulus are (path) connected.

Theorem. Let g be a differentiable scalar field with continuous gradient ∇g on an open,(path) connected set D in Rn. Then, for any two points P, Q joined by a piecewise C1 pathC completely lying in D and parametrized by α : [a, b] → Rn with α(a) = P and α(b) = Q,we have ∫

C

∇g · dα = g(Q)− g(P ).

Remark. The stunning thing about this result is that the integral depends only on the endpoints P and Q, and not on the path C connecting them. This is pretty revolutionary, ifyou think about it!

Corollary. Let g,D be as in the Theorem. Then, for any point P in D, and any piecewiseC1 path C connecting P to itself, i.e., with α(a) = P = α(b), we have

∫

C

∇g · dα = 0.

Such a path C whose beginning and end points are the same is called a closed path, andthe line integral of a vector field f over a closed path C is usually denoted

∮C

f · dα.

Proof of Theorem. We will prove it for C a C1 curve, and leave the piecewise C1 extension,which is routine, to the reader. By definition,

∫

C

∇g · dα =

∫ b

a

∇g(α(t)) · α′(t) dt.

Put h(t) = g(α(t)), for all t ∈ [a, b]. Then by the chain rule, which we can apply since bothα and g are differentiable, we have h′(t) = ∇g(α(t)) · α′(t). So

∫

C

∇g · dα =

∫ b

a

h′(t) dt.

6

Note that h′(t) is continuous on (a, b) by the continuity hypothesis on ∇g. For each u ∈[a, b], set G(u) =

∫ u

ah′(t) dt. Then by one variable calculus, G is continuous on [a, b] and

differentiable on (a, b) with G′(u) = h′(u) there. So G − h is constant on (a, b) and hencealso on [a, b] by continuity. Consequently, since G(a) = 0, we have G(b) = g(α(b))− g(α(a)),as needed.

5.3 Criterion for path independence

Let D be an open, connected set in Rn. The natural question raised by the result of theprevious section is to what extent we can characterize vector fields on f on D whose lineintegrals are path independent, i.e., depend only on the end points. Here is the (extremelysatisfying) answer, sometimes called the first fundamental theorem of Calculus for lineintegrals:

Theorem. Let D be a connected, open set in Rn, and let f : D → Rn be a continuous vectorfield. Suppose the line integral of f over any piecewise C1 path C in D depends only on theend points of C. Then there exists a differentiable scalar field ϕ on D, called a potentialfunction of f , such that f = ∇ϕ on D. Moreover, ϕ can be explicitly defined as follows.Fix any point P in D and set

ϕ(x) =

∫ x

P

f · dα,

where the integral is taken over any piecewise C1 path C in D connecting P to x. Then ϕ iswell defined and represents a potential function for f .

Putting the first and second fundamental theorems for line integrals together, we easilyobtain the following useful

Corollary. Let D be an open, connected set in Rn, and let f : D → Rn be a continuousvector field. Then the following properties are equivalent:

(i) f = ∇ϕ, for some potential function ϕ

(ii) The line integrals of f over piecewise C1 curves in D are path independent.

(iii) The line integrals of f over closed, piecewise C1 curves in D are zero.

Definition. A vector field f satisfying any of the (equivalent) properties of Corollary is saidto be conservative.

7

Proof of Theorem. Let ϕ be defined as in Theorem. That it is well defined is a consequenceof the path independence hypothesis for line integrals of f in D. Write f = (f1, f2, . . . , fn),where fj is, for each j ≤ n, the j-th component (scalar) field of f . We have to show: (∀j)

∂ϕ

∂xj

exists and equals fj. (∗)

For each x in D, ∃ r > 0 such that the vector x + hej lies in D whenever |h| ∈ (0, r). Wemay write

ϕ(x + hej)− ϕ(x) =

∫ x+hej

x

f · dα,

where the integral is taken over the line segment C connecting x to x + hej, parametrizedby α(t) = x + thej, for t ∈ [0, 1]. Since α′(t) = hej, we get

ϕ(x + hej)− ϕ(x)

h=

∫ 1

0

f(x + thej) · ej dt,

which equals

∫ 1

0

fj(x + thej) dt =1

h

∫ h

0

fj(x + uej) du

=1

h(g(h)− g(0)),

where g(t) =∫ t

0fj(x + uej) du, ∀ t ∈ (−r, r). Letting h go to zero, we then get the limit

g′(0). Thus ∂ϕ∂xj

(x) exists and equals g′(0). But by construction, g′(0) = fj(x), and so we are

done.

5.4 A necessary condition to be conservative

Not all vector fields f are conservative. It is also quite hard (except in special cases) to checkthat f is conservative. Luckily, we at least have a test which can be used in many cases torule out some f from being conservative.

Theorem. Let f = (f1, . . . , fn) be a C1 vector field on an open, connected set D in Rn.Suppose f is a conservative field. Then we must have (everywhere on D)

∂fi

∂xj

=∂fj

∂xi

, for all i, j.

8

Proof. Suppose f = ∇ϕ, for a potential function ϕ on D, so that fj = ∂ϕ∂xj

, for all j ≤ n.

Since f is differentiable, all the partial derivatives of each fj exist, and we get (∀ i, j)

∂fj

∂xi

=∂

∂xi

(∂ϕ

∂xj

)=

∂2ϕ

∂xi∂xj

,

and∂fi

∂xj

=∂

∂xj

(∂ϕ

∂xi

)=

∂2ϕ

∂xj∂xi

.

Since f is C1, each partial derivative of fi is continuous for every i. Then the mixed partialderivatives ∂2ϕ

∂xi∂xjand ∂2ϕ

∂xj∂xiare also continuous, and must be equal by an earlier theorem.

Examples. (1) Let f : R3 → R3 be the vector field given by f(x, y, z) = (y, x, 0). Denote

by C the oriented curve parametrized by α(t) = (tet, cos t, sin t), for t ∈ [0, π]. Compute theline integral I =

∫C

f · dα.

By definition this line integral I is

∫ π

0

f(α(t)) · α′(t) dt =

∫ π

0

(cos t, tet, 0) · ((1 + t)et,− sin t, cos t) dt

=

∫ π

0

[(1 + t) cos t et − t sin t et] dt.

This integral can be calculated, but with some trouble. It would have been better had wefirst checked to see if f could be conservative, i.e., if there is a function ϕ such that y = ∂ϕ

∂x,

x = ∂ϕ∂y

and 0 = ∂ϕ∂z

. The last equation says that ϕ is independent of z, and the first two can

be satisfied by taking ϕ(x, y, z) = xy, for all (x, y, z) ∈ R3. Thus f = ∇ϕ everywhere onR3, and so by the second fundamental theorem for line integrals, I = ϕ(α(π)) − ϕ(α(0)) =ϕ(πeπ,−1, 0)− ϕ(0, 1, 0) = −πeπ.

(2) (Physics example) Consider the force field f(x, y, z) = (x, y, z) (=xi + yj + zk). Findthe work done in moving a particle along the parabola C = {(x, y, z) ∈ R3 | y = x2, z = 0}from x = −1 to x = 2.

We can parametrize C by the C1 function α(t) = (t, t2, 0). We are interested in those t

9

lying in the interval [−1, 2]. The work done is given by

W =

∫

C

f · dα =

∫ 2

−1

f(α(t)) · α′(t) dt

=

∫ 2

−1

(t, t2, 0) · (1, 2t, 0) dt =

∫ 2

−1

(t + 2t3) dt

=t2

2+

t4

2

]2

−1

= 9.

Alternatively, as in example (1), we could have looked for a potential function ϕ satisfying∂ϕ∂x

= x, ∂ϕ∂y

= y and ∂ϕ∂z

= z. An immediate solution is given by ϕ(x, y, z) = x2+y2+z2

2. So, by

the second fundamental theorem for line integrals,

W = ϕ(α(2))− ϕ(α(−1)) = ϕ(2, 4, 0)− ϕ(−1, 1, 0)

=22 + 42

2− (−1)2 + 12

2= 9.

(3) Let f : R3 → R3 be the vector field sending (x, y, z) to (x2, xy, 1). Determine its lineintegral over the curve C parametrized by α(t) = (1, t, et), for t ∈ [0, 1].

First let us see if f could be conservative. Since f is differentiable with continuous partialderivatives, we can apply the necessary criterion proved in the previous section. Then, for fto be conservative, we would need ∂fi

∂xj=

∂fj

∂xi, for all i, j. Here f1(x, y, z) = x2, f2(x, y, z) = xy

and f3(x, y, z) = 1. (As usual, we are writing (x, y, z) for (x1, x2, x3).) But ∂f1

∂y= 0, while

∂f2

∂x= y. So the criterion fails, and f cannot be conservative. We could have also seen this

directly by trying to find a potential function ϕ(x, y, z). We would have needed ∂ϕ∂x

= x2,∂ϕ∂y

= xy and ∂ϕ∂z

= 1. The first equation gives (by integrating): ϕ(x, y, z) = x3

3+ ψ(y, z),

with ψ independent of x. Then the second equation forces ∂ψ∂y

, which is ∂ϕ∂y

, to equal xy,implying that ψ is not independent of x. So there is no potential function and f is notconservative.

In any case, we can certainly compute the line integral from first principles:∫

C

f · dα =

∫ 1

0

f(1, t, et) · (0, 1, et) dt

=

∫ 1

0

(1, t, 1) · (0, 1, et) dt =

∫ 1

0

(t + et) dt

=

(t2

2+ e2

)]1

0

=1

2+ e− 1 = e− 1

2.

10

Chapter 6

Green’s Theorem in the Plane

0 Introduction

Recall the following special case of a general fact proved in the previouschapter. Let C be a piecewise C1 plane curve, i.e., a curve in R2 definedby a piecewise C1-function

α : [a, b] → R2

with end points α(a) and α(b). Then for any C1 scalar field ϕ defined on aconnected open set U in R2 containing C, we have

∫

C

∇ϕ · dα = ϕ(α(b))− ϕ(α(a)).

In other words, the integral of the gradient of ϕ along C, in the directionstipulated by α, depends only on the difference, measured in the right order,of the values of ϕ on the end points. Note that the two-point set {a, b} isthe boundary of the interval [a, b] in R. A general question which arises is toknow if similar things hold for integrals over surfaces and higher dimensionalobjects, i.e., if the integral of the analog of a gradient, sometimes called anexact differential, over a geometric shape M depends only on the integralof its primitive on the boundary ∂M .

Our object in this chapter is to establish the simplest instance of such aphenomenon for plane regions. First we need some preliminaries.

1

1 Jordan curves

Recall that a curve C parametrized by α : [a, b] → R2 is said to be closed iffα(a) = α(b). It is called a Jordan curve, or a simple closed curve, iff αis piecewise C1 and 1-1 (injective) on the open interval (a, b). Geometrically,this means the curve doesn’t cross itself. Examples of Jordan curves arecircles, ellipses, and in fact all kinds of ovals. The hyperbola defined byα : [0, 1] → R2, x 7→ c

x, is (for any c 6= 0) simple, i.e., it does not intersect

itself, but it is not closed. On the other hand, the clover is closed, butnot simple.

Here is a fantastic result, in some sense more elegant than the ones inCalculus we are trying to establish, due to the French mathematician CamilleJordan; whence the name Jordan curve.

Theorem. Let C be a Jordan curve in R2. Then there exists connected opensets U, V in the plane such that

(i) U, V, C are pairwise mutually disjoint,and

(ii) R2 = U ∪ V ∪ C.

In other words, any Jordan curve C separates the plane into two disjoint,connected regions with C as the common boundary. Such an assertion isobvious for an oval but not (at all) in general. There is unfortunately no waywe can prove this magnificent result in this course. But interested studentscan read a proof in Oswald Veblen’s article, “Theory of plane curves in Non-metrical Analysis situs,” Transactions of the American Math. Society, 6, no.1, 83-98 (1905).

The two regions U and V are called the interior or inside and exterioror outside of C. To distinguish which is which let’s prove

Lemma: In the above situation exactly one of U and V is bounded. Thisis called the interior of C.

Proof. Since [a, b] is compact and α is continuous the curve C = α([a, b])is compact, hence closed and bounded. Pick a disk D(r) of some large radiusr containing C. Then S := R2\D(r) ⊆ U∪V . Clearly S is connected, so anytwo points P, Q ∈ S can be joined by a continuous path β : [0, 1] → S withP = β(0), Q = β(1). We have [0, 1] = β−1(U) ∪ β−1(V ) since S is coveredby U and V . Since β is continuous the sets β−1(U) and β−1(V ) are opensubsets of [0, 1]. If P ∈ U , say, put t0 = sup{t ∈ [0, 1]|β(t) ∈ U} ∈ [0, 1]. If

2

β(t0) ∈ U then, since β−1(U) is open, we find points in a small neighborhoodof t0 mapped to U . If t0 < 1 this would mean we’d find points bigger thant0 mapped into U which contradicts the definition of t0. So if β(t0) ∈ Uthen t0 = 1 and Q = β(1) ∈ U . If, on the other hand, β(t0) ∈ V , thereis an interval of points around t0 mapped to V which also contradicts thedefinition of t0 (we’d find a smaller upper bound in this case). So the onlyconclusion is that if one point P ∈ S lies in U so do all other points Q. ThenS ⊆ U and V ⊆ D(r) so V is bounded.

Recall that a parametrized curve has an orientation. A Jordan curvecan either be oriented counterclockwise or clockwise. We usually orient aJordan curve C so that the interior, V say, lies to the left as we traversethe curve, i.e. we take the counterclockwise orientation. This is also calledthe positive orientation. In fact we could define the counterclockwise orpositive orientation by saying that the interior lies to the left.

Note finally that the term “interior” is a bit confusing here as V is notthe set of interior points of C; but it is the set of interior points of the unionof V and C.

2 Simply connected regions

Let R be a region in the plane whose interior is connected. Then R issaid to be simply connected (or s.c.) iff every Jordan curve C in R canbe continuously deformed to a point without crossing itself in the process.Equivalently, for any Jordan curve C ⊂ R, the interior of C lies completelyin the interior of R.

Examples of s.c. regions:

(1) R = R2

(2) R=interior of a Jordan curve C.

The simplest case of a non-simply connected plane region is theannulus given by {v ∈ R2|c1 < ‖v‖ < c2} with 0 < c1 < c2. The reasonis the hole in the middle which prevents certain Jordan curves from beingcollapsed all the way.

When a region is not simply connected, one often calls it multiply con-nected or m.c. The annulus is often said to be doubly connected because we

3

can cut it into two subregions each of which is simply connected. In general,if a region has m holes, it is said to be (m+1)-connected, for the simple rea-son that we can cut it into m+1, but no smaller, number of simply connectedsubregions.

Sometimes we would need a similar, but more stringent notion. A regionR is said to be convex iff for any pair of points P, Q the line joining P,Qlies entirely in R.

A star-shaped region is simply connected but not convex.

3 Green’s theorem for s.c. plane regions

Recall that if f is a vector field with image in Rn, we can analyze f by itscoordinate fields f1, f2, . . . , fn, which are scalar fields. When n = 2 (resp.n = 3), it is customary notation to use P,Q (resp. P, Q, R) instead of f1, f2

(resp. f1, f2, f3).

Theorem A (Green) Let f = (P, Q) be a C1 vector field on a connected openset Y in R2. Suppose C is a C1 Jordan curve with inside (or interior) Usuch that Φ := C ∪ U lies entirely in Y . Then we have

∫∫

Φ

(∂Q

∂x− ∂P

∂y

)dxdy =

∮

C

f · dα

Here C is the boundary ∂Φ of Φ, and moreover, the integral over C istaken in the positive direction. Note that

f · dα = (P (x, y), Q(x, y)) · (x′(t), y′(t))dt,

and so we are justified in writing

∮

C

f · dα =

∮

C

(Pdx + Qdy).

Given any C1 Jordan curve C with Φ = C ∪ U as above, we can try tofinely subdivide the region using C1 arcs such that Φ is the union of subre-gions Φ1, . . . , Φr with Jordan curves as boundaries and with non-intersectinginsides/interiors, such that each Φj is simultaneously a region of type I and

4

II (see chapter 5). This can almost always be achieved. So we will contentourselves, mostly due to lack of time and preparation on Jordan curves, withproving the theorem only for regions which are of type I and II.

Theorem A follows from the following, seemingly more precise, identities:

(i)∫∫Φ

∂P∂x

dxdy = − ∮C

Pdx

(ii)∫∫Φ

∂Q∂y

dxdy =∮C

Qdy.

Now suppose Φ is of type I, i.e., of the form {a ≤ x ≤ b, ϕ1(x) ≤ y ≤ϕ2(x)}, with a < b and ϕ1, ϕ2 continuous on [a, b]. Then the boundary Chas 4 components given by

C1 : graph of ϕ1(x), a ≤ x ≤ b

C2 : x = b, ϕ1(b) ≤ y ≤ ϕ2(b)

C3 : graph of ϕ2(x), a ≤ x ≤ b

C4 : x = a, ϕ1(a) ≤ y ≤ ϕ2(a).

As C is positively oriented, each Ci is oriented as follows: In C1, traversefrom x = a to x = b; in C2, traverse from y = ϕ1(b) to y = ϕ2(b); in C3, gofrom x = b to x = a; an in C4, go from ϕ2(a) to ϕ1(a).

It is easy to see that

∫

C2

Pdx =

∫

C4

Pdx = 0,

since C2 and C4 are vertical segments allowing no variation in x. Hence wehave

(1)

−∮

C

Pdx = −

∫

C1

Pdx +

∫

C3

Pdx

=

b∫

a

[P (x, φ2(x))− P (x, φ1(x)]dx.

On the other hand, since f is a C1 vector field, ∂P∂y

is continuous, and sinceΦ is a region of type I, we may apply Fubini’s theorem and get

5

(2)∫∫

Φ

∂P

∂ydxdy =

b∫

a

dx

ϕ2(x)∫

ϕ1(x)

∂P

∂y.

Note that x is fixed in the inside integral on the right. We see, by thefundamental theorem of 1-variable Calculus, that

(3)ϕ2(x)∫

ϕ1(x)

∂P

∂y= P (x, ϕ2(x))− P (x, ϕ1(x)).

Putting together (1), (2) and (3), we immediately obtain identity (i).Similarly (ii) can be seen to hold by exploiting the fact that Φ is also of

type II.Hence Theorem A is now proved for a region Φ which is of type I and II.To prove this result for a general region, the basic idea is to cut it into

a finite number of subregions each of which is of type I or of type II. Fora rigorous treatment of the general situation, read Apostol’s MathematicalAnalysis (chap. 10). For a more geometric point of view, look at Spivak’sCalculus on Manifolds.

4 An area formula

The example below will illustrate a very useful consequence of Green’stheorem, namely that the area of the inside of a C1 Jordan curve Ccan be computed as

A =1

2

∮

C

(xdy − ydx). (*)

A proof of this fact is easily supplied by taking the vector field f = (P,Q)in Theorem A to be given by P (x, y) = −y and Q(x, y) = x. Clearly, f isC1 everywhere on R2, and so the theorem is applicable. The identity for Afollows as ∂Q

∂x= −∂P

∂y= 1.

6

Example:

Fix any positive number r, and consider the hypocycloid C parametrizedby

α : [0, 2π] → R2, t 7→ (r cos3 t, r sin3 t).

Then C is easily seen to be a piecewise C1 Jordan curve. Note that it is also

given by x23 + y

23 = r

23 . We have

A =1

2

∮

C

(xdy − ydx) =1

2

2π∫

0

(xy′(t)− yx′(t))dt.

Now, x′(t) = 3r cos2 t(− sin t), and y′(t) = 3r sin2 t cos t. Hence

xy′(t)− yx′(t) = (r cos3 t)(3r sin2 t cos t) + (r sin3 t)(3r cos2 t sin t)

which simplifies to 3r2 sin2 t cos2 t, as sin2 t + cos2 t = 1. So we obtain

A =3r2

2

2π∫

0

(sin 2t

2

)2

dt =3r2

8

2π∫

0

(1− cos 4t

2

)dt,

by using the trigonometric identities sin 2u = 2 sin u cos u and cos 2u = 1 −2 sin2 u. Finally, we then get

A =2r2

16

2π∫

0

(1− cos 4t)dt

=

3r2

16

(t− sin 4t

4

)]2π

0

i.e.,

A =3πr2

8.

5 Green’s theorem for multiply connected

regions

We mentioned earlier that the annulus in the plane is the simplest exampleof a non-simply connected region. But it is not hard to see that we can cutthis region into two pieces, each of which is the interior of a C1 Jordan curve.

7

We may then apply Theorem A of §3 to each piece and deduce statementover the annulus as a consequence. To be precise, pick any point z in R2 andconsider

Φ = Bz(r2)− Bz(r1),

for some real numbers r1, r2 such that 0 < r1 < r2. Here Bz(ri) denotes theclosed disk of radius ri and center z.

Let C1, resp. C2, denote the positively oriented (circular) boundary ofBz(r1), resp. Bz(r2). Let Di be the flat diameter of Bz(ri), i.e., the set{x0−ri ≤ x ≤ x0 +ri, y = y0}, where x0 (resp. y0) denotes the x-coordinate(resp. y-coordinate) of z. Then D2 ∩ Φ splits as a disjoint union of twohorizontal segments D+ = D2 ∩ {x > x0} and D− = D2 ∩ {x < x0}. Denoteby C+

i (resp. C−i ) the upper (resp. lower) half of the circle Ci, for i = 1, 2.

Then Φ = Φ+ ∪ Φ−, where Φ+ (resp. Φ−) is the region bounded by C+ =C+

2 ∪D− ∪ C+1 ∪D+ (resp. C− = C−

2 ∪D+ ∪D−1 ∪D−). We can orient the

piecewise C1 Jordan curves C+ and C− in the positive direction. Let U+, U−

denote the interiors of C+, C−. Then U+ ∩ U− = ∅, and Φ± = C± ∪ U±.Now let f = (P, Q) be a C1-vector field on a connected open set containing

Bz(r2). Then, combining what we get by applying Green’s theorem for s.c.regions to Φ+ and Φ−, we get:

∫∫

Φ

(∂Q

∂x− ∂P

∂y

)dxdy =

∮

C2

(Pdx + Qdy)−∮

C1

(Pdx + Qdy), (*)

where both the line integrals on the right are taken in the positive (counter-clockwise) direction. Note that the minus sign in front of the second lineintegral on the right comes from the need to orient C± positively.

In some sense, this is the key case to understand as any multiply connectedregion can be broken up into a finite union of shapes each of which can becontinuously deformed to an annulus. Arguing as above, we get the following(slightly stronger) result:

Theorem B (Green’s theorem for m.c. plane regions) Let C1, C2, . . . , Cr

be non-intersecting piecewise C1 Jordan curves in the plane with interiorsU1, U2, . . . , Ur such that

(i) U1 ⊃ Ci, ∀i ≥ 2,and

(ii) Ui ∩ Uj = ∅, for all i, j ≥ 2.

8

Put Φ = C1 ∪ U1 − ∪ri=2Ui, which is not simply connected if r ≥ 2. Also

let f = (P, Q) be a C1 vector field on a connected open set S containing Φ.Then we have

∫∫

Φ

(∂Q

∂x− ∂P

∂y

)dxdy =

∮

C1

(Pdx + Qdy)−r∑

i=2

∮

Ci

(Pdx + Qdy)

where each Cj, j ≥ 1 is positively oriented.

Corollary: Let C1, . . . Cr, f be as in Theorem B. In addition, suppose that∂Q∂x

= ∂P∂y

everywhere on S. Then we have

∮

C1

(Pdx + Qdy) =r∑

i=2

∮

Ci

(Pdx + Qdy).

6 The winding number

Let C be an oriented piecewise C1 curve in the plane, and let z = (x0, y0)be a point not lying on C. Then the winding number of C relativeto z is intuitively the number of times C wraps around z in the positivedirection. (If we reverse the orientation of C, the winding number changessign.) Mathematically, this number is defined to be

W (C, z) :=1

2π

∮

C

(−y − y0

r2dx +

x− x0

r2dy

),

where r = ‖(x, y)− z‖ =√

(x− x0)2 + (y − y0)2.When C is parametrized by a piecewise C1 function α : [a, b] → R2, α(t) =

(u(t), v(t)), then it is easy to see that

W (C, z) =1

2π

b∫

a

(u(t)− x0)v′(t)− (v(t)− y0)u

′(t)(u(t)− x0)2 + (v(t)− y0)2

dt

Some write W (α, z) instead of W (C, z).We note the following useful result without proof:

Proposition: Let C be a piecewise C1 closed curve in R2, and let z ∈ R2 bea point not meeting C.

9

(a) W (C, z) ∈ Z.

(b) C Jordan curve ⇒ W (C, z) ∈ {0, 1,−1}.More precisely, in the case of a Jordan curve, W (C, z) is 0 if z is outside C,and it equals ±1 if z is inside C.

The reader is advised to do the easy verification of this Proposition forthe unit circle C. In this case, when z is outside C, the winding number iszero. When it is inside, W (C, z) is 1 or −1 depending on whether or not Cis oriented in the positive (counter-clockwise) direction.

Another fun exercise will be to exhibit, for each integer n, a piecewise C1,closed curve C and a point z not lying on it, such that W (C, z) = n. Ofcourse this cannot happen if C is a Jordan curve if n 6= 0,±1.

Note: If C denotes the curve obtained from C by reversing the orientation,then the mathematical definition does give W (C, z) = −W (C, z).

10

CHAPTER 7

DIV, GRAD, AND CURL

1. The operator ∇ and the gradient:

Recall that the gradient of a differentiable scalar field ϕ on an openset D in Rn is given by the formula:

(1) ∇ϕ =

(∂ϕ

∂x1

,∂ϕ

∂x2

, . . . ,∂ϕ

∂xn

).

It is often convenient to define formally the differential operator invector form as:

(2) ∇ =

(∂

∂x1

,∂

∂x2

, . . . ,∂

∂xn

).

Then we may view the gradient of ϕ, as the notation ∇ϕ suggests,as the result of multiplying the vector ∇ by the scalar field ϕ. Notethat the order of multiplication matters, i.e., ∂ϕ

∂xjis not ϕ ∂

∂xj.

Let us now review a couple of facts about the gradient. For anyj ≤ n, ∂ϕ

∂xjis identically zero on D iff ϕ(x1, x2, . . . , xn) is independent

of xj. Consequently,

(3) ∇ϕ = 0 onD ⇔ ϕ = constant.

Moreover, for any scalar c, we have:

(4) ∇ϕ is normal to the level set Lc(ϕ).

Thus ∇ϕ gives the direction of steepest change of ϕ.

2. Divergence

Let F : D → Rn, D ⊂ Rn, be a differentiable vector field. (Note thatboth spaces are n-dimensional.) Let F1, F2, . . . , Fn be the component(scalar) fields of f . The divergence of F is defined to be

1

2 DIV, GRAD, AND CURL

(5) div(F ) = ∇ · F =n∑

j=1

∂Fj

∂xj

.

This can be reexpressed symbolically in terms of the dot product as

(6) ∇ · F = (∂

∂x1

, . . . ,∂

∂xn

) · (F1, . . . , Fn).

Note that div(F ) is a scalar field.Given any n× n matrix A = (aij), its trace is defined to be:

tr(A) =n∑

i=1

aii.

Then it is easy to see that, if DF denotes the Jacobian matrix of F ,i.e., the n× n-matrix (∂Fi/∂xj), 1 ≤ i, j ≤ n, then

(7) ∇ · F = tr(DF ).

Let ϕ be a twice differentiable scalar field. Then its Laplacian isdefined to be

(8) ∇2ϕ = ∇ · (∇ϕ).

It follows from (1),(5),(6) that

(9) ∇2ϕ =∂2ϕ

∂x21

+∂2ϕ

∂x22

+ · · ·+ ∂2ϕ

∂x2n

.

One says that ϕ is harmonic iff∇2ϕ = 0. Note that we can formallyconsider the dot product

(10) ∇ · ∇ = (∂

∂x1

, . . . ,∂

∂xn

) · ( ∂

∂x1

, . . . ,∂

∂xn

) =n∑

j=1

∂2

∂x2j

.

Then we have

(11) ∇2ϕ = (∇ · ∇)ϕ.

CHAPTER 7 3

Examples of harmonic functions:

(i) D = R2; ϕ(x, y) = ex cos y.Then ∂ϕ

∂x= ex cos y, ∂ϕ

∂y= −ex sin y,

and ∂2ϕ∂x2 = ex cos y, ∂2ϕ

∂y2 = −ex cos y. So, ∇2ϕ = 0.

(ii) D = R2 − {0}. ϕ(x, y) = log(x2 + y2) = 2 log(r).

Then ∂ϕ∂x

= 2xx2+y2 ,

∂ϕ∂y

= 2yx2+y2 ,

∂2ϕ∂x2 = 2(x2+y2)−2x(2x)

(x2+y2)2= −2(x2−y2)

(x2+y2)2, and

∂2ϕ∂y2 = 2(x2+y2)−2y(2y)

(x2+y2)2= 2(x2−y2)

(x2+y2)2. So, ∇2ϕ = 0.

(iii) D = Rn − {0}. ϕ(x1, x2, ..., xn) = (x21 + x2

2 + · · ·+ x2n)α/2 = rα for

some fixed α ∈ R.Then ∂ϕ

∂xi= αrα−1 xi

r= αrα−2xi, and

∂2ϕ∂x2

i= α(α− 2)rα−4xi · xi + αrα−2 · 1.

Hence ∇2φ =∑n

i=1 (α(α− 2)rα−4x2i + αrα−2) = α(α− 2 + n)rα−2.

So φ is harmonic for α = 0 or α = 2− n (α = −1 for n = 3).

3. Cross product in R3

The three-dimensional space is very special in that it admits a vec-tor product, often called the cross product. Let i,j,k denote thestandard basis of R3. Then, for all pairs of vectors v = xi + yj + zkand v′ = x′i + y′j + z′k, the cross product is defined by

(12) v×v′ = det(

i j kx y zx′ y′ z′

)= (yz′−y′z)i−(xz′−x′z)j+(xy′−x′y)k.

Lemma 1. (a) v × v′ = −v′ × v (anti-commutativity)(b) i× j = k, j× k = i, k× i = j(c) v · (v × v′) = v′ · (v × v′) = 0.

Corollary: v × v = 0.

Proof of Lemma (a) v′ × v is obtained by interchanging the secondand third rows of the matrix whose determinant gives v × v′. Thusv′ × v=−v × v′.

(b) i × j = det(

i j k1 0 00 1 0

), which is k as asserted. The other two iden-

tities are similar.(c) v ·(v×v′) = x(yz′−y′z)−y(xz′−x′z)+z(xy′−x′y) = 0. Similarly

for v′ · (v × v′).

Geometrically, v × v′ can, thanks to the Lemma, be interpreted asfollows. Consider the plane P in R3 defined by v,v′. Then v × v′ willlie along the normal to this plane at the origin, and its orientation is


given as follows. Imagine a corkscrew perpendicular to P with its tipat the origin, such that it turns clockwise when we rotate the line Ovtowards Ov′ in the plane P . Then v × v′ will point in the direction inwhich the corkscrew moves perpendicular to P .

Finally the length ||v × v′|| is equal to the area of the parallelogramspanned by v and v′. Indeed this area is equal to the volume of theparallelepiped spanned by v, v′ and a unit vector u = (ux, uy, uz) or-thogonal to v and v′. We can take u = v×v′/||v×v′|| and the (signed)volume equals

det

ux uy uz

x y zx′ y′ z′

=ux(yz′ − y′z)− uy(xz′ − x′z) + uz(xy′ − x′y)

=||v × v′|| · (u2x + u2

y + u2z) = ||v × v′||.

4. Curl of vector fields in R3

Let F : D → R3, D ⊂ R3 be a differentiable vector field. Denote byP ,Q,R its coordinate scalar fields, so that F = P i + Qj + Rk. Thenthe curl of F is defined to be:

(13) curl(F ) = ∇× F = det

(i j k∂

∂x∂

∂y∂∂z

P Q R

).

Note that it makes sense to denote it ∇ × F , as it is formally thecross product of ∇ with f . Explicitly we have

∇×F = (∂R/∂y − ∂Q/∂z) i−(∂R/∂x− ∂P/∂z) j+(∂Q/∂x− ∂P/∂y)k

If the vector field F represents the f low of a fluid, then the curlmeasures how the flow rotates the vectors, whence its name.

Proposition 1. Let h (resp. F ) be a C2 scalar (resp. vector) field.Then

(a): ∇× (∇h) = 0.(b): ∇ · (∇× F ) = 0.

Proof: (a) By definition of gradient and curl,

∇× (∇h) = det

(i j k∂

∂x∂

∂y∂∂z

∂h∂x

∂h∂y

∂h∂z

)

=

(∂2h

∂y∂z− ∂2h

∂z∂y

)i +

(∂2h

∂z∂x− ∂2h

∂x∂z

)j +

(∂2h

∂x∂y− ∂2h

∂y∂x

)k.

CHAPTER 7 5

Since h is C2, its second mixed partial derivatives are independent ofthe order in which the partial derivatives are computed. Thus, ∇ ×(∇fh) = 0.

(b) By the definition of divergence and curl,

∇ · (∇× F ) =

(∂

∂x,

∂

∂y,

∂

∂z

)·(

∂R

∂y− ∂Q

∂z,−∂R

∂x+

∂P

∂z,∂Q

∂x− ∂P

∂y

)

=

(∂2R

∂x∂y− ∂2Q

∂x∂z

)+

(− ∂2R

∂y∂x+

∂2P

∂y∂z

)+

(∂2Q

∂z∂x− ∂2P

∂z∂y

).

Again, since F is C2, ∂2R∂x∂y

= ∂2R∂y∂x

, etc., and we get the assertion.¤

Warning: There exist twice differentiable scalar (resp. vector)fields h (resp. F ), which are not C2, for which (a) (resp. (b)) does nothold.

When the vector field F represents fluid flow, it is often called ir-rotational when its curl is 0. If this flow describes the movement ofwater in a stream, for example, to be irrotational means that a smallboat being pulled by the flow will not rotate about its axis. We willsee later that the condition ∇ × F = 0 occurs naturally in a purelymathematical setting as well.

Examples: (i) Let D = R3 − {0} and F (x, y, z) = y(x2+y2)

i− x(x2+y2)

j.

Show that F is irrotational. Indeed, by the definition of curl,

∇× F = det

(i j k∂

∂x∂

∂y∂∂z

y

(x2+y2)

−x

(x2+y2)0

)

=∂

∂z

(x

x2 + y2

)i+

∂

∂z

(y

x2 + y2

)j+

(∂

∂x

( −x

x2 + y2

)− ∂

∂y

(y

x2 + y2

))k

=

[−(x2 + y2) + 2x2

(x2 + y2)2− (x2 + y2)− 2y2

(x2 + y2)2

]k = 0.

(ii) Let m be any integer 6= 3, D = R3 − {0}, and

F (x, y, z) = 1rm (xi + yj + zk), where r =

√x2 + y2 + z2. Show that F

is not the curl of another vector field. Indeed, suppose F = ∇×G.Then, since F is C1, G will be C2, and by the Proposition provedabove, ∇ · F = ∇ · (∇×G) would be zero. But,

∇ · F =

(∂

∂x,

∂

∂y,

∂

∂z

)·( x

rm,

y

rm,

z

rm

)


=rm − 2x2(m

2)rm−2

r2m+

rm − 2y2(m2)rm−2

r2m+

rm − 2z2(m2)rm−2

r2m

=1

r2m

(3rm −m(x2 + y2 + z2)rm−2

)=

1

rm(3−m).

This is non-zero as m 6= 3. So F is not a curl.

Warning: It may be true that the divergence of F is zero, but Fis still not a curl. In fact this happens in example (ii) above if weallow m = 3. We cannot treat this case, however, without establishingStoke’s theorem.

5. An interpretation of Green’s theorem via the curl

Recall that Green’s theorem for a plane region Φ with boundarya piecewise C1 Jordan curve C says that, given any C1 vector fieldG = (P,Q) on an open set D containing Φ, we have:

(14)

∫ ∫

Φ

(∂Q

∂x− ∂P

∂y

)dx dy =

∮

C

P dx + Qdy.

We will now interpret the term ∂Q∂x− ∂P

∂y. To do that, we think of

the plane as sitting in R3 as {z = 0}, and define a C1 vector field F

on D := {(x, y, z) ∈ R3|(x, y) ∈ D} by setting F (x, y, z) = G(x, y) =P i+Qj. We can interpret this as taking values in R3 by thinking of its

value as P i + Qj + 0k.Then ∇× F = det

(i j k∂

∂x∂

∂y∂∂z

P Q 0

)=

(∂Q∂x− ∂P

∂y

)k,

because ∂P∂z

= ∂Q∂z

= 0. Thus we get:

(15) (∇× F ) · k =∂Q

∂x− ∂P

∂y.

And Green’s theorem becomes:

Theorem 1.∫∫

Φ(∇× F ) · k dx dy =

∮C

P dx + Qdy

6. A criterion for being conservative via the curl

A consequence of the reformulation above of Green’s theorem usingthe curl is the following:

Proposition 1. Let G : D → R2, D ⊂ R2 open and simply con-nected, G = (P, Q), be a C1 vector field. Set F (x, y, z) = G(x, y), forall (x, y, z) ∈ R3 with (x, y) ∈ D. Suppose ∇ × F = 0. Then G isconservative on D.

CHAPTER 7 7

Proof: Since ∇× F = 0, the reformulation in section 5 of Green’stheorem implies that

∮C

P dx + Q dy = 0 for all Jordan curves Ccontained in D. QED

Example: D = R2 − {(x, 0) ∈ R2 | x ≤ 0}, G(x, y) = yx2+y2 i− x

x2+y2 j.

Determine if G is conservative on D:

Again, define F (x, y, z) to be G(x, y) for all (x, y, z) in R3 such that(x, y) ∈ D. Since G is evidently C1, F will be C1 as well. By theProposition above, it will suffice to check if F is irrotational, i.e.,∇ × F = 0, on D × R. This was already shown in Example (i) ofsection 4 of this chapter. So G is conservative.

Chapter 8

Change of Variables, Parametrizations, SurfaceIntegrals

§0. The transformation formula

In evaluating any integral, if the integral depends on an auxiliary function of thevariables involved, it is often a good idea to change variables and try to simplify theintegral. The formula which allows one to pass from the original integral to the newone is called the transformation formula (or change of variables formula). Itshould be noted that certain conditions need to be met before one can achieve this,and we begin by reviewing the one variable situation.

LetD be an open interval, say (a, b), in R , and let ϕ : D → R be a 1-1 , C1 mapping(function) such that ϕ′ 6= 0 on D. Put D∗ = ϕ(D). By the hypothesis on ϕ, it’s eitherincreasing or decreasing everywhere on D. In the former case D∗ = (ϕ(a), ϕ(b)), andin the latter case, D∗ = (ϕ(b), ϕ(a)). Now suppose we have to evaluate the integral

I =

b∫

a

f(ϕ(u))ϕ′(u) du,

for a nice function f. Now put x = ϕ(u), so that dx = ϕ′(u) du. This change ofvariable allows us to express the integral as

I =

ϕ(b)∫

ϕ(a)

f(x) dx = sgn(ϕ′)∫

D∗f(x) dx,

where sgn(ϕ′) denotes the sign of ϕ′ on D. We then get the transformation formula

∫

D

f(ϕ(u))|ϕ′(u)| du =

∫

D∗f(x) dx

This generalizes to higher dimensions as follows:

Theorem Let D be a bounded open set in Rn, ϕ : D → Rn a C1, 1-1 mappingwhose Jacobian determinant det(Dϕ) is everywhere non-vanishing on D, D∗ = ϕ(D),and f an integrable function on D∗. Then we have the transformation formula

∫· · ·

∫

D

f(ϕ(u))| det Dϕ(u)| du1... dun =

∫· · ·

∫

D∗f(x) dx1... dxn.

1

Of course, when n = 1, det Dϕ(u) is simply ϕ′(u), and we recover the old formula.This theorem is quite hard to prove, and we will discuss the 2-dimensional case indetail in §1. In any case, this is one of the most useful things one can learn in Calculus,and one should feel free to (properly) use it as much as possible.

It is helpful to note that if ϕ is linear, i.e., given by a linear transformationwith associated matrix M , then Dϕ(u) is just M . In other words, the Jacobiandeterminant is constant in this case. Note also that ϕ is C1, even C∞, in this case,and moreover ϕ is 1− 1 iff M is invertible, i.e., has non-zero determinant. Hence ϕ is1− 1 iff det Dϕ(u) 6= 0. But this is special to the linear situation. Many of the caseswhere change of variables is used to perform integration are when ϕ is not linear.

§1. The formula in the plane

LetD be an open set in R2. We will call a mapping ϕ : D → R2 as above primitiveif it is either of the form

(P1) (u, v) → (u, g(u, v))

or of the form

(P2) (u, v) → (h(u, v), v),

with ∂g/∂v, ∂h/∂u nowhere vanishing on D. (If ∂g/∂v or ∂h/∂u vanishes at a finiteset of points, the argument below can be easily extended.)

We will now prove the transformation formula when ϕ is a composition of twoprimitive transformations, one of type (P1) and the other of type (P2).

For simplicity, let us assume that the functions ∂g/∂v(u, v) and ∂h/∂u(u, v) arealways positive. (If either of them is negative, it is elementary to modify the argu-ment.) Put D1 = {(h(u, v), v)|(u, v) ∈ D} and D∗ = {(x, g(x, v))|(x, v) ∈ D1}. Byhypothesis, D∗ = ϕ(D).

Enclose D1 in a closed rectangle R, and look at the intersection P of D1 with apartition of R, which is bounded by the lines x = xm,m = 1, 2, ..., and v = vr, r =1, 2, ..., with the subrectangles Rmr being of sides 4x = l and 4v = k. Let R∗,respectively R∗

mr, denote the image of R, respectively Rmr, under (u, v) → (u, g(u, v)).Then each R∗

mr is bounded by the parallel lines x = xm and x = xm + l and by thearcs of the two curves y = g(x, vr) and y = g(x, vr + k). Then we have

area(R∗mr) =

xm+l∫

xm

(g(x, vr + k)− g(x, vr)) dx.

By the mean value theorem of 1-variable integral calculus, we can write

area(R∗mr) = l[g(x′m, vr + k)− g(x′m, vr)],

2

for some point x′m in (xm, xm+l). By the mean value theorem of 1-variable differentialcalculus, we get

area(R∗mr) = lk

∂g

∂v(x′m, v′r),

for some x′m ∈ (xm, xm + l) and v′r ∈ (vr, vr + k). So, for any function f which isintegrable on D∗, we obtain

∫∫

D∗f(x, y) dx dy = lim

P

∑m,r

klf(x′m, g(x′m, v′r))∂g

∂v(x′m, v′r).

The expression on the right tends to the integral∫∫D1

f(x, g(x, v))∂g∂v

(x, v) dx dv. Thus

we get the identity

∫∫D∗

f(x, y) dx dy =∫∫D1


(x, v) dx dv

By applying the same argument, with the roles of x, y (respectively g, h) switched,we obtain

∫∫D1


(x, v) dx dy =∫∫D

f(h(u, v), g(h(u, v), v))∂g∂v

(h(u, v), v)∂h∂u

(u, v) du dv

Since ϕ = g ◦ h we get by chain rule that

det Dϕ(u, v) =∂h

∂u(u, v)

∂g

∂v(h(u, v), v),

which is by hypothesis > 0. Thus we get

∫∫D∗

f(x, y) dx dy =∫∫D

f(ϕ(u, v))| det Dϕ(u, v)| du dv

as asserted in the Theorem.How to do the general case of ϕ? The fact is, we can subdivide D into a finite union

of subregions, on each of which ϕ can be realized as a composition of primitive trans-formations. We refer the interested reader to chapter 3, volume 2, of ”Introductionto Calculus and Analysis” by R. Courant and F. John.

§2. Examples

3

(1) Let Φ = {(x, y) ∈ R2|x2+y2 ≤ a2}, with a > 0. We know that A = area(Φ) = πa2.But let us now do this by using polar coordinates. Put

D = {(r, θ) ∈ R2|0 < r < a, 0 < θ < 2π},

and define ϕ : D → Φ byϕ(r, θ) = (r cos θ, r sin θ).

Then D is a connected, bounded open set, and ϕ is C1, with image D∗ which is thecomplement of a negligible set in Φ; hence any integration over D∗ will be the sameas doing it over Φ. Moreover,

∂ϕ

∂r= (cos θ, sin θ) and

∂ϕ

∂θ= (−r sin θ, r cos θ).

Hence

det(Dϕ) = det

(cos θ −r sin θsin θ r cos θ

)= r,

which is positive on D. So ϕ is 1− 1.By the transformation formula, we get

A =

∫∫

Φ

dx dy =

∫∫

D

| det Dϕ(r, θ)| dr dθ =

a∫

0

2π∫

0

r dr dθ =

=

a∫

0

r dr

2π∫

0

dθ = 2π

a∫

0

r dr = 2πr2

2

]a

0

= πa2.

We can justify the iterated integration above by noting that the coordinate func-tion (r, θ) → r on the open rectangular region D is continuous, thus allowing us touse Fubini.

(2) Compute the integral I =∫ ∫

Rxdxdy where R is the region {(r, φ) | 1 ≤ r ≤

2, 0 ≤ φ ≤ π/4}.

We have I =∫ 2

1

∫ π/4

0r cos(φ)rdrdφ = sin(φ)]π/4

0 · r3

3

]2

1=

√2

2(8/3− 1/3) = 7

√2

6.

(3) Let Φ be the region inside the parallelogram bounded by y = 2x, y = 2x−2, y =x and y = x + 1. Evaluate I =

∫∫Φ

xy dx dy.

The parallelogram is spanned by the vectors (1, 2) and (2, 2), so it seems reasonableto make the linear change of variable

(xy

)=

(1 22 2

)(uv

)

4

since then we’ll have to integrate over the box [0, 1]× [0, 1] in the u− v-region. TheJacobian matrix of this transformation is of course constant and has determinant −2.So we get

I =

∫ 1

0

∫ 1

0

(u + 2v)(2u + 2v)| − 2|dudv = 2

∫ 1

0

∫ 1

0

(2u2 + 6uv + 4v2)dudv

= 2

∫ 1

0

2u3/3 + 3u2v + 4v2u]1

0dv = 2

∫ 1

0

(2/3 + 3v + 4v2)dv

= 2(2v/3 + 3v2/2 + 4v3/3]1

0) = 2(2/3 + 3/2 + 4/3) = 7.

(4) Find the volume of the cylindrical region in R3 defined by

W = {(x, y, z)|x2 + y2 ≤ a2, 0 ≤ z ≤ h},

where a, h are positive real numbers. We need to compute

I = vol(W ) =

∫∫∫

W

dx dy dz.

It is convenient here to introduce the cylindrical coordinates given by the trans-formation

ϕ : D → R3

given by

ϕ(r, θ, z) = (r cos θ, r sin θ, z), where D = {0 < r < a, 0 < θ < 2π, 0 < z < h}.

It is easy to see ϕ is 1-1, C1 and onto the interior D∗ of W. Again, since the boundaryof W is negligible, we may just integrate over D∗. Moreover,

| det Dϕ| = | det

cos θ −r sin θ 0sin θ r cos θ 0

0 0 1

| = r

By the transformation formula,Since the function (r, θ, z) → r is continuous on D, we may apply Fubini and

obtain

I =

h∫

0

a∫

0

2π∫

0

r dr dθ dz = πa2h,

which is what we expected.

(5) Let W be the unit ball B0(1) in R3 with center at the origin and radius 1.Evaluate

5

I =

∫∫∫

W

e(x2+y2+z2)3/2

dx dy dz.

Here it is best to use the spherical coordinates given by the transformation

ψ : D → R3, (ρ, θ, φ) → (ρ sin φ cos θ, ρ sin φ sin θ, ρ cos φ),

where D = {0 < ρ < 1, 0 < θ < 2π, 0 < φ < π}. Then ψ is C1, 1-1 and onto W minusa negligible set (which we can ignore for integration). Moreover

det(Dψ) = det

sin φ cos θ −ρ sin φ sin θ ρ cos φ cos θsin φ sin θ ρ sin φ cos θ ρ cos φ sin θ

cos φ 0 −ρ sin φ

=

= cos φ det

(−ρ sin φ sin θ ρ cos φ cos θρ sin φ cos θ ρ cos φ sin θ

)− ρ sin φ det

(sin φ cos θ −ρ sin φ sin θsin φ sin θ ρ sin φ cos θ

)=

= −ρ2(cos φ)2 sin φ− ρ2(sin φ)3 = −ρ2 sin φ.

Note that sin φ is > 0 on (0, π). Hence | det(Dψ)| = ρ2 sin φ, and we get (by thetransformation formula):

I =

1∫

0

π∫

0

2π∫

0

eρ3

ρ2 sin φ dρ dφ dθ =

1∫

0

eρ3

ρ2 dρ

π∫

0

sin φ dφ

2π∫

0

dθ =

The function (ρ, θ, φ) → eρ3ρ2 sin φ is continuous on D, and so we may apply

Fubini and perform ietrated integration (on D). We obtain

= 2π

1∫

0

eρ3

ρ2 dρ (− cos φ)|π0 =4π

3

1∫

0

eu du,

where u = ρ3

=4π

3eu

]1

0

=4π

3(e− 1).

§3. Parametrizations

Let n, k be positive integers with k ≤ n. A subset Φ of Rn is called a parametrizedk-fold iff there exist a connected region T in Rk together with a C1, 1-1 mapping

ϕ : T → Rn, u → (x1(u), x2(u), ..., xn(u)),

such that ϕ(T ) = Φ.

6

It is called a parametrized surface when k = 2, and a parametrized curve whenk = 1.Example: Let T = {(u, v) ∈ R2|0 ≤ u < 2π,−π

2≤ v < π

2}. Fix a positive number

a, and defineϕ : T → R3 by ϕ(u, v) = (x(u, v), y(u, v), z(u, v)),

where

x(u, v) = a cos u cos v, y(u, v) = a sin u cos v, and z(u, v) = a sin v.

Then

x(u, v)2 + y(u, v)2 + z(u, v)2 = a2(cos2 u cos2 v + sin2 u cosv + sin2 v) =

= a2(cos2 v + sin2 v) = a2.

Also, given (x, y, z) ∈ R3 such that x2 + y2 + z2 = a2 and (x, y, z) 6= (0, 0,±1), wecan find u, v ∈ T such that x = x(u, v), y = y(u, v) and z = z(u, v). So we see thatϕ is C1, 1-1 mapping onto the standard sphere S0(a) of radius a in R3, minus 2points. If we want to integrate over the sphere, removing those two points doesn’tmake a difference because they form a set of content zero.

§4. Surface integrals in R3

Let Φ be a parametrized surface in R3, given by a C1, 1− 1 mapping

ϕ : T → R3, T ⊂ R2, ϕ(u, v) = (x(u, v), y(u, v), z(u, v)).

(When we say ϕ ∈ C1, we mean that it is so on an open set containing T.) Letξ = (x, y, z) be a point on Φ. Consider the curve C1 on Φ passing through ξ on whichv is constant. Then the tangent vector to C1 at ξ is simply given by ∂ϕ

∂u(ξ). Similarly,

we may consider the curve C2 on Φ passing through ξ on which u is constant. Thenthe tangent vector to C2 at ξ is given by ∂ϕ

∂v(ξ). So the surface Φ has a tangent

plane JΦ(ξ) at ξ iff ∂ϕ∂u

and ∂ϕ∂v

are linearly independent there. From now on we willassume this to be the case (see also the discussion of tangent spaces in Ch. 4 of theclass notes). When this happens at every point, we call Φ smooth. (In fact, forintegration purposes, it suffices to know that ∂ϕ

∂uand ∂ϕ

∂vare independent except at a

set {ϕ(u, v)} of content zero.)By the definition of the cross product, there is a natural choice for a normal

vector to Φ at ξ given by:

∂ϕ

∂u(ξ)× ∂ϕ

∂v(ξ).

Definition : Let f be a bounded scalar field on the parametrized surface Φ. Thesurface integral of f over Φ, denoted

∫∫Φ

f dS, is given by the formula

∫∫

Φ

f dS =

∫∫

T

f(ϕ(u, v))||∂ϕ

∂u(ξ)× ∂ϕ

∂v(ξ)|| du dv.

7

We say that f is integrable on Φ if this integral converges. An important specialcase is when f = 1. In this case, we get

area(Φ) =∫∫T

||∂ϕ∂u

(ξ)× ∂ϕ∂v

(ξ)|| du dv

Note that this formula is similar to that for a line integral in that we have to put ina scaling factor which measures how the parametrization changes the (infinitesimal)length of the curve (resp. area of the surface). Note also that unlike the case of curvesthis formula only covers the case of surfaces in R3 rather than in a general Rn.

Example: Find the area of the standard sphere S = S0(a) in R3 given byx2 + y2 + z3 = a2, with a > 0. Recall the parametrization of S from above given by

ϕ : T → R3 , ϕ(u, v) = (x(u, v), y(u, v), z(u, v)),

T = {(u, v) ∈ R2|0 ≤ u < 2π,−π

2≤ v <

π

2},

x(u, v) = a cos u cos v, y(u, v) = a sin u cos v, z(u, v) = a sin v.

So we have

∂ϕ

∂u= (−a sin u cos v, a cos u cos v, 0) and

∂ϕ

∂v= (−a cos u sin v,−a sin u sin v, a cos v).

⇒ ∂ϕ

∂u× ∂ϕ

∂v= det

i j k−a sin u cos v a cos u cos v 0−a cos u sin v −a sin u sin v a cos v

=

= det

(a cos u cos v 0−a sin u sin v a cos v

)i− det

(−a sin u cos v 0−a cos u sin v a cos v

)j+

+ det

(−a sin u cos v a cos u cos v−a cos u sin v −a sin u sin v

)k =

= a2 cos u cos2 vi + a sin u cos2 vj + a2(sin2 u sin v cos v + cos2 u sin v cos v)k =

= a2 cos u cos2 vi + a sin u cos2 vj + a2 sin v cos vk.

⇒ ||∂ϕ

∂u× ∂ϕ

∂v|| = a2(cos2 u cos4 v + sin2 u cos4 v + sin2 v cosv)1/2 =

= a2(cos4 v + sin2 v cos2 v)1/2 = a2| cos v|.

8

⇒ area(S) = a2

2π∫

0

du

π/2∫

−π/2

| cos v| dv = 2πa2

π/2∫

−π/2

cos v dv

⇒ area(S) = 4πa2.

Here is a useful result:

Proposition: Let Φ be a surface in R3 parametrized by a C1, 1-1 function

ϕ : T → R3 of the form ϕ(u, v) = (u, v, h(u, v)).

In other words, Φ is the graph of z = h(x, y). Then for any integrable scalar field fon Φ, we have

∫∫

Φ

f dS =

∫∫

T

f(u, v, h(u, v))

√(∂h

∂u)2 + (

∂h

∂v)2 + 1 du dv.

Proof.

∂ϕ

∂u= (1, 0,

∂h

∂u) and

∂ϕ

∂v= (0, 1,

∂h

∂v).

⇒ ∂ϕ

∂u× ∂ϕ

∂v= det

i j k1 0 ∂h

∂u

0 1 ∂h∂v

= −∂h

∂ui− ∂h

∂vj + k.

⇒ ||∂ϕ

∂u× ∂ϕ

∂v|| =

√(∂h

∂u)2 + (

∂h

∂v)2 + 1.

Now the assertion follows by the definition of∫∫Φ

f dS.

Example. Let Φ be the surface in R3 bounded by the triangle with vertices (1,0,0),(0,1,0) and (0,0,1). Evaluate the surface integral

∫∫Φ

x dS.

Note that Φ is a triangular piece of the plane x + y + z = 1. Hence Φ is parametrizedby

ϕ : T → R3 , ϕ(u, v) = (u, v, h(u, v)),

where h(u, v) = 1− u− v, and T = {0 ≤ v ≤ 1− u, 0 ≤ u ≤ 1}.

∂h

∂u= −1 ,

∂h

∂v= −1, and

√(∂h

∂u)2 + (

∂h

∂v)2 + 1 =

√1 + 1 + 1 =

√3.

9

By the Proposition above, we have:

∫∫

Φ

x dS =√

3

1∫

0

1−u∫

0

u du dv =√

3

1∫

0

u(1− u) du =

√3

6.

There is also a notion of an integral of a vector field over a surface. As in thecase of line integrals this is in fact obtained by integrating a suitable projection ofthe vector field (which is then a scalar field) over the surface. Whereas for curves thenatural direction to project on is the tangent direction, for a surface in R3 one usesthe normal direction to the surface.

Note that a unit normal vector to Φ at ξ = ϕ(u, v) is given by

n =∂ϕ∂u

(ξ)× ∂ϕ∂v

(ξ)

||∂ϕ∂u

(ξ)× ∂ϕ∂v

(ξ)||

and that n = n(u, v) varies with (u, v) ∈ T . This defines a unit vector field on Φcalled the unit normal field.

Definition: Let F be a vector field on Φ. Then the surface integral of F overΦ, denoted

∫∫Φ

F · n ds, is defined by

∫∫

Φ

F · n dS =

∫∫

T

F (ϕ(u, v)) · n(u, v)||∂ϕ

∂u× ∂ϕ

∂v|| du dv.

Again, we say that F is integrable over Φ if this integral converges.

By the definition of n, we have:

∫∫

Φ

F · n dS =

∫∫

T

F (ϕ(u, v)) · (∂ϕ

∂u× ∂ϕ

∂v) du dv.

As in the case of the line integral there is a notation for this integral which doesn’tmake explicit reference to the parametrization ϕ but only to the coordinates (x, y, z)of the ambient space. If F = (P,Q, R) we write

∫∫

Φ

F · n dS =

∫∫

Φ

Pdy ∧ dz + Qdz ∧ dx + Rdx ∧ dy. (1)

Here the notation v ∧ w indicates a product of vectors which is bilinear (i.e.(λ1v1 + λ2v2) ∧ w = λ1(v1 ∧ w) + λ2(v1 ∧ w) and similarly in the other variable) andantisymmetric (i.e. v∧w = −w∧v, in particular v∧v = 0). In this sense ∧ is similarto × on R3 except that v ∧ w does not lie in the same space where v and w lie. Onthe positive side v∧w can be defined for vectors v, w in any vector space. After theselengthy remarks let’s see how this formalism works out in practice. If the surface Φ is

10

parametrized by a function ϕ(u, v) = (x(u, v), y(u, v), z(u, v)) then dy = ∂y∂u

du + ∂y∂v

dvand by the properties of ∧ outlined above we have

dy ∧ dz = (∂y

∂u

∂z

∂v− ∂y

∂v

∂z

∂u)du ∧ dv

which is the x-coordinate of ∂ϕ∂u× ∂ϕ

∂v. So the equation (1) does indeed hold if we

interpret an integral∫

Tfdu ∧ dv (over some region T in R2) as an ordinary double

integral∫

Tfdudv whereas

∫T

fdv ∧ du equals − ∫T

fdudv. All of this suggests thatone should give meaning to dx, dy, dz as elements of some vector space (instead ofbeing pure formal as in the definition of multiple integrals). This can be done and isin fact necessary to a complete development of integration in higher dimensions andon spaces more general than Rn.

11

Chapter 9

The Theorems of Stokes andGauss

1 Stokes’ Theorem

This is a natural generalization of Green’s theorem in the plane to parametrizedsurfaces S in 3-space with boundary the image of a Jordan curve. We saythat S is smooth if every point on it admits a tangent plane.

Theorem (Stokes) Let S be a smooth surface in R3 parametrized by a C2,1 − 1 function ϕ : T → R3, (u, v) 7→ (x(u, v), y(u, v), z(u, v)), where T isa plane region bounded by a piecewise C1 Jordan curve Γ. Denote by C theimage of Γ under ϕ. Let F = (P, Q, R) be a C1 vector field on an open setcontaining S and C. Then we have

∫∫

S

(∇× F) · ndS =

∮

C

F · dα, (*)

where the line integral is taken in the direction inherited via ϕ from thepositive direction of Γ.

Proof. First note that both sides of (*) are linear in F. Since F = P i+Qj+Rk, it suffices to show the following:(i)

∫∫S

(∇× P i) · ndS =∮C

Pdx,

(ii)∫∫S

(∇×Qj) · ndS =∮C

Qdy,

and

1

(iii)∫∫S

(∇×Rk) · ndS =∮C

Rdz.

We will establish (i) and leave the (very similiar) cases (ii) and (iii) as(straight-forward) exercises for the reader.

So let F = P i. Then

(1) ∇× F = det

i j k∂/∂x ∂/∂y ∂/∂z

P 0 0

=

∂P

∂zj− ∂P

∂yk.

Recall

(2) ndS =

(∂ϕ

∂u× ∂ϕ

∂v

)dudv.

We now need the following

Lemma 1: Let A = (a1, a2, a3), B = (b1, b2, b3), and C = (c1, c2, c3). Then

A · (B × C) = det

a1 a2 ac

b1 b2 b3

c1 c2 c3

.

Proof of Lemma 1: By definition,

B × C = det

i j kb1 b2 b3

c1 c2 c3

= i det

(b2 b3

c2 c3

)− j det

(b2 b3

c2 c3

)+ k det

(b2 b3

c2 c3

).

So A · (B × C) = a1 det

(b2 b3

c2 c3

)− a2 det

(b2 b3

c2 c3

)+ a3 det

(b2 b3

c2 c3

).

The assertion follows.

Proof of Stokes’ theorem (contd)Using (1), (2) and Lemma 1, we get

(∇× F) ·(

∂ϕ

∂u× ∂ϕ

∂v

)= det

0 ∂P∂z

∂P∂y

∂x∂u

∂y∂u

∂z∂u

∂x∂v

∂y∂v

∂z∂v

.

2

Explicitly, we have(3)

(∇× F) ·(

∂ϕ

∂u× ∂ϕ

∂v

)= −∂P

∂z

(∂z

∂v

∂x

∂u− ∂z

∂u

∂x

∂v

)− ∂P

∂y

(∂y

∂v

∂x

∂u− ∂y

∂u

∂x

∂v

)

=

(∂P

∂z

∂z

∂u+

∂P

∂y

∂y

∂u

)∂x

∂u−

(∂P

∂z

∂z

∂v+

∂P

∂y

∂y

∂u

)∂x

∂u.

Put p(u, v) = P (ϕ(u, v)). Then by using chain rule, we get

∂p

∂u=

∂P

∂x

∂x

∂u+

∂P

∂y

∂y

∂u+

∂P

∂z

∂z

∂u

and∂p

∂v=

∂P

∂x

∂x

∂v+

∂P

∂y

∂y

∂v+

∂P

∂z

∂z

∂v.

Plugging this information into (3) and simplifying, we get

(4) (∇× F) ·(

∂ϕ

∂u× ∂ϕ

∂v

)=

∂p

∂u

∂x

∂v− ∂p

∂v

∂x

∂u.

Put

(5) P1(u, v) = p(u, v)∂x

∂u

and

Q1(u, v) = p(u, v)∂x

∂v.

It is easy to check, using the product rule, that

(6)∂

∂u

(p∂x

∂v

)− ∂

∂v

(p∂x

∂u

)=

∂p

∂u

∂x

∂v− ∂p

∂v

∂x

∂u.

A word of explanation is needed here. The simple formula has resulted fromour use of the equality

∂2x

∂u∂v=

∂2x

∂v∂u,

3

which is justified because ϕ is C2.

Putting these together, we obtain

(7) (∇× F) ·(

∂ϕ

∂u× ∂ϕ

∂v

)=

∂Q1

∂u− ∂P1

∂v

By the definition of the surface integral of ∇×F over S (see Chapter 9),we get (for F = P i)

(8)

∫∫

S

(∇× F) · ndS =

∫∫

T

(∂Q1

∂u− ∂P1

∂v)dudv.

Applying Green’s theorem in the plane, we get

(9)

∫∫

T

(∂Q1

∂u− ∂P1

∂v

)dudv =

∮

Γ

P1du + Q1dv.

On the other hands, since C = ϕ(Γ), we get

(10)

∮

C

Pdx =

∮

Γ

P (ϕ(u, v))

(∂x

∂u

)du + P (ϕ(u, v))

(∂x

∂v

)dv

=

∮

Γ

P1du + Q1dv.

This proves (i).

2 Examples

(1) Let F(x, y, z) = yezi+xezj+xyezk. Show that the integral of F over anyoriented simple closed curve C which is the boundary of a smooth surface Sis zero.

4

Applying Stokes, we get∮C

F · dα =∫∫S

(∇× F) · ndS. Moreover,

∇× F = det

i j k∂/∂x ∂/∂y ∂/∂zyez xez xyez

= i

(∂

∂y(xyez)− ∂

∂z(xez)

)− j

(∂

∂x(xyez)− ∂

∂x(yez)

)

+ k

(∂

∂x(xez)− ∂

∂y(yez)

)

= i(xez − xez)− j(yez − yez) + k(ez − ez) = 0.

Thus∮C

F · dα = 0.

(2) Let C be the intersection of the plane x + y + z = 1 and the cylinderx2 + y2 = 1 in R3. Orient C such that it corresponds to the positive (coun-terclockwise) direction in the plane z = 0. Use Stokes to compute the lineintegral I =

∮C

F · dα, where F = (−y3, x3,−z3).

To do this, note first that C is the boundary of the surface S definedby z = 1 − x − y, with (x, y) ∈ T := {x2 + y2 ≤ 1}. In other words, S isparametrized by ϕ(u, v) = (u, v, 1 − u − v). Note also that the boundary∂T of T is the unit circle Γ and that ϕ(Γ) = C. It is easy to see that S issmooth, and that ϕ is 1-1 and C2. Also, F is C1. So we may apply Stokesand get: I =

∫∫S

(∇ × F) · ndS, with n = ∂ϕ∂u× ∂ϕ

∂v. In this case, S is the

graph of the function h(x, y) = 1− x− y, and we computed ∂ϕ∂u× ∂ϕ

∂vfor such

a surface in the proof of the Proposition in §4 of Chapter 9. We found that∂ϕ∂u× ∂ϕ

∂v= (−∂h

∂u,−∂h

∂v, 1), which in the present case is (1,1,1). On the other

hand,

∇× F = det

i j k∂/∂x ∂/∂y ∂/∂z−y3 x3 −z3

= i

(∂

∂y(−z3)− ∂

∂z(x3)

)− j

(∂

∂x(−z3)− ∂

∂z(−y3)

)+ k

(∂

∂x(x3)− ∂

∂y(y3)

)

= (3x2 + 3y2)k

5

So, (∇× F) · (∂ϕ∂u× ∂ϕ

∂v

)= 3x2 + 3y2, and we get

I = 3

∫∫

T

(u2 + v2)dudv

= 3

1∫

0

2π∫

0

r2 · rdrdθ (in polar coordinates)

= 6π

1∫

0

r3dr =3π

2.

3 The Divergence Theorem of Gauss

Let S be a smooth surface in R3, parametrized by a C1, 1-1 function ϕ : T →R3, (u, v) 7→ (x(u, v), y(u, v), z(u, v)). Recall that ∂ϕ

∂u× ∂ϕ

∂vgives a normal

vector at any ξ = ϕ(u, v). This defines what one calls an orientation of S,i.e., gives a choice at each point between the two possible normal directions.

A surface S in R3 is said to be closed if it is the boundary of a (3-dimensional) region V in R3. In this case, we write ∂V for S. Note that atany point on a closed surface ∂V , are normal points inward, i.e., towards theinterior of V , and the other points outward.

We say that a smooth surface S is orientable if as we move the inwardnormal along a curve on S and come back to the initial point, then the inwardnormal continues to remain the inward normal. If we take the rectangularstrip and glue the ends to make a cylinder, resp. Mobius band, then theresulting surface is orientable, resp. non-orientable. Note that a smooth,closed surface is orientable, but it has two possible orientations, the firstwith the normal pointing inward, and the second with the normal pointingoutward.

Now we are in a position to state the divergence theorem.

Theorem (Gauss) Let V be a region in R3 with boundary S, a closed surface,oriented by choosing the unit outward normal n to S. Let F = (P, Q,R) bea C1 vector field on V . Then we have

∫∫∫

V

(∇ · F)dxdydz =

∫∫

S

F · ndS (*)

6

We will now prove this for regions of a very special type. Note first thatboth sides of (*) are linear in F, so it suffices to establish the following:

(Gi)∫∫∫V

∂P∂x

dxdydz =∫∫S

P i · ndS

(Gii)∫∫∫V

∂Q∂y

dxdydz =∫∫S

Qj · ndS

and(Giii)

∫∫∫V

∂R∂z

dxdydz =∫∫S

Rk · ndS

We will prove (Giii) under the hypothesis that V is a region of typeI, i.e., given by f1(x, y) ≤ z ≤ f2(x, y), (x, y) ∈ T , where T is a connectedregion in R2, and f1, f2 continuous. In this case, using Fubini, we get

∫∫∫

V

∂R

∂zdxdydz =

∫∫

T

dxdy

f2(x,y)∫

f1(x,y)

∂R

∂zdz

=

∫∫

T

[R(x, y, f2(x, y)−R(x, y, f1(x, y)]dxdy.

S = ∂V is a closed surface whose bottom S1, say, is the graph of z =f1(x, y), (x, y) ∈ T , and whose top S2, say, is the graph of z = f2(x, y), (x, y) ∈T . Let us denote by S3 the complement of S1 ∪ S2 in S, and by nj the unitoutward normal on Sj, 1 ≤ j ≤ 3. Then

∫∫

S

Rk · ndS =

∫∫

S

Rk · n1dS +

∫∫

S

Rk · n2dS,

since k · n3 = 0. (This is because n3 lies in the (x, y)-plane.) As we saw inthe proof of the Proposition in §4 of Chapter 8, we have

n1 =

∂f1

∂xi + ∂f1

∂yj− k√(

∂f1

∂x

)2+

(∂f1

∂y

)2

+ 1

Note that the sign is fixed by the fact that the outward normal n1 musthave negative k-component. In other words,

n1dS =

(∂f1

∂xi +

∂f1

∂yj− k

)dudv and n2dS =

(−∂f1

∂xi− ∂f1

∂yj + k

)dudv.

7

Consequently,

k · n1dS = −dudv, and k · n2dS = dudv

and ∫∫

S1

Rk · n1dS = −∫∫

T

R(u, v, f1(u, v))dudv

and ∫∫

S2

Rk · n1dS =

∫∫

T

R(u, v, f2(u, v))dudv.

Thus∫∫

S

Rk · n1dS =

∫∫

T

[R(u, v, f2(u, v))−R(u, v, f1(u, v))]dudv.

=

∫∫

T

∂R

∂zdxdydz.

This proves (Giii).By a similar argument, (Gii) (resp. (Gi)) holds if V is a region of type

II (resp. type III), i.e., if it is given by g1(x, z) ≤ y ≤ g2(x, z), (x, z) ∈ T1

(resp. h1(y, z) ≤ x ≤ h2(y, z), (y, z) ∈ T2), where T1 (resp. T2) is a connectedregion in R2 and g1, g2 (resp. h1, h2) are continuous.

Thus Gauss’s theorem holds at least when V is of type I, II and IIIsimultaneously. (Check, for example, that the ball in R3 is of this form.)Unfortunately, we have not developed things to a point where we can proveGauss’s theorem without such stupid hypotheses.

4 Examples

(1) Let S = S0(1) be the unit sphere x2 + y2 + z2 = 1 in R3 with center at0 and radius 1. Compute I :

∫∫S

F · ndS, where F = (2x, y2, z2). By Gauss’s

8

theorem, I =∫∫∫V

∇ · Fdxdydz, where V is the unit ball B0(1). We have:

∇ · F =∂

∂x(2x) +

∂

∂y(y2) +

∂

∂z(z2) = 2 + 2y + 2z.

⇒ I = 2

∫∫∫

V

(1 + y + z)dxdydz.

Note that ∫∫∫

V

ydxdydz =

∫∫∫

V

zdxdydz = 0

by the symmtery of V around 0 and the oddness of the function y (resp. z)in the y direction (resp. z direction).

So, I = 2∫∫∫V

dxdydz = 2vol(V ) = 8π3

.

The moral of this problem is that one should always try to first exploitthe symmetry arguments if present, before grunging out the calculation.

(2) Let S be the surface of the cylinder x2 + y2 = 1, −1 < z < 1, andx2 +y2 ≤ 1 when z = ±1. Compute I :=

∫∫S

F ·ndS, where F = (xy2, x2y, y).

Again by Gauss’s theorem, I =∫∫∫V

∇ · Fdxdydz, where V = {x2 + y2 ≤1, −1 ≤ z ≤ 1}. Also,

∇ · F =∂

∂x(xy2) +

∂

∂y(x2y) +

∂

∂z(y) = y2 + x2.

⇒ I =

∫∫∫

V

(x2 + y2)dxdydz =

1∫

−1

dz

∫∫

D0

(x2 + y2)dxdy

,

where D0 = {(x, y) ∈ R2|x2 + y2 ≤ 1}. Using polar coordinates to evaluate

the inside integral we get∫∫D0

(x2 + y2)dxdy =2π∫0

dθ(1∫0

r3dr) = π2. Thus I =

π2

1∫−1

dz = π.

9

Dinakar Ramakrishnan March 26, 2007 - Mathematicsdinakar/Vector Calculus Notes.pdf · Chapter 1...

Documents

Transcript of Dinakar Ramakrishnan March 26, 2007 - Mathematicsdinakar/Vector Calculus Notes.pdf · Chapter 1...