Chapter 1. Point-Set Topology and Calculusweb.hku.hk/~pingyu/6066/LN/LN1_Point-Set Topology... ·...

Chapter 1. Point-Set Topology and Calculus�

This course will review basic mathematical tools used in microeconomics, macroeconomics and

econometrics. Some popular resources on the materials in this course include Rudin (1976), Dixit

(1990), Simon and Blume (1994), Sundaram (1996), de la Fuente (2000), Chiang and Wainwright

(2005), Osborne (2016), the mathematical appendix of Mas-Colell et al. (1995) and the references

therein. Books designed especially for econometrics include Davidson (1994) and Dhrymes (2013)

among others. Our main reference books are Rudin (1976) for Chapter 1, Simon and Blume (1994)

and Sundaram (1996) for Chapter 2-4, and Casella and Berger (2002) for Chapter 5-6.

In this chapter, we will review some basics for point-set topology in the the Euclidean space and

single and multivariable calculus. Key concepts in point-set topology include open, completeness,

compactness, etc. In calculus, we will de�ne limits, continuity and di¤erentiability, and also de�ne

the Riemann integral and the related Riemann-Stieltjes integral.

This chapter includes too much content for logical completeness. Since we use this chapter

only as a prerequisite for the optimization theory in the coming chapters, we will only cover the

topics that are necessary for future developments. Sections, proofs, exercises, remarks or footnotes

indexed by * are optional. Paragraphs started and ended with (**) are also optional. Throughout

the course, I will emphasize understanding of basic concepts and intuition and application of basic

theorems rather than rigorous proofs; you can study the materials in the following order: slides in

class, the main text of the lecture notes (excluding the * part), and the references. Our assignment

and exam are based only on the materials in the slides.

We collect some popular notations here for future reference. "8" means "Any", 9means "Exist","viz." or "i.e." means "that is", "e.g." means "for example", "i¤" means "if and only if", "�" (or ,or :=) means "de�ned as", "=)" means "implies" and "()" means "is equivalent to". � is usedto signal the end of an example, and � the end of a proof. Real numbers are written using lower

case italics, e.g., x. Vectors are de�ned as column vectors and represented using lowercase bold,

e.g., x. Matrices are represented using uppercase bold, e.g., X. Sets are represented by uppercase

italic, e.g., X, and their elements by lower case italic, e.g., x.

1 Sets and Set Operations

This section reviews some basic concepts about set and set operation. We will use them to de�ne

the consumption set and production set. We will also use them in the optimizing decisions of �rms

�Email: [email protected]

1

and consumers.

De�nition 1.1 (Set) A set A is a collection of distinct objects.

An element x in A is denoted as x 2 A. An empty set is often denoted as ;. Elements in a set are notordered and must be distinct, so the following three sets are the same: f1; 2g = f2; 1g = f1; 2; 1g.

Remark 1.1 In mathematics, "collection", "class" and "family" all mean "set". An "object" ina set is often called a "point" although it can be a function de�ned in the following section or any

mathematical object.

The following are elementary set operations:

1. Subset: set A is contained in B.

� A � B: set A is contained in B, and A 6= B.

� A � B: set A is contained in B, and A and B may be equal.

Remark 1.2 Quite often, � means �. To emphasize A 6= B, & is often used. We will use

the convention of � and �.

Remark 1.3 To prove A = B, we need only to show A � B and B � A.

2. Union: A [B = fxjx 2 A or x 2 Bg. All points in either A or B.

Remark 1.4 "j" is read as "such that", and is often used exchangeably with ":" in the liter-ature and this course.

3. Intersection: A \B = fxjx 2 A and x 2 Bg. All points in both A and B.

4. Complement: Ac = fxjx =2 Ag. All points not in A. Here, a total set is implicitly de�ned.

5. Relative Complement: B nA = fx 2 Bjx =2 Ag = B \Ac: all points that are in B, but not inA.

Remark 1.5 B nA is often denoted as B �A, but we will the convention B nA.

The following are useful properties of set operations.

1. Commutatitivity: A [B = B [A and A \B = B \A

2. Associativity: A [ (B [ C) = (A [B) [ C and A \ (B \ C) = (A \B) \ C

3. Distributive Laws: A \ (B [ C) = (A \B) [ (A \ C) and A [ (B \ C) = (A [B) \ (A [ C)

4. De Morgan�s Law: (A [B)c = Ac \Bc and (A \B)c = Ac [Bc.

2

2 Functions

A function (or transformation or mapping) is about relationships between the elements of two sets.

De�nition 2.1 (Function) A function T : X 7�! Y is a rule that associates each element of X

with a unique element of Y ; in other words, for each x 2 X there exists a speci�ed element y 2 Y ,denoted as T (x). x is called the argument or the (independent) variable of T , and T (x) is calledthe value of T at x. X is called the domain of T , and Y the codomain. The set

GT = f(x; y) jx 2 X; y = T (x)g � X � Y

is called the graph of T , where X � Y � f(x; y) jx 2 X; y 2 Y g is the Cartesian product of Xand Y .1 For A � X, the set

T (A) = fT (x) jx 2 Ag � Y

is called the image of A under T , and for B � Y , the set

T�1 (B) = fxjT (x) 2 Bg � X

is called the inverse image of B under T . The set T (X) is called the range of T .

De�nition 2.2 If T (X) = Y the mapping is said to be surjective (or onto), i.e., every elementof the codomain is mapped to by at least one element of the domain; otherwise, T is said to be from

X into Y . If T (x1) = T (x2) =) x1 = x2, then the mapping is said to be injective (or one-to-one/1-1), i.e., every element of the codomain is mapped to by at most one element of the domain.A mapping is bijective (or one-to-one correspondence) if it is both onto and one-to-one.

Remark 2.1 The term function is usually reserved for cases when the codomain is the set of real

numbers. That is why we term utility functions and production functions. The term correspondence

is used for a rule connecting elements of X to elements of Y where the latter are not necessarily

unique. For example, T�1 is a correspondence, but not a function unless T is one-to-one. If

T�1 : T (X)! X is a function, then we call it the inverse function of T . The "correspondence"in one-to-one correspondence is an exceptional use of correspondence.

Remark 2.2 The elements in Y that cannot be achieved by T seem not interesting, so we can kick

them out to make T surjective. In this case, range and codomain mean the same thing. This is why

some books use them interchangeably.

The following is a miscellany of useful facts about mappings.

Theorem 2.1 For a mapping T : X 7�! Y , the following results hold:

1Cartesian means of or relating to the French philosopher René Descartes (1596-1650) - from his Latinized nameCartesius.

3

Figure 1: A & T�1 (T (A)) and T�T�1 (B)

�& B

(i) For a collection fA� � Xg, T (S�A�) =

S� T (A�) ;

(ii) for a collection fB� � Y g, T�1 (S�B�) =

S� T

�1 (B�) ;

(iii) for B � Y , T�1 (Bc) = T�1 (B)c ;

(iv) for A � X, A � T�1 (T (A)) ;

(v) for B � Y , T�T�1 (B)

�� B.

Remark 2.3 (i)-(iii) are easy to understand. (iv) is because T need not be one-to-one, so otherelements in X may have images in T (A). (iv) is because T need not be onto, so the inverse image

of some elements in B may be empty. See Figure 1 for intuitive illustration of (iv) and (v). If T

is bijective, then both (iv) and (v) hold with equality.

De�nition 2.3 (Composite Mapping) Let T : X 7�! Y and U : Y 7�! Z are two mappings.

The composite function (or mapping) U � T : X 7�! Z takes each x 2 X to the element

U (T (x)) 2 Z.

Remark 2.4 For C � Z, (U � T )�1 (C) = T�1�U�1 (C)

�= T�1 � U�1 (C).

3 Point-Set Topology in the Euclidean Space

To de�ne the Euclidean space, we �rst de�ne Rn. Rn is the Cartesian product of R with itself

n times. R1 is the real line; R2 is the plane; R3 is the three-dimensional space. The Euclideanspace is Rn with the Euclidean structure imposed on. To emphasize its Euclidean nature, somemathematicians denote the n-dimensional Euclidean space by En, but we will still use Rn to denoteit with the understanding that the Euclidean structure has been imposed.

4

3.1 Euclidean Spaces

The Euclidean structure is best described by the standard inner product (also known as the dotproduct) on Rn. The inner product of any two real n-vectors x and y is de�ned by

x � y =nXi=1

xiyi = x1y1 + x2y2 + � � �+ xnyn;

where xi and yi are ith coordinates of vectors x and y respectively, and x � y is often written asx0y with x0 meaning the transpose of x or hx;yi. The result is always a real number. The innerproduct of x with itself is always non-negative. This product allows us to de�ne the "length" of a

vector x through square root:

kxk =px � x =

vuut nXi=1

x2i :

This length function satis�es the required properties of a norm and is called the Euclidean normon Rn. Finally, one can use the norm to de�ne a metric (or distance) on Rn by

d (x;y) = kx� yk =

vuut nXi=1

(xi � yi)2:

This distance function is called the Euclidean metric. This distance function (which makes ametric space) is su¢ cient to de�ne all Euclidean geometry, including the dot product. Thus,a real coordinate space together with this Euclidean structure is called Euclidean space. Itsvectors form an inner product space, and a normed vector space, where a vector space (alsocalled a linear space) is a collection of objects called vectors, which may be added together andmultiplied ("scaled") by numbers, called scalars in this context.

Remark 3.1 A space is a set plus some structure on it. So a set imposed a metric, a norm or

an inner product is called a metric space, a normed space or an inner product space, respectively.

A norm must be de�ned on a vector space, so a normed space is also called a normed vector space.

The relationship between these spaces is as follows

Rn � inner product space � normed space � metric space.

Of course, we can naturally de�ne a subspace as a subset of the original space inheriting itsstructure.

The Euclidean norm is also called the L2 norm, or `2 norm. It is equivalent to other Lp norms,

p 2 [1;1], where the Lp norm of x is de�ned as

kxkp =

nXi=1

jxijp!1=p

:

5

Figure 2: Angle in Two-dimensional Euclidean Space

For p = 1 we get the taxicab norm (or the Manhattan norm)Pni=1 jxij, p = 2 corresponds

to the Euclidean norm, and as p approaches 1 the p-norm approaches the in�nity norm or

maximum norm, maxi jxij.At this moment, we need to clarify two points. First, what is the di¤erence between a norm

and an inner product. Second, what does it mean by the L2 norm is equivalent to other Lp

norms. For the �rst question, note that an important inequality in the inner product space is the

Cauchy-Schwarz inequality:jhx;yij � kxk � kyk :

Due to this inequality, we can de�ne

angle(x;y) = arccoshx;yi

kxk � kyk ;

see Figure 2. We assume the value of the angle is chosen to be in the interval [0; �]. If hx;yi = 0,angle(x;y) = �

2 ; we call x is orthogonal to y and denote it as x ? y. Given "orthogonality", wecan de�ne "projection". But for an normed space, we cannot do it.

The second question is more subtle. Formally, two norms k�k1 and k�k2 on a vector space V areequivalent if there exist two positive constants c and C such that 8v 2 V , c k�k2 � k�k1 � C k�k2.One important implication of equivalent norms is that they de�ne the same open sets which we

will explore now.

6

3.2 Open Sets

Open sets are the basic building blocks of the topological structure of Rn. Roughly speaking,topology is about the properties of open sets. To de�ne opens sets, we �rst de�ne open ballswhich form the base for all open sets in the sense that every open set in Rn can be written as aunion of open balls. An n-dimensional open ball (or open sphere) of radius r is the collectionof points of distance less than r from a �xed point in Rn. Explicitly, the open ball with center xand radius r is de�ned by

Br (x) = fy 2 Rnjd(x;y) < rg :

The open ball for n = 1 is called an open interval, and the term open disk is sometimes usedfor n = 2 and sometimes as a synonym for open ball.

De�nition 3.1 (Open Sets) A subset U of Rn is called open if for every x in U there exists an

r > 0 such that Br (x) is contained in U . A neighborhood of the point x is any subset N of Rn

that contains an open ball about x as a subset.

Remark 3.2 A set is open i¤ it contains an open ball around each of its points, i.e., open balls

are indeed the base of all open sets. Intuitively, an open set is "fat".

Remark 3.3 The complement of an open set is called closed.

Remark 3.4 Intuitively speaking, a neighborhood of a point is a set of points containing that pointwhere one can move some amount away from that point without leaving the set. Note that the

neighborhood N need not be an open set itself. If N is open it is called an open neighborhood.Some books require that neighborhoods be open; we will follow this convention.

Remark 3.5 The union of any arbitrary collection of open set fU�g�2A,S�2A U�, is open, where

"arbitrary" means the cardinality of the index set A can be uncountable. Equivalently, the intersec-

tion of any arbitrary collection of closed set fV�g�2A,T�2A V�, is closed. However, the converse

is not correct, i.e., the intersection of an arbitrary collection of open set need not be open although

the intersection of �nite open sets is still open. Equivalently, the union of an arbitrary collection

of closed set need not be closed although the union of �nite closed sets is still closed.

Example 3.1 What isS1n=1[0; 1� 1

n ]? What isT1n=1(0; 1 +

1n)?

Remark 3.6 Given a set [0; 1] � R, it seems that [0; 0:1) is an open set relative to [0; 1]. Generally,given a subset E of Rn, a set U is open relative to E if to each p 2 U there is associated an r > 0such that q 2 U whenever d (p; q) < r and q 2 E. Obviously, any open set in E is an open set in

Rn intersecting E.

De�nition 3.2 (Limit Point) A point p is a limit point of the set E if every neighborhood of

p contains a point q 6= p such that q 2 E.

7

Remark 3.7 Note that a limit point of E need not be in E.

Remark 3.8 If p 2 E and p is not a limit point of E, then p is called an isolated point of E. Ifp is an isolated point of E, then there exists a neighborhood of p not containing other points of E.

Remark 3.9 The closure of E consists of all points in E plus the limit points of E, usually

denoted as E. The point in E is called the closure point or adherent point - "sticking to" aset though not necessarily belonging to it. For a closed set, E = E, i.e., every limit point of E is a

point of E or every point of E is sticked to E.

De�nition 3.3 (Interior Point) A point p 2 E is an interior point of E if there is a neighbor-hood N of p such that N � E. The set of all interior points is called the interior of E, denotedas Eo.

Remark 3.10 Eo � E. For an open set, every point is an interior point, i.e., Eo = E.

Remark 3.11 If p 2 E and p is not an interior point, then p is called a boundary point. Theset of all boundary point is called the boundary of E, denoted as @E. E = Eo [ @E = E [ @E and

Eo = En@E.

Remark 3.12 A boundary point can be approached both from E and from Ec i.e., @E = E \ Ec

or @E consists points p of Rn such that every neighborhood of p contains at least one point of Eand at least one point not of E.

Example 3.2 Let E = f(0; 0)g [ fxj1 � d (x; (0; 0)) < 2g � R2. What are E, Eo, @E, the limitpoints and the isolated points?

3.3 Completeness and Compactness

We will use two key concepts in the �rst-year courses - completeness and compactness. Other

concepts such as separability, connectness, etc, are left for future occasions. For our purpose, we

�rst de�ne the concept "sequence" and its limit.

De�nition 3.4 (Sequence and Subsequence) A sequence fxng1n=1 is a mapping from N, theset of natural numbers, to some range space. Given a sequence fxng, a subsequence of fxng isde�ned as fxnig

1i=1, where n1 < n2 < � � � .

Remark 3.13 A sequence is automatically ordered, but its terms need not be distinct (like a set).Typically the range space is R although can be extended to any other space.

De�nition 3.5 (Convergence and Limit of a Sequence) A sequence fxng1n=1 is said to con-verge if there is a value x 2 R such that 8 " > 0, 9 n0 2 N which may depend on ", such that forall n > n0, jxn � xj < ". x is called the limit of fxng1n=1, and we write xn ! x or lim

n!1xn = x. If

fxng1n=1 does not converge, it is said to diverge.

8

Remark 3.14 Intuitively, limn!1

xn = x means that for n large enough, xn will stay in an arbitrary

small neighborhood of x. This de�nition is similar to the limit point of a set E. Actually, if p is a

limit point of E, then there exists a sequence fxng1n=1 in E such that limn!1

xn = p. Since a set E is

closed i¤ E = E, E is closed i¤ the limit of any converging sequence in E is also in E.

Example 3.3 Does the sequence fxng1n=1 with xn = 1 + (�1)n=n converge?

De�nition 3.6 (Completeness) A metric space (M;d) is complete if every Cauchy sequence inM converges in M . A sequence fxng in a metric space is called a Cauchy sequence if 8 " > 0,9 n0 2 N such that for all m;n > n0, d(xm; xn) < ".

Remark 3.15 Completeness can be used to check the existence of the limit of a sequence without�nd its limit explicitly. A convergent sequence must be Cauchy, but the converse is not true unless

the metric space is complete.

Remark 3.16 A closed subspace of a complete space is complete. Conversely, a complete subset

of a metric space (need not be complete) is closed.

Remark 3.17 Rn is complete, and any closed subset of Rn is complete. A complete inner productspace is called a Hilbert space, so Rn is a Hilbert space. A key feature of R that is not shared byother Hilbert spaces is that it has an ordering.

De�nition 3.7 (Compactness) A set E in a metric space (M;d) is called compact if each ofits open covers has a �nite subcover. Otherwise, it is called non-compact. Explicitly, this meansthat for every arbitrary collection

fU�g�2A

of open subsets of M such that

E �S�2A

U�;

there is a �nite subset J of A such that

E �Si2JUi:

If M is itself compact, (M;d) is said to be a compact space.

Remark 3.18 A closed subset of a compact space is compact, and a �nite union of compact sets

is compact.

Remark 3.19 Compactness can also be characterized by the so-called sequential compactness.Speci�cally, every sequence in E has a convergent subsequence when E is compact.

De�nition 3.8 (Boundedness) A set E is bounded if there is real number r and a point q 2 Esuch that d (p; q) < r for all p 2 E.

9

Remark 3.20 E is bounded means it can be covered by an open ball Br (q) of �nite radius.

Theorem 3.1 (Heine-Borel Theorem) A set E � Rn is compact i¤ it is bounded and closed.

Example 3.4 Show that [0;1) and (0; 1) are not compact by the de�nition of compactness. (Ex-ercise)

4 Single Variable Calculus

In calculus, we study functions with domain being a subset of Rn. The �rst-year economic analysisis typically based on �marginal e¤ect analysis", which is usually captured by the derivative of a

particular function (e.g., marginal utility, marginal cost, marginal revenue, etc.). This is why we

would review properties of continuous and "smooth" functions. The foundation and starting point

of calculus is the concept "limit".

4.1 Limits

We have investigated the limit of a sequence. However, the limit of a sequence need not exist, but

the following two limits, upper limit and lower limit, are always well de�ned. To de�ne these two

concepts, we �rst de�ne supremum and in�mum.

De�nition 4.1 (In�mum and Supremum) For a set A � R, the supremum or least upperbound of A, denoted as supA, is the smallest number y such that y � x for every x 2 A, andthe in�mum or greatest lower bound of A, denoted as inf A, is the largest number y such thatx � y for every x 2 A.

Remark 4.1 Although minA and maxA may not exist, inf A and supA always exist. For example,min

�1n ; n = 1; 2; � � �

does not exist, but inf

�1n ; n = 1; 2; � � �

= 0. Of course, if minA and maxA

exist, then inf A = minA and supA = maxA.

De�nition 4.2 (Upper and Lower Limits) The upper (or superior) limit of a sequence fxng1n=1is

lim supn!1

xn = infnsupm�n

xm:

The lower (or inferior) limit is

lim infn!1

xn = supninfm�n

xm:

Remark 4.2 Alternative notations for upper and lower limits of fxng1n=1 are limn!1xn and lim

n!1xn.

Remark 4.3 The limsup is the eventual upper bound of a sequence. Note that�supm�n xm

1n=1

is

a non-increasing sequence, so its limit always exists although may be +1. Similar arguments applyto liminf. When lim xn = lim xn, then lim

n!1xn exists in the extended real line R � R [ f�1g.

10

We can characterize limn!1

xn and limn!1

xn by the set of subsequential limits.

De�nition 4.3 (Subsequential Limit) If a subsequence of fxng, fxnig, converges, its limit iscalled a subsequential limit of fxng. The set of all possible subsequential limits is denoted as Swhich is a subset of R.

Remark 4.4 For a convergent sequence, S includes only one point, i.e., all subsequences convergeto the same limit.

Theorem 4.1 For a sequence fxng, limn!1

xn = supS and limn!1

xn = inf S, i.e., limn!1

xn is the largest

subsequential limit and limn!1

xn is the smallest subsequential limit.

The limit of a function is similarly de�ned as the limit of sequence.

De�nition 4.4 (Limit of a Function) Let f : [a; b] ! R. For any x 2 [a; b], we claim limt!x

f(t) = y if 8 " > 0, 9 � > 0 which may depend on ", such that jf(t)� yj < " for all t with

jt� xj < �.

Remark 4.5 We can similarly de�ne limt!xf(t) = inf

r>0sup ff(t)j jt� xj < r; t 6= xg, and lim

t!xf(t) =

supr>0

inf ff(t)j jt� xj < r; t 6= xg.

Remark 4.6 For two functions f and g, f(x) = O (g(x)) as x ! a if limx!a

��f(x)g(x)

�� < 1, read as

"f(x) is big-O of g(x)", and f(x) = o (g(x)) as x! a if limx!a

f(x)g(x) = 0, read as "f(x) is little-o (or

small-o) of g(x)", where we assume the ratio f(x)g(x) is well de�ned in taking the limit. So f(x) = O(1)

as x! a means f is bounded in a neighborhood of a, and f(x) = o(1) as x! a means limx!a

f(x) = 0.

We can also de�ne yn = O(xn) and yn = o(xn) for two sequences in a similar way.

4.2 Continuity

De�nition 4.5 (Continuity) Let f : [a; b] ! R. For any x 2 [a; b], f is said to be continuousat x if lim

t!xf(t) = f(x). If f is continuous at every point on [a; b], then f is said to be continuous

on [a; b] and denote f 2 C [a; b].

Remark 4.7 The domain of f can be extended to any set of Rn. Carefully check the de�nition of"limit", x need not stay in the domain of f to de�ne lim

t!xf(t) as long as x is a limit point of f�s

domain. However, to de�ne f to be continuous at x, x must stay in the domain of f .

Remark 4.8 An equivalent de�nition is that f is continuous if the inverse image of every openset is open, where we use the relative topology.

Remark 4.9 Some books de�ne t = x+�, so limt!xf(t) = f(x) is equivalently written as lim

�!0f(x+

�) = f(x). This can be understood as for any sequence �n ! 0, f(x + �n) ! f(x). Such a

notation system can be applied to other de�nitions such as the derivative.

11

Remark 4.10 For x = a or b, we can only de�ne limt#af(t) and lim

t"bf(t). Generally, for any x 2

(a; b), if f(x) = limt"xf(t) � f(x�), we call f is left continuous at x, and if f(x) = lim

t#xf(t) �

f(x+), we call f is right continuous at x, and f(x�) and f(x+) are called the left-hand andright-hand limit of f at x, respectively. If f(x) = f(x+) = f(x�), f is continuous at x.

Remark 4.11 The opposite of continuity is discontinuity. Usually, we will meet two kinds of

discontinuities. If f(x�) and f(x+) exist but are not equal, f is said to have a discontinuity of the�rst kind, or a simple discontinuity, at x. Otherwise the discontinuity is said to be of the second

kind.

Example 4.1 Let f(x) = 1=x, where x 2 R++ � (0;1). 8 " > 0,

jf(x+ �)� f(x)j = �

x (x+ �)< " i¤ � <

"x2

1� "x

and hence the choice of � depends on both " and x. To avoid the dependence on x, we de�ne

uniform continuity.

De�nition 4.6 (Uniform Continuity) Let f : [a; b]! R. If 8 " > 0, 9 � > 0 such that

jx� yj < � =) jf (x)� f (y)j < ":

Theorem 4.2 (Heine-Cantor Theorem) Every continuous function on a compact set is uni-formly continuous.

Another useful continuity concept is so-called Hölder continuity.

De�nition 4.7 (Hölder Continuity) A function f : [a; b]! R is said to beHölder continuousor satisfy a Hölder condition of order � > 0 on [a; b] if there exists a constant M > 0 such that

jf(x)� f(y)j �M jx� yj�

for all x; y 2 [a; b] :

Remark 4.12 Sometimes a Hölder condition of order � is also called a uniform Lipschitz con-dition of order � > 0. Some books call a function satisfying a Hölder condition of order 1 as

Lipschitz continuous and the corresponding M is referred to as a Lipschitz constant.

An important property of a continuous function is the following intermediate value theorem

(IVT).2

Theorem 4.3 (Intermediate Value Theorem) Let f : [a; b] ! R be continuous with f(a) <

f(b). Then for any value M 2 (f(a); f(b)), there is a c 2 (a; b) such that f(c) =M .2Bolzano proved the case with M = 0, so the IVT with M = 0 is also called Bolzano�s Theorem.

12

Figure 3: Intermediate Value Theorem

Remark 4.13 The case with f(a) > f(b) can be similarly stated. Note that c need not be unique;see Figure 3.

Remark 4.14 The intuition of the intermediate value theorem is that the graph of a continuous

function on a closed interval can be drawn without lifting your pencil from the paper.

4.3 Di¤erentiability

De�nition 4.8 (Di¤erentiability) Let f : [a; b]! R. For any x 2 [a; b] form the quotient

�(t) =f(t)� f(x)t� x (a < t < b; t 6= x) ;

and de�ne the derivative of f at x as

f 0(x) = limt!x

�(t)

provided this limit exists. If f 0 is de�ned at a point x, we say f is di¤ erentiable at x. If f 0 isde�ned at every point of [a; b], we say f is di¤ erentiable on [a; b]. If f 0 is further continuous on[a; b], we say f is continuously di¤ erentiable or smooth on [a; b] and denote f 2 C1 [a; b].

Remark 4.15 From this de�nition, the derivative is the limit of local slopes.

Remark 4.16 As in the de�nition of left continuity and right continuity, we can also de�ne theleft derivative of f at x as f 0�(x) = lim

t"xf(t)�f(x)t�x and right derivative as f 0+(x) = lim

t#xf(t)�f(x)t�x .

If f 0�(x) = f0+(x), then f is di¤erentiable at x. At the boundary points of f�s domain, a and b, only

right derivative and left derivative can be de�ned, respectively.

13

Remark 4.17 The notation of f 0(x) is attributed to Newton. The corresponding Leibniz�s notationis dydx or

dfdx(x), where y � f(x). One advantage of Leibniz�s notation is that we can intuitively write

dy = f 0(x)dx. If we want to emphasize that the derivative is taken at a speci�c point, say x0, then

we may write f 0(x0) asdydx

��x=x0

.

Remark 4.18 A function can be continuous, but not di¤erentiable (easy to �nd), or di¤erentiablebut not continuously di¤erentiable (Exercise). For f : [a; b]! R,

continuously di¤erentiable � Lipschitz continuous � �-Hölder continuous � uniformly continuous � continuous.

Also, a Lipschitz continuous function is di¤erentiable almost everywhere and the derivative, if exists,

is bounded by the Lipschitz constant.

Example 4.2 Suppose f(x) = x2. We want to calculate its derivative at point x = 2. Using thede�nition of derivative, we have

f 0(2) = limt!2

f(t)� f(2)t� 2 = lim

t!2

t2 � 22t� 2 = lim

t!2

(t+ 2)(t� 2)t� 2 = lim

t!2(t+ 2) = 4:

While it is useful to use the de�nition of derivative to verify the di¤erentiability of a function

at a point, it is time-consuming to compute derivatives using the de�nition. For particular forms

of functions, it is easier to remember the formula to compute derivatives. The following table

summarizes the derivatives of popular functions.

f(x) f 0(x) f(x) f 0(x)

c 0 exp(x) exp(x)

cx c ax ax ln(a)

x2 2x ln(x) 1=x

xn nxn�1 loga (x) 1= (x ln a)

x�1 �1=x2 sin(x) cos(x)px 1

21px

cos(x) � sin(x)

Table: Derivatives of Popular Functions

Theorem 4.4 Suppose f and g are de�ned on [a; b] and are di¤erentiable at a point x 2 (a; b).Then f + g, fg, and f=g are di¤erentiable at x, and

(i) (Sum Rule) (f + g)0 (x) = f 0(x) + g0(x);

(ii) (Product Rule) (fg)0 (x) = f 0(x)g (x) + f(x)g0(x);

(iii) (Quotient Rule)�fg

�0(x) = f 0(x)g(x)�g0(x)f(x)

g(x)2:

Remark 4.19 In the quotient rule, if f = 1, then we get the reciprocal rule: (1=g)0 (x) =�g0(x)=g(x)2.

14

The sign of the derivative of a function can be used to check whether it is monotone.

De�nition 4.9 (Monotone Functions) A function f is said to be non-decreasing (or in-creasing) if f(y) � f(x) whenever y > x. It is non-increasing (or decreasing) if �f isnondecreasing (increasing). A strictly increasing (strictly decreasing) function changes theabove inequality to be strict. A monotone (or monotonic) function is either non-decreasing ornon-increasing. A strictly monotone function is either strictly increasing or strictly decreasing.

Remark 4.20 To de�ne a monotone function, we must de�ne an order on its domain to makey > x meaningful. This is particularly easy when the domain of f is a subset of R.

Remark 4.21 Some books use "increasing" for our "strictly increasing".

Example 4.3 f(x) = 2x is monotone on R. f(x) = x2 is not monotone on R, but is monotoneon R+ � [0;1).

Theorem 4.5 Let f : [a; b]! R be continuous on [a; b] and di¤erentiable on (a; b).

(i) f 0(x) � 0 for x 2 (a; b) i¤ f(x) is non-decreasing;

(ii) f 0(x) � 0 for x 2 (a; b) i¤ f(x) is non-increasing;

(iii) if f 0(x) > 0 for x 2 (a; b), then f(x) is strictly increasing;

(iv) if f 0(x) < 0 for x 2 (a; b), then f(x) is strictly decreasing.

Remark 4.22 (*) A strictly increasing function f need not have f 0(x) > 0 for any x 2 (a; b).Actually, we can construct some function which is continuous, strictly increasing on [0; 1] but its

derivative vanishes almost everywhere (a.e.) in [0; 1].

The following theorem, known as the "chain rule" for di¤erentiation. It deals with di¤erentiation

of composite functions and is probably the most important theorem about derivatives.

Theorem 4.6 (Chain Rule) If g is a function that is di¤erentiable at a point c and f is a functionthat is di¤erentiable at g(c), then the composite function f�g is di¤erentiable at c, and the derivativeis

(f � g)0 (c) = f 0(g(c)) � g0(c);

or in short, (f � g)0 = (f 0 � g) � g0.

Another important theorem about di¤erentiation is the mean value theorem (MVT).3 The mean

value theorem states, roughly, that given a planar arc between two endpoints, there is at least one

point at which the tangent to the arc is parallel to the secant through its endpoints; see Figure 4.

3Rolle proved the case with f(a) = f(b), so the corresponding MVT is also called Rolle�s Theorem.

15

Figure 4: Mean Value Theorem

Theorem 4.7 (Mean Value Theorem) Let f : [a; b] ! R be continuous on [a; b] and di¤eren-tiable on (a; b). Then there exists some c 2 (a; b) such that

f 0(c) =f(b)� f(a)b� a :

We de�ne derivatives using limits. Actually, we can also �nd limits using derivatives particularly

when the limits involve indeterminate forms.

Theorem 4.8 (L�Hôital�s Rule) Suppose functions f and g are di¤erentiable on an open intervalI except possibly at a point c 2 I. If

limx!c

f(x) = limx!c

g(x) = 0 or �1;

g0(x) 6= 0 for all x 2 I and x 6= c, and limx!c

f 0(x)g0(x) exists, then

limx!c

f(x)

g(x)= limx!c

f 0(x)

g0(x):

Remark 4.23 The di¤erentiation of the numerator and denominator often simpli�es the quotientor converts it to a limit that can be evaluated directly.

Example 4.4 limx!0

sinxx = lim

x!0cosx1 = 1.

16

4.4 Higher-order Derivatives

If f has a derivative f 0 on an interval, and if f 0 is itself di¤erentiable, we denote the derivative of

f 0 by f 00 and call f 00 the second derivative of f . Continuing in this manner, we obtain functions

f; f 0; f 00; f (3); � � � ; f (k);

each of which is the derivative of the preceding one. f (k) is called the kth derivative, or thederivative of order k, of f . In Leibniz�s notation, f (k)(x) = dky

dxk, where y � f(x). In physics,

f 0 and f 00 mean velocity and acceleration, respectively. For a curve, f 0 means its slope and f 00

mean its curvature; that is why monotonicity and concavity of a function (which will be de�ned in

Chapter 3) are related to its �rst and second order derivatives, respectively.

In order for f (k)(x) to exist at a point x, f (k�1)(t) must exist in a neighborhood for x (or in a

one-sided neighborhood, if x is an endpoint of the interval on which f is de�ned), and f (k�1) must

be di¤erentiable at x. Since f (k�1) must exist in a neighborhood of x, f (k�2) must be di¤erentiable

in that neighborhood.

Theorem 4.9 (Taylor�s Theorem) Suppose f : [a; b] ! R, k 2 N, f (k�1) 2 C [a; b], f (k)(t)

exists for any t 2 (a; b). Let �, � be distinct points of [a; b], and de�ne

Pk�1(t) =k�1Xj=0

f (j)(�)

j!(t� �)j :

Then there exists x 2 (�; �)

f(�) = Pk�1 (�) +f (k)(x)

k!(� � �)k:

Remark 4.24 Taylor�s theorem gives an approximation of a k times di¤erentiable function arounda given point by a k-th order Taylor polynomial. The form of the remainder term is attributed

to Lagrange and is called the Lagrange form (or the mean-value form). Another form of the

remainder term takes the integral form:

Rk(�) =

Z �

�

f (k)(t)

(k � 1)!(� � t)k�1dt:

Yet another form is the Peano form of the remainder which writes f(k)(x)k! (��)k as

hf (k)(�)k! + hk(�)

i(��

�)k, where hk(�) satis�es lim�!� hk(�) = 0 or f(�)� Pk (�) = o�j� � �jk

�as � ! �.

Remark 4.25 Note that Taylor�s theorem does not require � close to � although we typically use

it in such a way. Also note that when k = 1, Taylor�s theorem reduces to the mean value theorem.

17

4.5 Integrability

Intuitively, the integral of a function is the area "under" the curve de�ned by the function. Let

f : [a; b]! R. A partition of [a; b] is a �nite sequence P = fxjgnj=0 such that a = x0 < x1 < � � � <xn = b. Each [xi; xi+1] is called a subinterval of the partition. The mesh or norm of a partition

is de�ned to be the length of the longest subinterval, that is,

max fxi � xi�1ji = 1; � � � ; ng :

A tagged partition P (x; t) of an interval [a; b] is a partition together with a �nite sequence ofnumbers t0; � � � ; tn�1 subject to the conditions that for each i, ti 2 [xi; xi+1]. In other words, it is apartition together with a distinguished point of every subinterval. The mesh of a tagged partition

is the same as that of an ordinary partition. The Riemann sum of f with respect to the tagged

partition x0; : : : ; xn together with t0; : : : ; tn�1 is

n�1Xi=0

f(ti) (xi+1 � xi) :

Each term in the sum is the product of the value of the function at a given point and the length of

an interval. Consequently, each term represents the (signed) area of a rectangle with height f(ti)

and width xi+1 � xi. The Riemann sum is the (signed) area of all the rectangles. The Riemannintegral is the limit of the Riemann sums of a function as the partitions get �ner, and is oftendenoted as

R ba f(x)dx. If the limit exists then the function is said to be Riemann integrable. The

Riemann sum can be made as close as desired to the Riemann integral by making the partition �ne

enough.

Riemann integrability can be better characterized by Darboux�s formulation.

Theorem 4.10 Let f : [a; b]! R. For each partition P de�ne

SP f =

n�1Xi=0

Mi (xi+1 � xi) ; sP f =n�1Xi=0

mi (xi+1 � xi) ;

where Mi and mi are the supremum and in�mum of f on [xi; xi+1], and de�ne

Iba (f) = inf

PSP f , Iba (f) = sup

PsP f;

where the in�mum and supremum are taken over all partitions P . If Iba (f) = Iba (f), then f is

Riemann integrable and their common value is the Riemann integralR ba f(x)dx.

Remark 4.26 SP f is called the upper Darboux sum of f with respect to P , and sP f is called

the lower Darboux sum of f with respect to P .

De�nition 4.10 (Antiderivative or Inde�nite Integral) Antiderivative or inde�nite in-tegral of a function f is a di¤erentiable function F whose derivative is equal to the original function

18

Figure 5: Fundamental Theorem of Calculus: Part I

f . This can be stated symbolically as F 0 = f .

The fundamental theorem of calculus is a theorem that links the concept of the derivative

of a function with the concept of the function�s integral. Brie�y, di¤erentiation and integration

are inverse operations. There are two parts to the theorem. Loosely put, the �rst part deals

with the derivative of an antiderivative, while the second part deals with the relationship between

antiderivatives and de�nite integrals.

Theorem 4.11 (Fundamental Theorem of Calculus) Part I: Let f : [a; b]! R be continuous,and F : [a; b]! R be de�ned, for all x 2 [a; b], by

F (x) =

Z x

af(t)dt:

Then, F is uniformly continuous on [a; b], di¤erentiable on the open interval (a; b), and

F 0(x) = f(x)

for all x 2 (a; b).Part II (Newton�Leibniz Axiom): Let f : [a; b] ! R be Riemann integrable, and F : [a; b] ! R

be continuous and F 0(x) = f(x) for all x 2 (a; b). ThenZ b

af(t)dt = F (b)� F (a):

Remark 4.27 Compared with Part I, Part II does not assume that f is continuous. Actually, if we

19

assume f is continuous, Part II is a straightforward corollary of Part I. Part II is often employed

to compute the de�nite integral of a function f for which an antiderivative F is known.

Remark 4.28 When an antiderivative F exists, then there are in�nitely many antiderivatives for

f , obtained by adding an arbitrary constant to F . Also, by Part I of the theorem, antiderivatives

of f always exist when f is continuous.

Remark 4.29 The �rst part of the theorem, sometimes called the �rst fundamental theorem of

calculus, is that the de�nite integration of a function is related to its antiderivative, and can be

reversed by di¤erentiation. This part of the theorem is important also because it guarantees the

existence of antiderivatives for continuous functions. The second part of the theorem, sometimes

called the second fundamental theorem of calculus, is that the de�nite integral of a function can be

computed by using any one of its in�nitely-many antiderivatives. This part of the theorem has key

practical applications because it markedly simpli�es the computation of de�nite integrals.

Integration by parts or partial integration relates the integral of a product of functionsto the integral of their derivative and antiderivative. It is frequently used to transform the anti-

derivative of a product of functions into an antiderivative for which a solution can be more easily

found.

Theorem 4.12 (Integration by Parts) Suppose both F and G : [a; b] ! R are di¤erentiable

functions, F 0 = f , and G0 = g. Then

R baF (x)g(x)dx = F (x)G(x)jba �

R ba f(x)G(x)dx:

Some books writeR baF (x)g(x)dx as

R baF (x)dG(x) and

R ba f(x)G(x)dx as

R baG(x)dF (x). To

understand such notations, we need to de�ne the so-called Riemann-Stieltjes integral. TheRiemann-Stieltjes integral is closely related to the Riemann integral. The Riemann-Stieltjesintegral of a function f : [a; b]! R with respect to a function g : [a; b]! R is denoted byZ b

af(x)dg(x)

and de�ned to be the limit, as the norm of the partition

P = fa = x0 < x1 < � � � < xn = bg

of the interval [a; b] approaches zero, of the approximating sum

S (P; f; g) =

n�1Xi=0

f(ti) (g (xi+1)� g (xi)) ;

where ti 2 [xi; xi+1]. The two functions f and g are respectively called the integrand and the

20

integrator. When g(x) = x, this is exactly the Riemann integral. We can also characterize

Riemann-Stietjes integrability by Darboux sums.

(**) In probability theory, given a random variable X, the expectation or mean of g(X) iswritten as

E [g(X)] �Zg(x)dF (x);

where F (x) is the cumulative distribution function (cdf) of X. When F does not adopt a

density f , we cannot writeRg(x)dF (x) as

Rg(x)f(x)dx, but

Rg(x)dF (x) is still valid even if X is

discrete or a mixture of continuous and discrete.

We further de�ne the Lp, 1 � p � 1, space as�g

��Z jg(x)jp dF (x) = E [jg(X)jp] <1�:

It is a normed space with the norm of f de�ned as kfkp = (E [jg(X)jp])1=p; when p =1, kfk1 =

sup ff(x)g. Especially, the L2 space is a Hilbert space with the inner product of f and g de�ned asZf(x)g(x)dF (x) = E [f(X)g(X)] :

Example 4.5 (Weighted Average Derivative) Suppose E [Y jX = x] = g(x); we are often in-

terested in a weight average derivative of g(x):

� = E

�w(X)

@g(X)

@X

�;

where w(x) is a scalar function. If g(x) = G(x0�), then

� = E

�w(X)

@G(X 0�)

(X 0�)

��

is proportional to � and can be used to estimate � up to scale. If w(x)f(x) is zero on the boundary

of the support of x, say, [a; b], then by Theorem 4.12,

� =

Z b

aw(x)g0(x)f(x)dx =

Z b

aw(x)f(x)dg(x)

= w(x)f(x)g(x)jba �Z b

ag(x) [w(x)f(x)]0 dx = �

Z b

ag(x) [w(x)f(x)]0 dx:

If w(x) = 1, then � = �R ba g(x)f

0(x)dx. In other words, the estimation of average derivative can

be translated to the estimation of average conditional mean which is sometimes easier. (**)

21

5 Multivariable Calculus

When f : E ! Rm, where E is an open set in Rn, we can similarly de�ne whether f is di¤erentiableat a point x 2 E. To understand the derivative of such a general function, note that the derivativef 0(x) in De�nition 4.8 can be equivalently reexpressed as follows: there exists a linear function

f 0(x)h such that

limh!0

r(h)

h� limh!0

f(x+ h)� f(x)� f 0(x)hh

= 0;

where r is for "remainder".

De�nition 5.1 (Di¤erentiability) A function f : E ! Rm is said to be di¤ erentiable at x ifthere exists a linear map J : Rn ! Rm such that

limh!0

kr(h)kRmkhkRn

� limh!0

kf(x+ h)� f(x)� J (h)kkhk = 0;

where J (h) = Jh, and the m�n matrix J is called the Jacobian matrix at x. We write f 0(x) = J.If f is di¤erentiable at every x 2 E, we say f is di¤ erentiable in E.

Remark 5.1 More often, we use Dxf (x) or Df (x) to denote f 0(x). Note that J depends on x asall these notations indicate.

Remark 5.2 When m = n, J is square. Both the matrix and its determinant are referred to as

the Jacobian in literature.

Remark 5.3 The chain rule still holds: If f : E ! Rm is di¤erentiable at x0 2 E, g maps anopen set containing f(E) into Rk, and g is di¤erentiable at f(x0). Then the mapping F of E into

Rk de�ned byF(x) = g(f(x))

is di¤erentiable at x0, and

F0(x0) = g0(f(x0))f

0(x0):

The derivative de�ned above is often called the total derivative of f at x. It may be intriguinghow to �nd J in practice. It turns out that it can be calculated through partial derivatives.

De�nition 5.2 (Partial Derivatives) Let f = (f1; � � � ; fm)0. The partial derivative of fi at xwith respect to the j-th variable is de�ned as

@fi@xj

(x) = limh!0

f(x1; � � � ; xj�1; xj + h; xj+1; � � � ; xn)� f(x1; � � � ; xj�1; xj ; xj+1; � � � ; xn)h

provided the limit exists.

Remark 5.4 @fi@xj

(x) measures how much fi would change when all other variables except xj are

�xed at x and only xj changes a little bit, so it is very useful in economics for "ceteris paribus"

analysis.

22

Remark 5.5 Even if all partial derivatives @fi@xj

(x) exist at a given point x, f need not be (totally)

di¤erentiable, or even continuous in the sense that limh!0

f(x+h) = f(x) (Exercise). On the contrary,

if f is di¤erentiable at x, then all the partial derivatives at x exist, and

J =

0BB@@f1@x1

(x) � � � @f1@xn

(x)...

. . ....

@fm@x1

(x) � � � @fm@xn

(x)

1CCA � @f

@x0(x);

where J depends on x in general.

Remark 5.6 When m = 1, Df (x) is a 1 � n (row) vector; we may intuitively express the totalderivative in the form of total di¤ erential,

dy =@f

@x1(x) dx1 + � � �+

@f

@xn(x) dxn;

or equivalently,dy

dx1=@f

@x1(x) +

@f

@x2(x)

dx2dx1

+ � � �+ @f

@xn(x)

dxndx1

;

where @f@xj

(x) is often denoted as fxj (x). Obviously, the total derivative must take into account of

the change of (x2; � � � ; xn) as x1 changes, which is dramatically di¤erent from the partial derivative.In other words, the path of h! 0 in Rn in the de�nition of total derivative is not restricted whilethe path of h ! 0 in the de�nition of partial derivative @fi=@xj (x) is restricted to be along the

axis (i.e., hj ! 0 and hk = 0 if k 6= j); see Figure 6 for an intuitive illustration of di¤erent pathesof h! 0 in R2 for m = 1.

Remark 5.7 If all partial derivatives exist in a neighborhood of x and are continuous there, thenf is (totally) di¤erentiable in that neighborhood and the total derivative is continuous. In this case,

it is said that f is a C1 function.

In the de�nition of partial derivative, x+ h converges to x along the jth coordinate direction.

In general, we can check the change of f along any direction.

De�nition 5.3 (Directional Derivative) Let f : E ! R, where E is an open set in Rn. Thedirectional derivative of f along a vector v = (v1; � � � ; vn)0 is the function de�ned by the limit

rvf(x) = limh!0

f(x+ hv)� f(x)h

:

Remark 5.8 If the function f is di¤erentiable at x, then the directional derivative exists alongany vector v, and one has

rvf(x) = rf(x) � v

where rf(x) =�@f@x1

(x) ; � � � ; @f@xn (x)�0on the right is called the gradient and � is the dot product.

Note that rf(x) = Df(x)0. Usually, v is normalized such that kvk = 1.

23

0.4 0.3 0.2 0.1 0 0.1 0.2 0.3

0.2

0.1

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

Figure 6: Total Derivative, Partial Derivtive and Directional Derivative: Red Arrow for TotalDerivative, Blue Arrow for Directional Derivative, Black Arrows along y-axis for @f=@x2 (x) andBlack Arrows along x-axis for @f=@x1 (x)

The partial derivative @fi=@xj can be seen as another function de�ned on E and can again

be partially di¤erentiated. To simplify notations, suppose m = 1, i.e., f : E ! R. One maywonder whether the order of di¤erentiation matters, that is, whether @2f

@xj@xi(x) � @

@xj

�@f@xi(x)�

equals @2f@xi@xj

(x) � @@xi

�@f@xj(x)�. This problem is solved in Young�s theorem.

Theorem 5.1 (Young�s Theorem) Let f : E ! R, where E is an open set in Rn. If f hascontinuous second partial derivatives at x, then

@2f

@xj@xi(x) =

@2f

@xi@xj(x):

Remark 5.9 If all mixed second order partial derivatives are continuous at a point (or on a set),f is termed a C2 function at that point (or on that set). Young�s theorem implies that for a C2

function f at x, the Hessian matrix f at x, H (x), is symmetric, where

H (x) �

0BBBBB@@2f@x21

(x) @2f@x1@x2

(x) � � � @2f@x1@xn

(x)

@2f@x2@x1

(x) @2f@x22

(x) � � � @2f@x2@xn

(x)

......

. . ....

@2f@xn@x1

(x) @2f@x22

(x) � � � @2f@x2n

(x)

1CCCCCA =@2f

@x@x0(x);

with the (i; j)th element being @2f@xi@xj

(x). The Hessian matrix of f at x is often denoted as D2xf(x)

24

or D2f(x).

Remark 5.10 As in the case of the Jacobian, the term "Hessian" unfortunately appears to beused both to refer to this matrix and to the determinant of this matrix.

We are now ready to state Taylor�s theorem for multivariate functions. First, we need to

introduce some notations. For � 2 Zn+ and x 2 Rn with Z+ � N [ f0g, de�ne

j�j = �1 + � � �+ �n;�! = �1! � � ��n!;x� = x�11 � � �x�nn :

If all the k-th order partial derivatives of f : Rn ! R are continuous at a 2 Rn, then by Young�stheorem, one can change the order of mixed derivatives at a, so the notation

D�f =@j�jf

@x�11 � � � @x�nn; j�j � k

for the higher order partial derivatives is justi�ed in this situation. The same is true if all the

(k � 1)-th order partial derivatives of f exist in some neighborhood of a and are di¤erentiable ata. Then we say that f is k times di¤erentiable at the point a.

Theorem 5.2 (Taylor�s Theorem for Multivariate Functions) Let f : E ! R be a k timesdi¤erentiable function on an open convex set E � Rn. Then for any two distinct points a and x inE,

f(x) =X

j�j�k�1

D�f (a)

�!(x� a)� +Rk(x);

where

Rk(x) =Xj�j=k

D�f (ca+ (1� c)x) (x� a)�

�!

for some c 2 (0; 1).

Remark 5.11 Rk(x) takes the Lagrange form. We can also state the remainder term in the integralform as

Rk(x) = kXj�j=k

(x� a)�

�!

Z 1

0(1� t)k�1D�f (a+ t (x� a)) dt;

or in the Peano form as Rk(x) =Pj�j=k

hD�f(a)�! + h�(x)

i(x� a)� with lim

x!ah�(x) = 0.

Example 5.1 When k = 1, we get the mean value theorem:

f(x) = f(a) +rf(ca+ (1� c)x) � (x� a)

for some c 2 (0; 1). When k = 2, we can approximate f(x) by a quadratic function in the neigh-borhood of a:

f(a) +rf(a) � (x� a) + 12(x� a)0H (x� a) ;

25

where H is evaluated at ca+ (1� c)x for some c 2 (0; 1).

We can similarly de�ne the multiple integralZ bn

an

� � �Z b1

a1

f(x)dx1 � � � dxn

as in the single variable case. Now, we need to partition each interval [ai; bi], i = 1; � � � ; n, toconstruct subrectangles. In constructing the Riemann sum, replace the width of subintervals by

the volume of subrectangles, and let ti stay in the corresponding subrectangle. Then make the width

of all subintervals uniformly converge to zero and check whether the Riemann sum converges.

Integrals of a function of two variables over a region in R2 are called double integrals, andintegrals of a function of three variables over a region of R3 are called triple integrals. They canbe used to calculate areas and volumes of regions in the plane, respectively.

In single variable calculus, the fundamental theorem of calculus establishes a link between

the derivative and the integral. The link between the derivative and the integral in multivariable

calculus is embodied by the integral theorems of vector calculus: Gradient theorem, Stokes�theorem,

Divergence theorem, and Green�s theorem. In a more advanced study of multivariable calculus, it is

seen that these four theorems are speci�c incarnations of a more general theorem, the generalizedStokes�theorem. These theorems are rarely used in economics, so are neglected in this course.

We �nally state a very useful result that shows how to di¤erentiate under the integral sign.

Theorem 5.3 (Leibniz�s Rule) Let f(x; t) be a function such that both f(x; t) and its partialderivative (@=@t) f (x; t) are continuous in t and x in some region of the (x; t)-plane, including

� (t) � x � � (t), t0 � t � t1. Also suppose that the functions � (t) and � (t) are both continuousand both have continuous derivatives for t0 � t � t1. Then for t0 � t � t1,

F (t) �Z �(t)

�(t)f (x; t) dx

is di¤erentiable and

d

dtF (t) = f (b (t) ; t) b0 (t)� f (a (t) ; t) a0 (t) +

Z b(t)

a(t)ft (x; t) dx:

26

Chapter 1. Point-Set Topology and Calculusweb.hku.hk/~pingyu/6066/LN/LN1_Point-Set Topology... ·...

Documents

Transcript of Chapter 1. Point-Set Topology and Calculusweb.hku.hk/~pingyu/6066/LN/LN1_Point-Set Topology... ·...