[Robert B. Ash] Measure, Integration and Functiona(BookFi.org)

ROBERT B. ASH

Measure, Integration,and Functional Analysis

ROBERT B. ASHUniversity of Illinois

ACADEMIC PRESSNew York and London

COPYRIGHT C 1972, BY ACADEMIC PRESS, INC.ALL RIGHTS RESERVEDNO PART OF THIS BOOK MAY BE REPRODUCED IN ANY FORM,BY PHOTOSTAT, MICROFILM, RETRIEVAL SYSTEM, OR ANYOTHER MEANS, WITHOUT WRITTEN PERMISSION FROMTHE PUBLISHERS.

ACADEMIC PRESS, INC.Ill Fifth Avenue, New York, New York 10003

United Kingdom Edition published byACADEMIC PRESS, INC. (LONDON) LTD.24/28 Oval Road, London NWI 7DD

LIBRARY OF CONGRESS CATALOG CARD NUMBER: 76-159618

AMS (MOS) 1970 Subject Classification: 28-01

PRINTED IN THE UNITED STATES OF AMERICA

Contents

PrefaceSummary of Notation

I Fundamentals of Measure and Integration Theory

1.1 INTRODUCTION

1.2 FIELDS, a-FIELDS, AND MEASURES

1.3 EXTENSION OF MEASURES

1.4 LEBESGUE-STIELTJES MEASURES AND DISTRIBUTION FUNCTIONS

1.5 MEASURABLE FUNCTIONS AND INTEGRATION

1.6 BASIC INTEGRATION THEOREMS

1.7 COMPARISON OF LEBESGUE AND RIEMANN INTEGRALS

2 Further Results in Measure and Integration Theory

2.1 INTRODUCTION

2.2 RADON-NIKODYM THEOREM AND RELATED RESULTS

2.3 APPLICATIONS TO REAL ANALYSIS

2.4 L° SPACES2.5 CONVERGENCE OF SEQUENCES OF MEASURABLE FUNCTIONS

2.6 PRODUCT MEASURES AND FUBINI'S THEOREM

2.7 MEASURES ON INFINITE PRODUCT SPACES

2.8 REFERENCES

V

A CONTENTS

3 Introduction to Functional Analysis

3.1 INTRODUCTION 1133.2 BASIC PROPERTIES OF HILBERT SPACES 1163.3 LINEAR OPERATORS ON NORMED LINEAR SPACES 1273.4 BASIC THEOREMS OF FUNCTIONAL ANALYSIS 1383.5 SOME PROPERTIES OF TOPOLOGICAL VECTOR SPACES 1503.6 REFERENCES 167

4 The Interplay between Measure Theory and Topology

4.1 INTRODUCTION 1684.2 THE DANIELL INTEGRAL 1704.3 MEASURES ON TOPOLOGICAL SPACES 1784.4 MEASURES ON UNCOUNTABLY INFINITE PRODUCT SPACES 1894.5 WEAK CONVERGENCE OF MEASURES 1964.6 REFERENCES 200

Appendix on General Topology

Al INTRODUCTION

A2 CONVERGENCEA3 PRODUCT AND QUOTIENT TOPOLOGIESA4 SEPARATION PROPERTIES AND OTHER WAYS OF CLASSIFYING

TOPOLOGICAL SPACES

AS COMPACTNESSA6 SEMICONTINUOUS FUNCTIONSA7 THE STONE-WEIERSTRASS THEOREMA8 TOPOLOGIES ON FUNCTION SPACESA9 COMPLETE METRIC SPACES AND CATEGORY THEOREMS

A10 UNIFORM SPACES

BIBLIOGRAPHY

Solutions to Problems

Subject Index

201

202

208

211

213

220

223

226

230

234

241

243

279

Preface

The subject matter of this book is fundamental to all areas of mathematicalanalysis and should be accessible to students who are in the early stages oftheir professional training. An undergraduate course in real variables is aprerequisite, and an acquaintance with complex analysis is desirable. How-ever, since no use is made of the Cauchy theory, the exposure to complexvariables need not be extensive. To fully appreciate Chapters 3 and 4 somebackground in elementary point-set topology is essential. The intendedaudience thus consists of mathematics majors, most likely seniors or beginninggraduate students, plus students of engineering and physics who use measuretheory or functional analysis in their work.

The book has been arranged so that it may be used in several ways. Thefirst two chapters present the fundamentals of measure and integration theory,and can serve as the text for a short course in this subject. If time permits,material from Chapter 4, on the interplay between measure theory andtopology, may be added. Chapter 4 is almost independent of Chapter 3; theonly dependence occurs in Theorems 4.3.13 and 4.3.15. If the particulargroup of students has some background in measure theory, Chapter 3 maybe used as a text for an introductory course in functional analysis. There is anAppendix on General Topology, in which those areas of topology that occurin the book are treated. Selections from the appendix and from Chapter 4can be blended into a course covering aspects of topology that are of interest

vu

PREFACE

in analysis. Of course, the entire book may be covered, possibly in a leisurelyone year course.

Problems are given at the end of each section. Fairly detailed solutionsare given to many problems, and instructors may obtain solutions to thoseproblems not worked out in the text by writing the publisher.

" Measure, Integration and Functional Analysis" is actually the first halfof another book. The author's " Real Analysis and Probability" consists ofthe present text and four additional chapters on probability. Thus I have beencareful to include those results that are of particular importance in probability.For example, the generalized product measure theorem (Section 2.6) isdeveloped in such a way that it is immediately applicable to compound experi-ments in probability, where the probability of an event associated with step nof the experiment depends on the result of the first n - I steps. Also, measureson arbitrary product spaces are discussed, and the Kolmogorov extensiontheorem is proved in full generality.

Although the book has a probabilistic flavor and indeed its main purposeis to prepare probability students for later work, nevertheless considerablecare has been taken to make the presentation suitable for the analysis studentwho is not necessarily a probability specialist. The basic training needed forwork in probability theory is quite similar to the background required for thestudy of other areas of modern analysis. There are only a few sections in thebook which can be regarded as specialized, and these sections may be omittedwithout loss of continuity. Specifically, Section 1.4 (Parts 6-10), Section 2.7,and Sections 4.4 and 4.5 may be skipped, although even the nonprobabilistmay encounter this material in his later work.

It is a pleasure to thank Professors Melvin Gardner and Samuel Saslaw,who used the manuscript in their classes and made many helpful suggestions,Mrs. Dee Keel for her expert typing, and the staff at Academic Press for theirencouragement and cooperation.

Summary of Notation

We indicate here the notational conventions to be used throughout thebook. The numbering system is standard; for example, 2.7.4 means Chapter 2,Section 7, Part 4. In the Appendix on General Topology, the letter A is used;thus A6.3 means Section 6, Part 3 of the appendix.

The symbol I will be used to mark the end of a proof.

1 Sets

If A and B are subsets of a set 1, A u B will denote the union of A and B,and A n B the intersection of A and B. The union and intersection of a familyof sets A; are denoted by U; A A. and n; A; A. The complement of A (relative to0) is denoted by Ac.

The statement "B is a subset of A" is denoted by B A; the inclusion neednot be proper, that is, we have A c A for any set A. We also write B c A asA B, to be read "A is an overset (or superset) of B."

The notation A - B will always mean, unless otherwise specified, the setof points that belong to A but not to B. It is referred to as the differencebetween A and B; a proper difference is a set A - B, where B C A.

The symmetric difference between A and B is by definition the union ofA - B and B - A; it is denoted by A AB.

ix

X SUMMARY OF NOTATION

If Al c A2 c - - - and Un°=1 A" - A, we say that the A. form an increasingsequence of sets (increasing to A) and write A. T A. Similarly, if Al ' A2and n 1 A,, = A, we say that the A. form a decreasing sequence of sets(decreasing to A) and write A. I A.

The word "includes" will always imply a subset relation, and the word"contains" a membership relation. Thus if rlf and -9 are collections of sets,"' includes .9 " means that -9 c . Equivalently, we may say that' containsall sets in -9, in other words, each A e.9 is also a member of W.

A countable set is one that is either finite or countably infinite.

2 Real Numbers

The set of real numbers will be denoted by R, and R" will denote n-dimen-sional Euclidean space. In R, the interval (a, b] is defined as {x c- R: a < x 5 b},and (a, oo) as {x c- R: x > a); other types of intervals are defined similarly.If a = (al, ..., a") and b = (b1, ..., b") are points in R", a:!5; b will meana; < b; for all i. The interval (a, b] is defined as {x E R": a; < x; < b;,i = 1, ..., n}, and other types of intervals are defined similarly.

The set of extended real numbers is the two-point compactificationR v {co} u {-co), denoted by R; the set of n-tuples (x1, ..., x"), with eachx; c -,R, is denoted by R". We adopt the following rules of arithmetic in R.

a+oo=co+a=oo, a - oo = -oo + a = -co, aeR,00+oo=co, -00-co=-0o (co - oo is not defined),

co if beR, b>0,b co-co b=(-oo if beR, b<O,a = a = 0, a e R

(00is not definedoo

- 00

0-c0=00.0=0.

The rules are convenient when developing the properties of the abstractLebesgue integral, but it should be emphasized that R is not a field under theseoperations.

Unless otherwise specified (notably in the definition of a positive linearfunctional in Chapter 4), positive means (strictly) greater than zero, and non-negative means greater than or equal to zero.

The set of complex numbers is denoted by C, and the set of n-tuples ofcomplex numbers by C".

SUMMARY OF NOTATION

3 Functions

xJ

iff is a function from S2 to Q' (written as f: Q - S2') and B c C", thepreimage of B under f is given by f -'(B) = {co e S2:f(w) a B). It follows fromthe definition that f -'(U, B,) = U; f -'(B,), f -'(n,B,) = nif -'(Bi),f -'(A - B) = f -'(A) - f -'(B); hence f -'(A`) = [f -'(A)]`. If W is a classof sets, f -'('e) means the collection of sets f -'(B), B e W.

I f f : R -+ R, f is increasing if x < y i mpl ies f (x) < f (y) ; decreasing ifx < yimplies f(x) > fly). Thus, "increasing" and "decreasing" do not have thestrict connotation. If Q - R, n = 1, 2, ..., the f are said to form anincreasing sequence iff fi+1(w) for all n and w; a decreasing sequenceis defined similarly.

If f and g are functions from Q to R, statements such as f 5 g are alwaysinterpreted as holding pointwise, that is, f(co) <g((o) for all co a Q. Similarly,if f,: SI --i R for each i e I, sup; f, is the function whose value at co issup{f,((O): i e 1}.

if f,,f2 , ... form an increasing sequence of functions with limit f[that is,lim".,f"(w) =f(w) for all w], we write f If. (Similarly, f if is used for adecreasing sequence.)

Sometimes, a set such as {co e fl: f(w) < g(w)} is abbreviated as (f:9 g};similarly, the preimage {co a S2: f(w) e B} is written as If e B).

If A c K1, the indicator of A is the function defined by IA(w) = I if w e Aand by 1,,(w) = 0 if co 0 A. The phrase " characteristic function " is often usedin the literature, but we shall not adopt this term here.

If f is a function of two variables x and y, the symbol f(x, ) is used forthe mapping y -+f(x, y) with x fixed.

The composition of two functions X: fl -+ S2' and f: Q' --> 0" is denotedbyfoXorf(X).

If f: S2 -* R, the positive and negative parts of f are defined by fmax(f,0) and f = max(-f, 0), that is,

f.+(w) _ If(w) if f(w) >_ 0,10 if f(w) < 0,

f -(w) - -f (CO) if f(w) : 0,-(0 if f(w)>0.

4 Topology

A metric space is a set i2 with a function d (called a metric) from L x 0to the nonnegative reals, satisfying d(x, y) >- 0, d(x, y) = 0 iff x = y, d(x, y) =

xii SUMMARY OF NOTATION

d(y, x), and d(x, z):!5; d(x, y) + d(y, z). If d(x, y) can be 0 for x:0 y, but dsatisfies the remaining properties, d is called a pseudometric (the term semi-metric is also used in the literature).

A ball (or open ball) in a metric or pseudometric space is a set of the formB(x, r) = {y e fl: d(x, y) < r} where x, the center of the ball, is a point of S2,and r, the radius, is a positive real number. A closed ball is a set of the formB(x,r)={yafl:d(x, y) <r}.

A topological space is a set Q with a collection .°l of subsets of 91, called atopology, such that 0 and Q belong to Jr and .l is closed under finiteintersection and arbitrary union. The members of .0- are called open sets. Abase for is a collection of sets .4 such that each open set is a union of setsin -4.

A neighborhood of a point x e fl is an open set containing x; an over-neighborhood of x is an overset of a neighborhood of x. A base for theneighborhood system at x (or simply a base at x) is a collection 4l of neighbor-hoods of x such that for each neighborhood V of x there is a set U e'fl withU e V. A base for the overneighborhood system at x is defined similarly, withneighborhood replaced by overneighborhood.

If 0 is a topological space, C(S2) denotes the class of continuous real-valued functions, and Cb(.Q) the class of bounded, real-valued, continuousfunctions, on fl. The phrase " lower semicontinuous" is abbreviated LSC, and"upper semicontinuous" is abbreviated USC. Sequences in Q are denoted by{x,,,n= 1,2,...}and nets by{x,,,neD}.

5 Vector Spaces

The terms " vector space " and " linear space " are synonymous. Allvector spaces are over the real or complex field, and the complex field isassumed unless the phrase " real vector space" is used.

A Hamel basis for a vector space L is a maximal linearly independentsubset B of L. (Linear independence means that if xI, . .., x e B, n = 1 , 2, ...and cl, ..., c are scalars, then Fi c; x, = 0 if all c; = 0.) Alternatively, aHamel basis is a linearly independent subset B with the property that eachx e L is a finite linear combination of elements in B. [An orthonormal basis fora Hilbert space (Chapter 3) is a different concept.]

The terms "subspace" and "linear manifold" are synonymous, eachreferring to a subset M of a vector space L that is itself a vector space underthe operations of addition and scalar multiplication in L. If there is a topologyon L and M is a closed set in the topology, then M is called a closed subspace.

SUMMARY OF NOTATION

If B is an arbitrary subset of L, the linear manifold generated by B, denotedby L(B), is the smallest linear manifold containing all elements of B, that is,the collection of finite linear combinations of elements of B. Assuming atopology on L, the space spanned by B, denoted by S(B), is the smallest closedsubspace containing all elements of B. Explicitly, S(B) is the closure of L(B).

6 Zorn's Lemma

A partial ordering on a set S is a relation " < " that is

(1) reflexive: a < a,(2) antisymmetric: if a:5 b and b < a, then a =b, and(3) transitive: if a < b and b < c, then a < c.

(All elements a, b, c belong to S.)

If C c S, C is said to be totally ordered if for all a, b e C, either a:5 b orb:5 a. A totally ordered subset of S is also called a chain in S.

The form of Zorn's lemma that will be used in the text is as follows:

Let S be a set with a partial ordering " <." Assume that every chain C in Shas an upper bound; in other words, there is an element x e S such thatx >_ a for all a e C. Then S has a maximal element, that is, an element m suchthat for each a e S it is not possible to have m::5 a and m 0 a.

1

Fundamentals of Measure and IntegrationTheory

In this chapter we give a self-contained presentation of the basic conceptsof the theory of measure and integration. The principles discussed here andin Chapter 2 will serve as background for the study of probability as well asharmonic analysis, linear space theory, and other areas of mathematics.

1.1 Introduction

It will be convenient to start with a little practice in the algebra of sets.This will serve as a refresher and also as a way of collecting a few resultsthat will often be useful.

Let A,, A2, ... be subsets of a set fl. If A, c A2 c ... and U , An = A,we say that the An form an increasing sequence of sets with limit A, or thatthe An increase to A; we write A. T A. If A, A2 . . . and nn , A. = A,we say that the An form a decreasing sequence of sets with limit A, or thatthe An decrease to A; we write A. . A.

The De Morgan laws, namely, (Un Any= nn An`, (nn An)` = U. An`,imply that

(1) if A I A, then Ac .i Ac; if A J, A, then Ac t Ac.

I

2 1 FUNDAMENTALS OF MEASURE AND INTEGRATION THEORY

It is sometimes useful to write a union of sets as a disjoint union. Thismay be done as follows:

Let A 1, A2, ... be subsets of f2. For each n we have

(2) U,n=1A1=Alu(A1`nA2)v(A1`nA2`nA3)...U(A1'n...An_1 r A,).

Furthermore,

n=1 ('41 n ... n A,- 1 n An).(3) U "0n= I An = UOO

In (2) and (3), the sets on the right are disjoint. If the A. form an increasingsequence, the formulas become

(4) Un=1AI=A1 u(A2_A1)u...tj(An-An-1)

and/(5) Un 1 An = Un 1 (An - A.-,)

(Take A0 as the empty set).

The results (1)-(5) are proved using only the definitions of union, inter-section, and complementation; see Problem 1.

The following set operation will be of particular interest. If A1, A2, .. .are subsets of 0, we define

(6) lim sup A. = nn I Uk n Ak.

Thus co a lim supn An iff for every n, co a Ak for some k >: n, in otherwords,

(7) co e lim supn An iff co e An for infinitely many n.

Also define

(8) Jim inf A. = Un= 1 k=,, Ak.n

Thus co a lim info An iff for some n, co e Ak for all k > n, in other words,

(9) co a lim inf, An iff co e An eventually, that is, for all but finitelymany n.

We call lim sup, An the upper limit of the sequence of sets A andlim inf, An the lower limit. The terminology is, of course, suggested by theanalogous concepts for sequences of real numbers

lim sup x, = inf sup xk,n n k2n

lim inf xn = sup inf xk .n n k-n

1.2 FIELDS, Q-FIELDS, AND MEASURES 3

See Problem 4 for a further development of the analogy.The following facts may be verified (Problem 5):

(10) (lim sup,, A")` = lim inf" A,,'

(11) (lim inf" A,,)` = Jim sup,, A,,'

(12) lim inf" A,, c lim sup,, A,,

(13) If A,,TA thenliminf"A,,=1imsup"A"=A.

In general, if lim inf. A,, = lim sup" A. = A, A is said to be the limit ofthe sequence Al i A2, ... ; we write A = lim,, A,,.

Problems

1. Establish formulas (1)-(5).2. Define sets of real numbers as follows. Let A. _ (-1/n, 1] if n is odd,

and A,, = (-1, 1/n] if n is even. Find lim sup" A. and lim inf. A".3. Let KI = R2, A. the interior of the circle with center at ((-1)"/n, 0) and

radius 1. Find lim sup" A. and lim inf. A,,.4. Let {x"} be a sequence of real numbers, and let A. _ (- co, x"). What is

the connection between lim sup"-,,,, x,, and lim sup,, A. (similarly forlim inf)?

5. Establish formulas (10)-(13).

1.2 Fields, Q-Fields, and Measures

Length, area, and volume, as well as probability are instances of themeasure concept that we are going to discuss. A measure is a set function,that is, an assignment of a number p(A) to each set A in a certain class. Somestructure must be imposed on the class of sets on which p is defined, andprobability considerations provide a good motivation for the type of struc-ture required. If S is a set whose points correspond to the possible outcomesof a random experiment, certain subsets of 0 will be called "events" andassigned a probability. Intuitively, A is an event if the question "Does CObelong to A?" has a definite yes or no answer after the experiment is per-formed (and the outcome corresponds to the point co e fl). Now if we cananswer the question " Is to e A ?" we can certainly answer the question" Is co e A'?", and if, for each i = 1, ..., n, we can decide whether or not CObelongs to A, then we can determine whether or not co belongs to U;=1 A,(and similarly for n, 1 A). Thus it is natural to require that the class of


events be closed under complementation, finite union, and finite intersection;furthermore, since the answer to the question "Is co e S2?" is always "yes,"the entire space 0 should be an event. Closure under countable union andintersection is difficult to justify physically, and perhaps the most convincingreason for requiring it is that a richer mathematical theory is obtained. Weshall have more to say about this point after we give the definition of ameasure. First, we concentrate on the underlying class of sets.

1.2.1 Definitions. Let F be a collection of subsets of a set Q. Then F iscalled a field (the term algebra is also used) if 11 a F and F is closed undercomplementation and finite union, that is,

(a) S e .F.(b) If A e JF then A` e F.(c) IfA,iA2,...,A,, e.F then U10=1AiE.1F.

It follows that F is closed under finite intersection. For if A1, ..., A. e F,then

n n c

n1 Ai =i = U1

Aic) E.F.i =

If (c) is replaced by closure under countable union, that is,

(d) If A1,A2,...e.F,then Ui'IAieF,F is called a a -field (the term a-algebra is also used). Just as above, F' isalso closed under countable intersection.

If .F is a field, a countable union of sets in .F can be expressed as thelimit of an increasing sequence of sets in ,F, and conversely. To see this, notethat if A = Un , A,,, then UA, t A; conversely, if A I A, then A =U , A . This shows that a a-field is a field that is closed under limits ofincreasing sequences.

1.2.2 Examples. The largest a-field of subsets of a fixed set S2 is the collec-tion of all subsets of 0. The smallest a-field consists of the two sets 0 and Q.

Let A be a nonempty proper subset of S2, and let .F = {0, 12, A, A`}.Then .F is the smallest a-field containing A. For if V is a a-field and A e Vr,then by definition of a a-field, Q, 0, and A` belong to V, hence F c 9.But .F is a a-field, for if we form complements or unions of sets in .F, weinvariably obtain sets in .F. Thus .F is a a-field that is included in anya-field containing A, and the result follows.

If Ai, ..., A,, are arbitrary subsets of S2, the smallest a-field containingAi, ..., A. may be described explicitly; see Problem 8.

1.2 FIELDS, a-FIELDS, AND MEASURES 5

If So is a class of sets, the smallest a-field containing the sets of So willbe written as a(.9'), and sometimes called the minimal afield over Y.

Let f1 be the set R of real numbers. Let .F consist of all finite disjointunions of right-semiclosed intervals. (A right-semiclosed interval is a set ofthe form (a, b] = {x: a < x< b}, - oo < a < b < co; by convention we alsocount (a, eo) as right-semiclosed for - oo < a < oo. The convention isnecessary because (- oo, a] belongs to ,F, and if F is to be a field, the com-plement (a, oo) must also belong to .F.) It may be verified that conditions(a)-(c) of 1.2.1 hold; and thus F is a field. But .F is not a a-field; for example,A _ (0, 1 - (1 /n)] E .°r, n = 1, 2, ..., and U' 1 A. _ (0, 1) t .F.

If f is the set R = [- oo, oo] of extended real numbers, then just as above,the collection of finite disjoint unions of right-semiclosed intervals forms afield but not a a-field. Here, the right-semiclosed intervals are sets of the form(a, b] _ {x: a < x < b}, -- oo < a < b < co, and, by convention, the sets[ - oo, b] _ {x: - co < x < b}, - oo < b < oo. (In this case the conventionis necessary because (b, eo] must belong to .F, and therefore the complement[- oo, b] also belongs to .F.)

There is a type of reasoning that occurs so often in problems involvinga-fields that it deserves to be displayed explicitly, as in the following typicalillustration.

If W is a class of subsets of S2 and A e fl, we denote by' n A the class(B n A: B e (}. If the minimal a-field over 1 is a(6) = .F, let us showthat

aA(W n A) = ,F n A,

where aA(W n A) is the minimal a-field of subsets of A over f n A. (In otherwords, A rather than S2 is regarded as the entire space.)

Now W c F, hence t n A e F n A, and it is not hard to verify that9 n A is a s-field of subsets of A. Therefore aA('f n A) c 9 n A.

To establish the reverse inclusion we must show that B n A e a4(' n A)for all B e 9. This is not obvious, so we resort to the following basic reason-ing process, which might be called the good sets principle. Let .9' be the classof good sets, that is, let .9' consist of those sets B e F such that

BcAec4(%'cA).

Since F and aA(l n A) are a-fields, it follows quickly that ,So is a a-field.But W c .9', so that a(W) c .9', hence .F = ,9' and the result follows. Briefly,every set in W is good and the class of good sets forms a a-field; consequently,every set in a('') is good.

One other comment: If f is closed under finite intersection and A E',then 'f n A = {C a W: C e A}. (Observe that if C c A, then C = C n A.)


1.2.3 Definitions and Comments. A measure on a a-field .F is a nonnegative,extended real-valued function p on F such that whenever A,, A2 , ... forma finite or countably infinite collection of disjoint sets in F, we have

p(U An) = Y p(An)n n

If p(S2).= 1, p is called a probability measure.A measure space is a triple (Q, .F, p) where n is a set, F is a a-field of

subsets of S2, and p is a measure on F. If p is a probability measure, (S2, F, p)is called a probability space.

It will be convenient to have a slight generalization of the notion of ameasure on a a-field. Let ,f be afield, p a set function on .F (a map from.F to R). We say that p is countably additive on . if whenever A,, A2, ...form a finite or countably infinite collection of disjoint sets in F whoseunion also belongs to .F (this will always be the case if F is a a-field) we have

p(U An) = Y p(A,3.n n

If this requirement holds only for finite collections of disjoint sets in F, p issaid to be finitely additive on F. To avoid the appearance of terms of theform + oo - oo in the summation, we always assume that + oo and - oocannot both belong to the range of p.

If p is countably additive and p(A) >- 0 for all A e.F, p is called a measureon F, a probability measure if µ(i2) = 1.

Note that countable additivity actually implies finite additivity. For ifp(A) _ + oo for all A e .F, or if p(A) _ - oo for all A E .F, the result isimmediate; therefore assume p(A) finite for some A e .F. By considering thesequence A, 0, 0, ..., we find that p(o) = 0, and finite additivity is nowestablished by considering the sequence A,, ..., An, 0, 0, ..., whereA,, ..., An are disjoint sets in .F.

Although the set function given by p(A) = + oo for all A e .F satisfiesthe definition of a measure, and similarly p(A) = - oo for all A e.F definesa countably additive set function, we shall from now on exclude these cases.Thus by the above discussion, we always have p(Q) = 0.

It is possible to develop a theory of measure with the countable additivityrequirement replaced by the weaker condition of finite additivity. The dis-advantage of doing this is that the resulting mathematical equipment ismuch less powerful. However, a convincing physical justification of countableadditivity has yet to be given. If the probability P(A) of an event A is torepresent the long run relative frequency of A in a sequence of performances

1.2 FIELDS, c-FIELDS, AND MEASURES 7

of a random experiment, P must be a finitely additive set function; but onlyfinitely many measurements can be made in a finite time interval, so countableadditivity is not inevitable on physical grounds. Dubins and Savage (1965)have considered certain problems in stochastic processes using only finitelyadditive set functions, and they assert that for their purposes, finite additivityavoids some of the complications of countable additivity without sacrificingpower or scope. On the other hand, at the present time almost all applica-tions of measure theory in mathematics (and physics and engineering as well)use countable rather than finite additivity, and we shall follow this practicehere.

1.2.4 Examples. Let S2 be any set, and let .F consist of all subsets of 0.Define p(A) as the number of points of A. Thus if A has n members, n =0, 1, 2, ..., then p(A) = n; if A is an infinite set, p(A) = co. The set functionp is a measure on F, called counting measure on S2.

A closely related measure is defined as follows. Let ) = (x1, x2, ...) be afinite or countably infinite set, and let p1, p2 , ... be nonnegative numbers.Take 3F as all subsets of S2, and define

p(A) _ Z pi.xj e A

Thus if A = {xi,, xi2, ...}, then p(A) =pit +pi2 + *. The set function p is ameasure on . and p{xi} = pi, i = 1, 2, .... A probability measure will beobtained if Yi pi = 1; if all pi = 1, then p is counting measure.

Now if A is a subset of R, we try to arrive at a definition of the length of A.If A is an interval (open, closed, or semiclosed) with endpoints a and b,it is reasonable to take the length of A to be p(A) = b - a. If A is a com-plicated set, we may not have any intuition about its length, but we shallsee in Section 1.4 that the requirements that p(a, b] = b - a for all a, b e R,a < b, and that p be a measure, determine p on a large class of sets.

Specifically, p is determined on the collection of Borel sets of R, denotedby I(R) and defined as the smallest ar-field of subsets of R containing allintervals (a, b], a, b e R.

Note that R(R) is guaranteed to exist; it may be described (admittedlyin a rather ethereal way) as the intersection of all a-fields containing theintervals (a, b]. Also, if a a-field contains, say, all open intervals, it mustcontain all intervals (a, b], and conversely. For

(a, b]hn1

(a, b +1)

and (a, b) _ tJ (a, b - 1l.n e=1 n

Thus .f(R) is the smallest a-field containing all open intervals. Similarly wemay replace the intervals (a, b] by other classes of intervals, for instance,

all closed intervals,all intervals [a, b), a, b e R,all intervals (a, oo), a e R,all intervals (a, oo), a e R,all intervals (- oo, b), b e R,all intervals (- oo, b], b e R.

Since a a-field that contains all intervals of a given type contains allintervals of any other type, a(R) may be described as the smallest a-fieldthat contains the class of all intervals of R. Similarly, R (R) is the smallesta-field containing all open sets of R. (To see this, recall that an open set is acountable union of open intervals.) Since a set is open if its complement isclosed, P(R) is the smallest a-field containing all closed sets of R. Finally,if °ro is the field of finite disjoint unions of right-semiclosed intervals (see1.2.2), then -4(R) is the smallest a-field containing the sets of Fo.

Intuitively, we may think of generating the Borel sets by starting withthe intervals and forming complements and countable unions and inter-sections in all possible ways. This idea is made precise in Problem 11.

The class of Borel sets of R, denoted by R(R), is defined as the smallesta-field of subsets of R containing all intervals (a, b), a, b e R. The abovediscussion concerning the replacement of the right-semiclosed intervals byother classes of sets applies equally well to R.

If Ee R(R), R(E) will denote (B a R(R): B c E); this coincides with{A n E: A e R(R)} (see 1.2.2).

We now begin to develop some properties of set functions:

1.2.5 Theorem. Let p be a finitely additive set function on the field

(a) µ(O) = 0.(b) p(AuB)+p(AnB)=p(A)+µ(B)forallA,Be.F.(c) If A, B e .F and B c A, then p(A) = p(B) + p(A -B)

(hence u(A - B) = µ(A) - p(B) if u(B) is finite, and p(B) < p(A) if

p(A - B) >- 0).

(d) If p is nonnegative,

µ(U A) < µ(A;) for all A ..., Ae.F.

If u is a measure,

u (U A) u(An)n=1 n=1

for all A,, A2, ... e . such that U,-=1 An e .F.

PROOF. (a) Pick A e .F such that p(A) is finite; then

p(A) = p(A u 0) = p(A) + µ(s71).

(b) By finite additivity,

u(A) = u(A n B) + p(A - B),p(B) = p(A n B) + p(B - A).

Add the above equations to obtain

p(A) + p(B) = u(A n B) + [p(A - B) + p(B - A) + u(A n B)]=p(AnB)+p(AuB).

(c) We may write A = B u (A - B), hence p(A) = u(B) + p(A - B).(d) We have

n

UAi=A, U(A,`nA2)U(A,`nA2`nA3)Ui=1

u(A,`n...nA'-, nAn)

[see Section 1.1, formula (2)]. The sets on the right are disjoint and

p(Ai`n...nA;-,nA,,):!g p(A,,) by (c).

The case in which p is a measure is handled using the identity (3) of SectionI.I. I

1.2.6 Definitions. A set function u defined on .F is said to be finite if p(A)is finite, that is, not ± co, for each A e .F. If u is finitely additive, it is sufficientto require that µ(i)) be finite; for Q = A U A`, and if p(A) is, say, +oo, so isu(0)

A nonnegative, finitely additive set function p on the field F is said to bea-finite on F if 0 can be written as U' , A,, where the A,, belong to F andp(A,) < oo for all n. [By formula (3) of Section 1.1, the A. may be assumeddisjoint.] We shall see that many properties of finite measures can be extendedquickly to a-finite measures.

It follows from 1.2.5(c) that a nonnegative, finitely additive set function yon a field .F is finite if it is bounded; that is, sup{ p(A) I : A e.fl < oo.


This no longer holds if the nonnegativity assumption is dropped (see Problem4). It is true, however, that a countably additive set function on a a-field isfinite if it is bounded; this will be proved in 2.1.3.

Countably additive set functions have a basic continuity property, whichwe now describe.

1.2.7 Theorem. Let p be a countably additive set function on the a-field _9'.

(a) If A1,A2,...eFand A.TA,then n-'oo.(b) If A1, A2 , .. e F, A,, , A, and µ(A1) is finite [hence is finite

for all n since µ(A1) = µ(A1 - then p(A) as n -+ oo.

The same results hold if F is only assumed to be a field, if we add the hypo-thesis that the limit sets A belong to F. [If A 0 F and p >_ 0, 1.2.5(c) impliesthat increases to a limit in part (a), and decreases to a limit in part (b),but we cannot identify the limit with µ(A).]

PROOF. (a) If oo for some n, then p(A) = µ(A -m +µ(A - A by Ak we find that p(Ak) = oo for allk n, and we are finished. In the same way we eliminate the case in which

oo for some n. Thus we may assume that all are finite.Since the A form an increasing sequence, we may use the identity (5)

of Section 1.1 :

A=A1 u(A2-A1)u...v(A,, -A,,-1)u.. .

Therefore, by 1.2.5(c),

p(A) = µ(A1) + p(A2) - µ(A1) + ... +

lim u(A,,).n-. 00

(b) If A I A, then Al - A. T Al - A, hence µ(A1 - µ(A1 - A) by(a). The result now follows from 1.2.5(c).

We shall frequently encounter situations in which finite additivity of aparticular set function is easily established, but countable additivity is moredifficult. It is useful to have the result that finite additivity plus continuityimplies countable additivity.

1.2.8 Theorem. Let p be a finitely additive set function on the field 9.


(a) Assume that it is continuous from below at each A e .F, that is, ifA1, A2 , ... e .y, A = U I A e F, and A. T A, then p(A) It followsthat p is countably additive on F.

(b) Assume that it is continuous from above at the empty set, that is, ifA1, A2 , ... e .F and A 10, then p is countablyadditive on F.

PRooF. (a) Let A1, A2 , ... be disjoint sets in JV whose union A belongs to9. If B = U"=1 A, then B. T A, hence p(A) by hypothesis. But

p(A) by finite additivity, hence p(A) = µ(A;), thedesired result.

(b) Let A1, A2, ... be disjoint sets in J whose union A belongs to .F,and let B = U7=, A; . By 1.2.5(c), p(A) = p(A - B ); but A - B. 10,so by hypothesis, p(A - 0. Thus p(A), and the result followsas in (a). I

If p, and p2 are measures on the a-field .F, then it =,u1 -- P2 is countablyadditive on F, assuming either it, or P2 is finite-valued. We shall see later(in 2.1.3) that any countably additive set function on a a-field can be expressedas the difference of two measures.

For examples of finitely additive set functions that are not countablyadditive, see Problems 1, 3, and 4.

Problems

1. Let n be a countably infinite set, and let .F consist of all subsets of 0.Define p(A) = 0 if A is finite, p(A) = oo if A is infinite.(a) Show that p is finitely additive but not countably additive.(b) Show that 52 is the limit of an increasing sequence of sets A. with

0 for all n, but p(f2) = oo.2. Let it be counting measure on 0, where S2 is an infinite set. Show that

there is a sequence of sets A 10 with 0.3. Let S2 be a countably infinite set, and let 31' be the field consisting of

all finite subsets of S2 and their complements. If A is finite, set p(A) = 0,and if A' is finite, set p(A) = 1.(a) Show that it is finitely additive but not countably additive on .9.(b) Show that S2 is the limit of an increasing sequence of sets A a ,F

with 0 for all n, but p(Q) = 1.

4. Let .F be the field of finite disjoint unions of right-semiclosed intervalsof R, and define the set function p on ,F as follows.

p(- oo, a] = a, ac-R,

p(a, b] = b - a, a, b e R, a < b,

p(b, oo) = - b, b e R,

p(R) = 0,

(,(;J1 I)i=1

if I, ... , In are disjoint right-semiclosed intervals.

(a) Show that p is finitely additive but not countably additive on F.(b) Show that it is finite but unbounded on F.

5. Let p be a nonnegative, finitely additive set function on the field F. IfA1, AZ , ... are disjoint sets in JIF and Un 1 A. e .F, show that

p(U An) > Y p(A,)n=1 n=1

6. Let f: fI -+ f", and let f be a class of subsets of S2'. Show that

a(f-'(cf)) =f-'(a(`9)),

where f -'(') = (f -'(A): A eW). (Use the good sets principle.)7. If A is a Borel subset of R, show that the smallest a-field of subsets of A

containing the sets open in A (in the relative topology inherited from R)is {Be.V(R):B=A}.

8. Let A1, ..., An be arbitrary subsets of a set Q. Describe (explicitly) thesmallest a-field F containing A1, ..., An. How many sets are there inF? (Give an upper bound that is attainable under certain conditions.)

9. (a) Let' be an arbitrary class of subsets of S2, and let ' be the collec-tion of all finite unions U"= I A , n = 1, 2, ... , where each A. isa finite intersection B1, with B, j a set in 'i or its comple-ment.

Show that T is the minimal field (not a-field) over W.(b) Show that the minimal field can also be described as the collection

2 of all finite disjoint unions Ui=1 A1, where the Al are as above.(c) If .y 1, ..., F. are fields of subsets of S2, show that the smallest

field including .F1, ... , Fn consists of all finite (disjoint) unionsof sets Al m r A. with A, e .F, , 1= 1, ..., n.

10. Let it be a finite measure on the a-field F. If An a .y, n = 1, 2, ... , andA = limn A. (see Section 1.1), show that p(A) = lim, p(An).

1.3 EXTENSION OF MEASURES 13

11. Let '1 be any class of subsets of fl, with 0,!Q e W. Define Wo =for any ordinal a > 0 write, inductively,

', and

lea = (U {HBO : f < a})',

where -9,' denotes the class of all countable unions of differences ofsets in -9.

Let 9 = U{WQ : a < /3,}, where Nt is the first uncountable ordinal,and let g be the minimal a-field over W. Since each Ta c F, we have.' c .F. Also, the Wa increase with a, and' c W. for all a.(a) Show that Y is a a-field (hence .9' =g by minimality of g).(b) If the cardinality of ' is at most c, the cardinality of the reals,

show that card g < c also.

1.3 Extension of Measures

In Section 1.2.4, we discussed the concept of length of a subset of R.The problem was to extend the set function given on intervals by µ(a, b] =b - a to a larger class of sets. If go is the field of finite disjoint unions ofright-semiclosed intervals, there is no problem extending p to go : ifA1, ..., A,, are disjoint right-semiclosed intervals, we set p(Ui=1 A) =Yi=1 p(Ai). The resulting set function on go is finitely additive, but count-able additivity is not clear at this point. Even if we can prove countableadditivity on go, we still have the problem of extending p to the minimala-field over go, namely, the Borel sets.

We are going to consider a generalization of the above problem. Insteadof working only with length, we shall examine set functions given byµ(a, b] = F(b) - F(a) where F is an increasing right-continuous functionfrom R to R. The extension technique to be developed is not restricted toset functions defined on subsets of R; we shall prove a general result concern-ing the extension of a measure from a field g o to the minimal a-field over go.

It will be convenient to consider finite measures at first, and nothing islost if we normalize and work with probability measures:

1.3.1 Lemma. Let go be a field of subsets of a set 0, and let P be a prob-ability measure on F o. Suppose that the sets A1, A2, ... belong to go andincrease to a limit A, and that the sets A1', A2', ... belong to go and increaseto A'. (A and A' need not belong to Fe .) If A A', then

lim limM_ 00 n_00

PROOF. If m is fixed, A. n An' T Am n A' = Am as n - oo, hence

P(Am n A,,') --> P(Am)

by 1.2.7(a). But P(Am n An) < P(A,,') by 1.2.5(c), hence

P(Am) = lim P(Am n AR) < urn P(A,').n- f a-.W

Let m -+ oo to finish the proof.

We are now ready for the first extension of P to a larger class of sets:

1.3.2 Lemma. Let P be a probability measure on the field .moo. Let 9r bethe collection of all limits of increasing sequences of sets in F , that is,A e 9 if there are sets A,, e Fo , n = 1, 2, ..., such that A. T A. (Note that9 can also be described as the collection of all countable unions of sets in.moo ; see 1.2.1.)

Define µ on V as follows. If A,, e .moo , n = 1, 2, ..., A. T A (e 1), setµ(A) = limn-. P(A,,); µ is well defined by 1.3.1, and p = P on .moo. Then:

(a) 0 e I and µ(o) = 0; (1 e 5 and µ(i2) = 1; 0 5 µ(A) < 1 for allAe9.

(b) If G G2 e 1r, then G, u G2, G, n G2 e 9 and µ(G, u G2) +µ'(G1 n G2) = µ(G1) + µ(G2)

(c) If G,, G2 a lr and G, c G2 , then µ(G1) < µ(G2).(d) If G,,E T, n = 1, 2, ... , and G,, T G, then G e'I and µ(G,) -9µ(G).

PROOF. (a) This is clear since µ = P on .moo and P is a probability measure.(b) Let Ant e J F0, Ant T G1; A,,2 a Fo, A,,2 T G2 . We have P(Anl U An2)

+ P(A,, n A,,2) = P(An1) + P(A,,2) by 1.2.5(b); let n -+ oo to complete theargument.

(c) This follows from 1.3.1.(d) Since G is a countable union of sets in .moo, G e V. Now for each n

we can find sets Anm E .moo, m = 1, 2, ..., with Anm T G. as m -+ oo. The situ-ation may be represented schematically as follows:

AllA21

A12 ... Alm T G,

A22 A2m T G2

A,,1 An2... Anm ... TG.

Let Dm = A,m U A2m U . u Amm (the D. form an increasing sequence).The key step in the proof is the observation that

AnmcDmcG. for n<m (1)

and, therefore,P(A"m) < P(D1) < p(Gm) for n< m. (2)

Let m -+ oo in (1) to obtain G,, c Um=, D. c G; then let n -+ oo to concludethat D. T G, hence P(Dm) -+,u(G) by definition of p. Now let m -+ oo in (2)to obtain µ(G,,) < P(Dm) < limm.,, µ(G.); then let n -> oo to con-clude that µ(G") = limm. P(Dm) = µ(G).

We now extend p to the class of all subsets of f ; however, the extensionwill not be countably additive on all subsets, but only on a smaller a-field.The construction depends on properties (a)-(d) of 1.3.2, and not on thefact that p was derived from a probability measure on a field. We expressthis explicitly as follows:

1.3.3 Lemma. Let 5 be a class of subsets of a set 0, µ a nonnegative real-valued set function on W such that IN and p satisfy the four conditions (a)-(d)of 1.3.2. Define, for each A 0,

µ*(A) = inf{p(G): G e T, G c A}.Then :

(a) µ* = µ on 9, 0 < µ*(A) < 1 for all A c f).(b) p*(A U B) + p*(A n B) :!g p*(A) + p*(B); in particular,

µ*(A) + µ*(A`) >- µ*(2) + µ*(0) = µ(Q) + p(O) = 1 by 1.3.2(a).(c) If A c B, then µ*(A) < µ*(B).(d) If A,, T A, then p*(A") -> µ*(A).

PROOF. (a) This is clear from the definition of p* and from 1.3.2(c).(b) If a > 0, choose G1, G2 e T, G, c A, G2 B, such that µ(G,) S

µ*(A) + E/2, µ(G2) < p*(B) + a/2. By 1.3.2(b),

µ*(A) + µ*(B) + s µ(G,) + p(G2) = µ(G, U G2) + µ(G1 n G2)> µ*(A U B) + µ*(A n B).

Since s is arbitrary, the result follows.(c) This follows from the definition of p*.(d) By (c), µ*(A) >_ lim".,, µ*(A"). If e > 0, for each n we may choose

G,, e T, G,, c A , such that

µ(G") < s2-".

Now A = U 1 An e U°= 1 G a wa; hence

u*(A) <_ u* (U Gn) by (c)n-1

_ u U Gn) by (a)n= 1

= lim u(U Gk) by 1.3.2(d).k=1

The proof will be accomplished if we prove that

µ(U G) C u*(An) + e Y=1 i=1

n=1,2, ...

This is true for n = 1, by choice of G1. If it holds for a given n, we apply1.3.2(b) to the sets U"=1 G, and to obtain

n+l n

1n

u(U G1) = u(U Gi) + u (U Gi) n Gn+ 1] .

i=1 1=1 i=1

Now (U 1 Gi) n Gn+1 G. n Gn+1 An n An+1 = An, so that the induc-tion hypothesis yields

>u(nU1 G.) <_ P*(An) +2_,

+ µ*(An+ 1) +e2-<n+ 1> _ u*(An)

1=1 i=1

n+Iu*(An+ 1) + e F 2-`. 1

i=1

Our aim in this section is to prove that a a-finite measure on a field -F.has a unique extension to the minimal a-field over .°Fo. In fact an arbitrarymeasure u on F0 can be extended to a(9o), but the extension is not neces-sarily unique. In proving this more general result (see Problem 3), the follow-ing concept plays a key role.

1.3.4 Definition. An outer measure on S2 is a nonnegative, extended real-valued set function ) on the class of all subsets of S2, satisfying

(a) 1(Q) = 0,

(b) A c B implies 2(A) :S 2(B) (monotonicity),(c) 2(U 1

A < , (countable subadditivity).

The set function u* of 1.3.3 is an outer measure on S2. Parts 1.3.4(a)and (b) follow from 1.3.3(a), 1.3.2(a), and 1.3.3(c), and 1.3.4(c) is proved asfollows:

u*(n= U An) = Jim u*(U Ai) by 1.3.3(d).1 n_00 i=1

< Jim u*(A;) by 1.3.3(b),R

n-+ooj=1

as desired.We now identify a a-field on which u* is countably additive:

1.3.5 Theorem. Under the hypothesis of 1.3.3, let

.;e' ={HcS2:u*(H)+,u*(H`)=1}

(9 c . by 1.3.2(a) and (b) and . i° = {H c Q: u*(H) + u*(H`) < 1) byI.3.3(b)}. Then .° is a a-field and u* is a probability measure on 0.

PROOF. Clearly A' is closed under complementation, and S2 e .° by 1.3.3(a)and 1.3.2(a). If H1, H2 c 0, then by 1.3.3(b),

u*(Hl u H2) + p*(H1 n H2) :: u*(H1) + u*(H2)

and since

(1)

(H1 u H2)` = Hl` n H2`, (H1 n H2)` = Hl` V H2`,

u*(Hl v H2)` + u*(Hj n HZ)` _< u*(HI") + u*(H2`) (2)

If H1, H2 E 0, add (1) and (2); the sum of the left sides is at least 2 by 1.3.3(b),and the sum of the right sides is 2. Thus the sum of the left sides is 2 as well.If a= u*(H1 u H2) + u*(Hi u H2)`, b = u*(H1 n H2) + u*(H, n H2)c, thena + b = 2, hence a:5 I or b:!5; 1. If a:5 1, then a = 1, so b = I also. Con-sequently H1 v H2 e .e and H1 n H2 e . °. We have therefore shown that. is a field. Now equality holds in (I), for if not, the sum of the left sidesof (1) and (2) would be less than the sum of the right sides, a contradiction.Thus p* is finitely additive on Y f.

To show that . is a a-field, let Hn a .)2, n = 1 , 2, ... , H I H; u*(H) +µ*(H`) > 1 by 1.3.3(b). But u*(H) = limn-.0 u*(H,,) by 1.3.3(d), hence forany e > 0, a*(H) < u*(H,,) + e for large n. Since u*(H`) < u*(H,`) for all nby 1.3.3(c), and Hn e ..°, we have u*(H) + u*(H`) < 1 + c. Since a is arbi-trary, He .e, making . a a-field.

Since u* is countably additive by 1.2.8(a). I

We now have our first extension theorem:

1.3.6 Theorem. A finite measure on a field Fo can be extended to a measureon

PROOF. Nothing is lost by considering a probability measure. The result thenfollows from 1.3.1-1.3.5 if we observe that moo c I c -me, hence a(F0) e . r.Thus p* restricted to a(°Fo) is the desired extension.

In fact there is very little difference between a(Fo) and Jr; if B e 0,then B can be expressed as A u N, where A e a(F0) and N is a subset of aset M E a(Fo) with p*(M) = 0. To establish this, we introduce the idea ofcompletion of a measure space.

1.3.7 Definitions. A measure p on a or-field F is said to be complete ifwhenever A e .F and p(A) = 0 we have B e .F for all B e A.

In 1.3.5, p* on . is complete, for if B c A E .-l°, p*(A) = 0, thenp*(B) + p*(B`) < p*(A) + p*(B`) = p*(B`) < 1; thus B c-.*,.

The completion of a measure space (S2, p) is defined as follows. Let.Fµ be the class of sets A u N, where A ranges over .F and N over all subsetsof sets of measure 0 in F.

Now Fµ is a a-field including F, for it is clearly closed under countableunion, and if A u N e F, N e M e ,F, p(M) = 0, then (A u N)` = A` n N` =(A`r M`)u(A`r (N`-M`)) and A`n(N`-M`)=A`n(M-N)cM,so (AuN)`EF,.

We extend p to .F,, by setting u(A u N) = p(A). This is a valid definition,for if A, u N, = A2 u N2 e .°F,,, we have

p(A1) = p(A1 n A2) + p(Al - A2) = p(A1 n A2)

since A, - A2 c N2. Thus p(A,) < p(A2), and by symmetry, p(A1) = p(A2).The measure space (0, p) is called the completion of (S2, F, p), and .Fthe completion of .F relative to p.

Note that the completion is in fact complete, for if M A u N E F uwhere A e .F, p(A) = 0, N e B e F, p(B) = 0, then M c A u B e F,p(A u B) =0; hence M c.f,,.

1.3.8 Theorem. In 1.3.6, (S), A , p*) is the completion of (0, a(PFO), p*).

PROOF. We must show that A' = where .9' = a(JFo). If A e .$f, bydefinition of u*(A) and µ*(A`) we can find sets Gn, Gn' a a(Fo), n = 1, 2,...,with G c A c Gn' and p*(Gn) -+ p*(A), µ*(A). Let G = Un , Gn,G' = n , G,'. Then A = G u (A - G), G e a(.y o), A - G c G' - G eµ*(G' - G) < p*(Gn' - Gn) - 0, so that u*(G' - G) = 0. Thus A e F,,,.

Conversely if B E .Fu then B = A u N, A E g', N c M e g', u*(M) = 0.Since F we have A e ff, and since (0, .yto,.u*) is complete we haveN e. Thus B e ..

To prove the uniqueness of the extension from .moo to R', we need thefollowing basic result:

1.3.9 Monotone Class Theorem. Let Pro be a field of subsets of 0, and W aclass of subsets of CI that is monotone (if An E' and A. T A or An , A, thenA e '). If 9 z) go, then f a(go), the minimal or-field over go.

PROOF. The technique of the proof might be called "boot strapping." LetF = a(9o) and let A be the smallest monotone class containing all sets ofmoo. We show that A = F, in other words, the smallest monotone class andthe smallest a -field over afield coincide. The proof is completed by observingthat AaW.

Fix A e A' and let A'A = {B e A: A r B, A r' BC and A` r B e A}; then.'A is a monotone class. In fact A'A = A'; for if A e go, then Fo c ..ffAsince go is a field, hence A' c '#A by minimality of A; consequently'lfA = ..ff. But this shows that for any B e A we have A n B, A n B`,A` n B e .' for any A e °Fo, so that A'B Fo. Again by minimality of A,A'B=A'.

Now A is a field (for if A, B e A = A'A, then A n B, A n B`,A` n B e A') and a monotone class that is also a field is a a -field (see 1.2.1),hence A is a a-field. Thus g' c A by minimality of Pte, and in fact g' = Abecause g' is a monotone class including Fo .

We now prove the fundamental extension theorem.

1.3.10 Caratheodory Extension Theorem. Let µ be a measure on the field.moo of subsets of CI, and assume that p is a-finite on Fo, so that C1 can bedecomposed as U n , An, where A. e Fo and p(A,) < oo for all n. Then phas a unique extension to a measure on the minimal a-field F over go.

PROOF. Since 5ro is a field, the A. may be taken as disjoint (replace An byA,` n n A'_, n A,,, as in formula (3) of 1.1). Let pn(A) = p(A n An),A e Fo ; then pn is a finite measure on Fo , hence by 1.3.6 it has an extensionpn* to .F. Since A = TLn pn, the set function p* = Y-n pn* is an extension of p,and it is a measure on ,F since the order of summation of any double seriesof nonnegative terms can be reversed.

Now suppose that A is a measure on F and A = p on .°F . Define 1. (A) _(A n An), A e.F. Then A. is a finite measure on `F and A,, = pn = pn* onF., and it follows that An = pn* on F. For'' = {A e.9': ),.(A) = pn*(A)} isa monotone class (by 1.2.7) that contains all sets of .moo. hence ' = JF by1.3.9. But then A = En An = In pn* = p*, proving uniqueness. I

The intuitive idea of constructing a minimal a-field by forming comple-ments and countable unions and intersections in all possible ways suggeststhat if Fo is a field and .F = a(.Fo), sets in .F can be approximated in somesense by sets in F0. The following result formalizes this notion:

1.3.11 Approximation Theorem. Let (11, .F, p) be a measure space, andlet Fo be a field of subsets of fI such that a(.Fo) _ F. Assume that p isa-finite on Fo, and let e > 0 be given. If A e .F and p(A) < oD, there is aset B e .moo such that p(A A B) < e.

PROOF. Let I be the class of all countable unions of sets of F,. The conclu-sion of 1.3.11 holds for any A e QF, by 1.2.7(a). By 1.3.3, if p is finite and A E .F,A can be approximated arbitrarily closely (in the sense of 1.3.11) by a set in V,and therefore 1.3.11 is proved for finite p. In general, let Q be the disjointunion of sets A. a Fo with p(An) < oD, and let pn(C) = p(C n An), C c- .F.

Then pn is a finite measure on .F, hence if A E F, there is a set B. e .Fosuch that pn(A A Bn) < e2 Since

pn(A A Bn) = p((A A B,,) n An)

= p[(A A (Bn n An)) n An] = pn(A A (Bn n An)),

and B. n A. E .moo, we may assume that B. c An. (The observation thatBn n A. E Fo is the point where we use the hypothesis that p is a-finite on,fo, not merely on .F.) If C = Un , Bn, then C n A. = Bn, so that

pn(A A C) = p((A A C) n An) = p((A A Bn) n An) = pn(A A Bn),

hence p(A A C) _ yN pn(A A C) < e. But f "_, Bk - A T C - A as N-* oo,

and A - U,"c=1 Bk J A - C. If A e .F and p(A) < oo, it follows from 1.2.7that p(A A Uk 1 Bk) -> p(A A C) as N -+ oo, hence is less than e for largeenough N. Set B = U Bk a .F0

1.3.12 Example. Let S2 be the rationals, .F° the field of finite disjointunions of right-semiclosed intervals (a, b) = (to E 0: a < co < b}, a, b rational[counting (a, oo) and S2 itself as right-semiclosed; see 1.2.2]. Let F = a(.F0).Then :

(a) F consists of all subsets of 0.(b) If p(A) is the number of points in A (p is counting measure), then p is

a-finite on .F but not on `9`0.(c) There are sets A c =.F of finite measure that cannot be approximated

by sets in .F°, that is, there is no sequence A. E .F° with p(A A A.)-+ 0.(d) If A = 2p, then .. = p on .FO but not on .F.

Thus both the approximation theorem and the Caratheodory extensiontheorem fail in this case.

PROOF. (a) We have {x} = An 1 (x - (1/n), x], and therefore all singletonsare in F. But then all sets are in R since S2 is countable.

(b) Since 0 is a countable union of singletons, p is a-finite on F. Butevery nonempty set in .F° has infinite measure, sop is not a-finite on .F°.

(c) If A is any finite nonempty subset of fl, then p(A n B) = oo for allnonempty B e .Fo, since any nonempty set in F° must contain infinitelymany points not in A.

(d) Since ,l{x) = 2 and p{x} = 1, .I: p on F. But A(A) = p(A) = oo,A e `F° (except for A= 0). 1

Problems

1. Let (S2, .F, p) be a measure space, and let F, be the completion of Frelative to p. If A c S2, define

p0(A) = sup{p(B) : B e F, B c A}, p°(A) = inf{p(B) : B e .F, B A}.

If A e F,,, show that p0(A) = p°(A) = p(A). Conversely, if p0(A) _p°(A) < oo, show that A E .FµF.

2. Show that the monotone class theorem (1.3.9) fails if .F0 is not assumedto be a field.


3. This problem deals with the extension of an arbitrary (not necessarilya-finite) measure on a field.(a) Let 1 be an outer measure on the set S2 (see 1.3.4). We say that the

set E is 1-measurable if

%(A) = i(A n E) + 1(A n E`) for all A c 12.

(The equals sign may be replaced by " > " by subadditivity of A.)If .11 is the class of all 1-measurable sets, show that ..ll is a a-field,and that if E,, E2 , ... are disjoint sets in X whose union is E, andA c S2, we have

1(A n E) = Y 1(A n En).n

(1)

In particular, A is a measure on [Use the definition of 1-measur-ability to show that is a field and that (1) holds for finite se-quences. If E,, E2, ... are disjoint sets in # and F. = U°=1 E. ? E,show that

n

1(A) >_ 1(A n Fn) + 1(A n E`) Y 1(A n E) + 1(A n E),1=1

and then let n -+ oo.](b) Let Ic be a measure on a field Fo of subsets of 0. If A e fl, define

µ*(A) = inf (>.(E):A c U En,n n 111

Show that u* is an outer measure on Cl and that u* = u on Fo.(c) In (b), if f is the class of p*-measurable sets, show that .Fo c .ill.

Thus by (a) and (b), µ may be extended to the minimal a-field over.Fo

(d) In (b), if µ is a-finite on moo, show that (S2, .,lf, Ic*) is the completionof (Cl, a(,o), µ*).

1.4 Lebesgue-Stieltjes Measures and Distribution Functions

We are now in a position to construct a large class of measures on theBorel sets of R. If F is an increasing, right-continuous function from R to R,we set p(a, b] = F(b) - F(a); we then extend p to a finitely additive set func-tion on the field FO(R) of finite disjoint unions of right-semiclosed intervals.If we can show that p is countably additive on O(R), the Caratheodoryextension theorem extends p to G(R).

1.4 LEBESGUE-STIELTJES MEASURES AND DISTRIBUTION FUNCTIONS 23

1.4.1 Definitions. A Lebesgue-Stieltjes measure on R is a measure p onf(R) such that p(1) < oo for each bounded interval I. A distribution functionon R is a map F: R -' R that is increasing [a < b implies F(a) < F(b)] and right-continuous [llmx_,x0 F(x) = F(xo)]. We are going to show that the formulap(a, b] = F(b) - F(a) sets up a one-to-one correspondence between Lebesgue-Stieltjes measures and distribution functions, where two distribution functionsthat differ by a constant are identified.

1.4.2 Theorem. Let p be a Lebesgue-Stieltjes measure on R. Let F:R -> Rbe defined, up to an additive constant, by F(b) - F(a) = p(a, b]. [For example,fix F(O) arbitrarily and set F(x) - F(O) = p(0, x], x > 0; F(0) - F(x) =µ(x,0], x < 0.] Then F is a distribution function.

PROOF. If a < b, then F(b) - F(a) = p(a, b] -> 0. If is a sequence ofpoints such that x, > x2 > -> x, then F(x) p(x, 0 by1.2.7(b). I

Now let F be a distribution function on R. It will be convenient to workin the compact space k, so we extend F to a map of R into R by definingF(oo) = limx, F(x), F(- oo) = 1imx. _ , F(x); the limits exist by monotoni-city. Define p(a, b] = F(b) - F(a), a, b e k, a < b, and let p[- oo, b] = F(b)- F(- oo) = p(- oo, b]; then p is defined on all right-semiclosed intervals ofk (counting [- oo, b] as right-semiclosed; see 1.2.2).

If I,, ..., Ik are disjoint right-semiclosed intervals of R, we definep(U;=, Ij) = Y;_, p(I j). Thus p is extended to the field F (R) of finitedisjoint unions of right-semiclosed intervals of k, and p is finitely additive onwo(k). To show that u is in fact countably additive on .po(R), we make use of

1.2.8(b), as follows:

1.4.3 Lemma. The set function u is countably additive on .FO(R).

PROOF. First assume that F(oo) - F(- oo) < oo, so that p is finite. Let A,,A2, ... be a sequence of sets in y o(R) decreasing to 0. If (a, b] is one of theintervals of A,,, then by right continuity of F, p(a', b] = F(b) - F(a) -F(b) - F(a) = p(a, b] as a' -+ a from above.

Thus we can find sets B e .°.0(R) whose closures B,, (in R) are included inA,,, with p(B,,) approximating p(A,,). If e > 0 is given, the finiteness of pallows us to choose the B,, so that p(A,,) - 82-°. Now n,=, B. = 0,and it follows that nk=, Bk = 0 for sufficiently large n. (Perhaps the easiestway to see this is to note that the sets R - B. form an open covering of the

compact set RR, hence there is a finite subcovering,so that Uk=1(R - Bk) = Rfor some n. Therefore fk=1 Bk = 0.) Now

=.u A,, - k°1B)

+ kl B)

= E1(An - n Bk)k=1

p(U (Ak-Bk)) sincek=1

< p(Ak - Bk) by 1.2.5(d)k=1

< S.

Thus 0.Now if F(oo) - F(- oo) = co, define F(x) IxI < n; FF(x) = F(n),

x >- n; F(-n), x< -n. If p is the set function corresponding toF, , then µ - Y.' , (Problem 5,Section 1.2) so if L 1 co, we are finished. If n 1 co, then

p(A) = limn-oo

W

= lim Yk=1

since the p are finite. Now since 1 p(Ak) < oo, we may write

000 < µ(A) - Y p(Ak)k=1

W= lim Y [pn(Ak) - p(Ak)]

n-ao k=1

< 0 since µ < p.

We now complete the construction of Lebesgue-Stieltjes measures:

1.4.4 Theorem. Let F be a distribution function on R, and let µ(a, b] =F(b) - F(a), a < b. There is a unique extension of p to a Lebesgue-Stieltjesmeasure on R.

PROOF. Extend u to a countably additive set function on .F,(R) as above.Let ,F0(R) be the field of all finite disjoint unions of right-semiclosed intervalsof R [counting (a, oo) as right-semiclosed; see 1.2.2], and extend p to .po(R)

as in the discussion after 1.4.2. [Take p(a, oo) = F(oo) - F(a); p(- oo, b] _F(b) - F(- oc), a, b e R; p(R) = F(oc) - F(- oo); note that there is no otherpossible choice for it on these sets, by 1.2.7(a).] Now the map

if a,beR or if beR, a=-co,(a, oo] -+(a, oo] if a e R or if a = - oo.

sets up a one-to-one, p-preserving correspondence between a subset of °L o(R)(everything in .Fo(R) except sets including intervals of the form [- oo, b])and .po(R). It follows that p is countably additive on .po(R). Furthermore, pis a-finite on .po(R) since p(-n, n] < oo; note that p need not be a-finiteon .FO(K) since the sets (-n, n] do not cover R. By the Caratheodory exten-sion theorem, p has a unique extension to e(R). The extension is a Lebesgue-Stieltjes measure because p(a, b] = F(b) - F(a) < oo for a, b e R, a < b. I

1.4.5 Comments and Examples. If F is a distribution function and p thecorresponding Lebesgue-Stieltjes measure, we have seen that p(a, b] = F(b)- F(a), a < b. The measure of any interval, right-semiclosed or not, may beexpressed in terms of F. For if F(x-) denotes limy_,x_ F(j,), then

(I) p(a, b] = F(b) - F(a), (3) p[a, b] = F(b) - F(a-),(2) p(a, b) = F(b-) - F(a), (4) p[a, b) = F(b-) - F(a-).

(Thus if F is continuous at a and b, all four expressions are equal.) For ex-ample, to prove (2), observe that

p(a, b) = limp (a, b - 1-1= lim [F(b - F(a) = F(b-) - F(a).

n- 00 n n-..0 n

Statement (3) follows because

pfa, b] = lim p(a - 1, bJ = lim[F(b) - F(a -1)] = F(b) - F(a );n_00 n n-'oo LL n

(4) is proved similarly. The proof of (3) works even if a = b, so that p{x} _F(x) - F(x-). Thus

(5) F is continuous at x if p{x} = 0; the magnitude of a discontinuity ofF at x coincides with the measure of {x}.

The following formulas are obtained from (1)-(3) by allowing a toapproach - oo or b to approach + oo.

(6) p(- oo, x] = F(x) - F(- oo), (9) p[x, oo) = F(oo) - F(x ),(7) p(-oo, x) = F(x-) - F(-oo), (10) p(R) = F(oo) - F(-oo).(8) p(x, oo) = F(oo) - F(x),

(The formulas (6), (8), and (10) have already been observed in the proofof 1.4.4.)

If p is finite, then F is bounded; since F may always be adjusted by anadditive constant, nothing is lost in this case if we set F(- co) = 0.

We may now generate a large number of measures on R(R). For example,if f : R R, f >_ 0, and f is integrable (Riemann for now) on any finite interval,then if we fix F(O) arbitrarily and define

xF(x) - F(0) = f0 f(t) di, x > 0;

oF(0) - F(x) = f f (t) dt, x < 0,

x

then F is a (continuous) distribution function and thus gives rise to a Lebes-gue-Stieltjes measure; specifically,

u(a, b] = ff (X)

dx.b

a

In particular, we may take f(x) = 1 for all x, and F(x) = x; then u(a, b) =b - a. The set function u is called the Lebesgue measure on R(R). The com-pletion of R(R) relative to Lebesgue measure is called the class of Lebesguemeasurable sets, written A (R). Thus a Lebesgue measurable set is the unionof a Borel set and a subset of a Borel set of Lebesgue measure 0. The extensionof Lebesgue measure to R(R) is called "Lebesgue measure" also.

Now let u be a Lebesgue-Stieltjes measure that is concentrated on acountable set S = {x1, X2, ...), that is, u(R - S) = 0. [In general if (1), F, ,u)is a measure space and B e .F, we say that p is concentrated on B if p(Q - B)= 0.] In the present case, such a measure is easily constructed: If al, a2, ...are nonnegative numbers and A c R, set p(A) = F{a; : x, e A}; p is a measureon all subsets of R, not merely on the Bore] sets (see 1.2.4). If u(I) < oo foreach bounded interval I, u will be a Lebesgue-Stieltjes measure on .'(R);if Y; a, < oc, p will be a finite measure. The distribution function F corre-sponding to u is continuous on R - S; if a > 0, F has a jump at xof magnitude a.. If x, y e S and no point of S lies between x and y, then F isconstant on [x, y). For if x < b < y, then F(b) - F(x) = p(x, b] = 0.

Now if we take S to be the rational numbers, the above discussion yieldsa monotone function F from R to R that is continuous at each irrationalpoint and discontinuous at each rational point.

If F is an increasing, right-continuous, real-valued function defined on aclosed bounded interval [a, b], there is a corresponding finite measure u onthe Borel subsets of [a, bJ; explicitly, u is determined by the requirement thatp(a', b'J = F(b') - F(a'), a < a' < b' < b. The easiest way to establish the

correspondence is to extend F by defining F(x) = F(b), x >- b ; F(x) = F(a),x< a; then take p as the Lebesgue-Stieltjes measure corresponding to F,restricted to 2[a, b].

We are going to consider Lebesgue-Stieltjes measures and distributionfunctions in Euclidean n-space. First, some terminology:

1.4.6 Definitions and Comments. If a = (al, ..., a"), b = (b1i ..., b") E R",the interval (a, b] is defined as {x = (x1, ... , x") a R": ai < x1:!9 bi for all i =1, ..., n}; (a, oo) is defined as {x a R": x1 > ai for all i = 1, ..., n}, (- co, b]as {x a R": x; < bi for all i = 1, ..., n}; other types of intervals are definedsimilarly. The smallest a-field containing all intervals (a, b], a, b e R", is calledthe class of Borel sets of R", written R(R"). The Borel sets form the minimala-field over many other classes of sets, for example, the open sets, the intervals[a, b), and so on, exactly as in the discussion of the one-dimensional case in1.2.4. The class of Borel sets of R", written f(R"), is defined similarly.

A Lebesgue-Stieltjes measure on R" is a measure p on a(R") such thatp(I) < oo for each bounded interval I.

The notion of a distribution function on R", n >- 2, is more complicatedthan in the one-dimensional case. To see why, assume for simplicity that n = 3,and let p be a finite measure on d(R3). Define

F(x1, X2, x3) = p{w e R3: wl G x1, C02 < X2, 0)3 < X3},

(x1, X2, x3) a R3.

By analogy with the one-dimensional case, we expect that F is a distributionfunction corresponding top [see formula (6) of 1.4.51. This will turn out to becorrect, but the correspondence is no longer by means of the formula p(a, b] _F(b) - F(a). To see this, we compute p(a, b] in terms of F.

Introduce the difference operator A as follows: If G: R'-+ R,Lb,a, G(xl,... , x") is defined as

G(xl, ..., xi-1, b1, xi+1, ..., x") - G(xl,..., xi-1, ai, xi+1, ..., x").

1.4.7. Lemma. If a < b, that is, a, < bi, i = 1, 2, 3, then

(a) p(a, b] = Lb,a, LLb2 aax b3 a3 F(x1, X2, x3), where((b) X3)

= F(b1,, b2, b3) - F(al, b2, b3) - F(bl, a2, b3) - F(b1, b2, a3)+ F(a1, a2, b3) + F(al, b2, a3) + F(bl, a2, a3)

- F(a1, a2, a3)

Thus p(a, b] is not simply F(b) - F(a).

28

PROOF.

1 FUNDAMENTALS OF MEASURE AND INTEGRATION THEORY

(a) Lb3"3 F(x1, x2 , x3)

Similarly,

and

= F(x1, X2, b3) - F(x1, x2 , a3)

X 1 cot 5x2, w3 < b3}- u{w: w1 < x1i w2 < x2, w3 < a3}

= JL{w: w1 C X1, w2 < X2, a3 < w3 <b3}

since a3 < b3.

Z\b2 az Obi a3 F(x1, x2 , x3) _ p{w: (01 < x1,

a3 < w3 < b3}

a2 < w2 < b2 ,

Obla, L\bz az L\b3 a3 F(X1, X2, x3) = p{w: a1 < w1 < b1, a2 < w2 < b2 ,

a3 < w3 < b3}.

(b) L b,a,F(x1,x2,x3)=F(xl,x2,b3)-F(xl,x2,a3),.b2 a20b, a3 F(x 1, x2 , x3) = F(x1, b2, b3) - F(x 1, a2, b3)

- F(x1, b2, a3) + F(x1, a2, a3)

Thus Qbla, Qb, az Qb, a, F(x1, x2 , x3) is the desired expression. I

The extension of 1.4.7 to n dimensions is clear.

1.4.8 Theorem. Let u be a finite measure on P4(R") and define

F(x)=p(-oo,x]=u{w:w;< x,, i= 1,...,n}.If a < b, then

(a) u(a, b] _ Ab,a, ... Lb a" F(x 1 i ... , xa), where

(b) Ob,a, ... Qb.aJ(Xl,...,xa)=Fo-F1+F2-F3+...+(-1)nFai

Fi is the sum of all (") terms of the form F(c1, ... , ca) with Ck = atfor exactly i integers in {1, 2, ..., n}, and ck = bk for the remaining n - iintegers.

PROOF. Apply the computations of 1.4.7. 1

We know that a distribution function on R determines a correspondingLebesgue-Stieltjes measure. This is true in n dimensions if we change thedefinition of increasing.

Let F: R" -- R, and, for a < b, let F(a, b] denote

Lbia, . F(Xl, ... , X").

The function F is said to be increasing if F(a, b] > 0 whenever a < b;F is right-continuous if it is right-continuous in all variables together, in otherwords, for any sequence x' >- x2 >- >- xk z - - -> x we have F(xk) -+ F(x).

An increasing right-continuous F: R"-+ R is said to be a distributionfunction on R. (Note that if F arises from a measure p as in 1.4.8, F is a dis-tribution function.)

If F is a distribution function on R", we set p(a, b] = F(a, b] (this reducesto F(b) - F(a) if n = 1). We are going to show that it has a unique extensionto a Lebesgue-Stieltjes measure on R. The technique of the proof is the samein any dimension, but to avoid cumbersome notation and to capture theessential ideas, we sometimes specialize to the case n = 2. We break the argu-ment into several steps:

(1) If a < a' < b' < b, I = (a, b] is the union of the nine disjoint inter-vals I,, ..., 19 formed by first constraining the first coordinate in one of thefollowing three ways:

a, <x<a1', a,' <x<b,', b1' <x<b1,

and then constraining the second coordinate in one of the following threeways:

a2 < y < a2', a2' < y G b2', b2' < y < b2.

For example, a typical set in the union is

{(x, y): b1' < x < b1, a2 < y S a2,};

in n dimensions we would obtain 3" such sets.The result (1) may be verified by looking at Fig. 1.1.

(2) In (1), F(1) F(1,), hence a < a' < b' < b implies F(a', b']< F(a, b].

This is verified by brute force, using 1.4.8.Now a right-semiclosed interval (a, b] in R" is, by convention, a set of the

form {(x1, ... , x"): a; < x,:5 b, , i = 1, ... , n}, a, b e R", with the provisothat ai < x; < b; can be replaced by a; < x, < b, if a, _ - oo. With thisassumption, the set ,°Fo(R") of finite disjoint unions of right-semiclosed inter-vals is a field. (The corresponding convention in R" is that a, < x; < b; can

b2

t,

02

02 0I'b,G1

X

Figure 1.1.

be replaced by a; < x; < bi if b, = + oo. Both conventions are dictated byconsiderations similar to those of the one-dimensional case; see 1.2.2.)

(3) If a and b belong to R" but not to R", we define F(a, b] as the limit ofF(a,' b'] where a', b' e R", a' decreases to a, and b' increases to b. [The defini-tion is sensible because of the monotonicity property in (2).] Similarly if a e R",beR"-R", we take F(a, b] =limb..,bF(a,b'];ifaeR"-R", beR",F(a,b}=

b].

Thus we define p on right-semiclosed intervals of R"; p extends to a finitelyadditive set function on as in the discussion after 1.4.2. [There is aslight problem here; a given interval I may be expressible as a finite disjointunion of intervals I,, ..., I so that for the extension to be well defined wemust have F(1) = F j=, F(I); but this follows just as in (2).]

(4) The set function p is countably additive on `;N O(R").

First assume that p(R") is finite. If a e R", F(a', b] -- F(a, b} as a' decreasesto a by the right-continuity of F and 1.4.8(b); if a E R" - R", the same resultholds by (3). The argument then proceeds word for word as in 1.4.3.

Now assume p(R") = oo. Then F, restricted to Ck = {x: - k < x; < k,i = I, ..., n), induces a finite-valued set function µk on PFO(R) that is concen-trated on Ck, so that uk(B) = pk(B n Ck), B e .,zo(R"). Since µk < p and pk -+ ,u

on .°Fo(R"), the proof of 1.4.3 applies verbatim.

1.4.9 Theorem. Let F be a distribution function on R, and let µ(a, b] =F(a, b], a, b e R", a S b. There is a unique extension of p to a Lebesgue-Stieltjes measure on R".

PROOF. Repeat the proof of 1.4.4, with appropriate notational changes. Forexample, in extending y to F o(R"), the field of finite disjoint unions of right-semiclosed intervals of R", we take (say for n = 3)

p{(x, y, z): a1 < x < b1, a2 < y < oo, a3 < z < oo} = lim F(a, b].b2, b3-. co

The one-to-one p-preserving correspondence is given by

(a, b] -- (a, b] if a, b e R"or if b e R" and at least one component of

ais -oo;

also, if the interval {(x1, ..., x"): a; < x; <- bi: i = 1, ..., n) has some bt = oo,the corresponding interval in R" has a; < xt < oo. The remainder of the proofis as before. I

1.4.10 Examples. (a) Let F1, F2, ..., F" be distribution functions on R,and define F(xl, ..., x") = F1(x1)F2(x2) ... F"(x"). Then F is a distributionfunction on R" since

"F(a, b] = 11 [F;(bi) - Fi(a)]

i=1

In particular, if F,(xt) = xt, i = 1, ..., n, then each F: corresponds to Lebes-gue measure on J(R). In this case we have F(xl, ..., x") = x1x2 x" andµ(a, b) = F(a, b] = 11;=1 (bi - at). Thus the measure of any rectangular boxis its volume; p is called Lebesgue measure on R(R"). Just as in one dimension,the completion of R(R") relative to Lebesgue measure is called the class ofLebesgue measurable sets in R, written R(R").

(b) Let f be a nonnegative function from R" to R such that

00 00

'... f_0Df(xl,..., x") dx1 ... dx" < oo.f_

(For now, we assume the integration is in the Riemann sense.) Define

F(x) = f f(t) dt,(-a0,x1

that is,

f f ...f f(t1,...,t")dt1...dt".

ao - ao

Then

Ab"an F(x1, ... , x") = J _ ... f_J

f(t1, ..., t") dt1 ... dt",ao 00 .f

and we find by repeating this computation that

b, bF(a,b]= f f f(t,,...,t")dt1 dt".

a,

Thus F is a distribution function. If p is the Lebesgue-Stieltjes measure de-termined by F, we have

p(a, b] = fa,

dx.(a, bl

We have seen that if F is a distribution function on R", there is a uniqueLebesgue-Stieltjes measure determined by p(a, b] = F(a, b), a < b. Also,if p is a finite measure on R(R") and F(x) = p(- co, x], x e R, then F is a dis-tribution function on R" and p(a, b] = F(a, b], a < b. It is possible to associatea distribution function with an arbitrary Lebesgue-Stieltjes measure on R",and thus establish a one-to-one correspondence between Lebesgue-Stieltjesmeasures and distribution functions, provided distribution functions with thesame increments F(a, b], a, b e R", a < b, are identified. The result will notbe needed, and the details are quite tedious and will be omitted.

Problems

1. Let F be the distribution function on R given by F(x) = 0, x < -1;F(x) = 1 +x, -1 <x <0; F(x) =2+x2, 0:5 x <2; F(x) = 9, x2.If p is the Lebesgue-Stieltjes measure corresponding to F, computethe measure of each of the following sets:(a) {2}, (d) [0, #) u (1, 2],(b) [- #, 3), (e) {x: I x I + 2x2 > 1).(c) (-1,0]u(1,2),

2. Let p be a Lebesgue-Stieltjes measure on R corresponding to a contin-uous distribution function.(a) If A is a countable subset of R, show that µ(A) = 0.(b) If p(A) > 0, must A include an open interval?(c) If p(A) > 0 and p(R - A) = 0, must A be dense in R?(d) Do the answers to (b) or (c) change if p is restricted to be Lebesgue

measure?3. If B is a Borel set in R" and a e R", show that a + B = {a + x: x e B}

is a Bore] set, and -B = {-x: x e B) is a Bore] set. (Use the good setsprinciple.)

4. Show that if B c -2(R"), a e R", then a + B E R(R") and p(a + B) = II(B),where p is Lebesgue measure. Thus Lebesgue measure is translation-invariant. (The good sets principle works here also, in conjunctionwith the monotone class theorem.)

5. Let p be a measure on .(R") such that p(a + 1) = p(1) for all a c- R"and all (right-semiclosed) intervals I in R". In other words, p is transla-tion-invariant on intervals. Show that p is a constant times Lebesguemeasure.

6. (A set that is not Lebesgue measurable) Call two real numbers x andy equivalent if x - y is rational. Choose a member of each distinctequivalence class Bx = {y: y - x rational) to form a set A; assume thatthe representatives are chosen so that A c [0, 1]. Establish the following:

(a) If r and s are distinct rational numbers, (r + A) n (s + A) = 0;also R = U{r + A: r rational}.

(b) If A is Lebesgue measurable (so that r + A is Lebesgue measurableby Problem 4), then p(r + A) = 0 for all rational r (p is Lebesguemeasure). Conclude that A cannot be Lebesgue measurable.

The only properties of Lebesgue measure needed in this problem aretranslation-invariance and finiteness on bounded intervals. Therefore,the result implies that there is no translation-invariant measure A(except A 0) on the class of all subsets of R such that A(I) < oo foreach bounded interval I.

7. (The Cantor ternary set) Let E, be the middle third of the intervalto, 1], that is, E, _(-], J); thus x c- [0, 1] - E, iff x can be written internary form using 0 or 2 in the first digit. Let EZ be the union of themiddle thirds of the two intervals that remain after E, is removed, thatis, E2 = (9, 9) u (9, 9); thus x e [0, 1] - (E, v E2) if x can be writ-ten in ternary form using 0 or 2 in the first two digits. Continue theconstruction; let E. be the union of the middle thirds of the intervalsthat remain after E,, ..., E"_, are removed. The Cantor ternary set Cis defined as [0, 1] - U , E" ; thus x c- C if x can be expressed in

34 I FUNDAMENTALS OF MEASURE AND INTEGRATION THEORY

ternary form using only digits 0 and 2. Various topological propertiesof C follow from the definition : C is closed, perfect (every point of C isa limit point of C), and nowhere dense.

Show that C is uncountable and has Lebesgue measure 0.Comment. In the above construction, we have m(E") = (j)(-3)n-1,

n = 1, 2,..., where m is Lebesgue measure. If 0 < a < 1, the proceduremay be altered slightly so that in (E") = a(-})". We then obtain a setC(a), homeomorphic to C, of measure I - a; such sets are called Cantorsets of positive measure.

8. Give an example of a function F: R2 - R such that Fis right-continuousand is increasing in each coordinate separately, but F is not a distribu-tion function on R.

9. A distribution function on R is monotone and thus has only countablymany points of discontinuity. Is this also true for a distribution functiononR",n> 1?

10. (a) Let F and G be distribution functions on R". If F(a, b] = G(a, b]for all a, b e R", a< b, does it follow that F and G differ by aconstant?

(b) Must a distribution function on R" be increasing in each coordinateseparately?

11. If c is the cardinality of the reals, show that there are only c Borel sub-sets of R", but 2` Lebesgue measurable sets.

12. The following result shows that under appropriate conditions, a Borelset can be approximated from below by a compact set, and from aboveby an open set. Problems of this type will be examined systematically inChapter 4.

If u is a a-finite measure on J(R"), show that for each B e R(R"),(a) u(B) = sup{ii(K): K c B, K compact).

If p is in fact a Lebesgue-Stieltjes measure, show that(b) p(B) = inf{p(V): V B, B open).(c) Give an example of a a-finite measure on R(R") that is not a Lebes-

gue-Stieltjes measure and for which (b) fails.

1.5 Measurable Functions and Integration

If./'is a real-valued function defined on a bounded interval [a, b] of reals,we can talk about the Riemann integral off, at least if f is piecewise con-tinuous. We are going to develop a much more general integration process,one that applies to functions from an arbitrary set to the extended reals, pro-vided that certain "measurability" conditions are satisfied.

1.5 MEASURABLE FUNCTIONS AND INTEGRATION 35

Probability considerations may again be used to motivate the concept ofmeasurability. Suppose that (f), .`, P) is a probability space, and that h is afunction from SZ to R. Thus if the outcome of the experiment corresponds tothe point w e 52, we may compute the number h(w). Suppose that we are in-terested in the probability that a< h(w) < b, in other words, we wish tocompute P{w: h(w) a B) where B = [a, b]. For this to be possible, the set{w: h(w) a B} = h-'(B) must belong to the a-field F. If h- l(B) a F for eachinterval B (and hence, as we shall see below, for each Borel set B), then h isa " measurable function," in other words, probabilities of events involvingh can be computed. In the language of probability theory, h is a "randomvariable."

1.5.1 Definitions and Comments. If h: 52, '12, h is measurable relative tothe a-fields Ft of subsets of Sgt, j = 1, 2, iff h-'(A) a for each A 6 F2.

It is sufficient that h-'(A) e. ', for each A e f, where T is a class of sub-sets of S22 such that the minimal a-field over f is F2 . For {A C_ F2 : h-1(A)a F,} is a a-field that contains all sets of', hence coincides with F2. This isanother application of the good sets principle.

The notation h: (521, -5F1) -+ (02, F2) will mean that h: S2, --+ f22, measur-able relative to ffl , and 2 .

If F is a a-field of subsets of 52, (0, F) is sometimes called a measurablespace, and the sets in .F are sometimes called measurable sets.

Notice that measurability of h does not imply that h(A) e F 2 for eachA e . F. For example, if '2 = {0, Q2}, then any h: 521 -+ 02 is measurable,regardless of F,, but if A e .`N, and h(A) is a nonempty proper subset of 02,then h(A) 0 F2. Actually, in measure theory, the inverse image is a muchmore desirable object than the direct image since the basic set operations arepreserved by inverse images but not in general by direct images. Specifically,we have h-'(U, B,) = U1 h-'(B,), h-'(f, B;) = f , h-'(B1), and h-' (B`) _[h -'(B)]'. We also have h(U; A,) = U i h(A;), but h(f, A;) e ni h(A,), andthe inclusion may be proper. Furthermore, h(A`) need not equal (h(A)]`, infact when h is a constant function the two sets are disjoint.

If (52, .9r) is a measurable space and h: 52 -+ R" (or R"), h is said to be Borelmeasurable [on (0, .F)] if h is measurable relative to the a-fields .F and -4,the class of Borel sets. If f) is a Bore] subset of Rk (or Rk) and we use the term" Borcl measurable," we always assume that.F = 9.

A continuous map h from Rk to R" is Borel measurable; for if ' is theclass of open subsets of R", then h-1(A) is open, hence belongs to R(R"),for each A e W.

If A is a subset of R that is not a Borel set (Section 1.4, Problems 6 and 11)and 1A is the indicator of A, that is, IA(w) = 1 for w e A and 0 for w 0 A,then IA is not Borel measurable; for {w: IA(w) = 11 = A 0 R(R).

To show that a function h: S2 - R (or R) is Borel measurable, it is sufficientto show that {(o: h(co) > c} e F for each real c. For if W is the class ofsets {x: x > c}, c E R, then a(') = a(R). Similarly, {w: h(w) > c} can be re-placed by {(o: h(w) > c}, {w: h((o) < c) or {w: h(w) < c}, or equally wellby {w: a < h(w) < b} for all real a and b, and so on.

If (0, .F, fi) is a measure space the terminology "h is Borel measurableon (i2, F, p) " will mean that h is Borel measurable on (Q, F) and p is ameasure on .F.

1.5.2 Definition. Let (S2, .F) be a measurable space, fixed throughout thediscussion. If h: S2 -+ R, his said to be simple iff h is Borel measurable and takeson only finitely many distinct values. Equivalently, h is simple if it can bewritten as a finite sum Y;= I x, IA, where the A, are disjoint sets in .F and IA, isthe indicator of A. ; the xi need not be distinct.

We assume the standard arithmetic of R; if a e R, a + oo = oo, a - oo =

0 (- oo) = 0, co + oo = co, - oo - co = - oo, with commutativity of addi-tion and multiplication. It is then easy to check that sums, differences, prod-ucts, and quotients of simple functions are simple, as long as the operationsare well-defined, in other words we do not try to add + co and - co, divideby 0, or divide oo by oo.

Let p be a measure on R", again fixed throughout the discussion. Ifh: 0 -+ R is Borel measurable we are going to define the abstract Lebesgueintegral of h with respect to p, written as In h dµ, In h(w)p(dw), or J. h(w) dp(co).

1.5.3 Definition of the Integral. First let h be simple, say h = D=I x,'A,where the A. are disjoint sets in Pr. We define

f h dp = xilt(A)n i= I

as long as +oo and -oo do not both appear in the sum; if they do, we saythat the integral does not exist. Strictly speaking, it must be verified that ifh has a different representation, say E;=I y;IBj, then

r S

E xi p(A) = y yj y(B).i=I j=I

(For example, if A = B u C, where B n C = 0, then xIA = xIB + xIc'.)The proof is based on the observation that

r S

h = ZijlAinBl1iL11=I

where z,i = Xi = yi . Thus

X zijp(Ai n Bi) = Y xi Y p(A n Bi)i. j i i

= Y xi 1i(Ai)

= Y yi u(B) by a symmetrical argument.i

If h is nonnegative Borel measurable, definell

fnhdp=sup{fnsdp:s simple, 0<s<h}.

This agrees with the previous definition if h is simple. Furthermore, we mayif we like restricts to be finite-valued.

Notice that according to the definition, the integral of a nonnegativeBorel measurable function always exists; it may be + co.

Finally, if h is an arbitrary Borel measurable function, let h+ = max(h, 0),h- = max(-h, 0), that is,

h+(w) = h(w) if h(co) 0; h+(w) = 0 if h(w) < 0;h-(w) = -h(w) if h((o) < 0; h-(w) = 0 if h(w) > 0.

The function h+ is called the positive part of h, h- the negative part. We havejhl = h+ + h-, h = h+ - h-, and h+ and h- are Borel measurable. Forexample, {w: h+(w) a B) = {w: h(w) >- 0, h(w) a B) u {co: h(w) < 0, 0 e B).The first set is h-'[0, oo] n h-'(B) a ,F; the second is h-'[- oo, 0) if 0 e B,and 0 if 0 t B. Thus (h+)-'(B) a .F for each B e R(R), and similarly for h-.Alternatively, if hl and h2 are Borel measurable, then max(h h2) andmin(hl, h2) are Bore] measurable; to see this, note that

(co: max(h,(w), h2(w)) < c} = {w: hl(w) < c) n {co: h2(w) < c}

and {co: min(hl(w), h2(w) < c) _ {co: hl(w) < c} v {w: h2(w) < c}. It followsthat if h is Borel measurable, so are h+ and h-.

We define

f hdu=fl h+dp - faif this is not of the form+oo - co;

n n nif it is, we say that the integral does not exist. The function h is said to bep-integrable (or simply integrable if p is understood) if In h dµ is finite, thatis, iff fnh+dµ and fnh-dµ are both finite.

If A e F, we define

fA hdy=fRhIAdji.

(The proof that hIA is Borel measurable is similar to the first proof abovethat h+ is Borel measurable.)

If h is a step function from R to R and p is Lebesgue measure, JR h dpagrees with the Riemann integral. However, the integral of h with respect toLebesgue measure exists for many functions that are not Riemann integrable,as we shall see in Section 1.7.

Before examining the properties of the integral, we need to know moreabout Borel measurable functions. One of the basic reasons why such func-tions are useful in anaylsis is that a pointwise limit of Borel measurablefunctions is still Borel measurable.

1.5.4 Theorem. If h1 i h2 , ... are Borel measurable functions from ! to Rand ho(rn) -* h(w) for all to e 0, then h is Bore] measurable.

PROOF. It is sufficient to show that {(o: h(w) > c} a y for each real c. We have

(co: h(w) > c} _ co:{lim hn(w) > c111 11

_ {w: hn(w) is eventually > c + 1 for some r = 1, 2, ...11111

r

00

= rUl {w: hn(w) > c +1

for all but finitely many n

00 1= UIiminf w: hn(w)>c+-r= 1 n r

00 00

, I= U U n co:hk(w)>c+-)e

r=1 n=l k=n r) I

To show that the class of Borel measurable functions is closed underalgebraic operations, we need the following basic approximation theorem.

1.5.5 Theorem. (a) A nonnegative Borel measurable function h is thelimit of an increasing sequence of nonnegative, finite-valued, simple functionshn.

(b) An arbitrary Borel measurable function f is the limit of a sequenceof finite-valued simple functions fn , with I fn I S I f I for all n.

PROOF. (a) Define

hn(w) = k 2 I if k 2I

< h(w) < 2n, k = 1, 2, ..., n2n,

and let hn(w) = n if h(w) >- n. [Or equally well, hn(w) = (k - 1)/2" if(k - 1)/2" < h(w) < k/2", k = 1, 2, ..., n2"; hn(w) = n if h(w) > n; hn(w) = 0if h(w) = 0.] The hn have the desired properties (Problem 1).

(b) Let gn and hn be nonnegative, finite-valued, simple functions withgn T f + and hn T f ; take fn = g" - hn

1.5.6 Theorem. If h, and h2 are Borel measurable functions from 51 to R,so are h, + h2, h1 - h2, h,h2, and h,/h2 [assuming these are well-defined,in other words, h,(w) + h2(w) is never of the form + co - oo and h,(w)/h2(w)is never of the form oo/oo or a/0].

PROOF. As in 1.5.5, let s,n, s2n be finite-valued simple functions with s,n hl,s2n h2. Then stn + S2n --> h1 + h2,

S1 S2nI(h,*o)I(hz*0}-'hIh2,

and

Since

Stn ± S2n,

s2n + (11n)I(s2..=o}

stn h1

S1n S2n I{h, *o} 1(h:*0} ,

h2.

1 -1Sin S2n +

n I(s2"=0}

are simple, the result follows from 1.5.4. 1

We are going to extend 1.5.4 and part of 1.5.6 to Borel measurablefunctions from n to R"; to do this, we need the following useful result.

1.5.7 Lemma. A composition of measurable functions is measurable;specifically, if g : (521, 91) -' (522 , 'F2) and h: (512 , .F2) - (513 , .F3), thenh ° g: (521, Ft) -, (513, 3)-

PROOF. If B e . 3 , then (hog)-'(B) = g-'(h-'(B)) e .Fl. I

Since some books contain the statement "A composition of measurablefunctions need not be measurable," some explanation is called for. If h: R - R,some authors call h " measurable" if the preimage of a Borel set is a Lebesguemeasurable set. We shall call such a function Lebesgue measurable. Notethat every Borel measurable function is Lebesgue measurable, but not con-versely. (Consider the indicator of a Lebesgue measurable set that is not a

Borel set; see Section 1.4, Problem 11.) If g and h are Lebesgue measurable,the composition h o g need not be Lebesgue measurable. For let -4 be theBorel sets, and 9 the Lebesgue measurable sets. If B E .J then h-'(B) e 2;but g-'(h-(B)) is known to belong to 2 only when h-'(B) a -4, so we can-not conclude that (h 0 g) -'(B) E -4. For an explicit example, see Royden(1968, p. 70). If g- '(A) E 2 for all A e ., not just for all A E a, then we arein the situation described in Lemma 1.5.7, and h o g is Lebesgue measurable;similarly, if h is Borel measurable (and g is Lebesgue measurable), then h o gis Lebesgue measurable.

It is rarely necessary to replace Borel measurability of functions fromR to R (or Rk to R") by the slightly more general concept of Lebesguemeasurability; in this book, the only instance is in Section 1.7. The integrationtheory that we are developing works for extended real-valued functions onan arbitrary measure space p). Thus there is no problem in integratingLebesgue measurable functions; set fl = R, F = A

We may now assert that if h, h2, ... are Borel measurable functions fromS2 to R" and h" converges pointwise to h, then h is Borel measurable; further-more, if h, and h2 are Borel measurable functions from Q to R", so are h, + h2and h, - h2, assuming these are well-defined. The reason is that if h(co) =(h, (w), ..., h"((o)) describes a map from S2 to R", Borel measurability of h isequivalent to Bore] measurability of all the component functions hi.

1.5.8 Theorem. Let h: 0 - R"; if pi is the projection map of R" onto R,taking (x,, ..., x") to xi, set hi = pi c h, i = 1, ..., n. Then his Bore] measur-able if hi is Borel measurable for all i = 1, ..., n.

PROOF. Assume h Borel measurable. Since

Pi '{xi:ai<xi<bi}={xaR":a,<xi<bi, - oo<x< oo,which is an interval of R", pi is Borel measurable. Thus

h: (Q, . ) - (R", .9(R")), pi : (R", 9(R")) - (R, R(A)),

and therefore by 1.5.7, hi : (Q, .°r) -' (R, £(R)).Conversely, assume each hi to be Borel measurable. Then

h-'{xeR":a;Sx,Sbi, i=l,...,n}= fl{wEQ:ai:hi(w)5b;}e.`Iz,i=1

and the result follows.

We now proceed to some properties of the integral. In the following result,all functions are assumed Borel measurable from.0 to R.

1.5.9 Theorem. (a) If In h dp exists and c e R, then In ch du exists andequals c fn h du.

(b) If g(uw) >- h(co) for all co, then In g dp > In h dp in the sense thatif fn h du exists and is greater than - oo, then In g dp exists and In g du> in h du; if In g du exists and is less than + oo, then In h du existsand 1, h dp < In g du. Thus if both integrals exist, I,, g dp In h du, whetheror not the integrals are finite.

(c) If In h du exists, then I In h dp l< in I h du.(d) If h >- 0 and B e .31", then J B h du = sup{ f B s du: 0< s< h, s simple).(e) If In h du exists, so does S, h du for each A e F ; if In h du is finite,

then f,, h du is also finite for each A e R'.

PROOF. (a) It is immediate that this holds when h is simple. If h is non-negative and c > 0, then

1

Inch du = sup{ Jns du; 0 < s < ch, s simple}111

S

S=csup{f -du; 0<h, s111

- simple =cI hdu.ne c c n

In general, if h = h+ - h- and c > 0, then (ch)+ = ch+, (ch) = ch henceIn ch dp = c In h+ du - c In h- du by what we have just proved, so thatfnchdu=c fnhdu.Ifc<0,then

(ch)' = -ch-, (ch) _ -ch+,so

inch du= - cIn h- du+cln h+ du=cln hdu.

(b) If g and hare nonnegative and 0 < s < h, s simple, then 0< s < g;hence f n h du < f n g du. In general, h< g implies h+ < g+, h- z g-. Iff n h du > - oo, we have $g - du <- fn h` du < co; hence fn g du exists andequals

fng+dp-Ing-du>Inh+du-In h- du=In h du.

The case in which In g du < oo is handled similarly.(c) We have - u h f <- h 5 1 h I so by (a) and (b), - In I h I du < In h du <

In I h I du and the result follows. (Note that I h is Borel measurable by 1.5.6since h = h+ + h-.)

(d) If 0:9s. h, then f B s du <(5sd:fB h du by (b); hence

IBhdu - sup0 <s<h).

If 0< t < hIB, t simple, then t=111,< h so fn t dp < sup{fn sIB dµ:0 < s < h, s simple}. Take the sup over I to obtain JB h dp < sup{ fB s dµ:0 < s < h, s simple).

(e) This follows from (b) and the fact that (hIA)+ = h+lA < h+,(NIA)- = It h IA < h-. I

Problems

1. Show that the functions proposed in the proof of 1.5.5(a) have the desiredproperties. Show also that if h is bounded, the approximating sequenceconverges to h uniformly on 0.

2. Let f and g be extended real-valued Borel measurable functions on(fl, 9"), and define

h(w)=f(w) if co c- A,

=g((o) if cEA`,where A is a set in .F. Show that h is Borel measurable.

3. If f,, f2, ... are extended real-valued Borel measurable functions on(f', ffl ), n = 1, 2, ..., show that sup" fa and infer f" are Bore] measurable(hence lim sup,,-,,, f" and lim inf.-.o f" are Borel measurable).

4. Let (SZ, .F, p) be a complete measure space. If f: (S2, . ) (S2', F') andg: S2 -. f2', g = f except on a subset of a set A E y with p(A) = 0, showthat g is measurable (relative to F and F').

5. (a) Let f be a function from Rk to R', not necessarily Borel measurable.Show that {x: f is discontinuous at x} is an Fa (a countable unionof closed subsets of Rk), and hence is a Borel set. Does this resulthold in spaces more general than the Euclidean space R"?

(b) Show that there is no function from R to R whose discontinuity setis the irrationals. (In 1.4.5 we constructed a distribution functionwhose discontinuity set was the rationals.)

6. How many Borel measurable functions are there from R" to Rk?7. We have seen that a pointwise limit of measurable functions is measurable.

We may also show that under certain conditions, a pointwise limit ofmeasures is a measure. The following result, known as Steinhaus' lemma,will be needed in the problem: If {ank} is a double sequence of realnumbers satisfying

(i) Ek , a"k = l for all n,(ii) Fk, lank) < c < oo for all n, and

(iii) ank - 0 as n - oo for all k,

there is a sequence {xa}, with x" = 0 or 1 for all n, such that t,, = Y_k S ank Xkfails to converge to a finite or infinite limit.

1.6 BASIC INTEGRATION THEOREMS 43

To prove this, choose positive integers n1 and k1 arbitrarily;having chosen n1, ..., n,, k1, ..., k,, choose n,+1 > n, such thatY-k <k, I a,,,. .k l < -i; this is possible by (iii). Then choose kr+1 > k, suchthat Ek>k,+, an,+,kI < s; this is possible by (ii). Set xk = 0, k2s-1 <k < k2s i Xk = 1, k2s < k < k2s+1, s = 1, 29 .... We may write tn,+, ash, + h2 + h3, where h, is the sum of xk fork < k h2 correspondsto k, < k < kr+1, and h3 to k > kr+1. If r is odd, then xk = 0, k, < k <kr+1 ; hence I t,,,+, I < J. If r is even, then h2 = Y-k,<k5k,+1a,,,+Ik ; henceby (i),

h2=1- l a,,+,k Y an,+lk > i.k:5 k, k>k,+l

Thus tn,+, > I - I h1 I - I h3 l > J, so {tn} cannot converge.

(a) Vitali-Hahn-Saks Theorem. Let (12, F) be a measurable space,and let P., n = 1, 2, ..., be probability measures on .". If P(A)

for all A c-.y, then P is a probability measure on.t ; further-more, if {Bk} is a sequence of sets in F decreasing to 0, thensup,, 10 as k --* oo. [Let A be the disjoint union of setsAk E F ; without loss of generality, assume A = 0 (otherwise addA` to both sides). It is immediate that P is finitely additive, so byProblem 5, Section 1.2, a = Y-k P(Ak) < P(1)) = 1. If a < 1, setank = (1 - a)-1 [P,(Ak) - P(Ak)] and apply Steinhaus' lemma.]

(b) Extend the Vitali-Hahn-Saks theorem to the case where the P.are not necessarily probability measures, but P,,(12) < c < oo for all n.[For further extensions, see Dunford and Schwartz (1958).]

1.6 Basic Integration Theorems

We are now ready to present the main properties of the integral. Theresults in this section will be used many times in the text. As above, (0, .F, p)is a fixed measure space, and all functions to be considered map f to A.

1.6.1 Theorem. Let h be a Borel measurable function such that S a h dpexists. Define .2(B) = S, h dp, B e F. Then 2 is countably additive on F;thus if h >- 0, A is a measure.

PROOF. Let h be a nonnegative simple function Y-°=1 x, I,1,. Then 3(B) _J B h d,u = J,%, x, p(B r A,); since p is countably additive, so is A.

Now let h be nonnegative Borel measurable, and let B = Un , B,,, theB disjoint sets in Ifs is simple and 0 < s:5 h, then

00f sdµ= y- f sdµB n=1 e

by what we have proved for nonnegative simple functions

S f hdµ11=1 sby 1.5.9(b) (or the definition of the integral).

Take the sup over s to obtain, by 1.5.9(d), 2(B) 5 Y' 1Now B a B, hence IB, 5 IB, so by 1.5.9(b), ).(B.)< 1(B). If oo

for some n, we are finished, so assume all finite. Fix n and let e > 0.It follows from 1.5.9(d), 1.5.9(b), and the fact that the maximum of a finitenumber of simple functions is simple that we can find a simple function s,0 5 s 5 h, such that

Now

f sdµ>- f hdµ - l i=1,2,...,n.B, e, n

,1(8,u ...L)Bn)= f ysdµ= f sdµ<U1B1 1=1 B,

by what we have proved for nonnegative simple functions, hence

2(B 1 v ... u Bn) > f h dµ - e = 2(B1) - e.1=1 B, i=1

Since 2(B) >- 1(Ui=, B;) and a is arbitrary, we have00

2(B) ? Y 2(B;).i=1

Finally let h = h+ - h- be an arbitrary Borel measurable function. Then.1(B) = SB h+ dµ - JB h- dµ. Since $n h+ dµ < co or in h- dµ < co, the resultfollows. I

The proof of 1.6.1 shows that A is the difference of two measures A+

and ,1-, where , +(B) = $B h+ dµ, h- dµ; at least one of the measures,% and A- must be finite.

1.6.2 Monotone Convergence Theorem. Let h1, h2, ... form an increasingsequence of nonnegative Borel measurable functions, and let h(w) =limn-,,,, hn(w), co eS2. Then In h dµ -- jn h dµ. (Note that In h dµ increaseswith n by 1.5.9(b); for short, 0 <- h T h implies In h dµ T in h dµ.)

PROOF. By 1.5.9(b), fn h dp < in h d y for all n, hence k = lim,, , in h dp <in h dp. Let 0 : bs(w)}. Then B. T .0 since h T hand s is finite-valued. Now k in h dµ >: JB hn dp by 1.5.9(b), and J B, h du >b J,, s dp by 1.5.9(a) and (b). By 1.6.1 and 1.2.7, S B. s dp - fns dµ, hence(let b -+ 1) k >: in s dp. Take the sup over s to obtain k >: in h dg. I

1.6.3 Additivity Theorem. Let f and g be Bore] measurable, and assume thatf + g is well-defined. If in f dp and fng du exist and In f dµ + in g dp iswell-defined (not of the form + oo - oo or - oo + oo), then

fn (f +g)dµ= fn fdp+ fn gdp.

In particular, if f and g are integrable, so is f + g.

PROOF. If f and g are nonnegative simple functions, this is immediate fromthe definition of the integral. Assume f and g are nonnegative Borel measur-able, and let t,,, u be nonnegative simple functions increasing to f and g,respectively. Then +g.Now In t,, dp+inu,,dpby what we have proved for nonnegative simple functions; hence by 1.6.2,fn(f+g)dp= So f dp+Ingdp.

Now if f 0, g < 0, h = f + g >- 0 (so g must be finite), we have f = h+ (-g); hence In f du = In h du - fag dµ. If fag dp is finite, then in h dp =in f dp + In g dµ, and if in g du = - oo, then since h z 0,

ffdµz-fgdu=oo,contradicting the hypothesis that in f du + Jag' du is well-defined. Similarlyif f > 0, g < 0, h < 0, we obtain In h dµ = In f dp + jag dµ by replacing allfunctions by their negatives. (Explicitly, -g > 0, -f < 0, -h = -f - g >: 0,and the above argument applies.)

Let

El = {w: f(w) ? 0, g(w) ? 0),E2 = (w:.f(w) > 0, g(w) < 0, h(w) >- 0},

E3 = (CO: R (D) >- 0, g(w) < 0, h(w) < 0},

E4 = {w: A CO) < 0, g(w) > 0, h(w) 0),

E5 = (c o: f(w) < 0, g(w) > 0, h(w) < 0},

E6 = {co: f(w) < 0, g(w) < 0}.

46 1 FUNDAMENTALS OF MEASURE AND INTEGRATION.THEORY

The above argument shows that JE, h dµ = JE, f dµ + JE, g dµ. Now in f du =DY 6=1 $E, f dµ, fag dµ = 6 1 JE, g dµ by 1.6.1, so that in f dµ + In g dµ =

6 1 JE, h du, and this equals In h dµ by 1.6.1, if we can show that in h dµexists; that is, in h+ dp and in h dµ are not both infinite.

If this is the case, SE, h+ dµ = JE, h- dµ = oo for some i, j (1.6.1 again),so that JE, h dµ = oo, JE, h dµ = - oo. But then JE, f dµ or JE, g dµ = oo ;hence In f dµ or Jn g dp = oo. (Note that Jn f + dµ > JE; f + dµ.) SimilarlyIn f dµ or In g dµ = - oo, and this is a contradiction.

1.6.4 Corollaries. (a) If hl, h2, ... are nonnegative Borel measurable,

00

.Y h(n)ii= Y hn dµn n=1 n=1 f2

Thus any series of nonnegative Borel measurable functions may be integratedterm by term.

(b) If h is Bore] measurable, h is integrable if I h I is integrable.(c) If g and h are Borel measurable with (g < h, h integrable, then g

is integrable.

PROOF. (a) Yk=1 hk T Ek 1 hk, and the result follows from 1.6.2 and 1.6.3.(b) Since I hI = h+ + h-, this follows from the definition of the integral

and 1.6.3.(c) By 1.5.9(b), i i is integrable, and the result follows from (b) above.

A condition is said to hold almost everywhere with respect to the measureµ (written a.e. [µ] or simply a.e. if p is understood) if there is a set B E F ofp-measure 0 such that the condition holds outside of B. From the point ofview of integration theory, functions that differ only on a set of measure 0may be identified. This is established by the following result.

1.6.5 Theorem. Let f, g, and h be Borel measurable functions.

(a) If f = 0 a.e. [µ], then In f dµ = 0.(b) If g = h a.e. [µ] and In g dµ exists, then so does f n h dµ, and J n g dµ =

In h dµ.

PROOF. (a) If f = z°=1 X11A, is simple, then x; j6 0 implies µ(A1) = 0 byhypothesis, hence In f dµ = 0. If f >- 0 and 0 < s < f, s simple, then s = 0a.e. [µ], hence Jn s dp = 0; thus In f dp = 0. If f =f' -f-, then f + and f-,being less than or equal to I f 1, are 0 a.e. [µ], and the result follows.

(b) Let A = {w: g(w) = h(w)}, B = A`. Then g = g1A + gIB, h =hIA + hIB = gIA + hlB. Since gIB = hIB = 0 except on B, a set of measure0, the result follows from part (a) and 1.6.3. 1

Thus in any integration theorem, we may freely use the phrase "almosteverywhere." For example, if is an increasing sequence of nonnegativeBorel measurable functions converging a.e. to the Borel measurable functionh, then In h,. dµ--. fnh dµ.

Another example: If g and h are Borel measurable and g >- h a.e., thenfn g dp >- fn h dg [in the sense of 1.5.9(b)].

1.6.6 Theorem. Let h be Bore] measurable.

(a) If h is integrable, then h is finite a.e.(b) Ifh - 0and fahdp=0,then h=0a.e.

PROOF. (a) Let A = {(o: h(w) I = oo}. If p(A) > 0, then fn I h I dµ >S, I h I dp = oop(A) = oo, a contradiction.

(b) Let B = {w: h(w) > 0}, B = {w: h(w) >_ 1 /n} T B. We have 0 < hl,. <hIB = h; hence by 1.5.9(b), SB. h dp = 0. But JB h dµ > (l /n)p(BO, so thatp(B) = 0 all 1

The theall functions were nonnegative. This assumption can be relaxed consider-

ably, as we now prove.

1.6.7 Extended Monotone Convergence Theorem. Let 9i, 92, ... , g, h beBorel measurable.

(a) If g >- h for all n, where In h dp > - oo, and g T g, then

fd.u Tfgdj.ng.

b) If g < h for all n, where fa h dp < oo, and g ,[ g, then(

fngdpI. fs: hdp.

PROOF. (a) If In h dp = oo, then by 1.5.9(b), In g dp = oo for all n, andfag dp = oo. Thus assume j a h dp < oo, so that by 1.6.6(a), h is a.e. finite;change h to 0 on the set where it is infinite. Then 0 < g - h T g - h a.e.,hence by 1.6.2, In (g - h) du T JA (g - h) dp. The result follows from 1.6.3.

(We must check that the additivity theorem actually applies. Since In h dp >- oo, In gn dµ and In g du exist and are greater than - oo by 1.5.9(b). Also,In h dy is finite, so that In gn dp - In h dp and In g dp - In h dp are well-defined.)

(b) -gn ? -h, In -h dµ > - oo, and -gn T -g. By part (a), -In gn dpT - Ingdg,soIngndplIngdp

The extended monotone convergence theorem asserts that under appro-priate conditions, the limit of the integrals of a sequence of functions is theintegral of the limit function. More general theorems of this type can beobtained if we replace limits by upper or lower limits. If f,, fZ , ... are func-tions from f) to R, lim inf, fn and lim supn,.,, fn are defined pointwise,that is,

(urn inffn)(co) = sup inf fk(w),n_oo n k_n

(lirnsuPfu)(w) = inf sup fk(w).n_ao n ken

1.6.8 Fatou's Lemma. Let f,, f2, ... , f be Borel measurable.

(a) Iffn >- f for all n, where !n f dp > - oo, then

lim inf f fn dp >- f (1iminff) dp.n_ao n n n-oo

(b) Iffn < f for all n, where !n f dp < oo, then

lim sup f fn du < f (iirnsupin) dµ.n.00 n n n-,OD

PROOF. (a) Let gn = infk,_n fk, g = lim inf f, Then gn >- f for all n,in f dp > - oo, and gn T g. By 1.6.7, In gn dp 11, (lim infn, fn) dµ. Butgn <fn, so

lim f gn du = lim inf f gn dµ < lim inf f fn dp.n n-OD n n

(b) We may write

f (hrn sup fn dy = - f lim inf(-fn) dµn-ao n- oo

>- - lim inf f (-fn) dp by (a)n- oo n

= lim sup f fn dµ.n-+oo n

The following result is one of the "bread and butter" theorems of analysis;it will be used quite often in later chapters.

1.6.9 Dominated Convergence Theorem. If fi, f2, ... , f, g are Borel measur-able, (f,, < g for all n, where g is p-integrable, and fn -f a.e. [p], then f isp-integrable and In fn dp - In f dp.

PROOF. We have I f 1 < g a.e.; hence f is integrable by 1.6.4(c). By 1.6.8,

f (iimi) dp < lim inf f fn dp <- lim sup f fn dµn n

n n-xBy hypothesis, lim inf,,-. f,, = lim sup, fn =f a.e., so all terms of theabove inequality are equal to fn f dµ. I

1.6.10 Corollary. If f, , f2 , ... , f, g are Borel measurable, I fn < g for all n,where `g I P is p-integrable (p > 0, fixed), and fn --+f a.e. [p], then If I P isp-integrable and In fn - f, P dp -> 0 as n --> oo.

PROOF. We have fn I P < Ig I o for all n; so I f I P < Ig P, and therefore I f I"is integrable. Also I fn -f P < (Ifn + if I )P < (2 1g )P, which is integrable,and the result follows from 1.6.9. 1

We have seen in 1.5.9(b) that g 5 h implies In g dp < In h dp, and in factJ,, g dp <- IA h dµ for all A e F. There is a converse to this result.

1.6.11 Theorem. If p is o-finite on g and h are Borel measurable,So g dp and In h dp exist, and fA g dp _< JA h dp for all A e .F, then g < ha.e. [p].

PROOF. It is sufficient to rove this when p is finite. Let

lAn = (co:111lg(co) _ h(w)+n 1, Ih(a)1 _< n111 }.

Then

f hdp> f gdp? f hdp+Ip(An)A. A An n

But

f h dµ f I h I d y S np(A,,) < co,A. A.

and thus we may subtract JA, h dp to obtain 0, hence 0.Therefore µ(U , 0; hence p{a): g(w) > h(w), h(w) finite} = 0. Con-sequently g< h a.e. on {co: h(w) finite}. Clearly, g:5 h everywhere on(co: h(w) = oo}, and by taking C. = {w: h(w) = -oo, g(w) >- -n} we obtain

h dµ> f g dpc chence 0. Thus 0, so that

p{w: g(w) > h(w), h(w) = - oo} = 0.

Therefore g < h a.e. on (co: h(w) = - oo}. I

If g and h are integrable, the proof is simpler. Let B = {co: g(w) > h(w)}.Then I R g dp < JB h dji < JB g dp ; hence all three integrals are equal. Thusby 1.6.3, 0 = SB (g - h) du = In (g - h)IB dµ, with (g - h)IB >- 0. By 1.6.6(b),(g - h)1B = 0 a.e., so that g = h a.e. on B. But g < h on B`, and the resultfollows.

The reader may have noticed that several integration theorems in thissection were proved by starting with nonnegative simple functions andworking up to nonnegative measurable functions and finally to arbitrarymeasurable functions. This technique is quite basic and will often be useful.A good illustration of the method is the following result, which introducesthe notion of a measure-preserving transformation, a key concept in ergodictheory. In fact it is convenient here to start with indicators before proceedingto nonnegative simple functions.

1.6.12 Theorem. Let T: (S2, ,F) - (CIO, F o) be a measurable mapping, andlet it be a measure on F. Define a measure po = pT -' on FO by

p0(A) = p(T -' (A)), A e .Fo .

If DO = 0, 'FO = F, and Elo = p, T is said to preserve the measure p.If f: (f 20 , .moo) - (R, .68(R)) and A e FO, then

fT _ f (T(w)) dp(w) =fA

f (w) dpo(w)A

in the sense that if one of the integrals exists, so does the other, and the twointegrals are equal.

PROOF. If f is an indicator 1B, the desired formula states that

µ(T-'A o T-'B) = pa(A n B),

which is true by definition of p,. If f is a nonnegative simple function" x I then

f f(T(w)) dp(w) _ x; f AI.,(T((9)) dµ(w) by 1.6.3T 'A i=1 T-'

n

Y- xi f IB,(w) dpo(w)i=1 A

by what we have proved for indicators

= fAf(w) dpo(o)) by 1.6.3.

If f is a nonnegative Borel measurable function, let fi, f2 , ... be nonnegativesimple functions increasing to f. Then IT- 'A fa(T(w)) dp(w) = fA fn(co) dpo(w)by what we have proved for simple functions, and the monotone convergencetheorem yields the desired result for f.

Finally, if f =f ' -f - is an arbitrary Borel measurable function, wehave proved that the result holds for f + and f -. If, say, JA f +(w) dpo(w) < oo,then 11- ,A f+(T(w)) dp(w) < cc, and it follows that if one of the integralsexists, so does the other, and the two integrals are equal. I

If one is having difficulty proving a theorem about measurable functionsor integration, it is often helpful to start with indicators and work upward.In fact it is possible to suspect that almost anything can be proved this way,but of course there are exceptions. For example, you will run into troubletrying to prove the proposition "All functions are indicators."

We shall adopt the following terminology: If p is Lebesgue measure andA is a interval [a, b], fA f dp, if it exists, will often be denoted by Pb f(x) dx(or f b,, J" f (x1 i ..., xa) dx1 dx,, if we are integrating functions on R").The endpoints may be deleted from the interval without changing the integral,since the Lebesgue measure of a single point is 0. Iff is integrable with respectto p, then we say that f is Lebesgue integrable. A different notation, such asrab(f), will be used for the Riemann integral off on [a, b].

Problems

The first three problems give conditions under which some of the mostcommonly occurring operations in real analysis may be performed: taking alimit under the integral sign, integrating an infinite series term by term, anddifferentiating under the integral sign.

1. Let f = f(x, y) be a real-valued function of two real variables, defined fora < y < b, c < x < d. Assume that for each x, f (x, ) is a Borel measur-able function of y, and that there is a Borel measurable g: (a, b) - Rsuch that I f (x, y) I < g(y) for all x, y, and Jag(y) dy < oo. If x0 a (c, d)and lim f(x, y) exists for all y a (a, b), show that

lim fbf(x, y) dy = f [limf(x. y)1 dy.x-xo a a J

2. Let J1, J2 , ... be Borel measurable functions on If00

L. f IfnI du < oo,Lln=1

show(( that n 1 J converges a.e. [p] to a finite-valued function, and/rtJ dp = Lrn 1 fn fn d e.

3. Let f = f(x, y) be a real-valued function of two real variables, definedfor a < y < b, c < x < d, such that f is a Borel measurable function of yfor each fixed x. Assume that for each x, f(x, ) is integrable over (a, b)(with respect to Lebesgue measure). Suppose that the partial derivativefi(x, y) off with respect to x exists for all (x, y), and suppose there is aBorel measurable h: (a, b) -> R such that If, (x, y) I < h(y) for all x, y,where $a h(y) dy < oo.

Show that d[$a f (X, y) dy]/dx exists for all x e (c, d), and equalsJQ f1(x, y) dy. (It must be verified that f1(x, ) is Borel measurable foreach x.)

4. If p is a measure on (S2, F) and A1, A2, ... is a sequence of sets in y,use Fatou's lemma to show that

(liminf lim inf p(Ap).

n n_ao

If p is finite, show that

p(lim sup Ap) Z lim sup p(A.).

n 111 n-+ao

Thus if p is finite and A = limp Ap, then p(A) = limn..co (Foranother proof of this, see Section 1.2, Problem 10.)

5. Give an example of a sequence of Lebesgue integrable functions f,converging everywhere to a Lebesgue integrable function f, such that

lim f' fn(x) dx < f" f(x) dx.n_oo -m -oo

Thus the hypotheses of the dominated convergence theorem and Fatou'slemma cannot be dropped.

1.7 COMPARISON OF LEBESGUE AND RIEMANN INTEGRALS 53

6. (a) Show that J; a-` In t dt = lim,,.,,, Ji [1 - (t/n)]° In t dt.(b) Show that Jo e-' In i dt = Ja [1 - (t/n)]° In t dt.

7. If (f), Pr, µ) is the completion of (Q, go, p) and f is a Borel measurablefunction on (Q, fl, show that there is a Borel measurable function gon (fl, such that f = g, except on a subset of a set in go of measure 0.(Start with indicators.)

8. If f is a Borel measurable function from R to R and a e R, show that

f(x)dx= f f(x - a) dx

in the sense that if one integral exists, so does the other, and the two areequal. (Start with indicators.)

1.7 Comparison of Lebesgue and Riemann Integrals

In this section we show that integration with respect to Lebesgue measureis more general than Riemann integration, and we obtain a precise criterionfor Riemann integrability.

Let [a, b] be a bounded closed interval of reals, and let f be a boundedreal-valued function on [a, b], fixed throughout the discussion. If P: a = xo <x1 < < x = b is a partition of [a, b], we may construct the upper and lowersums off relative to P as follows.

Let

Mi = sup{f(y): xi_1 < y :!g xi}, i = 1, ..., n,

mi=inf{f(y):xi_I <y<-x;}, i=1,...,n,and define step functions a and /3, called the upper and lower functionscorresponding to P, by

a(x)=M; if xi-1 <x<xi, i= 1,...,n,/3(x) = mi if xj_1 < x < x1, n

[a(a) and #(a) may be chosen arbitrarily]. The upper and lower sums aregiven by

U(P) _ Y_ M1(xi - xi- I)+i=1

L(P) _ mi(xi - xi- I).i=1

Now we take as a measure space Q = [a, b], ,F = R[a, b], the Lebesguemeasurable subsets of [a, b], u = Lebesgue measure. Since a and / are simplefunctions, we have

b bU(P) = f a du, L(P) = f/3 du.a a

Now let P1, P2, ... be a sequence of partitions of [a, b] such that Pk+lis a refinement of Pk for each k, and such that I Pk I (the length of the largestsubinterval of Pk) approaches 0 as k -+ oo. If ak and Nk are the upper andlower functions corresponding to Pk, then

P2>...al J a2

Thus ak and Nk approach limit functions a and P. If If I is bounded by M,then all I ak and I /3k are bounded by M as well, and the function that isconstant at M is integrable on [a, b] with respect to p, since

p[a,b] =b-a<oo.By the dominated convergence theorem,

b b

lira U(Pk) = lim f ak du = f a dp,k-oc a a

and

6/ b

lim L(PI) = lim f Yk du. = f $ du.k-oo k-oc a a

We shall need one other fact, namely that if x is not an endpoint of anyof the subintervals of the Pk,

f is continuous at x if a(x) =f(x) _ /3(x).

This follows by a standard s-S argument.If limk-,, U(Pk) = limk., L(Pk) = a finite number r, independent of the

particular sequence of partitions, f is said to be Riemann integrable on [a, b],and r = rab(f) is said to be the (value of the) Riemann integral off on [a, b].The above argument shows that f is Riemann integrable if

b 6

f adu= f /3dµ=r,a a

independent of the particular sequence of partitions. If f is Riemann integrable,

rab(f) = f a du = fbP

du.ba a

We are now ready for the main results.

1.7.1 Theorem. Let f be a bounded real-valued function on [a, b].

(a) The function f is Riemann integrable on [a, b] iff f is continuousalmost everywhere on [a, b] (with respect to Lebesgue measure).

(b) If f is Riemann integrable on [a, b], then f is integrable with respectto Lebesgue measure on [a, b], and the two integrals are equal.

PROOF. (a) If f is Riemann integrable,

brab(f)=Jadp= f Pdµ.a a

Since /3 < f< a, 1.6.6(b) applied to a - /3 yields a =f = /3 a.e.; hence f iscontinuous a.e. Conversely, assume f continuous a.e.; then a =f = /3 a.e.Now a and /3 are limits of simple functions, and hence are Bore] measurable.Thus f differs from a measurable function on a subset of a set of measure 0,and therefore f is measurable because of the completeness of the measurespace. (See Section 1.5, Problem 4.) Since f is bounded, it is integrable withrespect to p, and since a =f = /3 a.e., we have

faadp= fafidu= f fdu, (1)

independent of the particular sequence of partitions. Therefore f is Riemannintegrable.

(b) If f is Riemann integrable, then f is continuous a.e. by part (a). Butthen Eq. (1) yields rab(f) = Jb f dµ, as desired. I

Theorem 1.7.1 holds equally well in n dimensions, with [a, b] replaced bya closed bounded interval of R"; the proof is essentially the same.

A somewhat more complicated situation arises with improper integrals;here the interval of integration is infinite or the function f is unbounded.Some results are given in Problem 3.

We have seen that convenient conditions exist that allow the interchangeof limit operations on Lebesgue integrable functions. (For example, seeProblems 1-3 of Section 1.6.) The corresponding results for Riemann integrablefunctions are more complicated, basically because the limit of a sequence ofRiemann integrable functions need not be Riemann integrable, even if theentire sequence is uniformly bounded (see Problem 4). Thus Riemann inte-grability of the limit function must be added as a hypothesis, and this isa serious limitation on the scope of the results.


Problems

1. The function defined on [0, 1] byf(x) = 1 if x is irrational, and f(x) = 0if x is rational, is the standard example of a function that is Lebesgueintegrable (it is 1 a.e.) but not Riemann integrable. But what is wrongwith the following reasoning:

If we consider the behavior of f on the irrationals, f assumes theconstant value I and is therefore continuous. Since the rationals haveLebesgue measure 0, f is therefore continuous almost everywhere andhence is Riemann integrable.

2. Let f be a bounded real-valued function on the bounded closed interval[a, b]. Let F be an increasing right-continuous function on [a, b] withcorresponding Lebesgue-Stieltjes measure ,u (defined on the Borelsubsets of [a, b]).

Define M; , m; , or, and J3 as in Section 1.7, and take

bU(P) _ E M,(F(x1) - F(x,- 1)) = fa du,

L(P) _ Z mi(F(x1) - F(x`-1)) _ fb du,i=1 a

where f; indicates that the integration is over (a, b]. If {Pk) is a sequenceof partitions with I Pk -+0 and Pk+, refining Pk, with ak and fl, theupper and lower functions corresponding to Pk,

b

lim U(Pk) =J

a du,k-x a

Iim L(Pk) =J

/3 du,k-oo a

where a = limk,. ak, /3 = limk-m Nk. If U(Pk) and L(P,,) approach thesame limit rab(f; F) (independent of the particular sequence of partitions),this number is called the Riemann-Stieltjes integral off with respect toF on [a, b], and f is said to be Riemann-Stieltjes integrable with respectto F on [a, b].

(a) Show that f is Riemann-Stieltjes integrable iff f is continuous a.e. [u]on [a, b].

(b) Show that if f is Riemann-Stieltjes integrable, then f is integrablewith respect to the completion of the measure p, and the twointegrals are equal.

3. If f: R -+ R, the improper Riemann integral off may be defined as

r(f) = lim rab(f)

if the limit exists and is finite.(a) Show that if f has an improper Riemann integral, it is continuous

a.e. [Lebesgue measure] on R, but not conversely.(b) If f is nonnegative and has an improper Riemann integral, show

that f is integrable with respect to the completion of Lebesguemeasure, and the two integrals are equal. Give a counterexampleto this result if the nonnegativity hypothesis is dropped.

4. Give an example of a sequence of functions fa on [a, b] such that eachfa is Riemann integrable, I fa I < I for all n, fa -+f everywhere, but f isnot Riemann integrable.

Note: References on measure and integration will be given at the end ofChapter 2.

2

Further Results in Measure andIntegration Theory

2.1 Introduction

This chapter consists of a variety of applications of the basic integrationtheory developed in Chapter 1. Perhaps the most important result is theRadon-Nikodym theorem, which is fundamental in modern probabilitytheory and other parts of analysis. It will be instructive to consider a specialcase of this result before proceeding to the general theory. Suppose that F isa distribution function on R, and assume that F has a jump of magnitude akat the point xk, k = 1, 2, .... Let us subtract out the discontinuities of F;specifically let y, be a measure concentrated on {x x2, ...}, with µ,{xk} = akfor all k, and let F, be a distribution function corresponding to It, ThenG = F - F, is a continuous distribution function, so that the correspondingLebesgue-Stieltjes measure A satisfies A{x} = 0 for all x. Now in any " prac-tical " case, we can write G(x) = J x_ . f (t) dt, x e R, for some nonnegativeBorel measurable function f (the way to find f is to differentiate G). It followsthat A(B) = fB f (x) dx for all B e 99(R). To see this, observe that if 2'(B) =fB f (x) dx, then is a measure on (R) and A'(a, b] = G(b) - G(a); thus A' isthe Lebesgue-Stieltjes measure determined by G; in other words, A' = A.

It is natural to conjecture that if A is a measure on R(R) and A{x} = 0 forall x, then we can write A(B) = f B f (x) dx, B e A (R), for some nonnegative

58

2.1 INTRODUCTION 59

Borel measurable f. However, as found by Lebesgue, the conjecture is falseunless the hypothesis is strengthened. Not only must we assume that J. assignsmeasure 0 to singletons, but in fact we must assume that A is absolutely con-tinuous with respect to Lebesgue measure p, that is, if µ(B) = 0, then .?(B) = 0.In general, ) may be represented as the sum of two measures A, and ).21where A, is absolutely continuous with respect to p and A2 is singular withrespect to p, which means that 22 is concentrated on a set of Lebesguemeasure 0. A simple example of a measure singular with respect to p is onethat is concentrated on a countable set; however, as we shall see, morecomplicated examples exist.

The first step in the development of the general Radon-Nikodym theoremis the Jordan-Hahn decomposition, which represents a countably additiveset function as the difference of two measures.

Let (S), .F, p) be a measure space, h a Borel measurable function such thatSa h dµ exists. If 2(A) = JA h dµ, A e.F, then by 1.6.1, A is a countably additiveset function on .5F. We call 1. the indefinite integral of h (with respect to p).If p is Lebesgue measure and A = [a, x], then 2(A) = f

ah(y) dy, the familiar

indefinite integral of calculus. As we noted after the proof of 1.6.1, a is thedifference of two measures, at least one of which is finite. We are going toshow that any countably additive set function can be represented in this way.First, a preliminary result.

2.1.1 Theorem. Let A be a countably additive extended real-valued set func-tion on the a-field F. of subsets of .0. Then 2 assumes a maximum and aminimum value, that is, there are sets C, D e F such that

1.(C) = sup{2(A) : A e ?} and 2(D) = inf (2(A) : A e F}.

Before giving the proof, let us look at some special cases. If A is a measure,the result is trivial: take C = Q, D = 0. If I is the indefinite integral of h withrespect to it, we may write

2(A)= fA

f hdy+ f hdp.A An(w:h(w)_ O} An{w:h(w)<0}

Thus

f hdp<A(A)< f hdi(.(w: h(w) <0} {w: h(w)>_ 0)

Therefore we may take C = {co : h(w) >_ 0), D = {co : h(w) < 0).

PROOF. First consider the sup. We may assume that A < co, for if 2(A0) = o0we take C = A0. Let A. e FF with sup A, and let A = U , A. e .°F.

60 2 FURTHER RESULTS IN MEASURE AND INTEGRATION THEORY

For each n, we may partition A into 2" disjoint subsets Anm, where eachAnm is of the form A,* n A2* n n A"*, with A,* either A; or A - A;.For example, if n = 3, we have (with intersections written as products)

A = A,A2A3 u A1A2A3' u A,A2'A3 v A1A2'A3' u A1'A2A3

v A1'A2 A3' u A,'A2'A3 u A1'A2'A3', where A;' = A - A.

Let B" = U. {Anm : A(Anm) > 0); set B. = 0 if ).(Anm) < 0 for all m. Noweach A. is a finite union of some of the Anm, hence A(A,,) < A(Bn) by definitionof B. Also, if n' > n, each An.m. is either a subset of a given Anm or disjointfrom it (for example, A,A2'A3' (-- A1A2', and A,A2'A3' is disjoint from A,A2 ,A,'A2 , and A1'A2'). Thus Uk=n Bk can be expressed as a union of B. andsets E disjoint from Bn such that A(E) >- 0. [Note that, for example, if).(A1A2') = A(A,A2'A3) + A(A,A2'A3') > 0, it may happen that A(A,A2'A3)< 0, A(A,A2'A3') > 0, so the sequence {Bn} need not be monotone.] Conse-quently,

2(An) < 2(Bn) < A(U Bk) -> A(U Bk) as r -+ oo by 1.2.7(a).k-n k="

Let C=limn U00 Now Uk fBkJC, and 0<A(Uk" Bk) < oo for all n. By 1.2.7(b), A(Uk n Bk) - A(C) as n -+ oo. Thus

sup A = lim A(A,,) < lim A(U B) = A(C) < sup A;M_ 00 "_ 00 =n

hence A(C) = sup A. The above argument applied to -A yields D c- .F withA(D) = inf A. I

We now prove the main theorem of this section.

2.1.2 Jordan-Hahn Decomposition Theorem. Let A be a countably additiveextended real-valued set function on the a-field F. Define

Ai'(A)=sup{A(B):Be. , BcA),A-(A) = - inf {).(B) : B e .F, B c A).

Then A' and A- are measures on . and A = A+ - A-.

PROOF. We may assume A never takes on the value - co. For if - oo belongsto the range of A, +oo does not, by definition of a countably additive setfunction. Thus -A never takes on the value -co. But (-A)+ = A- and(- A)- = A+, so that if the theorem is proved for -A it holds for A as well.

2.1 INTRODUCTION 61

Let D be a set on which A attains its minimum, as in 2.1.1. Since 1(0) = 0,we have - co < A(D) < 0. We claim that

A(A n D) < 0, A(A n D`) >- 0 for all A e .. (1)

For if A(A n D) > 0, then A(D) = A(A n D) + A(A` n D). Since 1.(D) is finite,so are A(A n D) and A(A` n D); hence A(A` n D) = A(D) - A(A n D) <A(D), contradicting the fact that A(D) = inf A. If A(A n D`) < 0, then1.(D u (A n D`)) = A(D) + A(A n D`) < A(D), a contradiction.

We now show that

A+(A) = A(A n D`), A-(A) = -A(A n D). (2)

The theorem will follow from this. We have, for B e JIF, B = A,

A(B) = ).(B n D) + A(B n D`)< A(B n D`) by (1)

< A(B n D`) + A((A - B) n D`)= A(A n D`).

Thus A+(A) < A(A n D`). But A(A n D`) < A+(A) by definition of A+, provingthe first assertion. Similarly,

A(B) = A(B n D) + A(B n D`)

z A(B n D)A(Br D)+A((A-B)r D)

= A(A n D).

Hence -A-(A) >_ A(A n D). But A(A n D) > -A-(A) by definition of Acompleting the proof. I

2.1.3 Corollaries. Let A be a countably additive extended real-valued setfunction on the o-field..

(a) The set function A is the difference of two measures, at least one ofwhich is finite.

(b) If A is finite (A(A) is never ± oo for any A e F), then A is bounded.(c) There is a set D e .F such that A(A n D) < 0 and A(A n D`) >- 0 for

all A e..(d) If D is any set in .F such that A(A n D) < 0 and A(A n D`) >- 0 for

all A e .F, then A+(A) = A(A n D`) and A-(A) = -A(A n D) for all A e F.(e) If E is another set in F such that A(A n E) < 0 and A(A n E`) > 0

for all A e .9, then 1 A I (D E)=O, where J A I =).+ + A-.

PROOF. (a) If A > - oo, then in 2.1.2, A- is finite; if A < + oo, A+ is finite[see Eq. (2)].

(b) In 2.1.2, A+ and A- are both finite; hence for any A e.y, IA(A)I< A+(S2) + A(S2) < oo.

(c) This follows from (1) of 2.1.2.(d) Repeat the part of the proof of 2.1.2 after Eq. (2).(e) By (d), A+(A) = A(A n D`), A e .F; take A = D n E` to obtain

A+(D n E`) = 0. Also by (d), A+(A) = 2(A n E`), A e F; take A = if n Eto obtain A+(D` n E) = 0. Therefore A+(D A E) = 0. The same argumentusing A-(A) = -A(A n D) _ -A(A n E) shows that A-(D A E) = 0. Theresult follows. I

Corollary 2.1.3(d) is often useful in finding the Jordan-Hahn decomposi-tion of a particular set function (see Problems 1 and 2).

2.1.4 Terminology. We call A+ the upper variation or positive part of A, A-the lower variation or negative part, IA I = A+ + A- the total variation. SinceA = A+ - A-, it follows that I A(A) i < 1 A I (A), A e Pr. For a sharper result,see Problem 4.

Note that if A e F, then I A I (A) = 0 iff 2(B) = 0 for all B e .°F, B c A.The phrase signed measure is sometimes used for the difference of two

measures. By 2.1.3(a), this is synonomous (on a a-field) with countablyadditive set function.

Problems

1. Let P be an arbitrary probability measure on R(R), and let Q be pointmass at 0, that is, Q(B) = 1 if 0 e B, Q(B) = 0 if 0 0 B. Find the Jordan-Hahn decomposition of the signed measure A = P - Q.

2. Let A(A) = fA f dµ, A in the a-field ., where f n f dli exists; thus A is asigned measure on ,F. Show that

A+(A) = fAf+ dµ, (A) = f

Af dp, I A I (A) = fA III dp.

3. If a signed measure A on the a-field .F is the difference of two measuresA, and A2, show that A, >- 1.+, 22 >- A - .

4. Let A be a signed measure on the a-field F. Show that 1A I (A) _sup{y°=, A(Ei) : E E2 , ..., E disjoint measurable subsets of A,n = 1, 2, ...}. Consequently, if A, and A2 are signed measures on F, thenIA.+A21<IAII+IA21

2.2 RADON-NIKODYM THEOREM AND RELATED RESULTS 63

2.2 Radon-Nikodym Theorem and Related Results

If (0, F, p) is a measure space, then 2(A) = JA g dp, A e .F, defines a signedmeasure if jog dµ exists. Furthermore, if A e . and µ(A) = 0, then 2(A) = 0.For g1A = 0 on A`, so that gIA = 0 a.e. [p], and the result follows from1.6.5(a).

If it is a measure on the a-field .F, and ). is a signed measure on .F, wesay that 2 is absolutely continuous with respect to it (notation A<< µ) ifp(A) = 0 implies 1.(A) = 0 (A E F). Thus if A is an indefinite integral withrespect to it, then A < p. The Radon-Nikodym theorem is an assertion inthe converse direction; if A < p (and it is a-finite on .fl, then A is an indefiniteintegral with respect to it. As we shall see, large areas of analysis are basedon this theorem.

2.2.1 Radon-Nikodym Theorem. Let it be a a-finite measure and ) a signedmeasure on the a-field .F of subsets of Q. Assume that A is absolutely con-tinuous with respect to p. Then there is a Borel measurable functiong : Sl -+ R such that

2(A) = SA g dp for all A e F.

If h is another such function, then g = h a.e. [p].

PROOF. The uniqueness statement follows from 1.6.11. We break the existenceproof into several parts.

(a) Assume A and g are finite measures.

Let 9 be the set of all nonnegative µ-integrable functions f such thatSA f dp 5 ),(A) for all A e 3F. Partially order .9' by calling f < g ifff < g a.e. [p].Let c' be a chain (totally ordered subset) of .5", and let s = sup{ f n f dp : f e W).

We can find functions f e f with f n f dp j s, and since W is totally ordered,f,, _<f,+, a.e. for all n. [If f a.e., then f n dµ:!9 Inf. dp <_ Snf,,+, dp;hence by 1.6.6(b), f = f. +I a.e.]

Thus f increases to a limit f a.e., and by the monotone convergencetheorem, In f dp = s.

We show that f is an upper bound of W. Let h be an element of V. Ifh < f, a.e. for some n, then h < f a.e. If h - f a.e. for all n, then h >_ f a.e.;hence In h dp = In f dp = s. But then h = f a.e. and therefore f is an upperbound of W. (We have f e So by definition of .50.)

By Zorn's lemma, there is a maximal element g e .So. We must show that2(A) = IA g dp, A e JT: Let 2,(A) = )(A) - JA g dp, A e F. Then A, is a

measure, )., < p, and A1(S2) < oo. If A, is not identically 0, then A,((2) > 0,hence

µ(12)-kA1(52)<0 for some k>0. (1)

Apply 2.1.3(c) to the signed measure k - kA1 to obtain D e .F such that forallAe.F,

g(AnD)-kA1(AnD)<0, (2)

and

µ(A n D`) - kA,(A n D`) >- 0. (3)

We claim that µ(D) > 0. For if µ(D) = 0, then A(D) = 0 by absolute con-tinuity, and therefore A,(D)= 0 by definition of A1. Take A = 0 in (3) to obtain

0:5 µ(D`) - kA,(D`)

= µ((2) - k21(S2) since p(D) = A1(D) = 0

<0 by (1),

a contradiction. Define h((o) = 1/k, co e D; h(w) = 0, w 0 D. If A e F,

f Ah du = k p(A n D) < A, (A n D) by (2)

< A1(A) = A(A) - f Ag dµ.

Thus fA (h + g) du <_ A(A). But h + g > g on the set D, with µ(D) > 0, con-tradicting the maximality of g. Thus A, __ 0, and the result follows.

(b) Assume µ is a finite measure, A a a-finite measure.

Let n be the disjoint union of sets A. with oo, and letA n = 1, 2, .... By part (a) we find a nonnegative Borel

measurable g with A(A) = A A

u is a finite measure, A an arbitrary measure.

Let l be the class of sets C e F such that A on C (that is, A restricted toFc = {A n C: A E ,F}) is a-finite; note that 0 e', so W is not empty.Let s = sup(p(A): A e c9) and pick C. e f with s. If C = UR 1 C,,,then C E (B by definition of ', and s Z µ(C) ;-> s; hence µ(C) = s.

By part (b), there is a nonnegative g': C -+ R, measurable relative to 3F'c andR(R), such that

A(A n C) = f g' dp for all A e .F.Ar,C

Now consider an arbitrary set A e . .

Case I : Let u(A n C`) > 0. Then A(A n C`) = oo, for if A(A n C`) < oo,then C u (A n C`) a '; hence

s>_p(CU(AUC`))=p(C)+p(AnC`)>p(C)=s,a contradiction.

Case 2: Let p(A n C') = 0. Then A(A n C`) = 0 by absolute continuity.

Thus in either case, A(A n C`) = JA n c- co du. It follows that

A(A)=A(A n C)+A(An C`)= f gdp,A

where g = g' on C, g = oo on C`.

(d) Assume It is a a-finite measure, A an arbitrary measure.

Let 12 be the union of disjoint sets A with oo. By part (c), thereis a nonnegative function g : A. measurable with respect to 'FA,, andR(R), such that A(A n A A e .F. We may write this as2(A n JA g dp where g (co) is taken as 0 for w A.. Thus A(A) _Y. A(A n A.) = ( _ n f A 9n d p = fA 9 dp, where 9 = E. 9

(e) Assume p is a a-finite measure, A an arbitrary signed measure.

Write A = A+ - A- where, say, A- is finite. By part (d), there are non-negative Borel measurable functions gi and 92 such that

A+(A) = fA 9i dp, A -(A) = fA 92 dp, A e .F.

Since A- is finite, 92 is integrable; hence by 1.6.3 and 1.6.6(a), A(A) _JA (91 - 92) dp

2.2.2 Corollaries. Under the hypothesis of 2.2.1,

(a) If A is finite, then g is p-integrable, hence finite a.e. [p].(b) If I A I is a-finite, so that fl can be expressed as a countable union

of sets A such that I AI (An) is finite (equivalently is finite), then g isfinite a.e. [p].

(c) If A is a measure, then g Z 0 a.e. [lt].

PROOF. All results may be obtained by examining the proof of 2.2.1. Alter-natively, we may proceed as follows:

(a) Observe that %(fl) = Jn g dp, finite by hypothesis.(b) By (a), g is finite a.e. [µ] on each A., hence finite a.e. [,u] on fl.(c) Let A = {w: g(co) < 0}; then 0 < .1(A) = f,, g dp < 0. Thus -gIA is a

nonnegative function whose integral is 0, so that gIA = 0 a.c. [p] by 1.6.6(b).Since gIA < 0 on A, we must have u(A) = 0. 1

If .1(A) _ 1A g dp for each A e F, g is called the Radon-Nikodyin derivativeor density of A with respect to p, written d.1/du. If p is Lebesgue measure,then g is often called simply the density of A.

There are converse assertions to 2.2.2(a) and (c). Suppose that

A(A)=.(Agdp,

where fngdp is assumed to exist. If g is p-integrable, then 2 is finite; ifg > 0 a.e. [p], then A >- 0, so that A is a measure. (Note that v-finiteness of itis not assumed.) However, the converse to 2.2.2.(b) is false; if g is finite a.e.[p], I A I need not be a-finite (see Problem 1).

We now consider a property that is in a sense opposite to absolutecontinuity.

2.2.3 Definitions. Let p1 and 112 be measures on the a-field .F. We say thatp, is singular with respect to P2 (written p, 1 P2) if there is a set A e .Fsuch that pl(A) = 0 and µ2(A`) = 0; note p, is singular with respect to P2 ifP2 is singular with respect to pl, so we may say that p1 and P2 are mutuallysingular. If A, and .12 are signed measures on .F, we say that A, and 22 aremutually singular if 11, 1 1 1221.

If p, 1 102, with µ,(A) = p2(A`) = 0, then P2 only assigns positive measureto subsets of A. Thus u2 concentrates its total effect on a set of p,-measure 0;on the other hand, if 112 < P P2 can have no effect on sets of p,-measure 0.

If A is a signed measure with positive part A and negative part A-, wehave A 1 A- by 2.1.3(c) and (d).

Before establishing some facts about absolute continuity and singularity,we need the following lemma. Although the proof is quite simple, the resultis applied very often in analysis, especially in probability theory.

2.2.4 Borel-Cantelli Lemma. If A1, A2, ... e F and Y', oo, thenp(lim sup 0.

PROOF. Recall that lim sup,, A,, = ()n , Uk n Ak; hence

u(lim sup A,) < p( U Ak) for all nn k=n

mIC(Ak) -- 0 as n - oo.

k=n

2.2.5 Lemma. Let p be a measure, and A, and 22 signed measures, on thec-field F.

(a) If A, 1 p and 22 1 p, then A, + 22 IX(b) If A, < p, then I A, I < It, and conversely.(c) If A.,<pand 22lp,then A,122.(d) If A, < p and A, 1 p, then A, - 0.(e) If A, is finite, then A, << p if lim (A).o 2,(A) = 0.

PROOF. (a) Let p(A) = p(B) = 0, IA, I (Ac) = 1221(B`) = 0. Then p(A u B)= 0 and 21(C) = . 2(C) = 0 for every C e .F with C c A` o B`; hence12, + 221 [(A v B)`] = 0.

(b) Let p(A) = 0. If A, '(A) > 0, then (see 2.1.2) A,(B) > 0 for someB = A; since p(B) = 0, this is a contradiction. It follows that A ', and similarlyA, -, is absolutely continuous with respect to p; hence I a., I << Y. (This mayalso be proved using Section 2.1, Problem 4.) The converse is clear.

(c) Let p(A) = 0, 1 12 I (A`) = 0. By (b), I A, I (A) = 0, so I Al I 1 1 221(d) By (c), Al 1 A; hence for some A e ., A, I (A) = I A, I (A`) = 0.

Thus IA,I(Q)=0.(e) If p(A") -+0 implies A,(A,,) -0, and p(A) = 0, set An =-A to conclude

that A, (A) = 0, so 2 < p.

Conversely, let 2, 4 p.If o I ,1, (A) 0 0 we can find, for some e > 0, sets A. e .F with

p(A,,) < 2-" and I A, I (A,,) >_ E for all n. Let A = limn sup A"; by 2.2.4,p(A) = 0. But I A, I (Uk n Ak) ? 1 Al I (An) ? e for all n; hence by 1.2.7(b),1 A, I (A) >_ s, contradicting (b). Thus limU(A) 1A, I (A) = 0, and the resultfollows since I.11(A)1 S 1 A, I (A).

If A, is an indefinite integral with respect top (hence A, < p), then 2.2.5(e)has an easier proof. If 21(A) = fA f dp, A e .F, then

fA If Idp=$AnflfI5n} Ifldp+fAnjlfl>n} If Idu

< np(A) +f Ill dy.(Ill >n}

By 1.6.1 and 1.2.7(b), 1(1!I>nl If I dp may be made less than e/2 for large n,say n >_ N. Fix n = N and take µ(A) < e/2N, so that JA If I dµ < E-

lf it is a a-finite measure and A a signed measure on the a-field., A maybe neither absolutely continuous nor singular with respect to p. However,if I A I is a-finite, the two concepts of absolute continuity and singularity areadequate to describe the relation between A and p, in the sense that A can bewritten as the sum of two signed measures, one absolutely continuous and theother singular with respect to p.

2.2.6 Lebesgue Decomposition Theorem. Let p be a a-finite measure on thea-field ., A a a-finite signed measure (that is, I A I is a-finite). Then A has aunique decomposition as A, + A2 , where A, and A2 are signed measures suchthat A,<,u,221p.

PROOF. First assume A is a a-finite measure. Let m = p + A, also a a-finitemeasure. Then p and A are each absolutely continuous with respect to m;hence by 2.2.1 and 2.2.2(c) there are nonnegative Borel measurable functionsf and g such that µ(A) = JA f dm, A(A) = JA g dm, A e..

Let B = {(o: f (w) > 01, C = B` = {co: f (co) = 0), and define, for eachA e .F,

A 1(A) = A(A n B), A2(A) = 2(A n C).

Thus A, + A2 = A. In fact, A, 4 µ and A2 1 p. To prove A, < p, assumeµ(A) = 0. Then JA f dm = 0, hence f = 0 a.e. (m] on A. But f > 0 on A n B;hence m(A n B) = 0, and consequently A(A n B) = 0; in other words,A,(A) = 0. Thus A, <<,u.

To prove A2 1 µ, observe that A2(B) = 0 and p(B`) = µ(C) = I c 0 dm = 0.Now if A is a a-finite signed measure, the above argument applied to

A+ and A- proves the existence of the desired decomposition.To prove uniqueness, first assume A finite. If A = Al + A2 = Al' + A2',

where A,, Al' 4A, A2, A2' -Lµ, then Al - A,' = A2' - A2 is both absolutelycontinuous and singular with respect top; hence is identically 0 by 2.2.5(d). IfA is a-finite and f) is the disjoint union of sets A. with I A I oo, apply theabove argument to each A. and put the results together to obtain uniquenessof A, and A2 . I

Problems

1. Give an example of a measure p and a nonnegative finite-valued Borelmeasurable function g such that the measure A defined by A(A) = JA g dpis not a-finite.

2. If A(A) = JA g dp, A e ffl', and g is p-integrable, we know that A is finite;in particular, A = {co: g(co) 96 0} has finite 1-measure. Show that A hasa-finite p-measure, that is, it is a countable union of sets of finite A-measure. Give an example to show that p(A) need not be finite.

3. Give an example in which the conclusion of the Radon-Nikodym theoremfails; in other words, 1 < p but there is no Bore] measurable g such that1(A) = SA g dµ for all A e F. Of course p cannot be a-finite.

4. (A chain rule) Let (0, .F, p) be a measure space, and g a nonnegativeBorel measurable function on 0. Define a measure A on F by

1(A) = fA g dµ, A e F.

Show that if f is a Bore] measurable function on 1),

f f d1= f fgdpn

in the sense that if one of the integrals exists, so does the other, and thetwo integrals are equal. (Intuitively, dA/dµ = g, so that d1 = g dµ.)

5. Show that Theorem 2.2.5(e) fails if 1, is not finite.6. (Complex measures) If (fI,.F) is a measurable space, a complex measure

A on is a countably additive complex-valued set function; that is,A =1, + iA2, where A, and 12 are finite signed measures.(a) Define the total variation of A as

2I (A) = sup { t 11(Ei) I : E,, ... , E.

disjoint measurable subsets of A, n = 1, 2, ... I.

Show that I A I is a measure on if. (The definition is consistent withthe earlier notion of total variation of a signed measure; see Section2.1, Problem 4.)

In the discussion below, A's, with various subscripts, denotearbitrary measures (real signed measures or complex measures), andp denotes a nonnegative real measure. We define A 4,u in the usualway; if A e F and p(A) = 0, then A(A) = 0. Define 1, I A2 ifI1, I 1 I22I . Establish the following results.

(b) 1 1 , + 12 I _ < 1 1 1 I + 112 I ; I a1 I = I a I I 1 I for any complex numbera. In particular if A = 1, + i12 is a complex measure, then

I 1 I s 11., I + 112 I ; hence 111 (Q) < co by 2.1.3(b).

(c) If A, 1p and A21p,then d,+A21µ.(d) If A < a, then I A I < a, and conversely.(e) If Al < p and A, 1 a, then Al I A2 .

(f) If A < a and 1. l y, then A=O.(g) If A is finite, then A < p if lim,.(A)-O .I(A) = 0.

2.3 Applications to Real Analysis

We are going to apply the concepts of the previous section to some prob-lems involving functions of a real variable. If [a, b] is a closed bounded inter-val of reals and f: [a, b] -> R, f is said to be absolutely continuous if for eachs > 0 there is a S > 0 such that for all positive integers n and all families(a,, b,), ..., (an, bn) of disjoint open subintervals of [a, b] of total length atmost b, we have

If(bi) -f(ai)l <_ s.i=t

It is immediate that this property holds also for countably infinite familiesof disjoint open intervals of total length at most 6. It also follows from thedefinition that f is continuous.

We can connect absolute continuity of functions with the earlier notion ofabsolute continuity of measures, as follows:

2.3.1 Theorem. Suppose that F and G are distribution functions on [a, b],with corresponding (finite) Lebesgue-Stieltjes measures a1 and a2. Letf = F - G, p = a1 - 92, so that a is a finite signed measure on R(a, b], witha(x, y] = f (y) - f (x), x < y. If m is Lebesgue measure on 9[a, b], thena 4 m if f is absolutely continuous.

PROOF. Assume q < m. If s > 0, by 2.2.5(b) and (e), there is a S > 0 such thatm(A) S S implies a I (A) < s. Thus if (a1, b1), ... , (an , bn) are disjoint openintervals of total length at most S,

n n

I f(bi) - f(ai) I = I alai, bil Ii=1 i=1

_ l a(ai , bi) Ii=1

(Note that p{bi} = 0 since p << m.) Therefore f is absolutely continuous.

2.3 APPLICATIONS TO REAL ANALYSIS 71

Now assume f absolutely continuous; ifs > 0, choose S > 0 as in thedefinition of absolute continuity. If m(A) = 0, we must show that µ(A) = 0.We use Problem 12, Section 1.4:

m(A) = inf{m(V): V z) A, V open},

µi(A) = inf{µi(V): V A, V open), i = 1, 2.

(This problem assumes that the measures are defined on t(R) rather thanP2[a, b]. The easiest way out is to extend all measures to Q(R) by assigningmeasure 0 to R - [a, b].) Since a finite intersection of open sets is open, wecan find a decreasing sequence of open sets such that µ(V.) -+ µ(A) and

m(A) = 0.Choose n large enough so that m(1') < 6; if V. is the disjoint union of the

open intervals (ai, bi), i = 1, 2, ... , then I I < i I µ(ai, bi) 1. But f iscontinuous, hence

p(bi) = µ(bi - 1/n, bi] = [f(bi) - f(bi - 1/n)] = 0.

Therefore

I µ(Vn) I l µ(ai , bill = if(bi) - f(ai) I <- e.i i

Since a is arbitrary and µ(V,) - µ(A), we have µ(A) = 0.

If f: R - R, absolute continuity off is defined exactly as above. If Fand Gare bounded distribution functions on R with corresponding Lebesgue-Stieltjes measures µ, and µz , and f = F - G, µ = i - µZ [a finite signedmeasure on P4(R)], then f is absolutely continuous if µ is absolutely con-tinuous with respect to Lebesgue measure; the proof is the same as in 2.3.1.

Any absolutely continuous function on [a, b] can be represented as thedifference of two absolutely continuous increasing functions. We prove thisin a sequence of steps.

apartition of [a,b],define

n

V(P) = F, I f(xi) -f(xi-1) I .i=1

The sup of V(P) over all partitions of [a, b] is called the variation of f on[a, b], written Vf(a, b), or simply V(a, b) if f is understood. We say that f isof bounded variation on [a, b] if V(a, b) < oo. If a < c < b, a brief argumentshows that V(a, b) = V(a, c) + V(c, b).

2.3.2 Lemma. If f: [a, b] -+ R and f is absolutely continuous on [a, b], then fis of bounded variation on [a, b].

PROOF. Pick any e > 0, and let S > 0 be chosen as in the definition of absolutecontinuity. If P is any partition of [a, b], there is a refinement Q of P consis-ting of subintervals of length less than 6/2. If Q: a = x0 < x, < < x = b,let io = 0, and let it be the largest integer such that x,, - xi,, < b; let i2 be thelargest integer greater than i, such that x,2 - xi, < S, and continue in thisfashion until the process terminates, say with i, = n. Now xi,,- xi,,_, >- 6/2,k = 1, 2,..., r - 1, by construction of Q; hence

2(b - a)r<l+b

M.

By absolute continuity, V(Q) < Me. But V(P) < V(Q) since the refiningprocess can never decrease V; the result follows.

It is immediate that a monotone function F on [a, b] is of boundedvariation : VF(a, b) = I F(b) - F(a) I. Thus if f = F - G, where F and G areincreasing, then f is of bounded variation. The converse is also true.

2.3.3 Lemma. If f: [a, b] --* R and f is of bounded variation on [a, b], thenthere are increasing functions F and G on [a, b] such that f = F - G. If f isabsolutely continuous, F and G may also be taken as absolutely continuous.

PROOF. Let F(x) = Vf(a, x), a 5 x < b; Fis increasing, for if h >- 0, V(a, x + h)- V(a, x) = V(x, x + h) >- 0. If G(x) = F(x) -1(x), then G is also increasing.For if x1 < x2 , then

G(x2) - G(x1) = F(x2) - F(x,) - (f(x2) -f(x1))= V(x1, X2) - (f(x2) -f(x1))z V(x1, x2) - If(x2) -f(xl)IZ 0 by definition of V(x x2).

Now assume f absolutely continuous. If e > 0, choose S > 0 as in thedefinition of absolute continuity. Let (a,, b,), ..., (a., b.) be disjoint openintervals with total length at most S. If Pi is a partition of [ai, bi], i =1, 2, .. ., n, then

n

V(P) < e by absolute continuity off.i=1

Take the sup successively over P1, ..., P. to obtain

V(ai,bi)<e;i=1

in other words,

[F(bi) - F(a)] <- s.i=1

Therefore Fis absolutely continuous. Since sums and differences of absolutelycontinuous functions are absolutely continuous, G is also absolutely con-tinuous.

We have seen that there is a close connection between absolute continuityand indefinite integrals, via the Radon-Nikodym theorem. The connectioncarries over to real analysis, as follows:

2.3.4 Theorem. Let f: [a, b] - R. Then f is absolutely continuous on [a, b] iff is an indefinite integral, that is, iff

x

f (x) - f (a) = f g(t) dt, a< x:9 b,

where g: [a, b] -> R is Bore] measurable and integrable with respect toLebesgue measure.

PROOF. First assume f absolutely continuous. By 2.3.3, it is sufficient toassume f increasing. If u is the Lebesgue-Stieltjes measure corresponding tof, and m is Lebesgue measure, then u < m by 2.3.1. By the Radon-Nikodymtheorem, there is an m-integrable function g such that u(A) = fA g dm for allBorel subsets A of [a, b]. Take A = [a, x] to obtain f (x) - f (a) = f; g(t) dt.

Conversely, assume f (x) - f(a) = J' g(t) dt. It is sufficient to assumeg >- 0 (if not, consider g+ and g- separately). Define u(A) = f,, g dm, A eM[a, b]; then u < m, and if F is a distribution function corresponding to u,Fis absolutely continuous by 2.3.1. But

xF(x) - F(a) = u(a, x] = f g(t) dt = f (x) - f (a).a

Therefore f is absolutely continuous. I

If g is Lebesgue integrable on R, the "if" part of the proof of 2.3.4 showsthat the function defined by f z , g(t) dt, x e R, is absolutely continuous,hence continuous, on R. Another way of proving continuity is to observethat

rg(t)m

dt - f g(t) d t = f_ g(t)I,x. x+h>(t) dt00 00

if h > 0, and this approaches 0 as h-0, by the dominated convergencetheorem.

If f (x) - f (a) = fag(t) dt, a < x < b, and g is continuous at x, then f is

differentiable at x and f'(x) = g(x); the proof given in calculus carries over.

If the continuity hypothesis is dropped, we can prove that f'(x) = g(x) foralmost every x e [a, b]. One approach to this result is via the theory of differen-tiation of measures, which we now describe.

2.3.5 Definition. For the remainder of this section, µ is a signed measure onthe Borel sets of Rk, assumed finite on bounded sets; thus if is is nonnegative,it is a Lebesgue-Stieltjes measure. If in is Lebesgue measure, we define, foreach x e R',

(DO)(x) =1im sup /2(C,), (Dµ)(x) = lim inf P(Cr)

r-o c. m(Cr) - r-O cr m(Cr)

where the Cr range over all open cubes of diameter less than r that contain x.It will be convenient (although not essential) to assume that all cubes haveedges parallel to the coordinate axes.

We say that it is differentiable at x iff bit and Dµ are equal and finite at x;we write (Dp)(x) for the common value. Thus it is differentiable at x iff forevery sequence of open cubes containing x, with the diameter of C.approaching 0, approaches a finite limit, independent of theparticular sequence.

The following result will play an important role:

2.3.6 Lemma. If {C1, ... , is a family of open cubes in R", there is adisjoint subfamily {C,,, ..., CJ such that m(Uj=1 C;) < 3k EP =1 m(C1D).

PROOF. Assume that the diameter of C; decreases with i. Set i, = 1, and take i2to be the smallest index greater than i, such that C12 is disjoint from Ci, ; leti3 be the smallest index greater than i2 such that C., is disjoint from C,, U C,2Continue in this fashion to obtain disjoint sets C,...... C,,. Now for anyj = 1, ... , n, we have C; r C,p # 0 for some i,, < j, for if not, j is not oneof the ip, hence i,, < j < ip+1 for some p (or is < j). But C; r) (Ci, v ... U Cip)is assumed empty, contradicting the definition of ip+1.

If Bp is the open cube with the same center as Ctp and diameter three timesas large, then since C3 r Ctp # 0 and diameter C3 < diameter C,p, we haveC3 c Bp . Therefore,

m(U C;) < m(U Bp) < m(Bp) = 3k m(C,p) Ii=1 p=1 p=1 p=1

We now prove the first differentiation result.

2.3.7 Lemma. Let it be a Lebesgue-Stieltjes measure on the Borel sets of Rk.If µ(A) = 0, then Dp = 0 a.e. [m] on A.

PROOF. If a > 0, let B = {x e A: (Dp)(x) > a). (Note that {x: supc, µ(C,)/m(Cr)> a) is open, and it follows that B is a Bore] set.) Fix r > 0, and let K be acompact subset of B. If x e K, there is an open cube C, of diameter less than rwith x e C,. and p(Cr) > am(C,). By compactness, K is covered by finitelymany of the cubes, say C,_., Ca . If {C,,, ... , C,=} is the subcollection of2.3.6, we have

s 3k 3k k

m(K)< m(U C,)< 3k m(C;p) - p(C;,) = p(U C1p) <_ - p(Kr)=1 p=1 a p=1 a p=1 / a

where Kr = {x e Rk: dist(x, K) < r}. Since r is arbitrary, we have m(K) <-3kp(K)/a < 3kp(A)la = 0. Take the sup over K to obtain, by Problem 12,Section 1.4, m(B) = 0, and since a is arbitrary, it follows that By < 0 a.e. [m]on A. But it >_ 0; hence 0 < Dµ < Dµ, so that Dp = 0 a.e. [m] on A. I

We are going to show that Dµ exists a.e. [m], and to do this the Lebesguedecomposition theorem is helpful. We write µ = µ1 + P21 where p1 4 m,P2 1 m. If I P2 I (A) = 0 and m(A`) = 0, then by 2.3.7, Dp2+ = Dµ2- = 0 a.e.[m] on A; hence a.e. [m] on Rk. Thus Dµ2 = 0 a.e. [m] on R'.

By the Radon-Nikodym theorem, we have µ1(E) = J,, g dm, E E R(Rk), forsome Borel measurable function g. As might be expected intuitively, g is(a.e.) the derivative of p, ; hence Dµ = g a.e. [m].

2.3.8 Theorem. Let p be a signed measure on R(R) that is finite on boundedsets, and let p = p, + P21 where p, << m and µ2 1 m. Then Dp exists a.e. [m]and coincides a.e. [m] with the Radon-Nikodym derivative g = dµ1 fdm.

PROOF. If a e R and C is an open cube of diameter less than r,

p,(C)-am(C)=fCCr

(g-a)dm< f(92a) (g-a)dm.

If A(E) = JE n (gZa} (g - a) dm, E a R(Rk), and A = {g < a), then 2(A) = 0;so by 2.3.7, DA = 0 a.e. [m] on A. But

µ1(C) < a + A(C)m(C)

-m(C)

hence Dp, < a a.e. [m] on A. Therefore, if E. = {x a Rk: g(x) < a < (Dp1)(x)},then m(Ea) = 0. Since { Dp, > g) e U {Ea : a rational), we have Dp, < g a.e.[m]. Replace p, by -p, and g by -g to obtain Pp, >- g a.e. [m]. By 2.2.2(b),g is finite a.e. [m], and the result follows. I

We now return to functions on the real line.

2.3.9 Theorem. Let f: [a, b] -+ R bean increasing function. Then the deriva-tive of f exists at almost every point of [a, b] (with respect to Lebesguemeasure). Thus by 2.3.3, a function of bounded variation is differentiablealmost everywhere.

PROOF. Let µ by the Lebesgue-Stieltjes measure corresponding to f; by 2.3.8,Dµ exists a.e. [m]; we show that Dµ = f' a.e. [m]. If a:5 x 0 and a sequence h 0with all h of the same sign and I [f (x + f (x)]/h - c I >_ a for all n.Assuming all h > 0, we can find numbers k > 0 such that

I

h + k 2

for all n, and since f has only countably many discontinuities, it may beassumed that f is continuous at x + h and x - k.. Thus we conclude thatµ(x - k,,, x + a contradiction.

We now prove the main theorem on absolutely continuous functions.

2.3.10 Theorem. Let f be absolutely continuous on [a, b], with f (x) - f (a) =I' g(t) dt, as in 2.3.4. Then f' = g almost everywhere on [a, b] (Lebesguemeasure). Thus by 2.3.4, f is absolutely continuous ifl f is the integral of itsderivative, that is,

x./ (x) -.f (a) = f .f '(t) dt,4

a<x<b.

PROOF. We may assume g >_ 0 (if not, consider g+ and g-). If µ1(A) = JA g dm,A e P(Rk), then Dµ1 = g a.e. [m] by 2.3.8. But if a < x < y < b, then µ,(x, y]= f(y) - f(x), so that µl is the Lebesgue-Stieltjes measure corresponding to

f. Thus by the proof of 2.3.9, Dµ1 =f' a.e. [m]. I

Problems

1. Let F be a bounded distribution function on R. Use the Lebesgue decom-position theorem to show that F may be represented uniquely (up toadditive constants) as F1 + F2 + F3, where the distribution functionsF; , j = 1, 2, 3 (and the corresponding Lebesgue-Stieltjes measures µ;)have the following properties:(a) F1 is discrete (that is, µl is concentrated on a countable set of points).

(b) F2 is absolutely continuous (P2 is absolutely continuous with respectto Lebesgue measure; see 2.3.1).

(c) F3 is continuous and singular (that is, P3 is singular with respect toLebesgue measure).

2. If f is an increasing function from [a, b] to R, show that Ja f'(x) dx <f (b) - f (a). The inequality may be strict, as Problem 3 shows. (Note thatby 2.3.9, f' exists a.e.; for integration purposes, f' may be definedarbitrarily on the exceptional set of Lebesgue measure 0.)

3. (The Cantor function) Let E,, E2 , ... be the sets removed from [0, 1] toform the Cantor ternary set (see Problem 7, Section 1.4). Define functionsFF : [0, 1] [0, 1] as follows : Let A,, A2 , .... be the subintervalsof U"=, E; , arranged in increasing order. For example, if n = 3,

E1 u E2 u E3 = (`Z17' 27) (V1) (,7 27} U l3, 3)

U (27' 27) U (9' 9) U (21' 27=A,uA2L...uA.,.

Define

F"(0) = 0,

F"(x)=k/2" if xeAk, k=1,2,...,2"-1,F"(1) = 1.

Complete the specification of F" by interpolating linearly. For n = 2, seeFig. 2.1; in this case,

E1 UE2=(9,9)U(3, 3)U(91=A,uA2LA3.

4

1 2 4 5 2 7 8 1

9 9 3 9 9 3 9 9X

Figure 2.1. Approximation to the Cantor function.

Show that F(x) each x, where F, the Cantor function, hasthe following properties:

(a) F is continuous and increasing.(b) F = 0 almost everywhere (Lebesgue measure).(c) F is not absolutely continuous.In fact

(d) F is singular; that is, the corresponding Lebesgue-Stieltjes measurep is singular with respect to Lebesgue measure.

4. Let f be a Lebesgue integrable real-valued function on Rk (or on an opensubset of Rk). If p(E) = 1, f (x) dx, E e 9B(Rk), we know that Dp = f a.e.(Lebesgue measure). If Dp =fat x0, then if C is an open cube containingxo and diam C -> 0, we have p(C)/m(C) -+f (xo); that is,

1

m(C)f c[ f(x) - f(xo)] dx -+ 0 as diam C - 0.

In fact, show that

lf(x)-fxo)I dx-0 as diamC-01 fcM(C)

for almost every x0 . The set of favorable x0 is called the Lebesgue set off.5. This problem relates various concepts discussed in Section 2.3. In all

cases, f is a real-valued function defined on the closed bounded interval[a, b]. Establish the following:

(a) If f is continuous, f need not be of bounded variation.(b) If f is continuous and increasing (hence of bounded variation), f need

not be absolutely continuous.(c) If f satisfies a Lipschitz condition, that is, I f (x) - f (y) I < L I x - y

for some fixed positive number L and all x, y e [a, b], then f isabsolutely continuous.

(d) If f exists everywhere and is bounded, f is absolutely continuous.[It can also be shown that if f' exists everywhere and is Lebesgueintegrable on [a, b], then f is absolutely continuous; see Titchmarsh(1939, p. 368).]

(e) If f is continuous and f' exists everywhere, f need not be absolutelycontinuous [consider f (x) = x2 sin (l/x2), 0 < x <- 1, f(0) = 0].

6. The following problem considers the change of variable formula in amultiple integral. Throughout the problem, T will be a map from V ontoW, where V and W are open subsets of Rk, T is assumed one-to-one,continuously differentiable, with a nonzero Jacobian. Thus T has acontinuously differentiable inverse, by the inverse function theorem of

advanced calculus [see, for example, Apostol (1957, p. 144)]. It alsofollows from standard advanced calculus results that for all x e V,

thE!as (1)

where A(x) is the linear transformation on Rk represented by the Jacobianmatrix of T, evaluated at x. [See Apostol (1957, p. 118).](a) Let A be a nonsingular linear transformation on Rk, and define a

measure 2 on R(Rk) by .1(E) = m(A(E)) where m is Lebesgue measure.Show that 2 = c(A)m for some constant c(A), and in fact c(A) is theabsolute value of the determinant of A. [Use translation-invarianceof Lebesgue measure (Problem 5, Section 1.4) and the fact that anymatrix can be represented as a product of matrices corresponding toelementary row operations.]

Now define a measure µ on M(V) by µ(E) = m(T(E)). By con-tinuity of T, if E > 0, x e V, and C is a sufficiently small open cubecontaining x, then T(C) has diameter less than e, in particular,m(T(C)) < oo. It follows by a brief compactness argument that it is aLebesgue-Stieltjes measure on P4(V).

Our objective is to show that p is differentiable and (Dp)(x) _J(x) I for every x e V, where J(x) = det A(x), the Jacobian of the

transformation T.(b) Show that it suffices to prove that if 0 e V and T(O) = 0, then (Dµ)(0)

= J det A(0) I .(c) Show that it may be assumed without loss of generality that A(0)

is the identity transformation; hence det A(0) = 1.Now given e > 0, choose a e (0, *) such that

I - E<(I - 2a)k<(I +2a)k 0such that if (x ! < 6, then I T(x) - x I < a I x I 1,/_k.

(d) If C is an open cube containing 0 with edge length fi and diameterJk /3 < S, take C1, C2 as open cubes concentric with C, with edgelengths /31 = (1 - 2a)/3 and /32 = (I + 2a)f. Establish the following:

(i) IfxeC,then T(x)eC2.(ii) If x belongs to the boundary of C, then T(x) t C1.

(iii) If x is the center of C, then T(x) E C1.(iv) C 1 - T(C) = C 1 - T(0-

Use a connectedness argument to conclude that C1 c T(C) C C2,and complete the proof that (Dµ)(0) = 1.

(e) If d is any measure on P6(V) and DA < oo on V, show that a isabsolutely continuous with respect to Lebesgue measure. It thereforefollows from Theorem 2.3.8 that

(f)

m(T(E)) = f _ I J(x) I dx, E e P(V).E

[If this is false, find a compact set K and positive integers n and jsuch that m(K) = 0, A(K) > 0, and %(C) < nm(C) for all open cubesC containing a point of K and having diameter less than 11j. Essen-tially, the idea is to cover K by such cubes and conclude that%(K) = 0, a contradiction.]If f is a real-valued Borel measurable function on W, show that

f wf(y) dy = f, f(T(x))1 d(x) I dx

in the sense that if one of the two integrals exists, so does the other,and the two integrals are equal.

7. (Fubini's differentiation theorem) Let ft, f2 , ... be increasing functionsfrom R to R, and assume that for each x, 1 I f (x) converges to afinite number fix). Show that Y I f'(x) =f'(x) almost everywhere(Lebesgue measure).Outline:(a) It suffices to restrict the domain of all functions to (0,1 ] and to assume

all functions nonnegative. Use Fatou's lemma to show thatYa I f ' (x) < f'(x) a.e.; hence f; (x) - 0 a.e.

(b) Choosenl,n2,...such that Ep.,,f;(1) <2-k,k= 1,2,...,andapply

part (a) to the functions gk(x) =f(x) - y;k I f;(x) = fi(x).

2.4 L° Spaces

If (S), F, p) is a measure space and p is a real number with p >- 1, the setof all Borel measurable functions f such that I f I a is µ-integrable has manyimportant properties. In order to fully develop these properties, it will beconvenient to work with complex-valued functions.

2.4.1 Definitions. If (Q, F) is a measurable space, a complex-valued Borelmeasurable function on (0, F) is a mapping f: (0, F) (R2, -4(R2)). Ifp,(x, y) = x and p2(x, y) = y, x, y e R, we may identify p, o f and p2 -f withthe real and imaginary parts off. If µ is a measure on .`-, we define

fn fdp= fn Refdu+i fn Imfdp,

2.4 L" SPACES 81

provided In Ref dp and In Inf dp are both finite. In this case we say that f isp-integrable. Thus in working with complex-valued functions, we do notconsider any cases in which integrals exist but are not finite.

The following result was established earlier for real-valued f [see 1.5.9(c)];it is still valid in the complex case, but the proof must be modified.

2.4.2 Lemma. If f is p-integrable,

fn f d1i I <- f nIf I dp.

PROOF. If f, f dp = re'e, r > 0, then Sn e-'Of dp = r = fn f dµ 1. But if f (co) _p(w)e""') (taking p >- 0), then

fe'e f dp = f peuw-et dp

n n

= I p cos((p - 0) dp since r is realn

<Jn pdp=fn

Many other standard properties of the integral carry over to the complexcase, in particular 1.5.5(b), 1.5.9(a) and (e), 1.6.1, 1.6.3, 1.6.4(b) and (c),1.6.5, 1.6.9, 1.6.10, and 1.7.1. In almost all cases, the result is an immediateconsequence of the fact that integrating a complex-valued function is equiva-lent to integrating the real and imaginary parts separately. Only two theoremsrequire additional comment. To prove that h is integrable if IhI is integrable[1.6.4(b)l, use the fact that IRehl, 11m h1 < Ihl < IRehI + I lm hl.Finally, to prove the dominated convergence theorem (1.6.9), apply the realversion of the theorem to I f -f 1, and note that If. -f 1 < 1 fn I + I f I < 2g.

If p > 0, we define the space L" = Lp(f), .F, p) as the collection of allcomplex-valued Bore] measurable functions f such that In If l' dp < oo. Weset

11111,, =(ffIfl"dlt)lip,

feL".

It follows that for any complex number a, llaf Ilp = lal ilf llp.f e V.We are going to show that L' forms a linear space over the complex field.

The key steps in the proof are the Holder and Minkowski inequalities, whichwe now develop.

2.4.3 Lemma. I f a, h, a, f? > 0, a + fl = 1, then aaba < as + fib.

PROOF. The statement to be proved is equivalent to - log(aa + fb) <a(-log a) + /3(-log b), which holds since -log is convex. [If g has a non-negative second derivative on the interval I e R, then g is convex on I, that isg(ax + fly) <- ag(x) + fg(y), x, y a I, a, fl > 0, a + /3 = 1. To see this, assumex < y and writeg(xx + fly) - ag(x) - /3g(y) = a[g(ax + fly) - g(x)] + f[g(ax + fly) - g(y)]

= a(3(y - x)[g'(u) - g'(v)] for some u, v

with x < it < ax + fly, ax + fly <- v < y. But g'(u) - g'(v) < 0 since g' isincreasing on I.] 1

2.4.4 Corollary. If c, d > 0, p, q > 1, (1 /p) + (1 /q) = 1, then cd < (cP/p) +(dglq)

PROOF. In 2.4.3, let a = I /p, /3 = I /q, a = cP, b = d' .

2.4.5 Holder Inequality. Let I < p < oo, I < q < oo, (1/p) + (1/q) = I. IffeLPand geLq,then fgaL' and Ilfgll1<-IIfIIp1I911q.

PROOF. In 2.4.4, take c= I f (w) I /IIf IIP, d= 190011119111; (the inequality isimmediate if 11f 11, or I1911, = 0). Then

I f(w)9(w) l < If(w) I P + 19(w) 19

JJ11,11911q pllf1IP 9119119

integrate to obtain

In lf9l dlI<1+5=1.11lhIPllg11q p q

When p = q = 2, we obtain1/2

ffIffl dle<- [J'IfI2dPfIgIldII]

and thus, using 2.4.2, we have the Cauchy-Schwarz inequality: If f andg e L2, then fg e Ll and

fn.fgdu :5 [f.Ifl2dµ fn 1912 du]1/2,

where g is the complex conjugate of g. (The reason for replacing g by g is tomake the inequality agree with the Hilbert space result to be discussed inChapter 3.)

2.4 If SPACES

2.4.6 Lemma. If a,b>_0,p>- 1, then (a+b)P<2P-'(aP+ba).

83

PROOF.Leth(x) = d[(a + x)P - 2p-1(aP + x°))Idx = p(a + x)P-' - 2P-lpxP-l;

sincep - 1,

h(x)>0 for a+x>2x, that is, x<a,h(x) = 0 at x = a,h(x) < 0 for x > a.

The maximum therefore occurs at x = a; hence

(a + b)P - 2P-1(a° + bP) < (a + a)P - 2p-1(aP + aP) = 0.

2.4.7 Minkowski Inequality. If f, g e LP (1 I and choose q such that (1/p) + (1 /q) = 1. Then

I f + g I P = If+gI If+gIP-1 <I f I If+gIP-1+ I g I If+gIP-'(1)

Now If +glP-' c-0; for

(p - 1)q =p1/9I

lp 1/PP;

hence

ff[If+gIP-1]9dµ= fnIf+gIPdk<oo.

Since f and g belong to LP and I f +gIP-' a L9, Holder's inequality impliesthat If I If+gIP-' and IgI If+gIp-1 eL1, and

1 /9

f III If+gIP-' dp < IlfllP [f,(If + gIP-1)4dµ IlfIIPIIf+gIIP19, (2)n

faIgI If+gIP-' dp <- IIgIIPIIf+gllp'Q (3)

By Eq. (1), 11f+g11D<(11f11p+11g11p)(Ilf+g11°1Q) Since p-(p/q)=1, theresult follows.

By Minkowski's inequality and the fact that Ilaf 11 P = J al IlfllP for f e LP,LP (1 < p < oo) is a vector space over the complex field. Furthermore, thereis a natural notion of distance in If, by virtue of the fact that II 11P is a semi-norm.

2.4.8 Definitions and Comments. A seminorm on a vector space L (over thereal or complex field) is a real-valued function II I{ on L, with the followingproperties.

IIf II >_ 0,

Il of II = I a I IIf II for each scalar a;consequently, if f = 0, then If II = 0.

Ilf+gll <_ IIfII+II9II(f and g are arbitrary elements of L). If II II is a seminorm with the additionalproperty that if II = 0 implies f = 0, II II is said to be a norm.

Now II II, is a seminorm on L°; the first two properties follow from thedefinition of II IIP, and the last property is a consequence of Minkowski'sinequality.

We can, in effect, change II lip into a norm by passing to equivalence classesas follows.

If f, g a LP(f2, .F, u), define f - g iff f = g a.e. [p]. Then IIf IIP is the samefor all f in a given equivalence class, by 1.6.5(b). Thus if LP is the collectionof equivalence classes, LP becomes a linear space, and II lip is a seminorm onLP. In fact II IIP is a norm, since If IIP = 0 implies f = 0 a.e. [u], by 1.6.6(b).

If II II is a seminorm on a vector space, we have a natural notion of dis-tance: d(f, g) = IIf - gil By definition of seminorm we have

d(f, g) > 0,d(f,g)=0 if f=g,d(f, g) = d(g, f),d(f, h) < d(f, g) + d(g, h).

Thus d has all the properties of a metric, except that d(f, g) = 0 does notnecessarily imply f = g; we call d a pseudometric. If II II is a norm, d isactually a metric. (There is an asymmetry of terminology between seminormand pseudometric, but these terms seem to be most popular, although"pseudonorm" is sometimes used, as is "semimetric."]

One of the first questions that arises in any metric space is the problem ofcompleteness; we ask whether or not Cauchy sequences converge. We aregoing to show that the LP spaces are complete. The following result will beneeded; students of probability are likely to recognize it immediately, but itappears in other parts of analysis as well.

2.4.9 Chebyshev's Inequality. Let f be a nonnegative, extended real-valued,Borel measurable function on (Q, .°F, u). If 0 _ e} < ± fn fP dµ.

2.4 Lp SPACES 85

The following version is often applied in probability. If g is an extended real-valued Borel measurable function on (Q, S) and P is a probability measure on

define

in =J

g dP (assumed finite, so that g is finite a.e. [P]),f2

La2=J (g - m)2dP.

If0<k<oo,

P{w: Ig(w) - ml ko}<k2

This follows from the first version with f = Ig - nil, e = kQ, p = 2.

PROOF.

Jfpdp> f fpdp!sP.{w:f(w) > e}.

0 (w: j(w)ze)

One more auxiliary result will be needed.

2.4.10 Lemma. If 91,92, ... a Lp (p > 0) and Il9k - 9k+lllp < ('j)k, k =1, 2, ..., then {gk} converges a.e.

PROOF. Let Ak = {co: I9k(w) - 9k+1(w)I >-2_k}. Then by 2.4.9,

N(Ak) : 2kpII9k - 9k+111 pp < 2-kp

By 2.2.4, p(lim sup,, A,,)= 0. But ifw 0 lira sup A, then I gk(w) - gk+,((O) l <2_k for large k, so {gk(w)} is a Cauchy sequence of complex numbers, and

therefore converges. I

Now, the main result:

2.4.11 Completeness of Lp, 1 - n1, and let g, = Ingeneral, having chosen 91, ' ' ' ' 9 k and n1.... , nk, let nk+, > nk be such that

Ilfn -fmllp < (31)k+1 for n, in nk+1, and let 9k+1 By 2.4.10, 9kconverges a.e. to a limit function f.

Given e > 0, choose N such that llf, -f 11P < e for n, in > N. Fix n > N-and let in -+ oo through values in the subsequence, that is, let in = nk, k -+ oo.Then

e -liminf Ilfn-fnkllp=liminff Ifn-fnkI°dpk-.co k- OD n

> f lim inf If,, - 9kl° dp by Fatou's lemmak- oo

= fn -flip.Thus f=f-fn+fn, we havefcL°. 12.4.12 Examples and Comments. Let S2 be the positive integers; take .F asall subsets of n, and let µ be counting measure. A real-valued function on Cmay be represented as a sequence of real numbers; we write f = (an, n = 1, 2,...). An integral on this space is really a sum [see Problem I(a)]:

f f dp an,n=1

where the series is interpreted as Z Ian+ - YR I a.- if this is not of the

form + oo - oo (if it is, the integral does not exist). Thus the following casesoccur:

an- < oo. The series diverges to oo and the(1) n= co, _R=integral is oo.

(2) E , an+ < oo, YR 1 an = co. The series diverges to - Co and theintegral is - oo.

(3) Y', an+ < oo, an- < oo. The series is absolutely convergentand the integral equals the sum of the series.

(4) Y , an+ = Co, Y," an- = oo. The series is not absolutely conver-gent; it may or may not converge conditionally. Whether it does or not, theintegral does not exist. Thus when summation is considered from the point ofview of Lebesgue integration theory, series that converge conditionally butnot absolutely are ignored.

If p is changed so that p{n} is a nonnegative numberpn, not necessarily 1 asin the case of counting measure, the same analysis shows that

ao

f dp = E Pnan,a n=1

where the series is interpreted as In I Pn an+- -n 1 Pn an

If f = {an, n = 1, 2, ...} is a sequence of complex numbers and p iscounting measure,

f f dp = an = > Re an + i Im an;n n=1 n=1 n=1

the integral is defined provided 1a.1 < Co.Now let 0 be an arbitrary set, and take 9 as all subsets of fl and p as

counting measure. If f = (f (a), a e Q) is a nonnegative real-valued functionon 0, then [Problem 1(b)]

2.4 Lp SPACES 87

fa dµ = > f(x), (1)n a

where the series is defined as sup{yaeFf(a): Fc S2, Ffinite}. If f(a)> 0 foruncountably many a, then for some b > 0 we have f (a) >- S for infinitelymany x, so that Y_a f (a) = 00-

If the nonnegativity hypothesis is dropped, we apply the above results tof + and f - to again obtain Eq. (1), where the series is interpreted as Lf +(a) -Yaf -(a). If f is complex-valued, Eq. (1) still applies, with the seriesinterpreted as Ya Re f (a) + i Ya Im f (a). The integral is defined providedY-a I f (a) I < 00.

The space Lp(S2, .F, p) will be denoted by 1"(f ); it consists of all complex-valued functions (f (a), a e f) such that f (a) = 0 for all but countably many a,and

Ilf IIP =. If(a)II, < co

If 0 is the set of positive integers, the space P(Q) will be denoted simply by1"; it consists of all sequences f= {a"} of complex numbers such that

IIf II; _ Ianlp < co.n=1

It will be useful to state the Holder and Minkowski inequalities forsums. If f e 1P(O) and g e Iq(f2), where I <p < oo, I < q < oo, (11p) + (1/q) _1, then fg e 1'(S2) and

1/q

If(a)g(a) I <- (z If(a) IP)11p(Y

) Iq)

I g(a

If f, g e P(O), 1 < p < oo, then f + g e lp(Q) and

If(a) + g(a) Ip) 1 /P <

(Y_ I f(a) I

P) 1 /P +I g(a) I p)

"IP.

As in 2.4.5, we obtain the Cauchy-Schwarz inequality for sums from theHolder inequality. If f, g e l2(S2), then fg a 11(52) and

Y_ f(a) g(a) s (E I f(a) 12) "' (Y I g(a) 12)

If in the above discussion we replace S by {1, 2, ..., n}, all convergencedifficulties are eliminated, and all the spaces Ip(S2) coincide with C".

If 0 < p < 1, II II p is not a seminorm on Lp(Q, F, µ). For let A and B bedisjoint sets with a = u(A) and b = p(B) assumed finite and positive. If f = IA,g ='B, then

1/p 1/P

Ilf+gllp=(f If+gl"du)=(fn(IA+10)dic)

=(a+b)1/P,

Ill lip = al/p, Ilgllp = bl/P

But (a+b)1/P>a'1"+b1/Pif a, b > 0, 0<p<1, since (a+x)'-a'-x' isstrictly increasing for r > 1, and has the value 0 when x = 0. Thus the triangleinequality fails. We can, however, describe convergence in LP, 0 0, 0<p<l,which is proved by considering (a + x)P - aP - xP. It follows that

f If +gIPdµ<_ fn IfvPdp+ fn IgIPdP, f,geL", (2)

and therefore d(f, g) = In I f - g I P dp defines a pseudometric on M. In factthe pseudometric is complete (every Cauchy sequence converges); for Eq. (2)implies that if f, g e L", then f + ge LP, so that the proof of 2.4.11 goes through.

If fl is an interval of reals, is the class of Borel sets of Q, and p isLebesgue measure, the space L"(0, .F, µ) will be denoted by Lp(c2). Thus, forexample, L°[a, b] is the set of all complex-valued Borel measurable functionsf on [a, b] such that

6lifil = f lf(x)IP dx < co.

If f is a complex valued Bore] measurable function on (S2, p) andfl, f2, ... e LP(Q, F, p), we say that the sequence f f,) converges to f in LP ifII f, - f II p - 0, that is, if in I f" - f I P dµ -- 0 as n -+ oD. We use the notationf" i 4 f. In Section 2.5, we shall compare various types of convergence ofsequences of measurable functions. We show now that any f e LP is an LP-limitof simple functions.

2.4.13 Theorem. Let f e LP, 0 0, there is a simple functiong e LP such that Ilf - gilp < a; g can be chosen to be finite-valued and tosatisfy I g 1 5 I f 1. Thus the finite-valued simple functions are dense in R.

PROOF. This follows from 1.5.5(b) and 1.6.10. 1

If we specialize to functions on R" and Lebesgue-Stieltjes measures, wemay obtain another basic approximation theorem.

2.4.14 Theorem. Let f e LP(S2, F, p), 0 0, there is a continuousfunction g e LP(S2, F, µ) such that II f - gll p < s; furthermore, g can be chosenso that sup I g 1 S sup If 1. Thus the continuous functions are dense in Y.

2.4 LP SPACES 89

PROOF. By 2.4.13, it suffices to show that an indicator IA in L" can be approxi-mated in the LP sense by a continuous function with absolute value at most 1.Now IA c L" means that u(A) < oo; hence by Problem 12, Section 1.4, there isa closed set C e A and an open set V A such that p(V - C) < e"2-P.Let g be a continuous map of f) into [0, 11 with g = 1 on C and g = 0 on V`(g exists by Urysohn's lemma). Then

fn IIA -gIPdu= f{IA#g} IIA - gl"du.

But {IA $ g} e V- C and IIA - g I _< 2; hence

IIIA-gIIpS2Pg(V- C) <eP.

Since g=g-IA+IA, we have geL".

Theorem 2.4.14 shows that the continuous functions in V do not form aclosed subset, for if they did, every function in LP would be continuous.Equivalently, the continuous functions in LP are not complete, in other words,there are Cauchy sequences of continuous functions in L" that do not con-verge in LP to a continuous limit. Explicit examples may be given withoutmaking use of 2.4.14; see Problem 2.

2.4.15 The Space L. If we wish to define LP spaces for p = co, we mustproceed differently. We define the essential supremum of the real-valuedBore] measurable function g on (S2, .F, u) as

esssup g=inf{cER:p{(O:g(cu)>c}=0)

that is, the smallest number c such that g < c a.e. [p].If f is a complex-valued Borel measurable function on (0, F, p), we define

IIfII =esssup If1.The space L°°(Q, F, u) is the collection of all f such that (If IIA < oo. Thusf e L°° iff f is essentially bounded, that is, bounded outside a set of measure 0.

Now I f + g j < I f I + g -< If II. + (I g l(,° a.e.; hence

Ilf+A. <- 11f 11. + Ilgll.

In particular, f, g e L°° implies f + g e U. The other properties of a seminormare easily checked. Thus Loo is a vector space over the complex field, II 11 ,0 is aseminorm on L°', and becomes a norm if we pass to equivalence classes asbefore.

If f, f,, f2 , ... a L°° and 1,f, -f II,° -> 0, we write

f II,° -+ 0 if there is a set A e F with p(A) = 0 such that f -+funiformly on A`.

For, assume 11f. - f II - 0. Given a positive integer m, Il f" -f II < 1/mfor sufficiently large n; hence I f"(w) - f(w)I < 1/m for almost every ('0, sayfor w 0 Am, where u(Am) = 0. If A = Um=, Am, then p(A) = 0 and f" -+funiformly on A`. Conversely, assume p(A) = 0 and f" -+f uniformly on A`.Given e > 0, 1 f,, - f I < s on A` for sufficiently large n, so that I f" - f 1 5 Ea.e. Thus ll f" - f Ii - E for large enough n, and the result follows.

An identical argument shows that f f.) is a Cauchy sequence in L°(llf"-fmll.-+0asn,m->oo)iffthereisasetAe.Fwithp(A)=0andf"-fm - 0 uniformly on A`.

It is immediate that the Holder inequality still holds when p = 1, q = oo,and we have shown above that the Minkowski inequality holds when p = oo.

To show that L°° is complete, let {f.) be a Cauchy sequence in L°°, and letA be a set of measure 0 such that f"(w) - fm(w) - 0 uniformly for w e A`. Butthen "(w) converges to a limit f (w) for each co E A`, and the convergence isuniform on A`. If we define f (co) = 0 for w E A, we have f e L°° and

Theorem 2.4.13 holds also when p = oo. For if f is a function in L°°, thestandard approximating sequence {f,,) of simple functions (see 1.5.5) con-verges to f uniformly, outside a set of measure 0. However, Theorem 2.4.14fails when p = oo (see Problem 12).

If C is an arbitrary set, F consists of all subsets of 11, and p is countingmeasure, then L°°(fl, ,F, p) is the set of all bounded complex-valued functionsf = (f (a), a e fl), denoted by The essential supremum is simply thesupremum; in other words, Ilf II = sup( I f (a) I : a e fl). If f) is the set ofpositive integers, l°°(Q) is the space of bounded sequences of complex numbers,denoted simply by 1°°.

Problems

1. (a) If f = {a,,, n = 1, 2, ...}, the a are real or complex numbers, and pis counting measure on subsets of the positive integers, show thatla f dp = Y', a", where the sum is interpreted as in 2.4.12.

(b) I f f = (f (a), a e n) is a real- or complex-valued function on thearbitrary set 0, and u is counting measure on subsets of 0, showthat lo f du = Ea f (a), where the sum is interpreted as in 2.4.12.

2. Give an example of functions f, fl, f2, ... from R to [0, 1] such that(a) each f,, is continuous on R,(b) ,,(x) converges to f (x) for all x, f I f (x) I p dx - 0 for

every p e (0, oo), and(c) f is discontinuous at some point of R.

3. For each n = 1, 2, ..., let {a(,"), a2"), ..} be a sequence of complexnumbers.

2.4 L° SPACES 91

(a) If the a(n) are real and 0 5 ak") < ak"+1) for all k and n, show that

00 00

lim E a(n) lim ak").n-g00 k=1 k=1n-'oo

Show that the same conclusion holds if the ak") are complex andI ak"> I < bk for all k and n, where _k= 1 b, < oo.

(b) If the ak") are real and nonnegative, show that

ak")= Y, Y_ akn''k=1 n=1 n=1 k=1

(c) If the a(k") are complex and Y- 1 Yk 1 I ak" I < oo, show that00['n

1 Yk i a(kn) and Yk 1 E 1ak ) both converge to the same

finite number.4. Show that there is equality in the Holder inequality if I f I ° and I g I °

are linearly dependent, that is, iff A I f I ° = B I g I a a.e. for some constantsA and B, not both 0.

5. If f is a complex-valued p-integrable function, show that I f of d.I =fn I f I dp iff arg f is a.e. constant on {w: f (w) # 0}.

6. Show that equality holds in the Cauchy-Schwarz inequality iff f and gare linearly dependent.

7. (a) If 1 < p < oo, show that equality holds in the Minkowski inequalityif Af = Bg a.e. for some nonnegative constants A and B, not both0.

(b) What are the conditions for equality if p = I ?8. If 1 < r < s < oo, and f e LS(S2, .F, u), u finite, show that Ilf 11, < kJI f II,

for some finite positive constant k. Thus LS e L" and LS convergenceimplies L' convergence. (We may take k = 1 if u is a probability measure.)Note that finiteness of u is essential here; if p is Lebesgue measure onR(R) and f (x) = I /x for x 1, f (x) = 0 for x < 1, then f e LZ butf0L'.

9. If It is finite, show that Ilf IIp -i Ilf III as p -* oo. Give an example toshow that this fails if 14(S)) = oo.

10. (Radon-Nikodym theorem, complex case) If p is a or-finite (nonnegative,real) measure, A a complex measure on (S2, and A < p, show thatthere is a complex-valued u-integrable function g such that 2(A) _JA g dp for all A e F. If h is another such function, g = h a.e.

Show also that the Lebesgue decomposition theorem holds if A..is a complex measure and u is a a-finite measure. (See Problem 6,Section 2.2, for properties of complex measures.)

11. (a) Let f be a complex-valued p-integrable function, where u is anonnegative real measure. If S is a closed set of complex numbers

and [l /µ(E)] f E f dµ e S for all measurable sets E such that µ(E) > 0,show that f (w) e S for almost every a). [If D is a closed disk withcenter at z and radius r, and D e S`, take E = f -'(D). Show thatI J E (f - z) dp I < rp(E), and conclude that µ(E) = 0.]

(b) If 2 is a complex measure, then A < I A I by definition of 121 ;hence by the Radon-Nikodym theorem, there is a I A I -integrablecomplex-valued function h such that A(E)= 5 E h d I A I for allE e F. Show that I h I= 1 a.e. [ I A I ]. [Let A, = {co: I h(w) I< r},0 < r < 1, and use the definition of 12I to show that I h I >_ 1 a.e.Use part (a) to show I h I < I a.e.]

(c) Let p be a nonnegative real measure, g a complex-valued p-inte-grable function, and 2(E) = fE g dµ, E e F. If h = d.1/dl A I as inpart (b), show that 12 I (E) = f E lig dµ. (Intuitively, Jig dp =h d l - hh dI l I = I h 12 dill = dI l I . Formally, show that f a fhdI ).I = f a fg dp if f is a bounded, complex-valued, Borel measur-able function, and set f = RE.)

(d) Under the hypothesis of (c), show that

IAI(E)=J IgI dµ for all Ee..

12. Give an example of a bounded real-valued function f on R such thatthere is no sequence of continuous functions f such that if -f IL -> 0.Thus the continuous functions are not dense in L°° (R).

2.5 Convergence of Sequences of Measurable Functions

In the previous section we introduced the notion of LP convergence; weare also familiar with convergence almost everywhere. We now consider othertypes of convergence and make comparisons.

Let f, f f2, ... be complex-valued Bore] measurable functions on(S2, .F, µ). We say that f -> f in measure (or in µ-measure if we wish toemphasize the dependence on p) if for every e > 0, p(w: I f,(w) - f (w) I >- e}

- 0 as n -+ oo. (Notation: When µ is a probability measure, the con-vergence is called convergence in probability.

The first result shows that LP convergence is stronger than convergence inmeasure.

2.5.1 Theorem. If f, f1, f2, ... e LP (0 < p < oo), then f f implies f f.

PROOF. Apply Chebyshev's inequality (2.4.9) to If, - f I. 1

2.5 CONVERGENCE OF SEQUENCES OF MEASURABLE FUNCTIONS 93

The same argument shows that if is a Cauchy sequence in L°, then{ is Cauchy in measure, that is, given e > 0, p{w: I fm(m) I >_ E) - 0as

If f, f,, f2 .... are complex-valued Borel measurable functions on (Q, .9, p),we say that f - f almost uniformly if, given e > 0, there is a set A e F suchthat p(A) < E and f -> f uniformly on A`.

Almost uniform convergence is stronger than both a.e. convergence andconvergence in measure, as we now prove.

2.5.2 Theorem. If f, - f almost uniformly, then f -f in measure and almosteverywhere.

PROOF. If s > 0, let f, -+f uniformly on Ac, with p(A) < e. If 6 > 0, theneventually If, -f I< bon A`, so fl f, f -f I > S} c A. Thus p{ I f - f > S} <p(A) < E, proving convergence in measure.

To prove almost everywhere convergence, choose, for each positive integerk, a set Ak with p(Ak) < 1/k and f --+f uniformly on Ak`. If B = Uk 1

Akc,

then f -+f on B and p(B`) = p(f lk 1 A,):!:-:: p(Ak) - 0 as k -+ oo. Thusp(Bc) = 0 and the result follows. I

The converse to 2.5.2 does not hold in general, as we shall see in 2.5.6(c),but we do have the following result.

2.5.3 Theorem. If {f} is convergent in measure, there is a subsequence con-verging almost uniformly (in particular, a.e. and in measure) to the same limitfunction.

PROOF. First note that { is Cauchy in measure, because if I f - fm I E,

then either If, - f I e/2 or I f - fm I >- e/2. Thus

Pflfn-fmI >E}<p{If.-fI ?2}+µ{If-frI as n,m -> oo.

Now for each positive integer k, choose a positive integer Nk such thatNk+, > Nk for all k and

p{w: If,((o) -f.(w)I > 2-k} <2_k for n, m > Nk.

Pick integers nk Z Nk, k =/1, 2, ...; then if gk =

p{w. I9k(w) - 9k+I/lw)I : 2-k} < 2-k.

Let Ak = { I9k - 9k+1 I >- 2-k}, A = lim sup* Ak. Then p(A) = 0 by 2.2.4; butif co 0 A, then co e Ak for only finitely many k; hence I9k(w) - 9k+, (w) I < 2-k

for large k, and it follows that gk(w) converges to a limit g(w). Since µ(A) = 0we have gk -+ g a.e.

If B,. = Uk At, then µ(Br) < k r µ(A k) < s for large r. If co 0 Br, thengk(w) - gk1(w)I < 2-k, k = r, r + 1, r + 2, .... By the Weierstrass M-test,

9k -+ g uniformly on Br , which proves almost uniform convergence.Now by hypothesis, we have f "+ f for some f, hence --+f. But by 2.5.2,

f = g a.e. (see Problem 1). Thus f, converges almostuniformly to f, completing the proof. I

There is a partial converse to 2.5.2, but before discussing this it will beconvenient to look at a condition equivalent to a.e. convergence:

2.5.4 Lemma. If µ is finite, then f -> f a.e. if for every S > 0,

U {w: Ifk(w) -f(w)I >- S}) 0 as n - oo.k=n

PROOF. Let B, = {w : Ifn(w) - f (w) I >- S}, B,5 = lim sup Bna = f, 1 Uk nBka Now Uk n Bka I Ba ; hence µ(U n Bka) -+ µ(Ba) as n - oo by 1.2.7(b).Now

{w : fn(w) U Baa>o

00

U B1/mM=1

Therefore,

since Ba, c Bat for S1 > S2.

f. -f a.e. if µ(Ba)=0 for all 6>0

if µ UBka -,0G='On

for all S>0. 1

2.5.5 Egoroff's Theorem. If µ is finite and f -+f a.e., then f -+f almosiuniformly. Hence by 2.5.2, if µ is finite, then almost everywhere convergenceimplies convergence in measure.

PROOF. It follows from 2.5.4 that given c > 0 and a positive integer j, forsufficiently large n = n(j), the set Aj = Uk n(j) {Ifk -f I ? 1/f) has measureless than s/2j. If A = U; 1

A, then µ(A) < YT I µ(A j) < s. Also, if S > 0and j is chosen so that 1/j < S, we have, for any k >- n(j) and w e A` (hence

Ico 0 Aj), I fk(w) - f (co) I < 1 /j < S. Thus f -+ f uniformly on K.

2.5 CONVERGENCE OF SEQUENCES OF MEASURABLE FUNCTIONS 95

We now give some examples to illustrate the relations between the varioustypes of convergence. In all cases, we assume that F is the class of Borel setsand p is Lebesgue measure.

2.5.6 Examples. (a) Let .0 = [0, 1] and define

f(x) = Le if 0<-x<n,elsewhere.

Then f" -+ 0 a.e., hence in measure by 2.5.5. But for each p E (0, 00],f,, fails toconverge in R. For if p < oo,

IIf"IID= f 1If"(x)IIdx=o n

and

(b) Let SZ = R, and define

1- if 0<x<en,f(x) = n0 elsewhere.

Then f" -+ 0 uniformly on R, so that f.----+ 0. It follows quickly that f" -+ 0a.e. and in measure. But for each p e (0, oo), f" fails to converge in L°,since Ilf"IIp = npe" -+ Co.

(c) Let fl = [0, oo) and define

f"(x) =1 if n<x<n+1,

n

0 elsewhere.

Then f" -o. 0 a.e. and in measure (as well as in L°, 0 < p < co), but does notconverge almost uniformly. For, if f" - 0 uniformly on A and µ(A`) < e, theneventually f" < I on A; hence if A" = [n, n + (1/n)] we have A n UkZ" Ak = 0for sufficiently large n. Therefore, A` Ukz" Ak, and consequentlyµ(A`) z k " p(Ak) = oo, a contradiction.

(d) Let I = [0, 1], and define

fnm(x) =

m - I m1 if

n<x< m=1,...,n, n=1,2,...,

0 elsewhere.

Then Ilfnml!P =I In --+0, so for each p e (0, oo), the sequence fL1,f21,f22,f31,f32 , f33, ... converges to 0 in L° (hence converges in measure by 2.5.1). Butthe sequence does not converge a.e., hence by 2.5.2, does not converge almostuniformly. To see this, observe that for any x 0, the sequence {fnm(x)} hasinfinitely many zeros and infinitely many ones. Thus the set on which fnmconverges has measure 0. Also, fnm does not converge in L', for if fnm f,then fnm e-, f, hence f = 0 a.e. (see Problem 1). But Ilfnmll, = 1, a contradiction.

Problems

1. If fn converges to both f and gin measure, show that f =g a.e.2. Show that a sequence is Cauchy in measure if it is convergent in measure.3. (a) If p is finite, show that U convergence implies L° convergence for

all p e (0, oo).(b) Show that any real-valued function in L°[a, b], - oo < a < b < oo,

0 < p < oo, can be approximated in L° by a polynomial, in fact bya polynomial with rational coefficients.

4. If it is finite, show that { fn} is Cauchy a.e. (for almost every co, { fn(w)} isa Cauchy sequence) if for every S > 0,

U {co: I ff{w) -fk(w)I >- S} 0 as n -, oo.J,k=n

5. (Extension of the dominated convergence theorem) If I fn I < g for alln = 1, 2, ..., where g is Ep-integrable, and f,2->f, show that f is it-integrableand faf,,d;t-'fnfdp.

2.6 Product Measures and Fubini's Theorem

Lebesgue measure on Rn is in a sense the product of n copies of one-dimensional Lebesgue measure, since the volume of an n-dimensional rect-angular box is the product of the lengths of the sides. In this section wedevelop this idea in a general setting. We shall be interested in two construc-tions. First, suppose that (Q3 , F j, A) is a measure space for j = 1, 2, ... , n.We wish to construct a measure on subsets of .0, x C12 x x Qn such thatthe measure of the "rectangle" A, x A2 x x A. [with each A3 E .f';] isp,(A,)p2(A2) µn(A,). The second construction involves compound experi-ments in probability. Suppose that two observations are made, with the firstobservation resulting in a point co, a Q,, the second in a point w2 a C12. Theprobability that the first observation falls into the set A is, say, µ,(A). Further-more, if the first observation is co, the probability that the second observation

2.6 PRODUCT MEASURES AND FUBINI'S THEOREM 97

falls into B is, say, p(w B), where p(w,, ) is a probability measure definedon F2 for each co, a 52,. The probability that the first observation will belongto A and the second will belong to B should be given by

p(A x B) =JA p(wi, B)pi(dwl),

and we would like to construct a probability measure on subsets of f21 x E12such that p(A x B) is given by this formula for each A e.F, and B e . 2 .

[Intuitively, the probability that the first observation will fall near w, isp,(dw,); given that the first observation is w,, the second observation willfall in B with probability p(w B). Thus p(a),, B)p,(dco,) represents the prob-ability of one possible favorable outcome of the experiment. The totalprobability is found by adding the probabilities of favorable outcomes, inother words, by integrating over A. Reasoning of this type may not appearnatural at this point, since we have not yet talked in detail about probabilitytheory. However, it may serve to indicate the motivation behind the theoremsof this section.]

2.6.1 Definitions. Let F be a a-field of subsets of 52;, j = 1, 2, ... , n, andlet fl = S2, X 02 x x fl,,. A measurable rectangle in 0 is a set A =Al x A2 x x A. where A; e F; for each j = 1, 2, ..., n. The smallesta-field containing the measurable rectangles is called the product a-field,written F, X F2 x x If all .F; coincide with a fixed a-field .F, theproduct a-field is denoted by 39'". Note that in spite of the notation, .F, X F2x x F" is not the Cartesian product of the #j; the Cartesian product is

the set of measurable rectangles, while the product a-field is the minimala-field over the measurable rectangles. Note also that the collection of finitedisjoint unions of measurable rectangles forms a field (see Problem 1).

The next theorem is stated in such a way that both constructions describedabove become special cases.

2.6.2 Product Measure Theorem. Let (S2 'I, p,) be a measure space, withp, a-finite on F and let Q2 be a set with a-field F 2. Assume that for eachco, a 0, we are given a measure p(w,, ) on F2. Assume that p(w,, B),besides being a measure in B for each fixed w, a f2,, is Borel measurable inco, for each fixed B e .F2.. Assume that the p(w, ) are uniformly a-finite;that is, S 2 can be written as UM, B", where for some positive (finite) con-stants k" we have p(w B") < k" for all to, e f2,. [The case in which thep(w -) are uniformly bounded, that is, p(w,, f22) < k < oo for all w is ofcourse included.]


Then there is a unique measure p on F = F, x F2 such that

µ(A x B) = fA µ(w1, B)µ1 (dwl) for all A e .F,, B E F2,

namely,

p(F) = f µ(w1, F(wl))pl (dwl), F e .°F,n,

where F(wl) denotes the section of Fat w, :

F(wl) = {w2 a 522: (w,, (02) a F}.

Furthermore, p is or-finite on F; if µl and all the p(w,, ) are probabilitymeasures, so is µ.

PROOF. First assume that the µ(w,, ) are finite.

(1) If C e .F, then C((01) E 2;F2 for each w, c52,.To prove this, let 8 = {C e F: C(w,) E .F2}. Then W is a o-field since

(Jc)(W1) = U Cn(wl), Cc(wl) = (C(wl))c.n1 n=1

If A e -w,, B E then (A x B)(w,) = B if co, e A and 0 if w, 0 A. Thus fcontains all measurable rectangles; hence W = F.

(2) If C e .F, then u(w,, C(w,)) is Borel measurable in to,To prove this, let 9 be the class of sets in .F for which the conclusion of (2)

holds. If C = A x B, A e .°r1, B e 92, then

µ(w1, C(w,)) _u(w,, B) if co, a A,

µ(w1, 0) = 0 if w1 0 A.

Thus µ(w,, C(wt)) = µ(w B)IA(w,), and is Borel measurable by hypothesis.Therefore measurable rectangles belong to W. If C,, ..., C. are disjointmeasurable rectangles,

'U(wl, (Cic)1) = µ(w1, U C1(wl))

_ Y_ µ(wl, Ci(w,))i=1

is a finite sum of Borel measurable functions, and hence is Borel measurablein co, Thus ' contains the field of finite disjoint unions of measurable rect-

angles. But f is a monotone class, for if C. E', n = 1, 2, ..., and C. T C,then C(w,) hence p(w,, C,,(co,)) -- p(w,, C(w,)). Thus F(col, C(co,)),a limit of measurable functions, is measurable in w,. If C, . C, the sameconclusion holds since the p(w,, ) are finite. Thus W = .F.

(3) Define

p(F) = f p(w1, F(wl))pl(dwl), F e IFn

[the integral exists by (2)]. Then p is a measure on .F, and

p(A x B) = f p(w,, B)µ, (dw,) for all Ac-.F-,, B E F2.A

To prove this, let F,, F2, ... be disjoint sets in F. Then

p U F = f p wl, U Fn(wl))µ1(dwl)n1 Sl, n=1

00

Y_ p(w1, Fn((o1))p1(dw1)-ffl,n=1

_ f p(w1, Fn(w1))p1(dw1)n=1 11,

w

n=1

proving that p is a measure. Now

p(A x B) = f p(w1, (A x B)((o1))p1(dw1)

= f p(wl, B)IA(w1)p1(dw1)n,

= fA p(wl, B)p1(dw1)

[see (2))

as desired.Now assume the p(w1, ) uniformly a-finite. Let 522 = Un 1 B, where the

Bn are disjoint sets in Pre and p(w B.)< kn < oo for all co, E 521. If weset

B) =µ(w,, B n Be), B E .F2 ,

the -) are finite, and the above construction gives a measure onF such that

un'(A x B) = f B)u,(dwl), A e. ,, B E F 2A

= f u((ol, B n Bn)ul(dw1),A

namely,

un'(F) = f un'(w,, F(wl))p,(dw) u(w,, F(w,) n Bn)ul(dw,).

Let it = En , pn'; p has the desired properties.For the uniqueness proof, assume the p((o,, ) to be uniformly a-finite.

If A is a measure on F such that A(A x B) = JA p(w,, B)p1tdco,) for allA e B e F2, then A = p on the field F U of finite disjoint unions ofmeasurable rectangles. Now It is a-finite on .moo, for if S22 = Un , B. withB. C_ F2 and p(c) ,, Bn) < k < oo for all co, and fl, = Um=, Am, where theAm belong to .y, and p,(Am) < oo, then S2, X K22 = U.n-, (Am x and

u(Am x Bn) = f (w,, Bn)p1(dw1) < knu,(Am) < oo.Am

Thus A = it on Jr by the Caratheodory extension theorem.We have just seen that it is a-finite on .moo, hence on F. If it, and all the

p(w,, ) are probability measures, it is immediate that it is also.

2.6.3 Corollary : Classical Product Measure Theorem. Let (fl,, F; , p) be ameasure space for j = 1, 2, with pj a-finite on Ft,. If S2 = S2, X S22, .F _.F, X F21 the set function given by

u(F) = f u2(F(w,)) du,(w,) = f u1(F(w2)) du2(w2)n, nZ

is the unique measure on F such that p(A x B) = p,(A)p2(B) for all A E .FB e .F2. Furthermore, p is a-finite on F, and is a probability measure if it,and p2 are. The measure it is called the product of it, and p2, writtenu=u, x P2 -

PROOF. In 2.6.2, take p(w1, ) = p2 for all co, The second formula for p(F)is obtained by interchanging it, and u2 . I

As a special case, Let 0, = f12 = R, .F, = F 2 = M(R), it, = p2 = Lebes-gue measure. Then F, x F 2 = R(R2) (Problem 2), and p = p, x P2 agreeswith Lebesgue measure on intervals (a, b) = (a,, b,] x (a2, b2]. By the Cara-thbodory extension theorem, It is Lebesgue measure on R(R2), so we haveanother method of constructing two-dimensional Lebesgue measure. We shallgeneralize to n dimensions later in the section.


The integration theory we have developed thus far includes the notion ofa multiple integral on R"; this is simply an integral with respect to n-dimen-sional Lebesgue measure. However, in calculus, integrals of this type areevaluated by computing iterated integrals. The general theorem which justifiesthis process is Fubini's theorem, which is a direct consequence of the productmeasure theorem.

2.6.4 Fubini's Theorem. Assume the hypothesis of the product measuretheorem 2.6.2. Let f: (S2, .F) -+ (R, R(R)).

(a) If f is nonnegative, then Jn2 f(w,, w2)p(w1, dw2) exists and defines aBorel measurable function of wl. Also

fa dp = f ` f f(a) I, w2)µ(w1, dwi))p1(dwi).n n, n2

(b) If So f dp exists (respectively, is finite), then in, f (w1, (02)µ(wl, d(02)exists (respectively, is finite) for µl-almost every w1, and defines a Borelmeasurable function of (o, if it is taken as 0 (or as any Borel measurablefunction of w,) on the exceptional set. Also,

(wl, dw2))il(dwl)fnsz

f dp = f, (L2 f[The notation In, f (w1, w2)p(w1, dw2) indicates that for a fixed w1, the func-tion given by g(w2) = f (wl, (02) is to be integrated with respect to the measureµ(w1,

PROOF. (a) First note that:

(1) For each fixed wl we have f (w ): (02 , .F2) -- (R, E? (R)). In otherwords if f is jointly measurable, that is, measurable relative to the producta-field .7, x 2 , it is measurable in each variable separately. For if B e R(R),{02 : f (w,, w2) a B) = {w2: (W11 (02) of

_1(B)}=f -'(B)(wl) E 2 by part (1)

of the proof of 2.6.2. Thus SO, A(011 w2)p(wl, dw2) exists.Now let IF, FE .F, be an indicator. Then

412 IF(wl, W2)u(wl, d(02) = f IF((,,)((02)µ(wl, dw2) = p(wl, F(wl)),

and this is Borel measurable in w, by part (2) of the proof of 2.6.2. Also

f IF du = p(F) = f p(wl, F(w1))p1(dwl) by 2.6.2n n,

=f f 'F(w1, w2)µ(wl, dw2)µl(dwl)Jn2

Now if f = Yj=, xj!FJ, the Fj disjoint sets in .f, is a nonnegative simplefunction, then

f f (O) 11 (02)p(w1, dw2) _ Y xjp(w1, Fj(w1)),n2 j=1

Borel measurable in w1,and

f f d,u xjf IFj du = xj f f IFf(w1, w2)p(w1, dw2)p1(dw1)f2 j= n j= 1 f2, n2

by what we have proved for indicators

fTl,

f (w1, w2)pwl, dw2)p1(dw1).2

Finally, if f: (0, f) - (R, R(R)), f >_ 0, let 0 <fn T f, f simple. Then

fn f(wl, (02)p(w1, d(02) = "M f- fn(wl, w2)µ(w1, d02),-+n oo 2

which is Borel measurable in wl, and

f f dal = lim f fn dp = lim fn(w1, (02),U(0)1, dw2)N1(dwl)f2 n 00 n n-00 fn1 fn2

by what we have proved for simple functions

= f f f(wl, (02)u(w,, d(02)pl(dwl)fl, n2

using the monotone convergence theorem twice.

This proves (a).

(b) Suppose that fn f - dµ < oo. By (a),

Jf f (wl, w2)p(w1, dw2)µl(dwl) = f dµ < oofn' n2

fn

so that fn2 f (w,, w2)p(w1i dw2) is u,-integrable; hence finite a.e. [µ,]; thus:

(2) For µ,-almost every co, we may write:

Q f(wl, w2)u(w1, dw2) = f f +(wl, w2)µ(w1, dw2)f12 n

- f F(0)1, w2)µ(wl, d(02)-02

If Jn f dµ is finite, both integrals on the right side of (2) are finite a.e. [u,]. Inany event, we may define all integrals in (2) to be 0 (or any other Bore]measurable function of co,) on the exceptional set, and (2) will then be valid

for all w and will define a Bore] measurable function of co,. If we integrate(2) with respect to p we obtain, by (a) and the additivity theorem forintegrals,

J f(wj, w2)u(wl, dw2)ui(dwi) = f f + dk - f f du = f j dun, n2 s: n n

2.6.5 Corollary. If f: (ft, .f) -+ (R, PI(R)) and the iterated integralSO If n2 I P CO 1, (02) 1 u(w, , d(02) P0(0 ,) < then In f du is finite, and thusFubini's theorem applies.

PROOF. By 2.6.4(a), If, If I dg < oo, and thus the hypothesis of 2.6.4(b) issatisfied. I

As a special case, we obtain the following classical result.

2.6.6 Classical Fubini Theorem. Let it = S2, x 02, .F = .°F, x F, u =p, x p.2, where uj is a a-finite measure on .rr;, f = 1, 2. If f is a Borelmeasurable function on (Cl, F) such that fnf du exists, then

f f du = f fn f due dptn n,

= f f f du, dµ2 by symmetry.n2 n,

PROOF. Apply 2.6.4 with u(wl, -) = 112 for all wt. I

Note that by 2.6.5, if 10, In, If I dµ2 du, < oo (or In. In. If I du t due < 00),the iterated integration formula 2.6.6 holds.

In 2.6.4(b), if we wish to define fn2 f((o w2)u(w,, dw2) in a completelyarbitrary fashion on the exceptional set where the integral does not exist,and still produce a Bore] measurable function of to,, we should assume that(Q F,, u,) is a complete measure space. The situation is as follows. We haveh: (0 .F,) -- (R, R(R)), where h is the above integral, taken as 0 on theexceptional set A. We set g(w,) = h((D,), w, # A; g(w,) = q(w,) arbitrary,w, e A (q not necessarily Borel measurable). If B is a Borel subset of R, then

{w, : g((o,) e B} = [A` n {w,: h(w,) e B)] t [A n {w,: q(co,) e B}].

The first set of the union belongs to .F,, and the second is a subset of A,with u,(A) = 0, and hence belongs to .F, by completeness. Thus g is Bore!measurable.


In the classical Fubini theorem, if we want to define jfl f (w1, w2) dµ2((02)and in, P(1)1 I w2) dµ1((o,) in a completely arbitrary fashion on the exceptionalsets, we should assume completeness of both spaces ((21, F, p,) andAl 6"2' µ2)

The product measure theorem and Fubini's theorem may be extended ton factors, as follows:

2.6.7 Theorem. Let .F; be a a-field of subsets of 52,, j = 1, ... , n. Let µ, bea a-finite measure on F1, and, for each (co, ..., w;) c-01 x x52; , letµ(w,, ... , cop B), B e Fjt1i be a measure on .F;+t (j = 1, 2, .. ., n - 1).Assume the µ(w,, ... , (o;, ) to be uniformly a-finite, and assume thatµ(w ... , wJ, C) is measurable: (521 x x S2; , F, x x .FJ) - (R, R(R))for each fixed C e .F;+,.

Let 52 = )1 x x Stn , .F _ .F, x x .F..

(a) There is a unique measure µ on .°F such that for each measurablerectangle A, x x An e .I",

µ(A 1 x x An)

= f µt(dwl) f µ(w1, dw2)Al Az

...f-1 *0 ... , wn-2 , dwn- I) f µ(w1, ... , wn- 1, do),).

A

[Note that the last factor on the right is µ(w1, ..., co,,-,An).] The measure µis a-finite on .F, and is a probability measure if µ1 and all the µ(w1, ..., w;, )are probability measures.

(b) Let f: (52, .`F) -* (R, R(R)). If f > 0, then

f f dµ = fStµl(dwl) f µ(w1, d(02) ... lfl(wt, ... , on-2 , dwn- 1)n in, nz n

w1, ... , wn)µ(wt, ... , wn- t, dwn), (1)fn .

where, after the integration with respect to µ(w1, ... , w;, ) is performed(j = n - 1, n - 2, ... , 1), the result is a Borel measurable function of(w , w;).

If 1, f dµ exists (respectively, is finite), then Eq. (1) holds in the sense thatfor each j = n - 1, n - 2, ... , 1, the integral with respect to µ((o 1, ... , wj, )exists (respectively, is finite) except for ((o ... , wi) in a set of 0,where Aj is the measure determined [see (a)) by µ, and the measuresu(w1, ), ..., µ(w1, ..., ('0j_ I, ). If the integral is defined on the exceptional

set as 0 [or any Borel measurable function on the space (S2, x ... x f1;,Jr, x x ,Fj)j, it becomes Borel measurable in (wl, ... co).

PROOF. By 2.6.2 and 2.6.4, the result holds for n = 2. Assuming that (a) and(b) hold up to n - I factors, we consider the n-dimensional case. By theinduction hypothesis, there is a unique measure An-, on F, x 12 x . X

such that for all A, E.F,, ..., E

dw2))..-,(A, x ... x f µl(dwl) fA2A, A2

... J µ(w,, ... , (On-2 , dwn- 1)

and An_, is a-finite. By the n = 2 case, there is a unique measure p on(.F, x x x F (which equals F, x x Fn; see Problem 3) suchthat for each A x

x I p(wl, ..., (on-1, An) dAn-1(w,, ..., wn-1)A

IA(w1, ... wn-1, An)S2, x ... x S2 _ ,

dAn-1(w1, ... , wn-1). (2)

If A is a measurable rectangle A, x x then IA(wl, ...) (On-1) =IA,(w,) thus (2) becomes, with the aid of the inductionhypothesis on (b),

,u(A, x ... x An) = f µl(dwl)A,

µ(w ..., wn-1+ An)p(w1, ..., wn-2, dwn-1)

which proves the existence of the desired measure p on F. To show that µ isa-finite on .F o, the field of finite disjoint unions of measurable rectangles,and consequently p is unique, let f2, = U , AJr, j = 1, ..., n, whereµ(w ..., w;_1, Air) < k;r < co for all co,, ..., wj_1, j = 2, ..., n, andP,(A,r) = krr < co. Then

co

fl = U (A,,, x A2i2 x ... X Ani),

with

p(A,i, x A2,2 x ... x k,i,k2i2 ... kni < co.

This proves (a).


To prove (b), note that the measure p constructed in (a) is determined byAn-1 and the measures µ(W1, ... , on_1, ). Thus by the n = 2 case.

J fdp = fJf (W 1, ..., Wn)µ(W1, ..., Wn-1, dWn)

W,1)

where the inner integral is Borel measurable in (w1, ..., Wn-I), or becomes soafter adjustment on a set of fin-I -measure 0. The desired result now followsby the induction hypothesis.

2.6.8 Comments. (a) If we take f = IF in formula (1) of 2.6.7(b), we obtainan explicit formula for p(F), F e °F, namely,

p(F) = Jf1µ1(d01)J

µ(0I, d(02) .J

i.!(01, ... , on-2, dwn- 1), s12

f1IF(W 1, ... , 0n)µ(01, ... , on - 1, don)

(b) We obtain the classical product measure and Fubini theorems bytaking µ(W1, ..., cop ) = p;+1, J = 1, 2, ..., n - 1 (with pj+l a-finite). Weobtain a unique measure p on .F such that on measurable rectangles,

µ(A1 x ... X A") = 1z1(A1)µ2(A2) ... µ"(A")

If f: (S2, F) - (R, V(R)) and f - 0 or In f dµ exists, then

f f dp = $d,J

dµ2 ...J f dun

n a, nz nn

and by symmetry, the integration may be performed in any order. Themeasure p is called the product of p1. ..., p", written p = p, x x p,,.In particular, if each p; is Lebesgue measure on R(R), then p, x x pn isLebesgue measure on R(R"), just as in the discussion after 2.6.3.

Problems

1. Show that the collection of finite disjoint unions of measurable rectanglesin S21 x x Stn forms a field.

2. Show that -V(R") = R(R) x x 4(R) (n times).3. If.gf1, ... , F,, are arbitrary a-fields, show that

(F I x ... X ,F"-1) x -n = F1 x'2 X ... X F'n.

4. Let u be the product of the a-finite measures ut and 112 . If C E 'I X .F2,show that the following are equivalent:(a) u(C) = 0,(b) u2(C(w1)) = 0 for u,-almost all co, a fl,,(c) u1(C(w2)) =0 for 102-almost all w2 "12-

5. In Problem 4, let (Q', be the completion of (f2, .F, u), andassume It,, P2 complete. If B E .y', show that B(w1) a .F2 for p,-almostall co, e f2, [and B(w2) e .F, for u2-almost all w2 a (12). Give an examplein which B((o,) 0 F2 for some w1 E Q.

6. (a) Let f2, = f22 = the set of positive integers, .F, = .''2 = all subsets,u1 = 92 = counting measure, f (n, n) = n, f(n, n + 1) = -n, n = 1,2, ..., f(i,j)=0 ifj#i or i+1. Show that fn, fn2f due du,=0,f n2 f fl, f du, dµ2 = oo. (Fubini's theorem fails since the integral offwith respect to p, x P2 does not exist.)

(b) Let f2, = 112 = R, .F, _ 2 = R(R), u1 = Lebesgue measure,92 = counting measure. Let A = {(w1, (02): w, = (02} E X F' 2 .Show that

J J'A due edit1 = f u2(A(w1)) dui(w1) = oo,

S1, S12 11i

but

fn2 fn 'A du1 due = f u1(A(w2)) du2(w2) = 0., n2

[Fubini's theorem fails since u2 is not a-finite; the product measuretheorem fails also since in, u2(F(w,)) dp,(w,) and

fn2 p1(F(w2)) du2(w2) do not agree on F, x .F2.]

7.f Let n, = f22 = the first uncountable ordinal, .F, = .'2 = all subsets,f2 = f2, x'12 , .F = .;z, x F2 . Assume the continuum hypothesis,which identifies f2, and f22 with [0, 1].(a) If f is any function from f2, (or from a subset of f21) to [0, 1] and

G = {(x, y): x c- Q1, y = f (x)} is the graph of f, show that G e .F.(b) Let C, = {(x, y) a f2: y < x}, C2 = {(x, y) E f2: y > x). If B a Cl or

B - C2, show that B e .F. (The relation y < x refers to the orderingof y and x as ordinals, not as real numbers.)

(c) Show that ,F consists of all subsets of 0.8. Show that a measurable function of one variable is jointly measurable.

Specifically, if g: (Q .F,) -+ (f2', F') and we define f: 1, x 122 - f2' byf (w1, (02) = g(w,), then f is measurable relative to Pr, x .qf2 and .F',regardless of the nature of F2 .

t Rao, B. V., Bull. Amer. Math. Soc. 75, 614 (1969).


9. Give an example of a function f: [0. 1] x [0, 1] ->' [0, 1] such that(a) f (x, y) is Bore) measurable in y for each fixed x and Borel measurable

in x for each fixed y,(b) f is not jointly measurable, that is, f is not measurable relative to the

product a-field -4[0, 1] x 9[0, 1], and(c) f u (J f (x, y) dy) dx and f o (jo f (x, y) dx) dy exist but are unequal.

(One example is suggested by Problem 7.)

2.7 Measures on Infinite Product Spaces

The n-dimensional product measure theorem formalizes the notion of ann-stage random experiment, where the probability of an event associated withthe nth stage depends on the result of the first n - I trials. It will be convenientlater to have a single probability space which is adequate to handle n-stageexperiments for n arbitrarily large (not fixed in advance). Such a space can beconstructed if the product measure theorem can be extended to infinitelymany dimensions. Our first task is to construct the product of infinitely manya-fields.

2.7.1 Definitions. For each j = 1, 2, ..., let (Q;, F) be a measurable space.Let f = 1100, 12;, the set of all sequences (w,, (02 , ...) such that w; a 0j,j=1,2,.... If B"crlj_,Qj,we define

B"={wED:(w,,...,cw")eB"}.

The set B. is called the cylinder with base B"; the cylinder is said to bemeasurable if B" a fl F F. If B" = A, x x A,,, where A; c f1i for each i,B" is called a rectangle, a measurable rectangle if A i e .°F, for each i.

A cylinder with an n-dimensional base may always be regarded as havinga higher dimensional base. For example, if

B={wE(I:(w,,w2,w3)EB3),then

B = {w6f):(w1,w2,w3)EB3, (04Ei24)= {w E fl: (w1 . 0)2, w3 , (04) a B3 X f14).

It follows that the measurable cylinders form a field. It is also true that finitedisjoint unions of measurable rectangles form a field; the argument is thesame as in Problem 1 of Section 2.6.

The minimal a-field over the measurable cylinders is called the product ofthe a-fields .F;, written flj 1 F ; rl , F; is also the minimal a-field over

2.7 MEASURES ON INFINITE PRODUCT SPACES 109

the measurable rectangles (see Problem 1). If all .F, coincide with a fixeda-field F, then H, .`, is denoted by F', and if all S2, coincide with a fixedset S, f 1 S2, is denoted by Sr'.

The infinite-dimensional version of the product measure theorem will beused only for probability measures, and is therefore stated in that context.(In fact the construction to be described below runs into trouble for non-probability measures.)

2.7.2 Theorem. Let (S2, , .F), j = 1, 2, ..., be arbitrary measurable spaces;let i2 = f

1S2,' .F = fly 1 ., . Suppose that we are given an arbitrary prob-

ability measure P, on .F,, and for each j = 1, 2, ... and each (w...... co)e 92, x x S2, we are given a probability measure P(w,, ..., w, , ) on.°F,+,. Assume that P(w1, ..., w,, C) is measurable: Gl_1 52,, f_1-F,)-+(R, R(R)) for each fixed C e .v,+

If B" e j % F; , define

P"(B") = fn,PI(dwi)

Jn2P(wi, do),.) ... N(01' ... , (0"-2 , dui"-1)

J w")P(w1, ..., w"-1, dw").an

Note that P,, is a probability measure on f;=1 ,F, by 2.6.7 and 2.6.8(a).There is a unique probability measure P on .F such that for all n, P agrees

with P on n-dimensional cylinders, that is, P{w e S2: (w1, ..., w") e B") _P"(B") for all n=1,2,...and all B"efj=1,F;.

PROOF. Any measurable cylinder can be represented in the form B. ={w c -0 : w,;) a B"} fot some n and some B" a f;=1.F,; define P(B") =P"(B"). We must show that P is well-defined on measurable cylinders. For sup-pose that B can also be expressed as {co e Q: (w,, ..., (9 ) e C'j where Cm e

j=1 .F,; we must show that P"(B") = Pm(Cm). Say m < n; then (co,, ... , co,,,)E C' iff (w 1, ... , co,,) e B", hence B" = C' x S2,"+, x x K2,,. It follows fromthe definition of P,, that P(Cm). (The fact that the P((O ..., w, , )are probability measures is used here.)

Since P. is a measure on fl ., F,, it is immediate that P is finitely addi-tive on the field .Fo of measurable cylinders. If we can show that P is con-tinuous from above at the empty set, 1.2.8(b) implies that P is countablyadditive on .F0, and the Caratheodory extension theorem extends P to aprobability measure on fl. t, F,; by construction, P agrees with P,, onn-dimensional cylinders.

Let (B., n = n, , n2, ... } be a sequence of measurable cylinders decreasingto 0 (we may assume n, < n2 < , and in fact nothing is lost if we taken; = i for all i). Assume lim"-,, P(B") > 0. Then for each n > 1,

P(B") = fn 9n1)(w1)P1(dwl),,

where

w"-1, dwn)9(1)(w1) = f P(w1, dw2) ...fan

4nx an

Since B"+1 = B", it follows that Brt+1 = B" x hence

IB.,+,(w,, ..., wn+1) IB.,(w1, ..., w").

Therefore gn1 (co,) decreases as n increases (co, fixed); say g,(,1) (w1)-h,((o1).By the extended monotone convergence theorem (or the dominated conver-gence theorem), P(B") -+ In, h,(w,)P,(dw1). If lim"-,, P(B") > 0, then h1((O1')> 0 for some w,' a 52,. In fact co,' E B', for if not, IB"(w1', (02, ..., Co.) = 0for all n; hence g,(,1)(w,') = 0 for all n, and h,(w1') = 0, a contradiction.

Now for each n > 2,

9n"(wl') =J 9n2)(w2)P((9 ,', dw2),

lZ

where

9(w2) =J

P(w1', w2 , dw3)n2) L3

5 'B-(C4)1" w2, ..., w")P(w,...... wn-,, dww)an

As above, j h2(w2); hence

h2(w2)P(wt', dw2)9(1)(w,') -' L

Since gn1)(w1') -> h1(w1') > 0, we have h2(w2') > 0 for some (02' E 522, and asabove we have (w1', w2') e B2.

The process may be repeated inductively to obtain points co,', w2', ..._such that for each n, (w,', ..., co.') e B. But then (w1', w2', ...) a n"'=1 B.

0, a contradiction. This proves the existence of the desired probabilitymeasure P. If Q is another such probability measure, then P = Q on measur-able cylinders, hence P = Q on .F by the uniqueness part of the Caratheodoryextension theorem. I

2.7 MEASURES ON INFINITE PRODUCT SPACES ill

The classical product measure theorem extends as follows:

2.7.3 Corollary. For each j = 1, 2, ... , let P,) be an arbitrary prob-ability space. Let 0 = 52; , .F _ f 1 ,F j . There is a unique prob-ability measure P on F such that

P{w a 52: co, e A1, ..., co E f j=1P;(A;)

for all n = 1, 2, ... and all A; e F j , j = 1, 2, .... We call P the product ofthe Pj, and write P = f 1 P.

PROOF. In 2.7.2, take P(cvli ..., (oj, B) = Pj+1(B), B e j+1. ThenP (A1 x x f;=1 P;(A j), and thus the probability measure P of 2.7.2has the desired properties. If Q is another such probability measure, thenP = Q on the field of finite disjoint unions of measurable rectangles; henceP = Q on W by the Caratheodory extension theorem. I

Problems

1. Show that f; 1,F j is the minimal a-field over the measurable rectangles.

2. Let ,F = R(R); show that the following sets belong to F°° :(a) {x a R°°: sup,, x < a},(b) {xeR°°:Yn IIxxl <a),(c) {x e R': lim W x exists and is finite},(d) {x a R': lim sup,,.. , x < a},(e) {x a R'°: Y"= 1 xk = 0 for at least one n > 0}.

3. Let F be a c-field of subsets of a set S, and assume F is countablygenerated, that is, there is a sequence of sets A1, A2, ... in F such thatthe smallest a-field containing the A; is 9. Show that 9' is also count-ably generated. In particular, .(R)°° is countably generated; take the A;as intervals with rational endpoints.

4. How many sets are there in R(R)'?5. Define f: R°° -+ R as follows:

the smallest positive integer n

such that x1 + - +x,2: 1,Ax 1, x2 , ) = if such an n exists,

0o if I foralln.

Show that f: (R00, .4(R)°°) -+ (R, 9(R)).


2.8 References

The presentation in Chapters 1 and 2 has been strongly influenced byseveral sources. The first systematic presentation of measure theory appearedin Halmos (1950). Halmos achieves slightly greater generality at the expenseof technical complications by replacing a-fields by a-rings. (A a-ring is aclass of sets closed under differences and countable unions.) However, a-fieldswill be completely adequate for our purposes. The first account-of measuretheory specifically oriented toward probability was given by Loeve (1955).Several useful refinements were made by Royden (1963), Neveu (1965), andRudin (1966). Neveu's book emphasizes probability while Rudin's book isparticularly helpful as a preparation for work in harmonic analysis.

For further properties of finitely additive set functions, and a developmentof integration theory for functions with values in a Banach space, see Dunfordand Schwartz (1958).

3

Introduction to Functional Analysis

3.1 Introduction

An important part of analysis consists of the study of vector spaces en-dowed with an additional structure of some kind. In Chapter 2, for example,we studied the vector space L"(fl, .F, p). If I < p < oo, the seminorm II II,allowed us to talk about such notions as distance, convergence, and complete-ness.

In this chapter, we look at various structures that can be defined on vectorspaces. The most general concept studied is that of a topological vector space,which is a vector space endowed with a topology compatible with the alge-braic operations, that is, the topology makes vector addition and scalar multi-plication continuous. Special cases are Banach and Hilbert spaces. In aBanach space there is a notion of length of a vector, and in a Hilbert space,length is in turn determined by a "dot product" of vectors. Hilbert spacesare a natural generalization of finite-dimensional Euclidean spaces.

We now list the spaces we are going to study. The term "vector space"will always mean vector space over the complex field C; "real vector space"indicates that the scalar field is R; no other fields will be considered.

3.1.1 Definitions. Let L be a vector space. A senrinorin on L is a functionII II from L to the nonnegative reals satisfying

Ilaxll = jai Ilxlj for all aeC, xcL,!x+yll-IlxlI+IYII for all x, y c L.

113

114 3 INTRODUCTION TO FUNCTIONAL ANALYSIS

The first property is called absolute homogeneity, the second subadditivity.Note that absolute homogeneity implies that 11011 = 0. (We use the samesymbol for the zero vector and the zero scalar.) If, in addition, llxll = 0implies that x = 0, the seminorm is called a norm on L and L is said to be anormed linear space.

If II II is a seminorm on L, and d(x, y) = Ilx - Yll, x, y E L, d is a pseudo-metric on L; a metric if 11 II is a norm. A Banach.space is a complete normedlinear space, that is, relative to the metric d induced by the norm, everyCauchy sequence converges.

An inner product on L is a function from L x L to C, denoted by (x, y) -<x, y>, satisfying

<ax + by, z> = a<x, z> + b<y, z> for all a,beC, x,y,zeL,<x, y> _ <y, x> for all x, y e L

<x,x>0 for all xEL,<x,x>=0 if and only if x=0

(the over-bar indicates complex conjugation). A vector space endowed withan inner product is called an inner product space or pre-Hilbert space. If L isan inner product space, Ilxll = (<x, x>)112 defines a norm on L; this is a conse-quence of the Cauchy-Schwarz inequality, to be proved in Section 3.2. If,with this norm, L is complete, L is said to be a Hilbert space. Thus a Hilbertspace is a Banach space whose norm is determined by an inner product.

Finally, a topological vector space is a vector space L with a topology suchthat addition and scalar multiplication are continuous, in other words, themappings

(x, y)-->x+y of L x L into Land

(a, x) -. ax of C x L into Lare continuous, with the product topology on L x L and C x L. In manybooks, the topology is required to be Hausdorfl, but we find it more convenientnot to make this assumption.

A Banach space is a topological vector space with the topology inducedby the metric d(x, y) = 11x - yll. For if xn -+ x and y y, then

+Y )II s Ilx. - xll + IIY

if a,, - a and x,, - x, then

11a. x. - ax it < 11a. x. - a xll + Il a x - axll

= Ia.I Ilx,, - xll + Ia - aI IlxII-0.

3.1 INTRODUCTION 115

The above definitions remain unchanged if L is a real vector space, ex-cept of course that C is replaced by R. Also, we may drop the complex con-jugate in the symmetry requirement for inner product and simply write(x, y> = (y, x> for all x, y E L.

3.1.2 Examples. (a) If (0, .,c, p) is a measure space and I < p:!5; oo, II II vis a seminorm on the vector space L°(S2, F, µ). If we pass to equivalenceclasses by identifying functions that agree a.e. [p], we obtain L°(S2, °F, P), aBanach space (see 2.4). When p = 2, the norm II IIP is determined by an innerproduct

(.f, g> _ Jnf4 dp

Hence L2(S2, .F, p) is a Hilbert space.If .F consists of all subsets of f and p is counting measure, then f = g a.e.

[p] implies f =_ g. Thus it is not necessary to pass to equivalence classes;L°(S2, .F, p) is a Banach space, denoted for simplicity by I°(S2).

By 2.4.12, if I < p < co, then P(S2) consists of all functions f =(f (a), a E S2) from f to C such that f(a) = 0 for all but countably many a,and If IIP = I f(c) ° < oo. When p = 2, the norm on 12(52) is induced bythe inner product

<f, 9> = Y-f(a)9(«)a

When p = oo, the situation is slightly different. The space 1°°(f2) is thecollection of all bounded complex-valued functions on 0, with the sup norm

If II = sup{1 f(x) I: X C -Q).

Similarly, if 0 is a topological space and L is the class of all boundedcontinuous complex-valued functions on S2, then L is a Banach space under thesup norm, for we may verify directly that the sup norm is actually a norm, orequally well we may use the fact that L c Thus we need only checkcompleteness, and this follows because a uniform limit of continuous func-tions is continuous.

(b) Let c be the set of all convergent sequences of complex numbers, andput the sup norm on c; if f = n >_ 1) E c, then

If 11 =sup( n = 1, 2, ...}.

Again, to show that c is a Banach space we need only establish completeness.Let be a Cauchy sequence in c; if ff = k > 1}, then

limk., a,,,, exists from each n since f E c, and bk = ank exists,

uniformly in k, since I ank -a k I Ilfn -fmll - 0 as n, nm -* oo. By thestandard double limit theorem,

lim lim ank = lim lim an,.n-oo k-oo k-co n-oo

In particular, limk-a, bk exists, so if f = {bk, k > 11, then f e c. But Ilfn -f II =supk I ank - by 0 as n - oo since ank --> bk uniformly in k. This provescompleteness.

(c) Let L be the collection of all complex valued functions on S, where Sis an arbitrary set. Put the topology of pointwise convergence on L, so that asequence or net { fn} of functions in L converges to the function f e L if andonly if fn(x) --+f(x) for each x e S. (See the appendix on general topology,Section Al, for properties of nets.) With this topology, L is a topologicalvector space. To show that addition is continuous, observe that if fn -).f andgn -+ g pointwise, then f,, + gn --),f + g pointwise. Similarly if an a C, n = 1,2, ..., a" - a, and J,, f pointwise, then an fn of pointwise.

3.2 Basic Properties of Hilbert Spaces

Hilbert spaces are a natural generalization of finite-dimensional Euclideanspaces in the sense that many of the familiar geometric results in R" carryover. First recall the definition of the inner product (or "dot product ")on R": If x = (x...... x") and y = (y,, ..., yn), then <x, y> = y;=, x; y; .(This becomes Y;_, x y; in the space C" of all n-tuples of complex numbers.)The length of a vector in R" is given by IIxII = (<x, x>)1"2 = , x2)'2, andthe distance between two points of R" is d(x, y) = Ilx - YII In order to showthat d is a metric, the triangle inequality must be established; this in turnfollows from the Cauchy-Schwarz inequality I <x, y> I <_ IIxII IIYII In fact theCauchy-Schwarz inequality holds in any inner product space, as we nowprove:

3.2.1 Cauchy-Schwarz Inequality. If L is an inner product space, and IIxII =(<x, x>)112, x e L, then

I <x, Y> I < Ilx!I IIYI! for all x, y e L.

Equality holds if x and y are linearly dependent.

PROOF. For any a e C,

0<<x+ay,x+ay>_<x+ay,x>+<x+ay,ay>_ <x, x> + a<y, x> + i<x, y> + I a 12<Y, y>.

3.2 BASIC PROPERTIES OF HILBERT SPACES 117

Set a = - <x, y>l (y, y> (if <y, y> = 0, then y = 0 and the result is trivial).Since <y, x> _ <x, y>, we have

0<<x,x>-2 I<x,y>12+ I<x,Y>I2<Y, Y> <Y, Y>

proving the inequality.Since <x + ay, x + ay> = 0 if x + ay = 0, equality holds it x and y are

linearly dependent. I

3.2.2 Corollary. If L is an inner product space and 11.x1! = (<x,x>)"2,

x c- L, then 11 II is a norm on L.

PROOF. It is immediate that Ilxli ? 0, Ilaxll = I a I Ilxll, and Ilxll = 0 iff x = 0.Now

I1x+Y112 = <x+y,X+Y> = ilx112+ IIY112+<x,y>+<Y,x>

= IIx112+ IIY112+2Re<x,y>

<- IIxI12 + 1ly112 + 211x11 IIYII by 3.2.1.

Therefore Ilx + y112 <- (11x11 + IIYIl)2. 1

3.2.3 Corollary. An inner product is (jointly) continuous in both variables,that is, x,, -+ x, y y implies <x, y> can be a net aswell as a sequence).

PROOF.

<X., <x, Y> I = I <x,, , Y. - Y> + <x. - x, Y> I

-< llyn - YII + iix - xli IIYII

But by subadditivity of the norm,

111x.11 - llxii 1 < llx. - x11- 0;

hence 11xii. It follows that <x,,, y.> <x, y>. I

by 3.2.1.

The computation of 3.2.2 establishes the following result, which saysgeometrically that the sum of the squares of the lengths of the diagonals of aparallelogram is twice the sum of the squares of the lengths of the sides:

3.2.4 Parallelogram Law. In an inner product space,

IIx + y112 + IIx - YII2 = 2(11X112 + 11y112).

PROOF.

IIx + YII2 = IIXII2 + IIYII2 + 2 Re<x, y>,

and

IIx-YII2=IIXI12+IIYII2-2Re<x,y>. I

Now suppose that x,, ..., x are mutually perpendicular unit vectors inRk, k > n. If x is an arbitrary vector in Rk, we try to approximate x by a linearcombination Y;=, ajxj. The reader may recall that E;_, ajxj will be closestto x in the sense of Euclidean distance when aj = <x, xj>. This result holds inan arbitrary inner product space.

3.2.5 Definition. Two elements x and y in an inner product space L aresaid to be orthogonal or perpendicular if <x, y> = 0. If B c L, B is said to beorthogonal iff <x, y> = 0 for all x, y E B such that x:0 y; B is orthonormalif it is orthogonal and IIXII = 1 for all x c- B.

The computation of 3.2.2 shows that if x,, x2 , ..., x,, are orthogonal, thePythagorean relation holds: x+`12 = D°_' IIx;112.

3.2.6 Theorem. If {x ..., x,j is an orthonormal set in the inner productspace L, and x c- L,

n

x- Y-ajxj is minimized when a j = <x, xj>, j = 1, ..., n.II j=1 11

PROOF.

IIx a xjII2 = ` x - JY la. x j , x - kI' ak xk

\ n` n

IIX112 ak<X, Xk> - aj<xj, x>k=1 j=1

+ K ajxj, E akxk ).j =1 k=1 /

3.2 BASIC PROPERTIES OF IIILBERT SPACES 119

The last term on the right is >. 1 I a, 12 since the x; are orthonormal. Further-more, -a,<x, x,> - a,<xl , x> + I a;1

2 = _I <x, x,> 12 + I a,; - <x, x,> 12.Thus

0<2 n nx- a,x, 11X112- YI<x,x;>I2+ Yla;-<X,x;>12, (1)

.1=1 .1=1 J=1

so that we can do no better than to take a, = <x, x,>. I

The above computation establishes the following important inequality.

3.2.7 Bessel's Inequality. If B is an arbitrary orthonormal subset of the innerproduct space L and x is an element of L, then

11X112>- I<x,y>12.yeB

In other words, <x, y> = 0 for all but countably many y e B, say y = x1,x...... and

IIX112>_E I<x,x,>12.

Equality holds if El= 1 <x, x,>x, -> x as n -+ oo.

PROOF. If x1, ..., xn E B, set <x, x,> in Eq. (1) of 3.2.6 to obtainIl x - Yi = 1 <x, xl>xi II 2 = 11 X 112 - Li= 1 I <x, X j> 12 > 0. I

We now consider another basic geometric idea, that of projection. If Mis a subspace of Rn and x is any vector in Rn, x can be resolved into a compo-nent in M and a component perpendicular to M. In other words, x = y + zwhere y e M and z is orthogonal to every vector in M. Before generalizing toan arbitrary space, we indicate some terminology.

3.2.8 Definitions. A subspace or linear manifold of a vector space L is a sub-set M of L that is also a vector space; that is, M is closed under addition andscalar multiplication. If L is a topological vector space, M is said to be aclosed subspace of L if M is a subspace and is also a closed set in the topologyof L.

A subset M of the vector space L is said to be convex if for all x, y e M,we have ax + (I - a)y e M for all real a e [0, 1].

The key fact that we need is that if M is a closed convex subset of theHilbert space H and x is an arbitrary point of H, there is a unique point of Mclosest to x.

3.2.9 Theorem. Let M he a nonempty closed convex subset of the Hilbertspace H. If x e H, there is a unique element yo e M such that

IIx - yoll = inf{IIx-Y1l:yEM}.

PROOF. Let d = inf{llx - Y1l : y e M}, and pick points yl, Y21 ... C_ M withIIx - y.11 - d as n ce ; we show that is a Cauchy sequence.

Since Jlu + vlJ 2 + flu - x112 = 211u112 + 211vIl2 for all u, v e H by the parallel-ogram law 3.2.4, we may set u = yR - x, v = ym - x to obtain

IIYn+Ym-2x112+ IIY.-Yn112 =2IIY,,-xII2+211ym-x112

or

IIYn-Ym112=211Y,,-xII2+211Ym-xII2-4111(y,,+Ym)-xII2.

Since z(y,, + y,,) e M by convexity, 11 (y + ym) - xII2 >- d2, and it followsthat IIy,, - Y,,11 0 as n, m -> oo.

By completeness of H, y,, approaches a limit yo, hence IIx -fix - Yo11 . But then JJx - yo J1 = d, and yo e M since M is closed; this finishesthe existence part of the proof.

To prove uniqueness, let yo, zo e M, with IIx - Yoli = IIx - zoll = d. Inthe parallelogram law, take u = yo - x, v = zo - x, to obtain

IIYo+zo-2x112+llyo-z0112=211Yo-x112+211zo-x1f2=4d2.

But IIYo + zo - 2x112 = 411R(Yo + zo) - x112 >- 4d2; hence IIYo - zoll = 0, soYo=zo I

If M is a closed subspace of H, the element yo found in 3.2.9 is called theprojection of x on M. The following result helps to justify this terminology.

3.2.10 Theorem. Let M be a closed subspace of the Hilbert space H, andyo an element of M. Then

lix -- Yo 11 = inf{ IIx - Yil : y e M} iff x - Ye 1 M,

that is, <x - yo, y> = 0 for all y e M.

PROOF. Assume x - yo 1 M. If y e M, then

I1x-Y112 = IIx - Yo - (y-Yo)112

= IIx-Yo112 + IIY-Yo112-2Re<x-Yo,Y-Yo>= IIx-YoII2+ IIY-YoII2 since y - yoeM

IIx - YoII2

Therefore, JJx - Yoll = inf{Ilx - ylj : y e M}.

Conversely, assume Ilx - Yol! = inf{Ilx - yll : y e M}. Let y e M and letc be an arbitrary complex number. Then yo + cy e M since M is a subspace,hence Ilx - Yo - cyll Z llx - Yoll. But

I1x-Yo-cY112 = llx-Yo112+ IcI211Y112-2Re<x-yo,cy>;

hence

IcII11Y112 -2Re<x-Yo,cy>>0.

Take c = b<x - yo, y>, b real. Then <x - yo, cy> = b I <x - yo, y> 12. ThusI <x - Yo' Y> 12[b211Y112 - 2b] > 0. But the expression in square brackets isnegative if b is positive and sufficiently close to 0; hence <x - yo, y> = 0. 1

We may give still another way of characterizing the projection of xon M.

3.2.11 Projection Theorem. Let M be a closed subspace of the Hilbertspace H. If x e H, then x has a unique representation x = y + z wherey e M and z 1 M. Furthermore, y is the projection of x on M.

PROOF. Let yo be the projection of x on M, and take y = yo, z = x - yo.By 3.2.10, z 1 M, proving the existence of the desired representation. Toprove uniqueness, let x = y + z = y' + z' where y, y' e M, z, z' 1 M. Theny - y' e M since M is a subspace, and y - y' 1 M since y - y' = z' - z.Thus y - y' is orthogonal to itself, hence y = y'. But then z = z', provinguniqueness. I

If M is any subset of H, the set M1 = {x e H: x I M} is a closed subspaceby definition of the inner product and 3.2.3. If M is a closed subspace, Mlis called the orthogonal complement of H, and the projection theorem is ex-pressed by saying that H is the orthogonal direct sum of M and Ml, writtenH=M(B M1.

In R", it is possible to construct an orthonormal basis, that is, a set{x1, .... , x"} of n mutually perpendicular unit vectors. Any vector x in R"may then be represented as x <x, x.>x;, so that <x, x.> is the com-ponent of x in the direction of x; . We are now able to generalize this idea toan arbitrary Hilbert space. The following terminology will be used.

3.2.12 Definitions. If B is a subset of the topological vector space L, thespace spanned by B, denoted by S(B), is the smallest closed subspace of Lcontaining all elements of B. If L(B) is the linear manifold generated by B,

that is, L(B) consists of all elements F"., a; xi, a E C, Xt c B, i = 1, ... , n,n = 1, 2, ... , then S(B) = L(B).

If B is a subset of the Hilbert space H, B is said to be an orthonormalbasis for H if B is a maximal orthonormal subset of H, in other words, B isnot a proper subset of any other orthonormal subset of H. An orthonormalset B c H is maximal if S(B) = H, and there are several other conditionsequivalent to this, as we now prove.

3.2.13 Theorem. Let B = {x8, a e 1} be an orthonormal subset of the Hilbertspace H. The following conditions are equivalent:

(a) B is an orthonormal basis.(b) B is a "complete orthonormal set," that is, the only x c- H such that

xlBisx=0.(c) B spans H, that is, S(B) = H.(d) For all x e H, x = & <x, xa)xa. (Let us explain this notation. By

3.2.7, <x, xa) = 0 for all but countably many xa , say for x,, x2 , ... ; theassertion is that Y j=, <x, x;)x; -1, x, and this holds regardless of the order inwhich the xj are listed.)

(e) For all x, y e H, <x, y) <x, xa)<xa, y).(f) For all x e H, JJxJJ 2 = 1]a J <x, xa) 1 2.

Condition (f) [and sometimes (e) as well] is referred to as the Parsevalrelation.

PROOF. (a) implies (b): If x 1 B, x :A 0, let y = x/JlxIl. Then B v {y} is anorthonormal set, contradicting the maximality of B.

(b) implies (c): If x e H, write x = y + z where y e S(B) and z 1 S(B)(see 3.2.11). By (b), z = 0; hence x e S(B).

(c) implies (d): Since S(B) = L(B), given x e H and e > 0 there is a finiteset F c I and complex numbers a8, a e F, such that

x - Y as x,, < e.aeF

By 3.2.6, if G is any finite subset of I such that F c G,

x - E <x, xa)xa 11 -< I x - aY as xa II where as = 0 for a 0 F

- x - y as xQ l < s.aeF

Thus if x,, x2 , ... is any ordering of the points xQ E B for which <x, x8> # 0,lix - Ei=, <x, x;>x; < e for sufficiently large n, as desired.

(d) implies (e): This is immediate from 3.2.3.(e) implies (f): Set x = y in (e).(f) implies (a): Let C be an orthonormal set with B c C, B 0 C. If x e C,

x 1 B, we have IIxI12 = a I <x, x2> 2 = 0 since by orthonormality of C, x isorthogonal to everything in B. This is a contradiction because llxll = 1 forall

3.2.14 Corollary. Let B = {x,,, a e I} be an orthonormal subset of H, notnecessarily a basis.

(a) B is an orthonormal basis for S(B). [Note that S(B) is a closed sub-space of H, hence is itself a Hilbert space with the same inner product.]

(b) If x e H and y is the projection of x on S(B), then

y = <x, x.>xa

[see 3.2.13(d) for the interpretation of the series].

PROOF. (a) Let x e S(B), x 1 B; then x I L(B). If y e S(B), let y,, y2, ...e L(B) with y,, -+ y. Since <x, 0 for all n, we have <x, y> = 0 by 3.2.3.Thus x 1 S(B), so that (x, x> = 0, hence x = 0. The result follows from3.2.13(b).

(b) By part (a) and 3.2.13(d), y = Y,,(y, x >x8 . But x - y 1 S(B) by3.2.11, hence <x, x8> = <y, xa> for all a. I

A standard application of Zorn's lemma shows that every Hilbert spacehas an orthonormal basis; an additional argument shows that any twoorthonormal bases have the same cardinality (see Problem 5). This fact maybe used to classify all possible Hilbert spaces, as follows:

3.2.15 Theorem. Let S be an arbitrary set, and let H be a Hilbert space withan orthonormal basis B having the same cardinality as S. Then there is anisometric isomorphism (a one-to-one-onto, linear, norm-preserving map)between H and 12(S).

PROOF. We may write B = {x., a e S}. If x e H, 3.2.13(d) then gives x =1. <x, xz>xa, where Y I <x, xa> 12 = 11x112 < oo by 3.2.13(f). The mapx - (<x, xe>, a E S) of H into 12(S) is therefore norm-preserving; since itis also linear, it must be one-to-one. To show that the map is onto, consider any

collection of complex numbers as , a e S, with I as 12 < 00. Say as = 0except for a = a,, a, ..., and let x = a a,,,xa,. [The series converges to anelement of H because of the following fact, which occurs often enough to bestated separately: If {y,, Y2, ...} is an orthonormal subset of H, the series>; c, y, converges to some element of H if y; I c1I 2 < oo. To see this, ob-serve that 1117" c; ZIIY;I12 = ;_" c3 2; thus the partialsums form a Cauchy sequence iff yj I c; 12 < co.]

Since the x,, are orthonormal, it follows that <x, xa> = aQ for all a, so thatxmaps onto (a,,,aeS).I

We may also characterize Hilbert spaces that are separable, that is, havea countable dense set.

3.2.16 Theorem. A Hilbert space H is separable if it has a countable ortho-normal basis. If the orthonormal basis has n elements, H is isometricallyisomorphic to C"; if the orthonormal basis is infinite, H is isometrically iso-morphic to 12, that is, 12(S) with S = {1, 2, ...}.

PROOF. Let B be an orthonormal basis for H. Now IIx - Yll2 = 11x112 + Ily112= 2 for all x, y e B, x 5e y, hence the balls A,, = {y: Ily - x11 < j;}, x e B, aredisjoint. If D is dense in H, D must contain a point in each A,, so that if Bis uncountable, D must be also, and therefore H cannot be separable.

Now assume B is a countable set {x,, X2, ...). If U is a nonempty opensubset of H (= S(B) = L(B)], U contains an element of the form Y_;_, a; x;with the a; a C; in fact the a; may be assumed to be rational, in other words,to have rational numbers as real and imaginary parts. Thus

a; x; : n = 1, 2, ... , the a3 rational}D= (jY,,-1

is a countable dense set, so that H is separable. The remaining statements ofthe theorem follow from 3.2.15. 1

A linear norm-preserving map from one Hilbert space to another auto-matically preserves inner products; this is a consequence of the followingproposition:

3.2.17 Polarization Identity. In any inner product space,

4<x, y> = IIx + YII2 - IIx - YII2 + illx + iYI12 - illx - iy112.

PROOF.

IIx+y112=I1Xll2+11y112+2Re<x,y>IIx-y112=IIXII2+I1Y112-2Re<x,y>

Ilx + iy112 = IIxii2 + IIYI12 + 2 Re<x, iy>

IIx - iyl[2 = IIXI12 + 1ly112 - 2 Re<x, iy>

But Re<x, iy> = Re[-i<x, y>] = Im<x, y>, and the result follows.

Problems

1. In the Hilbert space 12(S), show that the elements e,,, a e S, form anorthonormal basis, where

e2(s) = r0, s 96 a,l1, s=a.

2. (a) If A is an arbitrary subset of the Hilbert space H, show thatA1' = S(A).

(b) If M is a linear manifold of H, show that M is dense in H ifMl = {0}.

3. Let x,, ... , x" be elements of a Hilbert space. Show that the xi arelinearly dependent if the Gramian (the determinant of the inner prod-ucts <x;, x;>, i, j = 1, ..., n) is 0.

4. (Grain-Schmidt process) Let B = {x,, x2 , ...} be a countable linearlyindependent subset of the Hilbert space H. Define e, = x,/IIx, II ; havingchosen orthonormal elements e,, ..., en, let yn+, be the projection ofxn+, on the space spanned by e,, ..., en:

n

Y.+, _ Y <xn+,, ei>e,

Definei=,

X'1+1 - Yn+,en+, =

iIxn+,_Y..,,,*

(a) Show that L{e,, ..., en} = L{x ..., xn} for all n, hence xn+, #Yn+, and the process is well defined.

(b) Show that the e" form an orthonormal basis for S(B).

Comments. Consider the space H = L2(-1, 1); if we take xn(t) = t",n = 0, 1, ..., the Gram-Schmidt process yields the Legendre polynomialsen(t) = an d"[(t2 - 1)"]/dt", where a" is chosen so that Ilenll = 1. Similarly,if in L2(- oo, oo) we take xn(t) = t"e-121'2, n = 0, 1, ..., we obtain theHermite polynomials en(t) = an(-1)"e`2 d"(e-`2)/dt".

5. (a) Show that every Hilbert space has an orthonormal basis.(b) Show that any two orthonormal bases have the same cardinality.

6. Let U be an open subset of the complex plane, and let H(U) be the collec-tion of all functions f analytic on U such that

11f1I2= f f If(x+iy)I2dxdy<c.

If we define

U

<f, g> = f f f(x +.iy)g(x + iy) dx dy, f, g e H(U),U

H(U) becomes an inner product space.(a) If K is a compact subset of U and f e H(U), show that

sup{I f(z) I : z e K} < 11f 111,[n_ (lo

where do is the Euclidean distance from K to the complement ofU. Therefore convergence in H(U) implies uniform convergence oncompact subsets of U. (If z e K, the Cauchy integral formulayields

zf(z) = (2n)-` f

n

f(z + re") d8, r < do.0

Integrate this equation with respect to r, 0 < r <_ d < do. Notealso that if U is the entire plane, we may take do = oo, and it followsthat H(C) = {0}.)

(b) Show that H(U) is complete, and hence is a Hilbert space.7. (a) If f is analytic on the unit disk D = {z: I zI < 1) with Taylor

expansion J(z) o a" z", show that

-00sup

fz"If(reie)I2dO=os.<i 2n o "o

It follows that if H2 is the collection of all functions f analytic onD such that

zaN2(f) = sup

I

f If(reie)I2 dO < oo,os.<12n o

then H2, with norm N, is a pre-Hilbert space.(b) If f e H2, show that

f f I f(x+iy)I2dxdy<nN2(f);D

3.3 LINEAR OPERATORS ON NORMED LINEAR SPACES 127

hence H2 c H(D) and convergence in HZ implies convergence inH(D).

(c) If f"(z) = z", n = 0, 1, ..., show that f" - 0 in H(D) but not inH2.

(d) Show that H2 is complete, and hence is a Hilbert space. [By (a),H2 is isometrically isomorphic to a subspace of 12.]

(e) If en(z) = z", n = 0, 1, ..., show that the e" form an orthonormalbasis for H2

(f) If en(z) = [(2n + 2)/27r]'/2z", n = 0, 1, ..., show that the e" form anorthonormal basis for H(D).

8. Let M be a closed convex subset of the Hilbert space H, and yo an ele-ment of M. If x e H, show that

IIx - YoII = inf{jjx - YII : y e M}if

Re<x - yo, y - yo> < 0 for all y c- M.

9. (a) If g is a continuous complex-valued function on [0, 2n1 withg(0) = g(2n), use the Stone-Weierstrass theorem to show that gcan be uniformly approximated by trigonometric polynomialsYk= _n ck e`kt. Conclude that the trigonometric polynomials aredense in L2[0, 2n].

(b) if f e L2[0, 21r], show that the Fourier series Z' an e'"t, an =(1 /2n) Jo" f (t)e-'"t dt, converges to f in L2, that is,

f2"

0J(t)- I akeik'

k= -ndt-s0 as n ->oo.

2

(c) Show that {e`"t/J2n, n = 0, ± 1, ±2, ...} yields an orthonormalbasis for L2[0, 2n].

10. (a) Give an example to show that if M is a nonempty, closed, but notconvex subset of a Hilbert space H, there need not be an elementof minimum norm in M. Thus the convexity hypothesis cannot bedropped from Theorem 3.2.9, even if we restrict ourselves toexistence and forget about uniqueness.

(b) Show that convexity is not necessary in the existence part of 3.2.9if H is finite-dimensional.

3.3 Linear Operators on Normed Linear Spaces

The idea of a linear transformation from one Euclidean space to anotheris familiar. If A is a linear map from R" to R'", then A is completely specifiedby giving its values on a basis e1, ..., en : A(Y-°=1 cr e;) _ °=1 c; A(e;);

furthermore, A is always continuous. If elements of R" and R' are representedby column vectors, A is represented by an m x n matrix. If n = m, so that Ais a linear transformation on R", A is one-to-one if it is onto, and if A-1exists, it is always continuous (as well as linear).

Linear transformations on infinite-dimensional spaces have many featuresnot found on the finite-dimensional case, as we shall see.

In this section, we study mappings A from one normed linear space Lto another.such space M. The mapping A will be a linear operator, that is,A(ax + by) = aA(x) + bA(y) for all x, y e L, a, b e C. We use the symbol

11 11 for the norm on both spaces; no confusion should result. Linear operatorscan of course be defined on arbitrary vector spaces, but in this section, it isalways understood that the domain and range are normed.

Linearity does not imply continuity; to study this idea, we introduce anew concept.

3.3.1 Definitions and Comments. If A is a linear operator, the norm of A isdefined by:

(a) 11A 11 = sup{IlAxll : x e L, Ilxll 5 1}. We may express IIAII in two otherways.

(b) IIAII = sup{IlAxll : x e L, Ilxll = 1}.

(c) IIAII = sup(IlAxll/llxll : x e L, x # 0).

To see this, note that (b) <_ (a) is clear; if x 96 0, then IlAxll/Ilxll = IIA(x/Ilxll)ll,and xlllxll has norm 1; hence (c):5 (b). Finally if Ilxll <_ 1, x:96 0, then IlAxll <-IlAxlllllxll, so (a):5 (c).

It follows from (c) that IlAxll 5 IIAII llxll, and in fact IIAII is the smallestnumber k such that II Ax II 5 kllxll for all x e L.

The linear operator A is said to be bounded if 11 All < oo. Boundedness isoften easy to check, a very fortunate circumstance because we can show thatboundedness is equivalent to continuity.

3.3.2 Theorem. A linear operator A is continuous if it is bounded.

PROOF. If A is bounded and {x"} is a sequence in L converging to 0, thenIIAx"II S IIAII llx"II 0. (We use here the fact that the mapping x--* Nxll of Linto the nonnegative reals is continuous; this follows because I llxll - llYll 15

11x - YII.) Thus A is continuous at 0, and therefore, by linearity, is con-tinuous everywhere. On the other hand, if A is unbounded, we can find ele-ments x" e L with Ilx"II < 1 and IIAx,,ll - oo. Let y,, = x"/IIAx"II ; then y" 0,but II Ay" II = 1 for all n, hence Ay" does not converge to 0. Consequently, Ais discontinuous. I

We are going to show that the set of all bounded linear operators fromL to M is itself a normed linear space, but first we consider some examples:

3.3.3 Examples. (a) Let L = C[a, b], the set of all continuous complex-valued functions on the closed bounded interval [a, b] of reals. Put the supnorm on L:

IIx11 = sup{ I x(t) I : a < t < b}, x e C[a, b].

With this norm, L is a Banach space [see 3.1.2(a)]. Let K = K(s, t) be acontinuous complex-valued function on [a, b] x [a, b], and define a linearoperator on L by

(Ax)(s) =J

K(s, t)x(t) dt, a < s < b.a

(By the dominated convergence theorem, Ax actually belongs to L.) We showthat A is bounded:

II A x 11 = sup f K(s, t)x(t) dtbs

las<b a

< sup lx(t) I sup fb I K(s, t) I d ta<t5b

.:5'

:5b a

bIIxllMax

f IK(s, t) I dt=a5s5b a

(note that fa I K(s, t) I dt is a continuous function of s, by the dominated con-vergence theorem). Thus

bIIAII <_ max f IK(s, t) I dt < oo.a<s5b a

In fact IIAII = max. 5.,5b fa I K(s, t) I dt (see Problem 3).(b) Let L = 1°(S), where S is the set of all integers and I 5 p < oo.

Define a linear operator T on L by (Tf)a =f., 1, n e S; T is called thesided shift or the bilateral shift. It follows from the definition that T is one-to-one onto, and (T -'f )a = fa _,, n e S. Also, 11 Tf 11 = 11.f11 for all f e L; hence

11 T -.111 = II f 11 for all f e L, so that T is an isometric isomorphism of L withitself. ([n particular, 11 T11 = 11 T-'11 = 1.)

If we replace S by the positive integers and define T as above, the resultingoperator is called the one-sided or unilateral shift. The one-sided shift is ontowith norm 1, but is not one-to-one; T(f1, f2, ...)= (f2, f3 . )

The shift operators we have defined are shifts to the left. We may also de-fine shifts to the right; in the two-sided case we take (Af)a = n e S (sothat A = T-'). In the one-sided case, we set (Af) n >_ 2; (Af), =0;

thus A(f1,f2 , ...) = (0,f1,f2 , ...).The operator A is one-to-one but not onto;A(L) is the closed subspace of L consisting of those sequences whose firstcoordinate is 0.

(c) Assume that the Banach space L is the direct sum of the two closedsubspaces M and N, in other words each x e L can be represented in a uniqueway as y + z for some y e M, z e N. (We have already encountered this situa-tion with L a Hilbert space, M a closed subspace, N = M'.) We define a linearoperator P on L by

Px=y.

P is called a projection; specifically, P is the projection of L on M. P has thefollowing properties:

(1) P is idempotent; that is, P2 = P, where P2 is the composition of Pwith itself.

(2) P is continuous.

Property (1) follows from the definition of P; property (2) will be proved later,as a consequence of the closed graph theorem (see after 3.4.16).

Conversely, let P be a continuous idempotent linear operator on L.Define

M={xeL:Px=x}, N={xEL:Px=0}.

Then we can prove that M and N are closed subspaces, L is the direct sum ofM and N, and P is the projection of L on M.

By continuity of P, M and N are closed subspaces. If x e L, then x = Px+ (I - P)x = y + z where y e M, z e N. Since M n N = {0}, L is the directsum of M and N. Furthermore, Px = y by definition of y, so that Pis the projection of L on M.

If f is a linear operator from a vector space L to the scalar field, f is calleda linear functional. (The norm of a scalar b is taken as I b 1.) Considerableinsight is gained about normed linear spaces by studying ways of representingcontinuous linear functionals on such spaces. We give some examples:

3.3.4 Representations of Continuous Linear Functionals. (a) Let f be a con-tinuous linear functional on the Hilbert space H. We show that there is aunique element y e H such that

f(x)=<x,y) for all xeH.

This is one of several results called the Riesz representation theorem.If the desired y exists, it must be unique, for if <x, y) = <x, z> for all

x, then y - z is orthogonal to everything in H (including itself), so y = z.

To prove existence, let N be the null space of f, that is, N ={x e H: f(x) = 0}. If N' = {0}, then N = H by the projection theorem; hence

f = 0, and we may take y = 0. Thus assume we have an element u e N' withu 0. Then u 0 N, and if we define z = u/f (u), we have z e Nl and f (z) = 1.

If x e H and f (x) = a, then

x = (x - az) + az, with x - az E N, az 1 N.

If y = z/Ilzll2, then

<x, y> = <x - az, y> + a<z, y>

= a(z, y> since y I N

= a =.f (x)

as desired.The above argument shows that if f is not identically 0, then N' is one-

dimensional. For if x e N' and f(x) = a, then x - az e N n N', hencex = az. Therefore N' = {az: a e Q.

Notice also that if 11fII is the norm off, considered as a linear operator,then

111'11 = IIyII

For I f(x) I = I<x; y>I <- Ilxll Ilyll for all x, hence 11f 11 < IIYII; but I f(y)I =<y, Y> I = Ilyll 11Y11, so 11f 11 t 11Y11.

Now consider the space H* of all continuous linear functionals on H;H* is a vector space under the usual operations of addition and scalarmultiplication. If to each f E H* we associate the element of H given by theRiesz representation theorem, we obtain a map qi: H* -,. H that is one-to-oneonto, norm-preserving, and conjugate linear; that is,

0(af+bg) =do(f)+b0(g), f,gEH*, a,bEC.

[Note that if f(x) = <x, y> for all x, then af(x) = <x, ay>.j Such a map iscalled a conjugate isometry.

(b) Let f be a continuous linear functional on l (= l°(S), where S isthe set of positive integers), I < p < oo. We show that if q is defined by(1/p) + (1/q) = 1, there is a unique element y = (yl, y2, ...) a 14 such that

Furthermore,

AX) = Y_ xk ykk=1

for all x e !°.

ao 1/q

111 l1=IIY11=(E lYklq)k= 1

To prove this, let en be the sequence in 1P defined by en(j) = 0, j 96 n;en(n) = 1. If x e 1P, then llx - Yk=, xkekliP = Ek n+, IxkIP->0; hencex = J:k 1 xk ek where the series converges in 1P. By continuity off,

f(x) = k xkyk, where yk =f(ek) (1)00k=1

Now write yk in polar form, that is, yk = rk eiek, rk > 0. Let

by (1),

But

n n

Al.) _ rk-1e-£Bkrkei0"= Y_ IA (2)k=1 k=1

If(Zn)I C 11f11 I1Zn11

n 1/P

= IIJ II Y_ IYkI1q-11P)k=1

llfll( IYkI9)1/Pk-1

By Eq. (2),n

1/4

Ek1IYk19)G Ilfll;

hence y e 19 and IIYII <_ If II. To prove that If Il < Ilyll, observe that Eq. (1)is of the form f (x) = J .Q xy du where Q is the positive integers and µ is count-ing measure. By Holder's inequality, If(x)I <- Ilxll IIYII, so that 11f 11 <_ IIYII

Finally, we prove uniqueness. If y, z e 14 and f(x) = Y-k 1 xkYk =k 1 xkZk for all x e 1P, then g(x) = Yk 1 xk(yk - Zk) = 0 for all x e P. The

above argument with f replaced by g shows that Ilgll = Ily - zll; henceliY - zll = 0, and thus y = z.

(c) Let f be a continuous linear functional on I. We show that there isa unique element y e F such that

AX) = E xk yk for all x e 11.k=1

The argument of (b) may be repeated up to Eq. (1). In this case, however,if yk=rke'Bk,k= 1, 2, ..., we take

zn = (0, ..., 0, e - ie°, 0, ...), witha-'e

in position n.

Thus by (1),` e'e = I Yf(zn) = e

But

If(zn)I s Ilfil Ilz.ll = Of 11.

Therefore I y I <_ llf II for all n, so that y e F and IIYII <_ IIf II. But by (1),

I f(x)I < (suPlYki) Ixkl = 11A Ilxii;k k=l

hence ilfll _< IIYII Uniqueness is proved as in (b).

In 3.3.4(b) and (c), the map f -, y of (1")* to 14 [q = c in 3.3.4(c)] is linear,and is one-to-one by uniqueness of y. To show that it is onto, observe thatany linear functional of the form f(x) = YA , xk yk , x e 1", y e 1", satisfiesIIIII IIYII by the analysis of 3.3.4(b) and (c). Therefore f is continuous, sothat every y e 19 is the image of some f e (1")*. Since If II = Ilyll, we have anisometric isomorphism of (I")* and 14.

If we replace yk by h, we obtain the result that there is a unique y e 1qsuch that f (x) = Yk 1 xk yk for all x e 1". This makes the map f -+ y a con-jugate isometry rather than an isometric isomorphism. If p = 2, then q = 2also, and thus we have another proof of the Riesz representation theoremfor separable Hilbert spaces (see 3.2.16). In fact, essentially the same argu-ment may be used in an arbitrary Hilbert space if the e are replaced by anarbitrary orthonormal basis. (Other examples of representation of continuouslinear functionals are given in Problems 10 and 11.)

We now show that the set of bounded linear operators from one normedlinear space to another can be made into a normed linear space.

3.3.5 Theorem. Let L and M be normed linear spaces, and let [L, M] bethe collection of all bounded linear operators from L to M. The operatornorm defined by 3.3.1 is a norm on [L, MI, and if M is complete, then [L, M]is complete. In particular, the set L* of all continuous linear functionals on Lis a Banach space (whether or not L is complete).

PROOF. It follows from 3.3.1 that IIA ll > 0 and ilaA II = Pal IlA ll for allA e [L, M] and a e C. Also by 3.3.1, if 11 All = 0, then Ax = 0 for all x e L,hence A = 0. If A, B e [L, M], then again by 3.3.1,

IIA + BII = sup{II(A + B)xll : x e L, Ilxll < 1}.

Since II(A + B)xll :5 IlAxll + IlBxll for all x,

iiA + BII s IIAil + IIBll

and it follows that [L, M] is a vector space and the operator norm is in facta norm on [L, M].

Now let A1, A2 , ... be a Cauchy sequence in [L, M]. Then

II(A"-Am)xli S IIA"-A,"II Ilxll-r0 as n,m->oo. (1)

Therefore {A" x} is a Cauchy sequence in M for each x e L, hence A. convergespointwise on L to an operator A. Since the A. are linear, so is A (observethat Ajax + by) = aA" x + bA" y, and let n -+ oo). Now given e > 0, chooseN such that 11 A,, - Amll < s for n, m >- N. Fix n >- N and let m - co in Eq. (1)to conclude that II (A" - A)xll <- e IlxiI for n >- N; therefore IIA" - A II - 0 asn -a oo. Since 11A 11 IIA - A"II + IIA"II, we have A e [L, M] and A. -> A inthe operator norm.

In the above proof we have talked about two different types of conver-gence of sequences of operators.

3.3.6 Definitions and Comments. Let A, A1i A2, ... e [L, M]. We say thatA" converges uniformly to A if IIA" - A II - 0 (notation: A"-" . A). SinceII(A" - A)xll < IIA" - All Ilxll, uniform operator convergence means thatA"x-. Ax, uniformly for llxll <_ 1 (or equally well for llxll <- k, k any positivereal number).

We say that A. converges strongly to A (notation: A"-4 A) if A" x -+ Axfor each x e L. Thus strong operator convergence is pointwise convergenceon all of L.

Uniform convergence implies strong convergence, but not conversely.For example, let {e1, e2 , ...) be an orthonormal set in a Hilbert space, andlet A. x = <x, e">, n = 1, 2, .... Then A"+ 0 by Bessel's inequality, but A.does not converge uniformly to 0. In fact 11 A. e"II = 1 for all n, hence IIA"ll = 1.

There is an important property of finite-dimensional spaces that we arenow in a position to discuss. In the previous section, we regarded C" as aHilbert space, so that if x e C", the norm of x was taken as the Euclideannorm Ilxll = (Ei=1 f x, z)l"z. The metric associated with this norm yieldsthe standard topology on C". However, we may put various other norms onC", for example, the LP norm Ilxll, = (E,=1 I xi I P)1"P, 1 < p < oo, or thesup norm llxll = max(I xl 1, ..., I x" 1). Since the space is finite-dimensional,there are no convergence difficulties and all elements have finite norm.Fortunately, the proliferation of norms causes no confusion because allnorms on a given finite-dimensional space induce the same topology. The proofof this result is outlined in Problems 6 and 7.

Problems

1. Let f be a linear functional on the normed linear space L. If f is notidentically 0, show that the following are equivalent:

(a) f is continuous.(b) The null space N =f -'{0} is closed.(c) N is not dense in L.(d) f is bounded on some neighborhood of 0.[To prove that (c) implies (d), show that if B(x, e) r N = 0, and f isunbounded on B(0, s), then f(B(O, e)) = C. In particular, there is apoint z e B(0, r) such that f(z) = -f(x), and this leads to a con-tradiction.]

2. Show that any infinite-dimensional normed linear space had a dis-continuous linear functional. (Let e1, e2, ... be an infinite sequence oflinearly independent elements such that 1 for all n. Define jrappropriately on the e and extend f to the whole space using linearity.)

3. In Example 3.3.3(a), show that IIAII = maxa5ssb fb I K(s, t)1 dt. [IfJ ,b I K(s, t) J dt assumes a maximum at the point u, and K(u, t) = r(t)e'etr0,let z(t) = r(t)e-t90). Let x1, x2, ... be continuous functions such thatlx,,(t) 1 < 1 for all n and t, and f o I xa(t) - z(t) I dt - 0 as n - co. SinceII Ax,, II < II A II and (Axa)(s) K(s, t)z(t) di as n - oo, it follows thatfQ K(u,')I dt:5 IIAII.]

The same argument, with integrals replaced by sums, shows that ifA is a matrix operator on Ca, with sup norm, IIAII = max1 s ts Y;=' I a;; I .

4. Let M be a linear manifold in the normed linear space L. Denote by[x] the coset of x modulo M, that is, {x + y: y e M}. Define

II [x] II = inf{lLvll : y e [x]);

note that II [x] II = inf{ Ilx - z11 : z e M} = dist(x, M). Show that theabove formula defines a seminorm on the quotient space L/M, a normif M is closed.

5. If L is a Banach space and M is a closed subspace, show that L/M isa Banach space.

6. (a) Let A be a linear operator from L to M. Show that A is one-to-onewith A-' continuous on its domain A(L), if there is a finite numberm > 0 such that IlAxll ? m Ilxll for all x e L.

(b) Let 111), and 11 JJ2 be norms on the linear space L. Show that thenorms induce the same topology if there are finite numbersm, M > 0 such that

mllxll1:5 11x112 <Ml141 for all xeL.

(This may be done using part (a), or it may be shown directly that, forexample, if 11x111 <_ (l/m)IIx112 for all x, then the topology induced by1111 , is weaker than (that is, included in) the topology induced by 11112.)

7. Let L be a finite-dimensional normed linear space, with basis e1, ... , e".Let 11 111 be any norm on L, and define

n 1/2 n

11x112 = ( lxil2} , where x = Y_ xiei.=1 i=1

(a) Show that for some positive real number k, 11x111 <_ k 11x112 for allxeL.

(b) Show that for some positive real number in, 11x111 ? in on{x: hIxll2 = 1}. [By (a), the map x -+ 11x!11 is continuous in thetopology induced by II 112 ]

(c) Show that 11x111 ? in lIx211 for all x e L, and conclude that all normson a finite-dimensional space induce the same topology.

(d) Let M be an arbitrary normed linear space, and let L be a finite-dimensional subspace of M. Show that L is closed in M.

8. (Riesz lemma) Let M he a closed proper subspace of the normedlinear space L. Show that if 0 < S < 1, there is an element x6 e L suchthat 11x611 = 1 and Ilx - x611 ? S for all x e M. [Choose x1 0 M, andlet d = dist(x1, M) > 0 since M is closed. Choose x0 e M such thatIIx1 - x011 <- d/S, and set x6 = (x1 - xo)/11x1 -

9. Let L be a normed linear space. Show that the following are equivalent:(a) L is finite-dimensional.(b) L is topologically isomorphic to a Euclidean space C" (or R" if L

is a real vector space), that is, there is a one-to-one, onto, linear,bicontinuous map between the two spaces.

(c) L is locally compact (every point of L has a neighborhood whoseclosure is compact).

(d) Every closed bounded subset of L is compact.(e) The set {x e L: llxll = 1} is compact.(f) The set {x e L: IIx11 = 1} is totally bounded, that is, can be covered

by a finite number of open balls of any preassigned radius.[Problem 7 shows that (a) implies (b); (b) implies (c), (d) implies (e),and (e) implies (f) are obvious, and (c) implies (d) is easy. To provethat (f) implies (a), use Problem 8.]

10. Let c be the space of convergent sequences of complex numbers [see3.1.2(b)]. If f is a continuous linear functional on c, show that there isa unique element y = (yo , Y1, . . .) a l1 such that for all x e c,

AX) = lI m )(0 k1 xk A

Furthermore, 11111 = I YO - Y_k = I Yk I +Y-k'=1 I Yk I If co is the closedsubspace of c consisting of those sequences converging to 0, the repre-sentation of a continuous linear functional on co is simpler: f(x) =Y-k I xk Yk 1 11111 = Y_k I j Yk . Thus co* is isometrically isomorphic to 1'.

11. Let (f', .F,, p) be a measure space.

(a) Assume it finite, I <p < oo, (1/p) + (1/q) = 1. If f is a continuouslinear functional on L° = L°(Q, .5F, p) show that there is an clementy e Lq such that

AX) = 5 xy tip for all x e B'.n

Furthermore, IIfII = IIYIIq If y1 is another such function, showthat y = YI a.e. [p]. [Define ).(A) = f (IA), A e , , and apply theRadon-Nikodym theorem.]

(b) Drop the finiteness assumption on It. If A e .F and p(A) < ce,part (a) applied to (A, ,F n A, p) provides an essentially uniqueYA e L4 such that y,, = 0 on A` and

f (xIA) =J

xyA dp for all x e L°;a

also, IIYAIIq is the norm of the restriction off to L°(A), so

IIYAIIq < Ilfll

(i) If p(A) < oo and p(B) < oo, show that YA = yB a.e. [p] onA n B. Thus yA B may be obtained by piecing togetherYA and yB.

(ii) Let n = 1, 2, ... , be sets with k = sup{IIYAIIqA e .F, p(A) < oo} < II! 11. Show that YA converges in Lq toa limit function y, and y is essentially independent of theparticular sequence Furthermore, IIYIIq 5 II! I1

(iii) Show that f(x) = fn xy dp for all x e L°. Since IIYIIq 5 I!.f 11by (ii) and IIf1 S IIYIIq by Holder's inequality, we haveIf 11 = IIYIIq Thus the result of (a) holds for arbitrary it.

(c) Prove (a) with it finite, p = 1, q = oo.(d) Prove (a) with p o-finite, p = 1, q = co. [For an extension of this

result, see Kelley and Namioka (1963, Problem 14M).1It follows that there is an isometric isomorphism of (LP)* and Lq ifI <p < oo, 1 < q < oo, (I /p) + (I /q) = I ; if it is o-finite, this is truealso forp=1,q=co.

3.4 Basic Theorems of Functional Analysis

Almost every area of functional analysis leans heavily on at least one ofthe three basic results of this section : the Hahn-Banach theorem, the uniformboundedness principle, and the open mapping theorem. We are going toestablish these results and discuss applications.

We first consider an extension problem. If f is a linear functional definedon a subspace M of a vector space L, there is no difficulty in extending f to alinear functional on all of L; simply extend a Hamel basis of M to a Hamelbasis for L, define f arbitrarily on the basis vectors not belonging to M,and extend by linearity. However, if L is normed and we require that theextension of f have the same norm as the original functional, the problembecomes more difficult. We first prove a preliminary result.

3.4.1 Lemma. Let L be a real vector space, not necessarily normed, andlet p be a map from L to R satisfying

p(x+y)_<p(x)+p(y) for all x,yeLp(ax) = ap(x) for all x e L and all a > 0.

[The first property is called subadditivity, the second positive-homogeneity.Note that positive-homogeneity implies that p(O) = 0 [set x = 0 to obtainp(O) = ap(0) for all a > 0]. A subadditive, positive-homogeneous map issometimes called a sublinear functional.]

Let M be a subspace of L and g a linear functional defined on M such thatg(x) < p(x) for all x e M. Let x0 be a fixed element of L. For any real numberc, the following are equivalent:

(1) g(x) + Ac < p(x + Axo) for all x e M and all .1 a R.(2) -p(-x - xo) - g(x) < c< p(x + xo) - g(x) for all x e M.

Furthermore, there is a real number c satisfying (2), and hence (1).

PROOF. To prove that (1) implies (2), first set ) = 1, and then set A = -1and replace x by -x [note g(-x) = -g(x) by linearity]. Conversely, if (2)holds and 2 > 0, replace x by x/.l in the right-hand inequality of (2); if A < 0,replace x by x/i in the left-hand inequality. In either case, the positive homo-geneity of p yields (1). If ). = 0, (1) is true by hypothesis.

To produce the desired c, let x and y be arbitrary elements of M. Then

g(x) - g(y) = g(x - y) <_ p(x - y)< p(x + x0) + p(-y - x0) by subadditivity.

3.4 BASIC THEOREMS OF FUNCTIONAL ANALYSIS 139

It follows that

sup[ -P(-y - x0) - g(y)] < inf [P(x + x0) - g(x)].yEM xEM

Any c between the sup and the inf will work. I

We may now prove the main extension theorem.

3.4.2 Hahn-Banach Theorem. Let p be a subadditive, positive-homogeneousfunctional on the real linear space L, and g a linear functional on the sub-space M, with g < p on M. There is a linear functional f on L such that f = gon M and f < p on all of L.

PROOF. If x0 M, consider the subspace M1 = L(M u {xo}), consisting ofall elements x + Axo, x E M, A e R. We may extend g to a linear functionalon M1 by defining gl(x + Axo) = g(x) + Ac, where c is any real number. Ifwe choose c to satisfy (1) of 3.4.1, then gl :!5-: p on M1.

Now let ' be the collection of all pairs (h, H) where h is an extension ofg to the subspace H z) M, and h <p on H. Partially order l by (h1, H1) <(h2, H2) if H1 c H2 and h1 = h2 on H1; then every chain in W has an upperbound (consider the union of all subspaces in the chain). By Zorn's lemma, '

has a maximal element (f, F). If F # L, the first part of the proof yields anextension off to a larger subspace, contradicting maximality. I

There is a version of the Hahn-Banach theorem for complex spaces.First, we observe that if L is a vector space over C, L is automatically a vectorspace over R, since we may restrict scalar multiplication to real scalars. Forexample, C" is an n-dimensional space over C, with basis vectors (1, 0,..., 0),..., (0, ... , 0, 1). If C" is regarded as a vector space over R, it becomes2n-dimensional, with basis vectors

(1,0,...,0),...,(0,...0, 1), (i,0,...,0),...,(0,...,0,i).Now iff is a linear functional on L, with ft = Ref, f2 = Imf, then ft

and f2 are linear functionals on L', where L' is L regarded as a vector spaceover R. Also, for all x e L,

f(ix) =fi(ix) + if2(ix).

But

f(ix) = if(x) _ -f2(x) + ifl(x)

Thus f1(ix) _ -f2(x), JZ(ix) =fl(x); consequently

f(X) =f1(X) - 1f1(IX) =f2(IX) + i 2(X)

Therefore f is determined by f1 (or by f2). Conversely, let f1 be a linearfunctional on L'. Then f1 is a map from L to R such that f1(ax + by) =af1(x) + bf1(y) for all x, y e L and all a, b e R. Define f(x) = f1(x) - if1(ix),x e L. It follows that f is a linear functional on L (and f1 = Ref). For f1 isadditive, hence so is f, and if a, b e R, we have

f ((a + ib)x) = f1 (ax + ibx) - if, (- bx + iax)

= af1(x) + bf1(ix) + ibf, (x) - iaf1(ix)

= (a -i ib)[f1(x) - if1(ix)]

= (a + ib) f (x).

Note that homogeneity of f1 does not immediately imply homogeneity of f.For f1(..x) = Af1(x) for real .1 but not in general for complex A. [For example,let L = C, f (x) = x, f1(x) = Re x.]

We now prove the complex version of the Hahn-Banach theorem.

3.4.3 Theorem. Let L be a vector space over C, and p a seminorm on L.If g is a linear functional on the subspace M, and IgI < p on M, there is alinear functional f on L such that f = g on M and If I < p on L.

PROOF. Since p is a seminorm, it is subadditive and absolutely homogeneous,and hence positive-homogeneous. If g1 = Reg, then g1 < I g I <p; so by3.4.2, there is an extension of g1 to a linear map f1 of L to R such that f1 = g1on M and f1 0. Then

If(x)j = r =f(e-'Bx)= f1(e-iex) since r is real< p(e-i°x) since f1 < p on L= p(x) by absolute homogeneity.

3.4.4 Corollary. Let g be a continuous linear functional on the subspace Mof the normed linear space L. There is an extension of g to a continuous linearfunctional f on L such that Ilfll = Ilgll.

PROOF. Let p(x) = Ilgll Ilxhl; then p is a seminorm on L and IgI <p on Mby definition of Ilgll. The result follows from 3.4.3.

A direct application of the Hahn-Banach theorem is the result that in anormed linear space, there are enough continuous linear functionals to dis-tinguish points; in other words, if x y, there is a continuous linear functionalf such that f(x) 96f(y). We now prove this, along with other related results.

3.4.5 Theorem. Let M be a subspace of the normed linear space L, and letL* be the collection of all continuous linear functionals on L.

(a) If x0 0 M, there is an f e L* such that f = 0 on M, f (xo) = 1, and 11111= 1/d, where d is the distance from x0 to M.

(b) xo e M iff every f e L* that vanishes on M also vanishes at xo .(c) If x0 0, there is an f e L* such that Of 11 = I and f(xo) = Ilxoll ;

thus the maximum value of I f (x) I 111x11, x 0, is achieved at x0. In particular,if x 0 y, there is an f e L* such that f (x) sE fly).

PROOF. (a) First note that L(M u {xo}) is the set of all elements y = x + axo,x e M, a e C, and since xo 0 M, a is uniquely determined by y. Define f onN = L(M V {x0}) by f(x + axo) = a; f is linear, and furthermore, 11111 = 1/d,as we now prove. By 3.3.1 we have

11111 =sup{I1(Y)l .YEN, Y*O)IIYII 111

sup{II l a l ll.xEM, aeC, x&0 or a 0x+axo

=sup( lal xeM, ac-C, a960}IIx + axoll

111

since f(y) = 0 when a = 0. Now

I a l1 1 for some z e M;

xIl Ilxo - zllIlx + axoll[lx' + -

u

hence 11111 = (inf{Ilxo - zll : z e M})-' = 1/d < oo. The result now followsfrom 3.4.4.

(b) This is immediate from (a).(c) Apply (a) with M = {0}, to obtain g e L* with g(xo) = I and 11911 =

1/Ilxoll; set f = Ilxollg 1

The Hahn-Banach theorem is basic in the study of the concept of re-flexivity, which we now discuss. Let L be a normed linear space, and L* the

set of continuous linear functionals on L; L* is sometimes called the conjugatespace of L. By 3.3.5, L* is a Banach space, so that we may talk about L**,the conjugate space of L*, or the second conjugate space of L. We may identifyL with a subspace of L** as follows: If x e L, we define x** e L** by

x**(f) =f(x), f e L*.

If 11f, - f II --+ 0, then f .(x) -+ f (x); hence x** is in fact a continuous linearfunctional on L*. Let us examine the map x - x** of L into L**.

3.4.6 Theorem. Define h: L -+ L** by h(x) = x**. Then h is an isometric iso-morphism of L and the subspace h(L) of L**; therefore, if x e L, we have, by3.3.1, IIxII = Ilx**II = sup{If(x)I : fe L*, 11f 11 s 1}.

PROOF: To show that h is linear, we write

(h(ax + by)](f) =f(ax + by) = af(x) + bf(y) [ah(x)](f) + [bh(y)](f).

We now prove that h is norm-preserving (Ilh(x)II = Ilxll for all x e L); conse-quently, h is one-to-one. If x e L, I [h(x)](f) I = I f(x) I < JJxJJ 11f 11, and

hence Jlh(x)JJ < JJxJJ. On the other hand, by 3.4.5(c), there is an f e L* suchthat II f ll =1 and I f(x) I = Ilxil. Thus

sup{I [h(x)](f) I : f e L*, 11f 11 = I} z IIxII

so that Jlh(x)IJ >- IIxII, and consequently 11h(x)II = IIxII. 1

If h(L) = L**, L is said to be reflexive. Note that L** is complete by 3.3.5so by 3.4.6, a reflexive normed linear space is necessarily complete. We shallnow consider some examples.

3.4.7 Examples. (a) Every Hilbert space is reflexive. For if >G is the con-jugate isometry of 3.3.4(a), H* becomes a Hilbert space if we take <f, g> =</i(g), ir(f)). Thus if q e H**, we have, for some g e H*, q(f) _ <f, g> _<O(g), 0(f)> = f (x), where x = di(g). Therefore q = h(x).

(b) If 1 < p < oo, 1° is reflexive. For by 3.3.4(b), (IP)* is isometricallyisomorphic to 19, where (1/p) + (1/q) = 1. Thus if t e (1P)** we have t(y) __k , yk Zk , y e 14, where z is an element of (14)* =1p. But then t = h(z).

Essentially the same argument, with the aid of Problem 11 of Section 3.3,shows that if (0, F, p) is an arbitrary measure space L°(i2, .F, p) is reflexivefort<p<oo.

(c) The space 11 is not reflexive. This depends on the following result:

3.4.8 Theorem. If L is a normed linear space and L* is separable, so is L.Thus if L is reflexive and separable (so that L** is separable), then so is L*.

PROOF. Let fl, f2, ... form a countable dense subset of (f e L*: 11f II = 1 }(note that any subset of a separable metric space is separable). Since Ilf"II = 1,we can find points x" e L with IIx"II = land I f"(x")I >_ I for all n. Let M bethe space spanned by the x" ; we claim that M = L. If not, 3.4.5(a) yieldsan f e L* with f = 0 on M and 11f ll = 1. But then I < f"(x") I = I f"(xn) -.f(x") I :5 Il.f" -.f II for all n, contradicting the assumption that { f1, fZ , ...} isdense. I

To return to 3.4.7(c), we note that l' is separable since {x ell : xk = 0 forall but finitely many k} is dense. [If x e !1 and P) _ (x1, ... , x", 0, 0, ...)then x(") -+ x in I1.] But (11)* = I°° by 3.3.4(c), and this space is not separable.For if S = {x e I': xk = 0 or I for all k}, then Sis uncountable and lix - yll = I

for all x, y e S, x y. Thus the sets Bx = {y c- 1': Ily - xII < i}, x e S, forman uncountable family of disjoint open sets. If there were a countable denseset D, there would be at least one point of D in each Bx, a contradiction. Thus11 is not reflexive.

We now consider the second basic result of this section. Suppose that theA,, i belonging to the arbitrary index set I, are bounded linear operators fromL to M, where L and M are normed linear spaces. The uniform boundednessprinciple asserts that if L is complete and the A; are pointwise bounded, that is,sup{IIA, x11 : i e I} < oo for each x e L, then the Ai are uniformly bounded,that is, sup{IIA,II : I e I} < oo. Completeness of L is essential; to see this, letL be the set of all sequences x = (x1, x2 , ...) of complex numbers such thatxk = 0 for all but finitely many k, with the IP norm, I <p < oo. Take M = C,and A"x = nx", n = 1, 2, .... For any x, A"x = 0 for sufficiently large n, sothe A. are pointwise bounded, although IIA"II = n -+ oo

The proof that we shall give uses the Baire category theorem, one versionof which states if a complete metric space is a countable union of closed sets,one of the sets must have a nonempty interior. (See the appendix on generaltopology, Theorem A9.2.)

3.4.9 Principle of Uniform Boundedness. Let A,, i e I, be bounded linearoperators from the Banach space L to the normed linear space M. If the A, arepointwise bounded, they are uniformly bounded.

PROOF. Let C" _ {x e L: sups IIA,xII <- n}, n = 1, 2..... Since the A, arepointwise bounded, UM 1 C" = L, and since the A, are continuous, each C. is

closed. By the Baire category theorem, for some n there is a closed ballB = {x E L: IIx - x011 <_ r} c C.. Now if Ilyll <_ 1 and i E I, we have

IIA1yll = 1 IIA1zli where z = ryr

I 1

- IIA1(x0 + z)II + - IIA1x0Ilr r

2n since x0 + z and x0 belong tor

Thus II A, 11 <_ 2n/r. I

The uniform boundedness principle is used in an important way in thestudy of weak convergence, a concept that we now describe.

3.4.10 Definitions and Comments. The sequence (or net) in the normedlinear space L is said to converge weakly to x e L iff f (x.) --).f(x) for everyf c- L* (notation: x). Convergence in the metric of L (IIx - xll -> 0) willbe called strong convergence and will be written simply as x,, -+ x. [In thisterminology, strong convergence of the sequence of linear operators A,, tothe linear operator A means strong convergence of A,,x to Ax for each x(see 3.3.6).]

For each x0 E L, consider the collection of sets

U(xo) = {x e L : I f,(x) - f,(xo) I < e1, n},

whereWe form a topology by taking the U(xo) as a base for the neighborhood

system at x0. (There is no difficulty checking that the required axioms for aneighborhood system are satisfied.) The resulting (Hausdorff) topology iscalled the weak topology on L; it follows from the definition that convergencerelative to the weak topology coincides with weak convergence. (The topologyinduced by the norm of L will be called the strong topology.)

It follows from the definitions that strong convergence implies weakconvergence, hence every set that is open in the weak topology is open in thestrong topology (and similarly for closed sets). To see that the converse doesnot hold, let {e1, e2 , ...} be an infinite orthonormal sequence in a Hilbertspace H. If x e H, then x> -+ 0 as n -> oo by Bessel's inequality, hence

But 11e,,11 = 1, so e,, does not converge strongly to 0.In a Euclidean space, strong and weak convergence coincide. For if x _

a,,,e, + + akfl ek converges weakly to x = ate, + + ak ek (where the e1

are basis vectors for Ck), let f be a continuous linear functional that is 1 ate;and Oat thatThus x - x strongly.

The weak topology is an important example of a product topology (seethe appendix on general topology, Section A3). For L may be identified (see3.4.6) with a subset h(L) of the set C` of all complex-valued functions onL*, and the weak topology is the product topology, also called the "topologyof pointwise convergence," on CL*, relativized to h(L).

We give a few properties of weak convergence.

3.4.11 Theorem. (a) A weakly convergent sequence is bounded, that is,sup. Ilxnll < oo. In fact if x."'+ x0, then Ilxoll <_ lim infs. IIx.N

(b) If A is a bounded linear operator from L to M and the net {x,) con-verges weakly to x0 in L, then Ax, converges weakly to Axo in M.

(c) If the net {x,} converges weakly to x0, then x0 belongs to the subspacespanned by the x; .

(d) If M is a linear manifold of L, the closures of M relative to the weakand strong topologies coincide. Consequently if M is closed in the strongtopology, it is closed in the weak topology.

PROOF. (a) The xn may be regarded as continuous linear functionals on L*with x (f) =f(x) (see 3.4.6). The x are pointwise bounded on L* sincef(xn) -f(xo); hence by the uniform boundedness principle, sup. Ilxnll < oo.Also, if f s L*,

If(xo)I = lim l lim inf I f(xn)I <_ 11f 11 lim inf IIx,,IIn- 00 n_ OD n-oo

Since 1x011 = sup{lf(xo)l :.1E L*, II! II < l}, we may conclude that 11x011 -<lim inf. , Ilxnll

(b) If g e M*, define f= g o A; since A is continuous, we have f e L*,so that f (x,) -+ f(xo), that is, g(Ax,) -+ g(Axo). But g is arbitrary ; henceAx, "'+ Axo.

(c) If this is false, 3.4.5(a) yields an f e L* with f (xo) = I and f (x,) = 0,contradicting x,-!4 x0.

(d) The strong closure is always a subset of the weak closure, so assumex belongs to the weak closure of M. Let {x,) be a net in M with x; "'-* x. By (c),x is a strong limit of a net of finite linear combinations of the x,. But thesefinite linear combinations belong to M since M is a subspace; hence x belongsto the strong closure. I

In order to characterize weak convergence in specific spaces, the followingresult is useful.

3.4.12 Theorem. Let E be a subset of L* such that S(E) = L*, and letbe a sequence in L. If x0aL,then x0iffsup, lIxnIIfor allfe E.

PROOF. The "only if" part follows from 3.4.11(a) and the definition of weakconvergence. For the "if" part, let f e L*, and choose elements fk a L(E) withIf -fk1I - 0. Then

If(xn) -f(x0)I I f(xn) -fk(xn)I + I fk(xn) -fk(xo)I + I fk(x0) -f(x0)I

-< Ilf-All IIX-II + Ifk(xn)-fk(X0)I + Ilfk-.fII 11X011-

Since the x are bounded in norm, given s > 0, we may choose k such that theright-hand side is at most I fk(XO) I + s; but fk(xo) as n -. o0since fk e L(E), and since a is arbitrary, the result follows. I

We now describe weak convergence in IP and L'.

3.4.13 Theorem. Assume I <p < oo.(a) Let X. = (xi1, X.2, ...) a IP, n = 1, 2, .... If z e l', then

xII < oo and Xnk -24 Zk as n -+ oo for each k.(b) Let x,, e LP(fl, Pte, p), n = 1, 2, ..., where µ is assumed finite. If

z e LP(11, F, Ic), then x,, -- z if sup,, oo and JA x du -* JA z dp for eachA e F. (It will often be convenient to blur the distinction between L' and L',and treat the elements of L' as functions rather than equivalence classes.)

PROOF. (a) Define fk e (lP)* by fk(x) = xk ; then fk corresponds to the se-quence in l4 with a one in position k and zeros elsewhere [see 3.3.4(b)]. TakeE = {fl, f2, ...} and apply 3.4.12.

(b) For each A e F, definef4 e (LP)* by f,(x) = fA x dp; fA correspondsto the indicator function 1A a L9 (see Problem 11, Section 3.3). Take E as theset of all fA, A e F; E spans (LP)* because the simple functions are densein L", and the result follows from 3.4.12. 1

We now consider the third basic result, the open mapping theorem. Thiswill allow us to conclude that under certain conditions, the inverse of a one-to-one continuous linear operator is continuous. We cannot make this assertionin general, as the following example shows: Let L be the set of all continuouscomplex-valued functions x on [0, 1] such that x(0) = 0, M = {x e L: x has acontinuous derivative on [0, fl); put the sup norm on L and M. If A is definedby (Ax)(t) = Jo x(s) ds, 0 < t < 1, then A is a one-to-one, bounded, linear

operator from L onto M. But A-' is discontinuous; for example, if x(t) =sin nt, then (1 - cos nt)/n, so that y -+ 0 in M, but x hasno limit in L. A hypothesis under which continuity of the inverse holds is thecompleteness of both spaces L and M. In the above example, M is notcomplete; for example, a sequence of polynomials may converge uniformlyto a continuous function without a continuous derivative (in fact to a con-tinuous nowhere differentiable function).

We now state the third basic theorem.

3.4.14 Open Mapping Theorem. Let A be a bounded linear operator fromthe Banach space L onto the Banach space M. Then A is an open map, thatis, if D is an open subset of L, then A(D) is an open subset of M. Conse-quently if A is also one-to-one, then A-' is bounded.

PROOF. We defer this until the next section (see 3.5.11) since it will not be muchmore difficult to prove a more general statement about mappings of topo-logical vector spaces.

The open mapping theorem allows us to prove the closed graph theorem,which is often useful in proving that a particular linear operator is bounded.

First we observe that if L and M are normed linear spaces, we maydefine a norm on the product space L x M by II(x, y)II = (11x11' + IIYIIP)"P,x e= L, y e M, where p is any fixed real number in [1, oo). For if x, x' e L,Y, Y e M,

(Ilx + x'I1P + IIY +Y'NP)!'P < [(Ilxll + Ilx'll)P + (IIYII + IIY'll)P]11P

<- (11x11' + IIYIIP)1'P + 0141P + IIY'11P)1 '

by Minkowski's inequality applied to R2.

Therefore the triangle inequality is satisfied and we have defined a norm onL x M (the other requirements for a norm are immediate). Furthermore,(x., (x, y) if x,, -+ x and y -+ y; thus regardless of the value of p, thenorm on L x M induces the product topology; also, if L and M are complete,so is L x M. [The same result is obtained using the analog of the L°° norm,that is, 11(x, y)Ii = max(IIxII,

3.4.15 Definition. Let A be a linear operator from L to M, where L and Mare normed linear spaces. We say that A is closed if the graph G(A) ={(x, Ax): x e L} is a closed subset of L x M. Equivalently, A is closed if thefollowing condition holds:

If x e L, x -+ x, and Ax,, -+ y, then (x, y) c G(A); in other words, y = Ax.This formulation shows that every bounded linear operator is closed. Theconverse holds if L and M are Banach spaces.

3.4.16 Closed Graph Theorem. If A is a closed linear operator from theBanach space L to the Banach space M, then A is bounded.

PROOF. Since G(A) is a closed subspace of L x M, it is a Banach space.Define P: G(A) -+ L by P(x, Ax) = x. Then P is linear and maps onto L, andIIP(x, Ax) II = II x II 11(x, Ax) II ; hence Il Pll <_ 1 so that P is bounded. [Alterna-tively, if (x,, (x, y), then x -+ x, proving continuity of P.] Similarly,the linear operator Q : G(A) -+ M given by Q(x, Ax) = Axis bounded. If P(x, Ax)= 0, then x = Ax = 0, so P is one-to-one. By 3.4.14, P-1 is bounded, andsince A = Q o P-1, A is bounded.

As an application of the closed graph theorem, we show that if P is theprojection of a Banach space L on a closed subspace M, then P is continuous[see 3.3.3(c)]. Let {x} be a sequence of points in L with x -+ x, and assumePx,, converges to the element y e M. Recall that in defining a projection opera-tor it is assumed that L is the direct sum of closed subspaces M and N; thus

x,, y z xx y y that y = Px, proving P closed. By 3.4.16,

P is continuous.

Problems

1. Show that a subadditive, absolutely homogeneous functional on avector space must be nonnegative, and hence a seminorm. Give an ex-ample of a subadditive, positive-homogeneous functional that fails tobe nonnegative.

2. Let (S), F, µ) be a measure space, and assume .F is countably generated,that is, there is a countable set W (-- F such that aff) = F. (Note thatthe minimal field Fo over f is also countable; see Problem 9, Section1.2.) If y is a-finite on Pro and 1 0, show that L'(0, .9, p) is not reflexive.

3. If L and M are normed linear spaces and [L, M] is complete, show thatM must be complete.

4. Let A e [L, M], where L and M are normed linear spaces. The adjointof A is an operator A* : M* L*, defined as follows: If f e M* we take(A*f)(x) = f(Ax), x e L. Establish the following results:

(a) IIA*II = IIAII(b) (aA + bB)* = aA* + bB* for all a, b e C, A, B e [L, M].(c) If A e [L, M], B E [M, N], then (BA)* = A*B*, where BA is the

composition of A and B.(d) If A e [L, M], A maps onto M, and A-' exists and belongs to

[M, L], then (A-')* = (A*)-'.5. Define the annihilator of the subset K of the normed linear space L as

Kl = f f e L*: f (x) = 0 for all x E K}. Similarly, if J c L*, define J' ={x e L: f (x) = 0 for all f e J}. If A e [L, M], we denote by N(A) the nullspace {x e L: Ax = 0}, and by R(A) the closure of the range of A, thatis, the closure of {Ax: x e L}. Establish the following:

(a) For any K c L, K'1 = S(K), the space spanned by K.(b) R(A)' = N(A*) and R(A) = N(A*)l.(c) R(A) = M if A* is one-to-one.(d) R(A*)1 = N(A).(e) For any J c L*, S(J) e J- - ; S(J) = J11 if L is reflexive.(f) R(A*) c N(A)'; R(A*) = N(A)' if L is reflexive.(g) If R(A*) = L*, then A is one-to-one; the converse holds if L is

reflexive.6. Consider the Hahn-Banach theorem 3.4.2, with the additional assump-

tion that L is a normed linear space (or more generally, a topologicalvector space) and p is continuous at 0; hence continuous on all of Lsince I p(x) - p(y) I < p(x - y). Show that if L is separable, the theoremmay be proved without Zorn's lemma. It follows that 3.4.3 and 3.4.4 donot require Zorn's lemma under the above hypothesis.

7. If po is a finitely additive, nonnegative real-valued set function on afield .moo of subsets of a set Q, use the Hahn-Banach theorem to showthat po has an extension to a finitely additive, nonnegative real-valuedset function on the class of all subsets of fl. Thus in one respect, at least,finite additivity is superior to countable additivity.

8. If the sequence (or net) of bounded linear operators A,, on a Banachspace converges strongly to the (necessarily linear) operator A, show thatA is bounded; in fact

IIAII <_ lim inf IIAnll < sup IIAII < oon- rn

9. Let {Ai, i e I} be a family of continuous linear operators from theBanach space L to the normed linear space M. Assume the A, areweakly bounded, that is, supi I f (Ai x) I < co for all x e L and all f e M*.Show that the Ai are uniformly bounded, that is, sup; IIAiII < co.

10. (a) I f the elements x, , i e 1, belong to the normed linear space L, andsup, Jf (x;) I < oo for each f e L*, show that sup, Ilxill < oo.

(b) If A is a linear operator from the normed linear space L to thenormed linear space M, and foA is continuous for each fe M*,show that A is continuous.

11. Let L, M, and N be normed linear spaces, with L or M complete, andlet B: L x M -+ N be a bilinear form, that is, B(x, y) is linear in x foreach fixed y, and linear in y for each fixed x. If for each f e N*, f (B(x, y))is continuous in x for each fixed y, and continuous in y for each fixed x,show that B is bounded, that is,

sup{ B(x, y) I : Ilxll, IIYII < 1} < oo.

Equivalently, for some positive constant k we have I B(x, y) < klixll IIYIIfor all x, y.

12. Give an example of a closed unbounded operator from one normedlinear space to another.

13. Let A be a bounded linear operator from the Banach space L onto theBanach space M. Stow that there is a positive number k such that foreach y e M there is an x e L with y = Ax and Ilxll <_ kllyll This result issometimes called the solvability theorem.

3.5 Some Properties of Topological Vector Spaces

Recall that a topological vector space is a vector space L with a topologythat makes addition and scalar multiplication continuous; that is, the maps(x, y) -+ x + y of L x L -> L and (a, x) ax of C x L -> L are continuous,with the product topology on L x L and C x L. In this section we look atsome of the important properties of such spaces.

One cannot put an arbitrary topology on a vector space L and obtain atopological vector space. For example, consider the discrete topology (allsubsets of L are open); with this topology, L is not a topological vector space(exclude the trivial case L = {0}). To see this, pick a nonzero x e L, andlet a = 1/n, n = 1, 2, .... If scalar multiplication were continuous, we wouldhave a x -> 0 x = 0; since {0} is open, this implies that a x = 0 for suffi-ciently large n, a contradiction.

3.5 SOME PROPERTIES OF TOPOLOGICAL VECTOR SPACES 151

The following result is helpful in checking whether a given topology makesL into a topological vector space. {Throughout this section, we use the follow-ing notation: If A and B are subsets of L,

A+B={x+y:xEA, yEB},A-B={x-y:xeA, yEB).

IfxeL,x+B={x +y:yeB}x-B={x-y:yeB}.]

3.5.1 Theorem. Let L be a topological vector space, and let all be a baseof overneighborhoods at 0 (the sets in * are overneighborhoods of 0, andfor each neighborhood V of 0 there is a set U e all with U e V). Then

(a) if U, V e all, there is a set W e W such that W c U n V;(b) if U e* then for each x e L there is a S > 0 such that ax c- U for

all complex numbers a with I al < S;(c) for each U e all, there is a set V e ?l with V + V e U.

Conversely, if all is a collection of subsets of L satisfying (a)-(c), and inaddition,

(d) each U e all is circled, in other words, if x c- U, then ax r= U wheneveraI < 1,

there is a unique topology that makes L a topological vector space with all abase of overneighborhoods at 0.

PROOF. (a) This follows since the intersection of two neighborhoods of 0is a neighborhood of 0.

(b) By continuity of scalar multiplication, ax - 0 as a -> 0, with x fixed;hence ax e U for sufficiently small I al.

(c) By continuity of the map (x, y) - x + y at x = y = 0, there are setsV V2 e all such that if x e V, and y e V2, then x + y e U; in other words,V, + V2 e U. By (a), there is a set V E all with V c VI n V2 ; then V + V cV1 + V2 C-_ U.

Conversely, let all satisfy (a)-(d). Take all as an overneighborhood baseat 0, and, for each x e L, x + all = {x + U: U e all} as an overneighborhhoodbase at x. Thus '(x) = { V c L: V c x + U for some U e all} is the pro-posed class of overneighborhoods of x. Now:

(i) If V e 1'(x), then x e V. This follows because 0 belongs to eachU e all, by (b).

e Vi,(ii) If V1, V2 E *(x), then V1 n V2 e *^(x). For if x + U.i = 1, 2, by (a) there is a set W e all with W c U1 n U2. Thenx + W c (x + U,) n (x + U2) c V1 n V2.

(iii) If V E *(x) and V c W, then W e 'V(.x). This is immediate from thedefinition of V(x).

(iv) If V e 'fi(x), there is a set W e *'(x) such that W c V andV e 'V(y) for each y e W. For if x + U c V, by (c) there is a setTeall with T+TcU. If W=x+T andy t +Tcx+T+Tcx+ Uc V, hence Ve'r(y).

Thus the axioms of a system of overneighborhoods are satisfied, so thereis a topology on L with all an overneighborhood base at 0. By definition ofthe *(x), we have x,, - x if x - x -* 0; hence if the addition operation(x, y) -* x + y is continuous at (0, 0), it is continuous everywhere. Butcontinuity of addition at (0, 0) is immediate from (c).

Finally, we look at continuity of the multiplication operation (a, x) ---I' ax.By (b), the map is continuous at a = 0 for each fixed x, hence, as above,continuous at any a for each fixed x. If U e all and a e C, choose a positiveinteger n Z I a I and apply (c) inductively to obtain V e au such that V + .. +V (n times) is a subset of U. (Note, for example, that V + V + V c V + V+V+ V since 0 e V.) If n V = {ny: y e V}, then n V e V+ + V c U; hence

by (d), aV c U, so that multiplication is continuous at x = 0 for each fixeda, and thus continuous at any x for each fixed a. Now if U e all, then by (d), ifa 1 < 1 and x e U, then ax a U, so that multiplication is jointly continuous

at (0, 0). Thus if x --* x and a -* a, then (a - a)(x - x) 0, and by the aboveanalysis, a x -* ax, ax,, - ax. It follows that a x - ax.

Finally, to prove uniqueness of the topology, observe that if a& is a baseof overneighborhoods at 0, continuity of addition and subtraction makes themap y -+ x + y, y e L, a homeomorphism, and therefore *'(x) is a system ofoverneighborhoods of x.

We have seen that a seminorm on a vector space L induces a topologyunder which L becomes a topological vector space. We now show that asimilar result holds if instead of a single seminorm we use an entire family.

3.5.2 Theorem. Let {pi, i e I} be a family of seminorms on the vector spaceL. Let all be the class of all finite intersections of sets {x e L: pi(x) < bi}where i e 1, bi > 0. Then all is a neighborhood base for a topology thatmakes L a topological vector space. This topology is the weakest, making allthe p, continuous, and for a net in L, x -+ x if pi(x - x) - 0 for eachieI.

PROOF. We show that the conditions of 3.5.1 are satisfied. Condition (a)follows because an intersection of two finite intersections of sets {x: pi(x) < S,}is a finite intersection of such sets. If x e L, then p,(ax) = I a I pi(x), which isless than S, if I a I is sufficiently small; this proves (b). To prove (c), supposethat U = n,= I{x: pi(x) < S;}, and let V = ni= I{x: pi(x) < S} where 0 < S< mini S; . If y, z e V, then p,(y + z) 5 pi(y) + p,(z) < S,, hence y + z e U.Finally, each U e all is circled since pi(ax) = la Ip,(x), proving (d). If x e U =n,.,{z: p,(z) < S,} and y e V =n" I{z: pi(z) < minj[S; - pi(x)ii, thenp,(x + y) <- pi(x) + pi(y) < p,(x) + S, - pi(x) = S; ; hence x + V c U. ThusU is an overneighborhood of each of its points, so the sets in a& are open andall is a neighborhood base, not merely an overneighborhood base.

Now continuity of p, at 0 is equivalent to continuity everywhere[ lp,(x) - pi(y) 1 5 pi(x - y)] and the sets U e* are open in any topology forwhich the pi are continuous at 0. This proves that the given topology is theweakest, making the pi continuous. Finally, x - x if x - x - 0 (see theproof of 3.5.1), and x - x -+ 0 if for each i, p,(x - x) 0, by definition ofall.

The topology induced by a family of seminorms is locally convex; thatis, for each x e L, the family of convex overneighborhoods forms a baseof overneighborhoods at x. This follows because the sets {x: pi(x) < S,} areconvex. We shall prove later (see 3.5.7) that every locally convex topologyis generated by a family of seminorms.

If for each x 96 0 there is an i e I such that pi(x) # 0, the topology isHausdorff. For if x # y and p,(x - y) = S > 0, then {z: p,(z - x) < 6/2)and {z: p,(z - y) < 6/2) are disjoint neighborhoods of x and y.

3.5.3 Examples. (a) Let L be a vector space of complex-valued functionson a topological space Q. For each compact subset K of f), define PK(x) =sup{ I x(t) I : t e K). In the (Hausdorff) topology induced by the seminormspK, convergence means uniform convergence on all compact subsets of .0.If we restrict the K to finite subsets of f), we obtain the topology of pointwiseconvergence. In general, if the K are restricted to a class W of subsets of fl,we obtain the topology of uniform convergence on sets in W.

(b) Let L = C°° [a, b], the collection of all infinitely differentiable com-plex-valued functions on the closed bounded interval [a, b] a R. For each n,let p (x) = sup{ I x("'(t) I : a < t < b} where P) is the nth derivative of x. Inthe topology induced by the p,,, convergence means uniform convergence ofall derivatives.

We now examine convex sets in more detail.

3.5.4 Definitions. Let K be a subset of the vector space L. Then K is said tobe radial at x if K contains a line segment through x in each direction, inother words, if y e L, there is a b > 0 such that x + Ay e K for 0:5 A < 6.(If K is radial at x, x is sometimes called an internal point of K.) If K is con-vex and radial at 0, the Minkowski functional of K is defined as

p(x) = inf (r > O: x e K).111

Intuitively, p(x) is the factor by which x has to be shrunk in order to reachthe boundary of K. The Minkowski functional has the following properties.

3.5.5 Lemma. Let K be a convex subset of L, radial at 0, and let p be theMinkowski functional of K.

(a) The functional p is sublinear, that is, subadditive and positive-homogeneous.

(b) {x e L: p(x) < l} is the radial kernel of K, defined by rad ker K ={x e K: K is radial at x}; also, K c {x e L: p(x) < 1}.

(c) If K is circled, then p is a seminorm.(d) If L is a topological vector space and 0 belongs to the interior K°

of K, then p is continuous, K = {x e L: p(x) < I}, and K° = {x e L: p(x) < 1);hence {x e L: p(x) = 1) is the boundary of K.

PROOF. (a) If x/r e K and y/s e K, then

x+ y __ r x+S

ye K by convexity.r+s r+sr r+ss

Thus p(x + y) < r + s; take the inf over r, then over s, to obtain p(x + y) <-p(x) + p(y), proving subadditivity. Positive homogeneity follows from thedefinition of p.

(b) If p(x) < 1, then sx e K for some s > 1; hence x e K by convexity.[Write x = (1/s)(sx) + [1 - (1/s)]0.] Now if p(x) 0; hencep(x + Ax) < 1 by definition of p. Thus p(x) < (I + A)-1 < 1. The last state-ment follows from the definition of p.

(c) if x/r e K and a e C, a 96 0, then ax/ I a I r e K since K is circled ;consequently p(ax) < I a I r. Take the inf over r to obtain p(ax) < I a Ip(x).Replace x by x/a to obtainp(x) < j a f p(x/a) or, with b = 1 la, p(bx) f b I p(x).

Now p(0) = 0 since 0 e K, and the result follows.(d) Since 0 e K° there is a neighborhood U of 0 such that U e K. If

A > 0 and y e AU, that is, y = Ax for some x e U, then p(y) = Ap(x) < A.

[Note x e K implies p(x) < 1, by (b).] Thus p is continuous at the origin,and therefore continuous everywhere by subadditivity.

1}.Since p is continuous, {x: p(x) < 1} is closed, so by (b), K c {x: &) :5But if 0 < A < 1 and p(x) < 1, then p(Ax) < 1; hence AX E K. If A -* 1, thenAx x; therefore p(x) < 1 implies x e K.

Again, by continuity of p, {x: p(x) < 1} is open, and hence is a subset ofK°. But if p(x) >_ 1, then by considering {x with A > I we see that x is a limitof a sequence of points not in K, hence x 0 K°. I

Before characterizing locally convex spaces, we need the followingresult.

3.5.6 Lemma. If U is a neighborhood of 0 in the topological vector space L,there is a circled neighborhood V of 0 with V c U, and a closed circled over-neighborhood W of 0 with W c U. If L is locally convex, V and W can betaken as convex.

PROOF. Choose T e 'W and S > 0 such that aT (-- U for I a l < 6, and takeV = U{aT: I a 1< b}. Now if A c L, we claim that

A= n{A+N: Ne.'}, (1)

where .K is the family of all neighborhoods of 0. [This may be written asA = n{A - N: N e X); note that N e .' if - N e .N' since the map y - -yis a homeomorphism.] For if x e A and N e X, then x + N is a neighbor-hood of x; hence (x+N)nA 00. Ifye(x+N)nA, then xey-N. Ifx0A,then (x+N)r A=0forsomeNe.,V;hencexoA-N.

Now if U is a neighborhood of 0, let V, be a circled neighborhood of 0with V, + V, c U. By (1), V, e V, + V,, and since V, is circled, so is V,.Thus we may take W = V, V.

In the locally convex case, we may as well assume U convex [the interiorof a convex set is convex, by 3.5.5(d) and the fact that a translation of aconvex set is convex]. If V2 is a circled neighborhood included in U, the con-vex hull

r n

.I,>0, YA,=1, n=1,2,...111+=1

the smallest convex overset of V2, is also included in U. Since V2 is circledso is V2 , and therefore so is (V2)°. (If x + N c P2, N e X, and I a l < 1, thenax + aN c V2 .) Thus we may take V = (V2)°.

Finally, if V3 is a circled convex neighborhood of 0 with V3 + V3 e U,then W = V3 is closed, circled, and convex, and by (1), W c V3 + V3 c U.

3.5.7 Theorem. If L is a locally convex topological vector space, the topologyof L is generated, in the sense of 3.5.2, by a family of seminorms. Specifically,if 3?1 is the collection of all circled convex neighborhoods of 0, the Minkowskifunctionals pu of the sets U e W are the desired seminorms.

PROOF. By 3.5.5(c), the Pu are seminorms, and 3.5.6, iW is a base at 0 for thetopology of L. By 3.5.5(d), for each U e ?l we have U = {x: pu(x) < 1}, andit follows that the topology of L is the same as the topology induced by thePu I

The fact that the Minkowski functional is sublinear suggests that theHahn-Banach theorem may provide useful information. The next resultillustrates this idea.

3.5.8 Theorem. Let K be a convex subset of the real vector space L; assumeK is radial at 0 and has Minkowski functional p.

(a) If f is a linear functional on L, then f < 1 on K iff f < p on L.(b) If g is a linear functional on the subspace M, and g< 1 on K n M,

then g may be extended to a linear functional f on L such that f < 1 on K.(c) If in addition K is circled and L is a complex vector space, and g is a

linear functional on the subspace M with I g I < 1 on K n M, then g may beextended to a linear functional f on L such that I f I < 1 on K.

(d) A continuous linear functional g on a subspace M of a locally convextopological vector space L may be extended to a continuous linear functionalon L.

PROOF. (a) If f < p on L, then f < 1 on K by 3.5.5(b). Conversely, assumef < 1 on K. If x/r a K, then f(x/r) < 1, so f(x) <- r. Take the inf over r toobtain f (x) < p(x).

(b) By (a), g < p on M, so by the Hahn-Banach theorem, g extends toa linear functional f with f < p on L. By (a), f <- 1 on K.

(c) Let L' be L regarded as a real vector space; by (b), g, = Re g may beextended to a linear function f, on L' with f, :5 1 on K. If fix) = f,(x) - if,(ix),then (as in 3.4.3) f is a linear functional on L, and on M we havef(x) = g,(x)- ig, (ix) = g(x). Finally, if f (x) = re19, r >- 0, then I f (x) I = r = f (e-!9x) _

f,(e-19x) since risreal. IfxeK,thene-1Axe KsinceKiscircled,so I f(x)1 <- 1.(d) By continuity, g must be bounded on some neighborhood of 0 in M,

so that by 3.5.6 there is a convex circled neighborhood K of 0 in L such thatgI is bounded on K n M. By (c), g has an extension to a linear functional

f on L with I f I bounded on K. But if If 1 <- r on K, then, given S > 0, wehave If I < 6 on the neighborhood (b/r)K, proving continuity.


In 3.4.14 we promised a general version of the open mapping theorem,and we now consider this question. The following technical lemma is the keystep.

3.5.9 Lemma. Let A be a continuous linear operator from L to M, where Lis a complete, metrizable topological vector space and M is a Hausdorff topo-logical vector space. Assume that for every neighborhood W of 0 in L, A(W)is an overneighborhood of 0 in M. Then in fact A(W) is an overneighborhoodof 0 in M.

PROOF. Denote by °?l the collection of overneighborhoods of 0 in L, and by'V the corresponding collection in M. It suffices to show that if U and V areneighborhoods of 0 in L, then A(U) c A(U + V). For then given the neigh-borhood W we may find a closed circled neighborhood U with U + U + U eW; then [see 3.5.6, Eq. (1)] U + U c U + U + U c W. Now A(U), whichbelongs to Y," by hypothesis, is a subset of A(U + U) by the result to be estab-lished, and this in turn is a subset of A(W), as desired.

Thus let U and V be neighborhoods of 0 in L. Since L is metrizable it hasa countable base of neighborhoods at 0, say U1, U2,.... It may be assumedthat Un+1 + Un+, c Un for all n [3.5.1(c)], and that U1 = U, U2 + U2 cU r) V.

Let y e A(U); we try to find elements xn e Un such that if yn = Axn, then,for all n= 1,2,...,

Yl + ... +y,, - Y e A(Un+1) (1)

Since y e A(U1) (=A(U)) and y + A(U2) is an overneighborhood of y,there is an element y1 e A(U1) n (y + A(U2)). Thus y1 = Ax, for some x,e U1, and y1 - y e A(U2). If x1, ... , xn have been found such that y1 + +yn - y e A(Un+1), then since y - (y1 + + yn) + A(Un+2) is an over-neighborhood of y - (y, + ' + yn), there is an element yn+, e A(Un+1) n(y - (y1 + + yn) + A(Un+2)), so that we have Xn+1 a Un+,, Y.+]=Axn+1, and y1 + + yn+, - y e A(U,,+2), completing the induction.

Let z , = then + * * ' ' +Un+k C U. (Note Un+k-1 + Un+k C Un+k-1 + Un+k-1 C Un+k-2, and pro-ceed backward.) Since the Un form a base at 0, (zn) is a Cauchy sequence in L;hence z,, converges to some x e L by completeness. [By the appendix on generaltopology, Theorem A 10.9, the metric d of L may be assumed invariant, thatis, d(x, y) = d(x + z, y + z) for all z. We know that zn,-z,, -,, 0 as n,m-+co;

hence by invariance, d(zm , zn) = d(zm - z., 0) -* 0, so that {zn} is Cauchy.]Since

it follows that x e U + V. If we can show that y = Ax, then y e A(U + V) andwe have finished.

If y 0 Ax, the Hausdorff hypothesis for M yields a closed circled W' E 'fi'such that Ax + W' and y + W' are disjoint; let W be a closed set in 11 suchthat W + W c W'. By continuity of A there is a set W, E 0l such that A(W,)c W. Since the Un form a base at 0, Un c W, for large n; hence A(U,,) c--A(WI) c W; since W is closed, we have

A(U,,) c W for sufficiently large n. (2)

Ifn<m,m n m

A Exj -y= Eyj-y+ y- yjj=1 j-1 j=n+l

E A(Un+1) + A(Un+l + ... + Um) by (1).

By (2), A(Y; ,x)Let m -- oo ; by continuity of A we find Ax - y E W + W C W' = W' CW' + W', contradicting the disjointness of Ax + W' and y + W'.

Under the hypothesis of 3.5.9, if W e K, then by continuity of A there is aset U,, in the countable base at 0 in L such that A(U,) c V. By 3.5.9, A(U,)e Yl^for all n; hence M also has a countable base at 0. It is shown in the appendixon general topology (Theorem A10.8) that the existence of a countable baseat 0 in a topological vector space is equivalent to pseudo-metrizability. Thusunder the hypothesis of 3.5.9, M, being Hausdorff, is metrizable.

One more preliminary result is needed.

3.5.10 Lemma. Let A be a linear operator from the topological vector spaceL to the topological vector space M. Assume A(L) is of the second category inM; that is, A(L) cannot be expressed as a countable union of nowhere densesets. If U is an overneighborhood of 0 in L, then A(U) is an overneighborhoodof 0 in M.

PROOF. Let 'W and 'V be the overneighborhoods of 0 in L and M, respectively.If U e O?1, choose a circled V e /1 with V+ V c U. By 3.5.1(b) applied to V

we have L = Un i nV, hence A(L) = Un=, nA(V). Since A(L) is of the sec-ond category, nA(V) [= nA(V)] has a nonempty interior for some n, henceA(V) has a nonempty interior. If B is a nonempty open subset of A(V), then

OeB-BcA(V)-A(V)cA(V)-A(V)= A(V) + A(V) since V, hence A(V), is circled

c A(U).

But B - B = U{x - B: x e B}, an open set; hence A(U) e

3.5.11 Open Mapping Theorem. Let Land M be metrizable topological vectorspaces, with L complete. If A is a continuous linear operator from L to Mand A(L) is of the second category in M, then A(L) = M and A is an openmap.

PROOF. Take 0ll and Y'' as in 3.5.9 and 3.5.10. Let U be any set in ?l ; by 3.5.9and 3.5.10, A(U) a 'Y'. If y c M, then y e nA(U) for some n by 3.5.1(b);hence y = A(nx) for some x e U, proving that A(L) = M. If G is open inL and x0 a G, choose V e °l1 such that xo + V c G. Then A(V) e 1 andAx. + A(V) e A(G), proving that A(G) is open.

3.5.12 Corollary. If A is a continuous linear operator from L onto M, whereL and M are complete metrizable topological vector spaces, then A is anopen map.

PROOF. By completeness of M, A(L) (=M) is of category 2 in M, and theresult follows from 3.5.11. 1

We conclude this section with a discussion of separation theorems andtheir applications. If K, and K2 are disjoint convex subsets of R3, there is aplane P such that K, is on one side of P and K2 is on the other side. Now Pcan be described as {x e R3 : f (X) = c} for some linear functional f and realnumber c; hence we have f(x) < c for all x in one of the two convex sets, andf(x) >- c for all x in the other set. We are going to consider generalizations ofthis idea.

The following theorem is the fundamental result of this type.

3.5.13 Basic Separation Theorem. Let K, and K2 be disjoint, nonemptyconvex subsets of the real vector space L, and assume that K, has at least one

internal point. There is a linear functional f on L separating K, and K2, thatis, f 0 0 and f(x) < f(y) for all x e K, and y e K2 [for short, f(K,) < f(K2)].

PROOF. First assume that 0 is an internal point of K,. Pick an element z e K2;then -z is an internal point of -z + K, c K, - K2 ; hence 0 is an internalpoint of the convex set K = z + K, - K2. If z e K, then K, n K2 Qf ,

contradicting the hypothesis; therefore z 0 K; so if p is the Minkowskifunctional of K, we have p(z) > I by 3.5.5(b). Define a linear functional g onthe subspace M = (),z: A e R} by g(Az) _ A. If a > 0, then g(az) = a = a(l) <ap(z) = p(az), and if a < 0, then g(az) = a < 0 <- p(az). By the Hahn-Banachtheorem, g extends to a (nontrivial) linear functional f with f < p on L. By3.5.8(a), f S I on K. Since f(z) = g(z) = 1, f(K,) < f(K2).

If x is an internal point of K,, then 0 is an internal point of -x + K,, and-x + K, is disjoint from -x + K2. If f(-x + Kl) < f(-x + K2), thenf(KI) -<f(K2) I

If L is a complex vector space, the discussion after 3.4.2 shows that thereis a nontrivial linear functional f on L such that Ref(x) < Re f(y) for allx e K y e K2 ; this is what we shall mean by "f separates K, and K2 " inthe complex case.

3.5.14 Corollaries. Assume the hypothesis of 3.5.13.(a) If L is a topological vector space and K, has nonempty interior, the

linear functional f constructed in 3.5.13 is continuous.(b) If L is locally convex, and K, and K2 disjoint convex subsets of L,

with K, closed and K2 compact, there is a continuous linear functional f on Lstrongly separating K, and K2, that is,.f(x) < c, < c2 < f(y) for some realnumbers c, and c2 and all x e K1, y e K2. In particular if L is Hausdorll' andx, y e L, x y, there is a continuous linear functional f on L with f (x) 96f(y).

(c) If L is locally convex, Ma closed subspace of L, and x0 e M, there is acontinuous linear functional f on L such that f = 0 on M and f(x°) 96 0.

PROOF. (a) Since K,° 96 0, the interior and internal points of K, coincide, by3.5.5(b) and (d). Thus in the proof of 3.5.13, 0 is an interior point of K, sothat K° = {x: p(x) < l} by 3.5.5(d). Since f < p on Lit follows that f < 1 onsome neighborhood U of 0 (for example, U = K°); since U may be assumedcircled, f(-x) < 1 for all x e U; hence I f 1 < 1 on U. But then If < 6 onthe neighborhood 5U, proving continuity.

(b) Let K = K2 - K,; K is convex, and is also closed since the sum of acompact set A and a closed set B in a topological vector space is closed. (If

then y,,,-+z- x, hencez - x e B since B is closed.) Now 0 0 K by disjointness of K, and K2 ; so bylocal convexity, there is a convex circled neighborhood V of 0 such thatV n K = 0. By 3.5.13 and part (a) above there is an f e L* separating V andK, say f( V) < c < f(K). Now f cannot be identically 0 on V, for if so f- 0on L by 3.5.1(b). Thus there is an x e V with f(x) # 0, and since V is circled,we may assume f (x) > 0. Thus c must be greater than 0; hence f(K2)f(K1) + c, c > 0, as desired.

(c) Let K, = M, K2 = {x0}, and apply (b) to obtain fe L* with f(M) <C' < c2 < f(xo). Since f is bounded above on the subspace M, f must be iden-tically 0 on M. [If f(x) 96 0, consider f(Ax) for arbitrarily large A.] But then.f (x0) >- c2 > 0. 1

If we adopt the above definition of separation in complex vector spaces,all parts of 3.5.14 extend immediately to the complex case.

Separation theorems may be applied effectively in the study of weaktopologies. In 3.4.10 we defined the weak topology on a normed linear space;the definition is identical for an arbitrary topological vector space L. Specifi-cally, for each f e L*, p1(x) = f (x) I defines a seminorm on L. The locallyconvex topology induced by the seminorms pr, f e L*, is called the weaktopology on L. By 3.5.1 and 3.5.2, a base at x0 for the weak topology consistsof finite intersections of sets of the form {x: p f(x - x0) < s}, so in the case ofa normed linear space we obtain the topology defined in 3.4.10.

There is a dual topology defined on L*; if x e L, then px(f) = I f(x) Idefines a seminorm on V. The locally convex topology induced by the semi-norms px is called the weak* topology on V.

By 3.5.2, the weak topology is the weakest topology on L making eachf e L* continuous, so the weak topology is weaker than the original topologyof L. Convergence of x to x in the weak topology means f (x,,) -+f(x) foreach f e V. The weak* topology on L* is the weakest topology making allevaluation maps f-+f(x) continuous. Convergence of f to fin the weak*topology means f (x) -+f(x) for all x e L; thus weak* convergence is simplypointwise convergence, so if L is a normed linear space, the weak* topologyis weaker than the norm topology on V.

We have observed in 3.4.10 that the weak topology is an example of aproduct topology. Since weak* convergence is pointwise convergence, theweak* topology is the product topology on the set C` of all complex-valuedfunctions on L, relativized to L*.

In distinguishing between the weak topology and the original topologyon L, it will be convenient to call the original topology the strong topology.

By the above discussion, a weakly closed subset of L is closed in the originaltopology. Under certain conditions there is a converse statement:

3.5.15 Theorem. Let L be a locally convex topological vector space. If K is aconvex subset of L, then K is strongly closed in L iff it is weakly closed.

PROOF. Assume K strongly closed. If y 0 K, then by 3.5.14(b) there are realnumbers c, and c2 and an f e L* with Re f(x) < c, < c2 < Re f(y) for allx e K. But then W = {x e L: I f (x) - f (y) I < c2 - c, j is a weak neighborhoodof y, and if xeK, we have If(x - y)I >- IRe f(y) - Re f(x)I c2-c1;therefore W n K = 0, proving K` weakly open.

For the remainder of this section we consider normed linear spaces. Theclosed unit ball f f: If II <- 1} of L* is never compact (in the strong topology)unless L (hence L*) is finite-dimensional; see Problem 9, Section 3.3. However,it is always compact in the weak* topology, as we now prove:

3.5.16 Banach-Alaoglu Theorem. If L is a normed linear space and B ={ f e L*: if II < 11, then B is compact in the weak* topology.

PROOF. If f e B, then f(x) I < if II Ilxll _< Ilxjj for all x e L. If I(x) is the set(z e C: I zI <_ Ilxlj), then B c fl{1(x): x e L), the set of all functions definedon L such that f(x) e 1(x) for each x e L. By the Tychonoff theorem (see theappendix on general topology, Theorem A5.4), fl{I(x): x e L) is compact inthe product topology (the topology of pointwise convergence).

Let be a net in B; by the above discussion there is a subnet convergingpointwise to a complex-valued function f on L, and since the f, are linear sois f. Now I f(x)I and it I

f

is L the weak topologyon to the topology on the

is weakly compact. Conversely, weak compactness of the closedunit ball implies reflexivity. To prove this we need the following auxiliaryresults:

3.5.17 Lemma. Let L be a topological vector space.

(a) If f is a linear functional on L, f is continuous relative to the weaktopology iff f is continuous relative to the strong topology, that is, iff f e V.

(b) If g is a linear functional on L*, g is continuous relative to the weak*topology iff there is an x e L such that g(f) =f(x) for all f e L*.

PROOF. (a) If f e L* and x converges weakly to x, then f(x)definition of the weak topology, so f is weakly continuous. Conversely, iff is continuous relative to the weak topology and x converges strongly to x,then x,, converges weakly to x, hence f(x) Thus f e V.

(b) The "if" part follows from the definition of the weak* topology, soassume g weak* continuous. A basic neighborhood of 0 in the weak* topologyis of the form U=ni=1{feL*: If(x;)I <b;,i=1,...,n}forsome x,,...,x e L. 61, ... , S > 0, n = 1, 2, ... ; thus there is a U of this type such that

g(f) I < I forallfe U. By linearity, I f(x;)I <b;lMforalliimplies Ig(f)I <11M. In particular, if f (x;) = 0 for all i = 1, ... , n, then I g(f) ( < I /M, and sinceAl is arbitrary, g(f) = 0. But this implies, by a standard algebraic result (seeProblem 15) that g(f) = Yl=1 c;f(x;) for some complex numbers c1, ..., c,,.Setx=Y;_,clxi.

3.5.18 Lemma. Let L be a normed linear space, and let h be the isometricisomorphism of L into L** (see 3.4.6). Let B be the closed unit ball in L,B** the closed unit ball in L**. Then h(B) is dense in B** relative to the weak*topology. Furthermore, h(L) is dense in L**.

PROOF. Let M be the weak* closure of h(B), and let x** E B**, x** 0 M. By3.5.16, B** is weak* compact (and convex); so by 3.5.14(b), there is a linearfunctional g on L** such that g is continuous relative to the weak* topologyand Re g(Al) 5 c, < c2 < Re g(x**). Since M is a subspace (in particular iscircled), g(y**) I < cl for all y** e M. [If y** e M and g(y**) = rei°, thenI g(y**) = g(e-Fey**) = r = Re g(e-rOy**) < c,.] By 3.5.17(b), g is of theform g(y**) = y**(f) for some f e V.

If y E B, and y** = h(y) is the corresponding element of h(B) c M, thenIf(y)I = Iy**(f)I - c,; hence

IlfII s c, < c2 < Reg(x**) < Ix**(f)1 < Ilx**II 11f 11 :g 1181,

a contradiction. Consequently M = B**.Finally, if y** e L**, y** 96 0, then y**/Ily**II e B**, which we have just

seen is the weak* closure of h(B). Thus y**/Ily**II belongs to the weak*closure of h(L), which is a subspace. The result follows. I

We now characterize reflexive normed linear spaces.

3.5.19 Theorem. A normed linear space L is reflexive if its closed unit ballB is weakly compact.

PROOF. The "only if" part has been pointed out after 3.5.16, so assume Bweakly compact. The isometric isomorphism h of L into L** is a homeo-morphism of the weak topology of L and the weak* topology of h(L) [notethat x"- + x iff f (x,,) --+f(x) for all f E L*; in other words, if h(x")-"' . h(x)J.Thus h(B) is a weak* compact (hence closed) subset of B**, and is dense inB** by 3.5.18. It follows that h(B) = B**, so as in 3.5.18, h(L) = L**, provingreflexivity.

3.5.20 Corollaries. (a) If M is a closed subspace of the reflexive normedlinear space L, then M is reflexive.

(b) If L is a Banach space, L is reflexive if L* is reflexive.

PROOF. (a) If B is the closed unit ball in L, then B n M is the closed unitball in M. By hypothesis, B n M is strongly closed, hence weakly closed by3.5.15. But B is weakly compact by 3.5.19, so B n M is weakly compact.Again by 3.5.19, M is reflexive.

(b) Assume L reflexive. The weak* topology on L* then coincides withits weak topology, hence by 3.5.16, the closed unit ball of L* is weakly coin-pact, and consequently L* is reflexive by 3.5.19. (Completeness of L is notused in this part.)

If L* is reflexive, so is L** by the first part of the proof. Since L is complete,so is h(L); thus h(L) is a closed subspace of L**, and the result follows from(a). I

Problems

1. Let W be the family of overneighborhoods of 0 in a topological vectorspace L. Show that L is Hausdorff iff (}{U: U E °ll} = {0}.

2. If f is a linear functional on the topological vector space L, with f notidentically 0, show that the four conditions of Problem 1, Section 3.3,are still equivalent.

3. If L is a finite-dimensional Hausdorff topological vector space, show thatL is topologically isomorphic to C" for some n. Show also that a finite-dimensional subspace of an arbitrary Hausdorff topological vector spaceis always closed. [It can be shown that for a Hausdorff topologicalvector space, finite-dimensionality is equivalent to local compactness;see Kelley and Namioka (1963, Theorem 7.8).]

4. (a) Let K be a convex set, radial at 0, in the vector space L. If g is anynonnegative sublinear functional on L such that {x c- L: g(x) < 1} cK c {x e L: g(x) < 1}, show that g is the Minkowski functional ofK. [Thus by 3.5.5(d), if 0 e K°, then the Minkowski functionals ofK°, K, and K coincide.)

(b) Use part (a) to find the Minkowski functionals of each of thefollowing subsets of R2:

<_ 1}.(i) {(x, y): IxI + I Y1(ii) {(X, Y): -1<x<1, -1<y<1}(iii) {(x, y): x2 + y2 < 1}.

(c) If K, and K2 are convex and radial at 0, with Minkowski functionalsand p2, show that the Minkowski functional of K, n K2 is

max(p1, P2)5. Let L be the set of all complex-valued functions on [0, 1], with the topol-

ogy of pointwise convergence. Then L is a locally convex topologicalvector space, with topology induced by the seminorms p,,(f) = I f(x)I,f e L, where x ranges over [0, 1]; see 3.5.3(a). Show that L is not metriz-able. (The same result holds even if we restrict L to the continuous com-plex-valued functions on [0, 1].)

6. Let L be a locally convex topological vector space. Show (without usingthe theory of uniform spaces given in the appendix on general topology,Section A10) that the following conditions are equivalent:

(a) L is pseudometrizable, that is, there is a pseudometric inducingthe topology of L.

(b) L has a countable base at 0 (hence at every point of L).(c) The topology of L is generated by a countable family of semi-

norms.7. Let L be the collection of analytic functions on the unit disk D =

{z: IzI < 1}, with the topology of uniform convergence on compactsubsets [see 3.5.3(a)]. Show that L is metrizable but not normable.

8. Let L = LP[0, 1) where 0 0.]

(a) Let f be a continuous linear functional on L. If fix) = 1, show thatx can be written as y + z where d(y, 0) = d(z, 0) = # d(x, 0).

(b) With f as in part (a), show that there are functions x,, X2, ... e Lsuch that 0) -- 0 as n - oo, but I f (x.) I > I for all n. Concludethat the only continuous linear functional on L is the zero functional.[In part (a), either I f(Y) I ? I or I f(z) I > 1; if, say, I f(Y) I #,take x, = 2y and proceed inductively.]

9. Let L be a topological vector space. Show that there is a nontrivialcontinuous linear functional on L iff there is a proper convex subset ofL with nonempty interior.

10. Let B1 and B2 be (not necessarily convex) nonempty subsets of thetopological vector space L, with B1° 96 0. If f is a linear functional on

L that separates B, and B2, show that f is continuous. [This gives anotherproof of 3.5.14(a).]

11. Let K be a convex subset of a locally convex topological vector space L.Show that the closure of K in the weak topology coincides with theclosure in the strong topology.

12. Let K be a convex set, radial at 0, in the real vector space L. Let x be apoint of L and M a subspace of L such that x + M does not contain anyinternal points of K. Show that there is a hyperplane H (that is, H isthe null space of a nontrivial linear functional on L) such that M c Hand x + H contains no internal points of K. [Define g on L(M u {x})by g(y + ax) = a, and extend using the Hahn-Banach theorem.]

13. (a) If L is a topological vector space and 91, 2 are topologies on Lsuch that ,% , c 9 2 and L is complete and metrizable under both, and 9-,2, show that J,

(b) If L is a Banach space under two norms 11 11, and II 112, and, forsome k > 0, llxll S klfxll2 for all x e L, show that for some r > 0,IIx112 < rllxll, for all x e L; in other words (Problem 6, Section 3.3)the norms induce the same topology.

14. (a) (Closed graph theorem) Let A be a linear map from L to M,where L and M are complete metrizable topological vector spaces.Assume A is closed; in other words, the graph of A is a closedsubset of L x M, with the product topology. Show that A is con-tinuous.

(b) If A is a continuous linear operator from L to M where L and Mare topological vector spaces and M is Hausdorff, show that Ais closed.

15. Let g, f1, ... , f" be linear functionals on the vector space L. If N denotesnull space and n;= I N(f;) e N(g), show that g is a linear combinationof the f, .

16. Let L be a normed linear space. If the weak and strong topologies coin-cide on L, show that L is finite-dimensional, as follows:(a) The unit ball B = {x: IIxli < 1) is strongly open, hence weakly

open by hypothesis. Thus we can find . e L* and b1, ...,S" > 0 such that {x: l f;(# < S,, i = 1, ..., n} c B. Show that(with N(f) denoting the null space off) (),!=I N(f;) = (0).

(b) Define T(x) = (f, (x), ... , f"(x)) ; by (a), T is a one-to-one map of Lonto a subspace M of C". If {y...... yk) is a basis for M and x; =T - ' (y;), j = 1, ... , k, show that {x,, ... , xk} is a basis for L,hence L is finite-dimensional.

17. Let L be a separable normed linear space. If B is the closed unit ball inL*, show that B is metrizable in the weak* topology. Thus since B iscompact by 3.5.16, every sequence in B (hence every norm-bounded

3.6 REFERENCES 167

sequence in L*) has a weak* convergent subsequence. Similarly, if L isreflexive, the closed unit ball of L is metrizable in the weak topology.(Let {x,, X21 ...} be a countable dense subset of

f,

and set

,,JJ {{w 1 If(xn) - g(Xn)1

f, g EL*.d(f, g) - =1 2" 1 + ((((x.) - g(xn)I

Show that weak* convergence implies d-convergence.)18. Show that a reflexive Banach space is weakly complete; in other words

every weak Cauchy sequence [{ f (x )} is Cauchy for each f e L*] convergesweakly.

19. A subset B of a topological vector space L is said to be absorbed by asubset A if B c aA for sufficiently large I a I ; B is said to be boundedif it is absorbed by every neighborhood of 0.(a) Show that B is bounded if for each sequence of points x e B and

each sequence of complex numbers .1 -+ 0, we have A. x -+ 0.(b) A bornivore in a topological vector space L is a convex circled set

that absorbs every bounded set; L is bornological if L is locallyconvex and every bornivore is an overneighborhood of 0. Show thatevery metrizable locally convex space is bornological.

(c) Let T be a linear operator from L to M, where L is bornologicaland M is locally convex. If T maps bounded sets into boundedsets, show that T is continuous. (The converse is true for arbitraryL and M.)

(d) If L is a Hausdorff topological vector space, show that there is nobounded subspace of L except {0}.

3.6 References

There is a vast literature on functional analysis, and we only give a fewrepresentative titles. Readable introductory treatments are given in Liusternikand Sobolev (1961), Taylor (1958), Bachman and Narici (1966), and Halmos(1951); the last deals exclusively with Hilbert spaces. Among the more ad-vanced treatments, Dunford and Schwartz (1958, 1963, 1970) emphasizenormed spaces, Kelley and Namioka (1963) and Schaefer (1966) emphasizetopological vector spaces. Yosida (1968) gives a broad survey of applicationsto differential equations, semigroup theory, and other areas of analysis.

4

The Interplay between Measure Theoryand Topology

4.1 Introduction

A connection between measure theory and topology is established whena a-field .F is defined in terms of topological properties. In the most commonsituation, we have a topological space fl, and F is taken as the smallest a-fieldcontaining all open sets of Q. If this is done, there is a natural connectionbetween measure-theoretic and topological questions. For example, if µ is ameasure on , and A e .F, we may ask whether A can be approximated by acompact subset. In other words, we wish to know if u(A) = sup{µ(K):K a compact subset of A}. As another example, we may ask whether afunction in L°(S2, .07, p) can be approximated by a continuous function. Oneformulation of this is to ask whether the continuous functions are dense inP.

In this chapter we investigate questions of this type. The results in thefirst two sections are not topological, but they serve as basic tools in the laterdevelopment. We first consider a result that is a companion to the monotoneclass theorem:

4.1.1 Definition. Let -9 be a class of subsets of a set 0. Then -9 is said to be aDynkin system (D-system for short) if the following conditions hold.

168

4.1 INTRODUCTION 169

(a) QE -'.(b) If A, B e 2, B c A, then A - B e 9. Thus -9 is closed under proper

differences.(c) If A1,A2,...e9andA,, TA, then AeO.

Note that by (a) and (b), -9 is closed under complementation; hence by (c), -9is a monotone class. If -9 is closed under finite union (or closed under finiteintersection), then g is a field, and hence a a-field (see 1.2.1).

4.1.2 Dynkin System Theorem. Let .9' be a class of subsets of 0, and assume9' closed under finite intersection. If -9 is a Dynkin system and -9 .9', then-9 includes the minimal a-field F = a(9').

PROOF. Let -9o be the smallest D-system including 9. We show that -9Q = F,in other words the smallest D-system and the smallest a-field over a classclosed under finite intersection coincide. Since -9o c -9, the result will follow.

Now -9o (-- .;t since F is a D-system. To show that .F c .9o , let ' ={A e -9o : A r B e -90 for alI B e .9'}. Then 91' c ' since .9' is closed underfinite intersection, and since -9o is a D-system, so is W. Thus .90 a' , hence-9o=W.

Now let " _ {C ego : C n D c-90 for all D e 3o). The result .90 = Wimplies that 51 T, and since 'e' is a D-system we have .9o c W', hence0o = V. It follows that -90 is closed under finite intersection; by 4.1.1, -90 isa a-field, so that F c go. 1

If Y is a field and .,# is the smallest monotone class including So, thenAf = a(.9') (see 1.3.9). In the Dynkin system theorem, we have a weakerhypothesis on .9' (it is closed under finite intersection but need not be a field)but a stronger hypothesis on the class of sets including . 9' (9 is a Dynkinsystem, not merely a monotone class).

4.1.3 Corollary. Let 9 be a class of subsets of 0, and let µl and P2 be finitemeasures on a(.9'). Assume 0 E . 9' and . 9' is closed under finite intersection.If µ, = µ2 on So, then µl = µ2 on a(.9').

PRoor. Let g be the collection of sets A e a(.9') such that µ,(A) = 102(A). Then2 is a D-system and 59' c .9, hence g = a(.9') by 4.1.2. 1

4.1.4 Corollary. Let .9' be a class of subsets of 0; assume that S2 e .9' and . 9'is closed under finite intersection. Let H be a vector space of real-valuedfunctions on fl, such that IA E H for each A E .9'.

170 4 THE INTERPLAY BETWEEN MEASURE THEORY AND TOPOLOGY

Suppose that whenever f f2 , ... are nonnegative functions in H, I f 1 <M < co for all n, and f, T f, the limit function f belongs to H. Then IA e Hfor all A c a(.9').

PROOF. Let _q = {A c fl: IA a H}; then ,9'c -9 and 2 is a D-system. For ifA,Bc ,BcA,then IA_B=IA-JBeH,sothat A-Be -9. If A. e-9 andA T A, then IA T IA, hence I. e H by hypothesis; thus A e -9. The result nowfollows from 4.1.2. 1

4.2 The Daniell Integral

One of the basic properties of the integral is linearity; if f and g are p-integrable functions and a and b are real or complex numbers, then

fn (af+bg)dp=a$n fdp+b fng dp.

Thus the integral may be regarded as a linear functional on the vector spaceof integrable functions. This idea may be used as the basis for a differentapproach to integration theory. Instead of beginning with a measure andconstructing the corresponding integral, we start with a given linear functionalE on a vector space. Under appropriate hypotheses, we extend E to a largerspace, and finally we show that there is a measure u such that E is in fact theintegral with respect to p.

We first fix the notation to be used.

4.2.1 Notation. In this section, L will denote a vector space of real-valuedfunctions on a set f ; L is assumed closed under the lattice operations, in otherwords if f, g e L and f vg = max(f, g), f A g= min(f, g), then f v g, f A g e L.There are several familiar examples of such spaces; L can be the class ofcontinuous real-valued functions on a given topological space, or equally well,the bounded continuous real-valued functions. Another possibility is to takeL as the collection of all real-valued functions on a given set.

The letter E will denote a positive linear functional on L, that is, a linearmap E from L to R such that f'>_ 0 implies E(f) >- 0. This implies that E ismonotone, that is, f < g implies E(f) <- E(g).

If H is any class of functions from 0 to R (or R), H+ will denotef f e H : f > 0). The collection of functions f: S2 -+ R of the form lim f wherethe f form an increasing sequence of functions in L+, will be denoted by L';if the f are allowed to form an increasing net in L+, the resulting class isdenoted by L. (See the appendix on general topology, Section Al, for adiscussion of nets.)

4.2 THE DANIELL INTEGRAL 171

If H is as above, a(H) is defined as the smallest a-field of subsets of 12making every function in H Bore] measurable, that is, the minimal a-fieldcontaining all sets f -'(B), where f ranges over H and B over the Borel sets.If 9 is a class of subsets of K2, a(f) as usual denotes the minimal or-field overT.

The following will be assumed throughout:

Hypothesis A: If the functions fn form a sequence in L decreasing to 0,then E(fn) decreases to 0. Equivalently, if the fn belong to L and increase tof e L, then E(fn) increases to E(f).

We are also going to carry through a parallel development under thefollowing assumption:

Hypothesis B: If the functions fn form a net in L decreasing to 0, thenE(fn) decreases to 0 (with the equivalent statement just as in hypothesis A).

Hypothesis A is always assumed in the statements of theorems. Corre-sponding results under hypothesis B will be added in brackets.

4.2.2 Lemma. Let {fm} and { fn'} be sequences in L increasing to f and f',respectively, with f < f' (f and f need not belong to L). Then

Iim E(fm) <lim E(f;,).M n

Hence E may be extended to L' by defining E(limn fn) = limn E(fn). [Underhypothesis B (and with "sequence" replaced by "net" in the above state-ment), E may be extended to L" in the same fashion.]

PROOF. As n --> 00,Jm A./n' I Jm Af' =Jme hence limn E(fn') limn E(fm Afn') =E(fm). Let m -), oo to finish the proof. [Under hypothesis B, the proof is thesame, with sequences replaced by nets.] I

We now study the extension of E.

4.2.3 Lemma. The extension of E to L' has the following properties:

(a) 0< E(f)<o forallfeL'.(b) Iff,geL',f<g,then E(f)<E(g).(c) life L' and 0:!9c< oo, then cf e L' and E(cf) = cE(f).(d) Iff,geL',thenf+g,fvg,fngeL'and

E(f i g)=E(fvg)+E(fng)=E(f)+E(g).

(e) If If,,) is a sequence in L' increasing to f, then f e L' and increasesto E(f).

[Under hypothesis B, the extension of E to L" has exactly the same properties;" sequence " is replaced by " net " in (e).]

PROOF. Properties (a)-(c) follow from 4.2.2. To prove (d), let g a L+, withfn T f g,, ? g. Then

E(fn v gn) + E(fn A g,,) = E[(f,, v gn) + (fn A g,,)]by linearity

= E(f,, + gn)(the sum of two numbers isthe larger plus the smaller)

= E(fn) + E(gn)by linearity.

Let n --> co to obtain (d). To prove (e), let J,,m e L, f,,, If. as in -,, oo. Ifg,,, = f,m v .. vfmm, the gm form an increasing sequence in L+, and

f.. :5: gm:! fm for n<m; (1)

hence

E(fnm) < E(gm) < E(fm), n < in. (2)

In (l) let in - oo to obtain fn < limm gm < limmfm =f; then let n -+ oo toobtain gm If, hence E(gm) I E(f). Now let in oo in (2) to find that E(fn) SE(f) < limm E(fm); then let n --> oo to establish that limn E(f)

[Under hypothesis B, (a)-(d) are proved as above, but (e) is more com-plicated. Let {fn} be a net in L" with f I f; for each n there is a net If..) in Vwith f,m T fn , so that f = sup,,, mfnm Let D consist of all finite sets of pairs(n, m). Direct D by inclusion, and define gm = (n, m) a M}, ME D.The g,K form a monotone net in e and gm If-, therefore f e L" and, by 4.2.2,E(gm) I E(f ). Now for each M, E(gM) < sup,,,,. E(f,, m), and sincef,,m < f, forall in, it follows that E(gm) < sup,, E(.) = lim,, E(fn). But M is arbitrary,hence E(f) < lim,, The reverse inclusion holds because f', t f, and theproof is complete.]

We now begin the construction of a measure derived from the linear func-tional E. Fron now on we assume that all constant functions belong to L and,as a normalization, E(l) = 1 (hence E(c) = c for all c).

4.2.4 Lemma. Let 9 = {G c 0: IG cL'} and define µ(G) = E(IG), G E mar.Then 9 satisfies the four conditions of 1.3.2, namely:

(a) 0, n e QF, u(o) = 0, µ(0) = 1, 0 < µ(A) < 1 for all A e 99.(b) If G G2 e W, then G, u G2, G, n G2 c 9r, and

µ(G, u G2) +µ(G, n G2) = µ(GI) + µ(G2).

(c) If G,, G2 e 91 and G, c G2, then µ(G,) < µ(G2).(d) If G a g, n = 1, 2,... and G I G, then G E T and p(Ga) j p(G).

Thus by 1.3.3 and 1.3.5, u*(A) = inf{µ(G): G e 9, G e A} is a probabilitymeasure on the a-field W = {H c 0: µ*(H) + p*(H`) = 1), and p* = p on 9.

[Under hypothesis B, we take 9 = {G a !Q: Ic e L"} and replace sequencesby nets in 4.2.4(d). The classIi then has exactly the same properties.]

PROOF. Part (a) follows since L contains the constant functions and E(c) = c.Part (b) follows from 4.2.3(d) with f = IG, , g = IG2 . Part (c) follows from4.2.3(b), and part (d) from 4.2.3(e). [The proof under hypothesis B is identi-cal.] I

We now investigate Borel measurability relative to the a-field a(T).

4.2.5 Lemma. 1f f E L' and a e R, then {w: f (c)) > a} e g. Hence

f (A, aM) --> (R, 9(R)).

[The same result holds for f e L" under hypothesis B.]

PROOF. Let f,, L+, f, If Then (fn - a)+ = (f - a) v 0 e L+ c L', and(f, - a)+ j (f - a)+ By 4.2.3(e) (or the definition of L'), (f - a)+ E L'; hencek(f - a)+ e L' for each positive integer k, by 4.2.3(c). But as k -+ oo,

1 A k(f-a)+ I I(f>);so by 4.2.3(e), I{ f>a) e L', and therefore {f > a) e 9. [Under hypothesis B, theproof is the same, with { fa} a net instead of a sequence.] I

4.2.6 Lemma. The a-fields a(L), a(L'), and a(ir) are identical. [Under hypo-thesis B we only have a(V) = a(f) and a(L) c a(L").]

PROOF. By 4.2.5, a(gr) makes every function in L' measurable; hence a(E) ca(gr). If G E 9r, then IG e L'; hence G = (IG = 1) a a(L'); therefore a(sr) e a(L').[Under hypothesis B, a(L") = a(W) by the same argument.]

174 4 THE INTERIsLAY BETWEEN MEASURE THEORY AND TOPOLOGY

If f e L, then f = f + -f-, where f +, f - e L' c :L'. Since f + and f- area(L')-measurable, so is f. In other words, u(L') makes every function in LBore] measurable, so that a(L) c a(L'). [Under hypothesis B, a(L) c a(L") bythe same argument.] Now if f e L', then f is the limit of a sequencef a L, andsince the f are a(L)-measurable, so is f. [This fails under hypothesis B becausethe limit of a net of measurable functions need not be measurable; seeProblem 1.] Thus a(L') e a(L). I

Now by definition of the set function p* (see 4.2.4) we have, for allA c Sl,

p*(A) = inf{E(IG): G e QV, G A)

=inf{E(f):f=IGE L', f>IA}> inf{E(f): fe L', f> IA}.

In fact equality holds.

4.2.7 Lemma. For any A c 0, p*(A) = inf{E(f): f e L', f > IA). [The result isthe same under hypothesis B, with L' replaced by L".)

PROOF. Let f e L', f >- IA. If 0 < a < 1, then A e (f > a), which belongs to 9by 4.2.5. Thus p*(A) < p{ f > a) = E(I f>a)). But since f >- 0 we have f ZaI(f>Q); hence E(I(f>Q)) <E(f)la. Let a-+ 1 to conclude that p*(A) S E(f).[The proof is the same under hypothesis B.]

We may now prove that p* is a measure on a(l):

4.2.8 Lemma. If .e = {H c fl: p*(H) + p*(H`) = 1), then 9r c f e, hencea(19) c 0. [The result is the same under hypothesis B.]

PROOF. Let G e T; since 1c e L', there are functions f, e e with f, T 1G. Then

y*(G) = p(G) = E(IG) = lim E(f) by 4.2.3(e).n

By 4.2.7, p*(G`) = inf{E(f ): f e L', f But if fn < IC', then 1 - fn >- Ian ;since I - fn > 0, we have I - fn e L' c L'; hence

p*(G`) < inf E(1 - I - lim E(f) = 1 - E(IG).n n

Thus p*(G) + p*(G`) < 1; since p*(G) + p*(G`) is always at least I by 1.3.3(b),we have G E .-Y. [Under hypothesis B, the proof is the same, with { fn} a netinstead of a sequence.] I

We now prove the main theorem; for clarity we state the results underhypotheses A and B separately (in 4.2.9 and 4.2.10).

4.2.9 Daniel! Representation Theorem. Let L be a vector space of real-valuedfunctions on the set 92; assume that L contains the constant functions and isclosed under the lattice operations. Let E be a Daniell integral on L, that is,a positive linear functional on L such that E(fa) J. 0 for each sequence offunctions f, e L with f 10; assume that E(1) = 1.

Then there is a unique probability measure P on a(L) (=a(L') = a(T) by4.2.6) such that each f e L is P-integrable and E(f) = In f dP.

PROOF. Let P he the restriction of p* to a(L). By 4.2.6 and 4.2.8, P is a proba-bility measure. If G e 19, then

E(I0) = p(G) = p*(G) = P(G) = f IG 0.n

Now if f e L', we define

k-1hn = l{(k-I)/2^<f5k/2^) + nl(f>,)n

k=1 2

Since I(a< f:5 b) = I(f>a) - I{ f>b} for a < b, and {f > a}, {f > b} e W by 4.2.5, itfollows that In h dP. But the h form a sequence of nonnegativesimple functions increasing to f [see 1.5.5(a)], so by the monotone convergencetheorem, E(f) = In f dP.

Now let feL; f = f + -f-, where f +, f - c -L' c L'. Then E(f) _E ( f ') - E(f -) _ ,$n f + d P - fn f d P = f n f dP. (Since f +, f - e L, the in-tegrals are finite.)

This establishes the existence of the desired probability measure P.If P' is another such measure, then So fdP = 5 n f dP' for all f a L,and hencefor all f e L', by the monotone convergence theorem. Set f = Ia, G e W1, toshow that P = P on W. Now Ir is closed under finite intersection by 4.2.4(b);hence by 4.1.3, P = P' on a(lr), proving uniqueness. I

4.2.10 Theorem. Let L be a vector space of real-valued functions on the seti2; assume that L contains the constant functions and is closed under thelattice operations. Let E be a positive linear functional on L such thatE(,,) 10 for each net of functions f e L with f 10; assume that E(1) = I.

Then there is a unique probability measure P on o(L") (=a(W) by 4.2.6)such that:

(a) Each f e L is P-integrable and E(f) = In f dP.(b) If is a net of sets in g and Ga TG, then G e and P(G) T P(G).

PROOF. Let P be the restriction of M* to a(L'). The proof that P satisfies (a)is done exactly as in 4.2.9, with L' replaced by L'; P satisfies (b) by 4.2.4(d).

The uniqueness part cannot be done exactly as in 4.2.9 because themonotone convergence theorem fails in general for nets (see Problem 2). LetP' be a probability measure satisfying (a) and (b). If f e L', there is a net offunctions fa e L+ with fa If. We define

1 n2"hna = 2n,I1 I{Ia> n = 1, 2, ... .

If (k - 1)/2" < fa(w) < k/2n, k = 1, 2, ..., n2", then hna(w) =(k - 1)/2", and iffa(co) > n, then hna(w) = n. Thus the hna, n = 1, 2, ..., are in fact the standardsequence of nonnegative simple functions increasing to fa [see 1.5.5(a)].Similarly, if

1 nr22"

h" 2n L I{j> j2-")j=1

the h" are nonnegative simple functions increasing to f. Now

1 n2"f h" dP' y P'{f > j2-"}a 2nj=I

1 o2"E lim P'{ fa > j2-"} by (b)' 2n j=1 a

1 n2"=lim- Y P'{fa>j2r"}a 2" j=1

since the sum on j is finite

= lim f hna dP';a n

but

f f dP' = lim f ha dP'n n n

by the monotone convergence theorem

= lim lim f hna dP'n a n

= lim lim f hna dP'a n a

since hna is monotone in each variable,so that "lim" may be replaced by "sup"

=lim f fadP'a a

by the monotone convergence theorem

4.2 THE DANIELL INTEGRAL

Equation continues

= limJ

fa dP by (a)a a

= fnf dP

by the above argument with P' replaced by P.

177

Set f = Ic, G e 9r, to show that P = P' on 9; hence, as in 4.2.9, on a(w).

The following approximation theorem will be helpful in the next section.

4.2.11 Theorem. Assume the hypothesis of 4.2.9, and in addition assume thatL is closed under limits of uniformly convergent sequences. Let

9'={G-=0:G={f>0) for some feL}.Then :

(a) 9' = 9r.

(b) If A e 6(L), then P(A) = inf{P(G): G e,r', G :DA}.(c) If G e W, then P(G) = sup{E(f): fe L, f < Ic}.

PROOF. (a) We have 5r' c Ir by 4.2.5. Conversely, suppose G e 1, and letf E L+ with f, T Ic (e L'). Set f = 2:1 1 2-nf . Since 0 <f,,:!5; 1, the series isuniformly convergent, hence f e L. But

{f>0)= U 1}=G.n=1

Consequently, G e 9'.

(b) This is immediate from (a) and the fact that P = u* on Q(L).(c) If f e L+, f < Ic, then E(f) < E(IG) = P(G). Conversely, let G E 1,

with f e L, f T 1G. Then P(G) = E(!) = limn sup henceP(G) < sup{E(f): fe e, f < IG}.

Problems

1. Give an example to show that the limit of a net of measurable functionsneed not be measurable.

2. Give an example of a net of nonnegative Borel measurable functions faincreasing to a Bore] measurable function f, with lima f fa du -0 f f du.

3. Let L be the class of real-valued continuous functions on [0, 1], and letE(f) be the Riemann integral off. Show that E is a Daniel] integral on L,and show that o(L) _ -4[0, 1] and P is Lebesgue measure.


4.3 Measures on Topological Spaces

We are now in a position to obtain precise results on the interplay betweenmeasure theory and topology.

4.3.1 Definitions and Comments. Let fI be a normal topological space (0 isHausdorff, and if A and B are disjoint closed subsets of Q, there are disjointopen sets U and V with A c U and B c V). The basic property of normalspaces that we need is Urysohn's lemma: If A and B are disjoint closed subsetsof 0, there is a continuous function f: r2 -> [0, 11 such that f = 0 on A andf = 1 on B. Other standard results are that every compact Hausdorrf spaceis normal, and every metric space is normal.

The class of Borel sets of 0, denoted by R(fl) or simply by.4, is defined asthe smallest a-field of subsets of r2 containing the open (or equally well theclosed) sets. The class of Baire sets of 0, denoted by V(fl) or simply by sad,is defined as the smallest a-field of subsets of S1 making all continuous real-valued functions (Borel) measurable, that is, sat is the minimal Q-field contain-ing all sets f -'(B) where B ranges over R(R) and f ranges over the classC(f2) of continuous maps from f to R. Note that sd is the smallest a-fieldmaking all bounded continuous functions measurable. For let .F be a a-fieldthat makes all bounded continuous functions measurable. If f e C(r2), thenf + An is a bounded continuous function and f + An If + as n -+ oo. Thusf + (and similarly f -) is .F-measurable, hence f =f' -f - is .F-measurable.Thus d c F, as desired.

The class of bounded continuous real-valued functions on 0 will be denotedby Cb(Q).

If V is an open subset of R and f e C(Q), then f -'(V) is open in 11, hencef -(V) e 2(c2). But the sets f -'(V) generate si(Q), since any a-field con-taining the sets f -'(V) for all open sets V must contain the sets f -'(B) for allBorel sets B. (Problem 6 of Section 1.2 may be used to give a formal proof,with ' taken as the class of open sets.) It follows that W(r) a R(r2).

An F, set in rZ is a countable union of closed sets, and a G. set is a count-able intersection of open sets.

4.3.2 Theorem. Let r2 be a normal topological space. Then sl(r2) is the mini-mal a-field containing the open F, sets (or equally well, the minimal a-fieldcontaining the closed G6 sets).

PROOF. Let f e C(Q); then If > a) = Un , If >- a + (1/n)) is an open F,, set.As above, the sets {f> a}, a e R, f e C(O), generate V; hence .4 is included

4.3 MEASURES ON TOPOLOGICAL SPACES 179

in the minimal o-field ,Y over the F, sets. Conversely, let H = Un , F,,, F.closed, be an open F. set. By Urysohn's lemma, there are functions f" e C(f))with onF,,.Iff=Y', 2-"f", then feC(Q),0<f<1,and{f>0}=Un ,{f">0}= H.Thus He.4,sothat . c.4.

4.3.3 Corollary. If fl is a normal topological space, the open F, sets are pre-cisely the sets (f > 0) where f e Cb(fl), f >_ 0.

PROOF. By the argument of 4.3.2. 1

4.3.4 Corollary. If Cl is a metric space, then .sd(Cl) = R(i2).

PROOF. If F is a closed subset of Cl, then F is a G.

1F = n {c o: dist(w, F) < - ;00

"=i n

hence every open subset of Cl is an F, . The result now follows from 4.3.2. 1

Corollary 4.3.4 has a direct proof that avoids use of 4.3.2; see Problem 1.We may obtain some additional information about the open F, sets of a

normal space.

4.3.5 Lemma. Let A be an open F. set in the normal space S2, so that by 4.3.3,A = (f > 0) where f C- Cb(fZ), f > 0. Then IA is the limit of an increasingsequence of continuous functions.

PROOF. We have (f > 01 = Un , {f> 1/n), and by Urysohn's lemma there arefunctions f" e C(C) with 0< .fn :g 1, f" = 0 on (f = 0),f. = 1 on {f > I /n}. If9" = max(f , ... ,./"), then 9" T Itt> o) .

The Daniell theory now gives us a basic approximation theorem.

4.3.6 Theorem. Let P be any probability measure on .W(f2), where Cl is anormal topological space. If A e d, then

(a) P(A) = inf{P(V): V n A, Van open F. set},(b) P(A) = sup{P(C): C c A, C a closed Ga set}.

PROOF. Let L = Cb(f) and define E(f) = In f dP, f e L. [Note that a(L) = ,V,so each f e L is d-measurable; furthermore, since f is bounded, In f dP isfinite. Thus E is well-defined.] Now E is a positive linear functional on L, andby the dominated convergence theorem, E is a Daniell integral. By 4.2.11(b),

P(A) = inf{P(G): G e 9r', G z A},

where

9'={Gc0:G={f>0} forsomefeL}.By 4.3.3, T' is the class of open F, sets, proving (a). Part (b) follows uponapplying (a) to the complement of A. I

4.3.7 Corollary. If f2 is a metric space, and P is a probability measure onR(f ), then for each A e R(f2),

(a) P(A) = inf{P(V): V A, V open},(b) P(A) = sup{P(C): C c A, C closed}.

PROOF. In a metric space, every closed set is a Gs and every open set is an F,(see 4.3.4); the result follows from 4.3.6. 1

Under additional hypotheses on f2, we obtain approximations by compactsubsets.

4.3.8 Theorem. Let ) be a complete separable metric space (sometimescalled a "Polish space"). If P is a probability measure on R(fl), then foreach A e R(f2),

P(A) = sup{P(K): K compact subset of A}.

PROOF. By 4.3.7, the approximation property holds with "compact" replacedby "closed." We are going to show that if e > 0, there is a compact set K. suchthat P(Ke) > 1 - s. This implies the theorem, for if C is closed, then C n K, iscompact, and P(C) - P(C n K,) = P(C - K,) < P(c2 - K,) < e.

Since fl is separable, there is a countable dense set {w1, w2, ...}. LetB(cv,,, r} (respectively, B(w,,, r)) be the open (respectively, closed) ball withcenter at w and radius r. Then for every r > 0, Q = Un 1 B(w,,, r) so thatUk=, B(Wk, 1/n) T f2 as m.-oo (n fixed). Thus given e > 0 and a positiveinteger n, there is a positive integer m(n) such that P(Uk 1 B(wk, 1/n)) z1 - e2` for all m Z m(n).

4.3 MEASURES ON TOPOLOGICAL SPACES

Let K, = nni Uk("1 B(uwk, 1/n). Then K, is closed, and

X00

'PI [Cok'!m() _

B1 c m

P(K,`) L_J) < Y s2-" = s.

n=1 k=1 n n=1

181

Therefore P(K,) >- I - s.It remains to show that K, is compact. Let {x1ix2, ...} be a sequence in K,.

Then xP E no1

Uk "i R04, 1/n) for all p; hence x,, e Uk`=11 B(wk, 1) for allp. We conclude that for some k,, xP E B(wk,, 1) for infinitely many p, say, forp c- T,, an infinite subset of the positive integers. But xP E B(wk, #) forall p, in particular for all p e T1; hence for some k2 , XP E B(cok,, 1) r B((vk2, #)for infinitely many p e TI, say, for p e T2 e T1. Continue inductively toobtain integers k,, k2, ... and infinite sets TI T2 such that

Ix1,E f BLwk',J

for all peT,.j=1 LL 1

Pick pi e Ti, i = 1, 2, ..., with PI < P2 < -. Then if j < i, we have xP,,x,,, E B(cokj , 1 /j), so d(xp, , XP) < 2/j - 0 as j - oo. Thus {xp,) is a Cauchysequence, hence converges (to a point of K, since K, is closed). Therefore {xp}has a subsequence converging to a point of K,, so K, is compact. I

We now apply the Daniell theory to obtain theorems on representationof positive linear functionals in a topological context.

4.3.9 Theorem. Let n be a compact Hausdorff space, and let E be a positivelinear functional on C(Q), with E(l) = 1. There is a unique probability meas-ure P on sd(Q) such that E(f) = In f dP for all f e C(CZ).

PROOF. Let L = C(Q). If fn e L, fn 10, then f -+0 uniformly (this is Dini'stheorem). For given b > 0, we have fl! = Un I {f,, < S}; hence by compactness,

N

0 = U {f, < S} for some Nn=1

= { fN < S} by montonicity of {fj.

Thus n >_ N implies 0 < f"(w) < fN(co) 0 is given, eventually 0 < f" < S, so 0 < E(f") < E(S) = S.Therefore E(fn) 10, hence E is a Daniell integral. The result follows from4.2.9. 1

A somewhat different result is obtained if we use the Daniell theory withhypothesis B.

4.3.10 Theorem. Let S2 be a compact Hausdorff space, and let E be apositive linear functional on C(Q), with E(l) = 1. There is a unique proba-bility measure P on R(S2) such that

(a) E(f) = fa f dP for all f e C(O), and(b) for all A e R(S2),

P(A) = inf{P(V): V z) A, V open)or equivalently,

P(A) = sup{P(K) : K (-_ A, K compact).

(" Compact" may be replaced by " closed " since n is compact Hausdorff.)

PROOF. Let L = C(c2). If {f, n e D} is a net in L and f 10, then, as in 4.3.9, forany 6 > 0 we have fl = U j E F { f j < S} for some finite set F c D. If N e D andN z j for all j e F, then n = { fN < S) by monotonicity of the net. Thus n >_ Nimplies 0 < f < fN, < S, and it follows as in 4.3.9 that 10.

By 4.2.10 there is a probability measure P on a(L") = a(lr) such thatE(f) = fn f dP for all f e L. But in fact ## is the class of open sets, so thata(lr) = R(Q). For if f e L", there is a net of nonnegative continuous functionsf T f; hence for each real a, {f > a) = U f f, > a) is an open set. Thus ifG e 9, then IG e L", so that G = (Ia > 0) is open. Conversely if G is open andco e G, there is a continuous function f.: Q -+ [0, 11 such that fjw) = 1 andf,,, = 0 on G`. Thus Ic = supw.fu so that if for each finite set F c G we definegF = max{ fu, : w e F}, and direct the sets F by inclusion, we obtain a mono-tone net of nonnegative continuous functions increasing to I.. ThereforeIGeL", so that Ge4.

Thus we have established the existence of a probability measure P on9(0) satisfying part (a) of 4.3.10; part (b) follows since P = µ* on a(W), andV is the class of open sets.

To prove uniqueness, let P' be another probability measure satisfying (a)and (b) of 4.3. 10. If we can show that P' satisfies 4.2.10(b), it will follow fromthe uniqueness part of 4.2.10 that P' = P. Thus let be a net of open setswith G T G; since G is the union of the G,,, G is open. By hypothesis, givenS > 0, there is a compact K e G such that P'(G) s P'(K) + S. Now G. UK` T G U K` = fl; hence by compactness and the monotonicity ofGm U K` = SZ for some m, so that K e G.. Consequently,

P'(G) T P'(G).

Property (b) of 4.3.10 is often referred to as the "regularity" of P. Sincethe word "regular" is used in so many different ways in the literature, let usstate exactly what it will mean for us.

4.3.11 Definitions. If p is a measure on .(S)), where S2 is a normal topologi-cal space, p is said to be regular if for each A E R(S2),

p(A) = inf{p(V): V A, V open)

and

p(A) = sup{p(C): C c A, C closed).

Either one of these conditions implies the other if p is finite, and if in addition,.0 is a compact Hausdorff space, we obtain property (b) of 4.3.10.

If p = p+ - jr is a finite signed measure on N(Q), S2 normal, we say thatp is regular iff p+ and p- are regular (equivalently, if the total variation (p I isregular).

The following result connects 4.3.9 and 4.3.10.

4.3.12 Theorem. If P is a probability measure on sa1(S2), fl compact Haus-dorff, then P has a unique extension to a regular probability measure oneJ(1).

PROOF. Let E(f) = In f dP, f E C(i2). Then E is a positive linear functional onL = C(S2), and thus (see the proof of 4.3.10) if f f.) is a net in L decreasing to 0,then j 0. By 4.3.10 there is a unique regular probability measure P' on'(t2) such that In f dP = fn f dP' for all f e L. But each f in L is measurable:(S), d) - (R, V(R)), hence by 1.5.5, In f dP' is determined by the values of P'on Baire sets. Thus the condition that f a f dP = In f dP' for all f e L isequivalent to P = P' on W(Q), by the uniqueness part of 4.3.9. 1

In 4.3.9 and 4.3.10, the assumption E(l) = 1 is just a normalization, andif it is dropped, the results are the same, except that "unique probabilitymeasure" is replaced by "unique finite measure." Similarly, 4.3.12 appliesequally well to finite measures.

Now let fl be a compact Hausdorff space, and consider L = C(fl) as avector space over the reals; L is a Banach space with the sup norm. If E is apositive linear functional on L, we can show that E is continuous, and thiswill allow us to generalize 4.3.9 and 4.3.10 by giving representation theoremsfor continuous linear functionals on C(O). To prove continuity of E, note thatif f e L and If 11 _< 1, then -1 < f(co) < 1 for all co; hence -E(l) :5 E(f)

E(l), that is, I E(f) I < E(1). Therefore IIEII < E(l); in fact IIEII = E(l), as maybe seen by considering the function that is identically 1.

The representation theorem we are about to prove will involve integrationwith respect to a finite signed measure y = p+ - u-; the integral is defined inthe obvious way, namely,

f of dp = f .f dµ+ - f of d9 ,

assuming the right side is well-defined.

4.3.13 Theorem. Let E be a continuous linear functional on C(SI), SI compactHausdorff.

(a) There is a unique finite signed measure p on d(Q) such that E(f) _In fdpfor all feCO).

(b) There is a unique regular finite signed measure A on R(Q) such thatE(f) = In f dA for all f c- C(SI).

Furthermore, any finite signed measure on .01(SI) has a unique extension to aregular finite signed measure on .(SI); in particular A is the unique extensionof p.

PROOF. The existence of the desired signed measures y and A will follow from4.3.9 and 4.3.10 if we show that E is the difference of two positive linearfunctionals E+ and E-. If f Z 0, f e C(SI), define

E+(f) = sup(E(g): 0 <g < f, g e C(S))}.

If f < M, then E+(f) 5 MIIEII < oo by continuity of E; hence E+ is finite. If0:!g g ; < f;, i = 1, 2, with all functions in C(Q), then

E(9i) + E(g2) = E(g, + 92) <- E+(.f, +f2).

Take the sup over g, and g2 to obtain E+(f,) + E+(f2) < E+(f, +f2).Now if O :g g<.f, +.f2 ,.fi,.f2 >: 0, define g, = 9 A f,, 92 = 0V(9 -f1)

Then 0 < g, <-.fi, 0 <- 92:5f2, and 9=9,+92. Thus E(g) = E(g,) +E(92) :!g E+(fl) + E+(f2); hence E+(f, +f2) < E+(f,) + E+(f2), and conse-quently

E+(f, +f2) _ E+(fi) + E+(f2) (1)

Clearly E+(af) = aE+(f) if a 0. Thus E+ extends to a positive linearfunctional on C(SI). Specifically, if f =f + -f -, take E+(f) = E+(f+) -E+(f-). If f can also be represented as g - h, where g and h are nonnegative

functions in C(c2), then E+(f +) - E''(f -) = E+(g) - E+(h) by (1), so theextension is well-defined. If f >- 0, f c C(Q), define

E-(f) = -inf{E(g): 0 <g < f, g e C((2)} = (-E)+(f).

By the above argument, E- extends to a positive linear functional on C(Q).Now

E-(f) _ -inf{E(f) - E(f -g): 0 <g < f, g e C((2)}

-E(f) + sup(E(h): 0 < h < f, h e C(Q)}

_ -E(f) + E+(f).

Thus E=E+-E-.To prove that p is unique, assume that t is a finite signed measure on

.W(S2) such that In f dr = 0 for all f e C(Q). Then So f dT+ = In f dT- for allf e C(Q), hence T+ = t- (so t = 0) by 4.3.9. Uniqueness of A is provedsimilarly, using 4.3.10.

Now if p is any finite signed measure on .4(Q), E(f) = fn f dp defines acontinuous linear functional on C(Q), so by what we have just proved thereis a unique regular finite signed measure A on 94(Q) such that Sa f du =In f dl for all f e C(Q). But as in 4.3.12, this condition is equivalent to p = Aon .W(Q), so p has a unique extension to a regular finite signed measure Aon 94(Q). 1

Theorems 4.3.9, 4.3.10, and 4.3.13 are referred to as versions of the Rieszrepresentation theorem. [This name is also given to the somewhat differentresult 3.3.4(a).] A result of this type for complex-valued functions and com-plex measures is given in Problem 6.

We are going to show that II E 11 = I µ (i2) A 1 (Q) in 4.3.13. To do thiswe need a result on the approximation of Borel measurable functions bycontinuous functions.

4.3.14 Theorem. Consider the measure space (S2, F, p), where Q is a normaltopological space, 51-' = 94(Q), and p is a regular measure on F. If 0 0, and f e L"(Q, F, p), there is a continuous complex-valued functiong e L°(S2, .°r, p) such that Ilf - gllp < e; furthermore, g can be chosen so thatsup I g I <_ sup I f 1. Thus the continuous functions are dense in L".

PROOF. This is done exactly as in 2.4.14, except that the application of Prob-lem 12, Section 1.4, is now replaced by the hypothesis that p is regular. I

4.3.15 Theorem. In the Riesz representation theorem 4.3.13, IIEII = I N I (Q) _I21(L)).

PROOF. If f e C(S2), E(f) = fa f d2 = Jn f d l+ - faf d2 -; hence

IE(f)I < fnIll d2++fnIJI

d2 = fn IfI dI21 <Ilfll IA I(Q),

where II II is the sup norm. Thus IIEII < IAI(Q)Now let A,, ... , A,, be disjoint measurable subsets of S2, and define

f = L;=, MA,, where x; = 1 if 2(A) > 0, and x; = -1 if 2(A) < 0. By4.3.14, if s > 0, there is a continuous function g such that Ilgll <- 1 andJn If - g J dI 2I < e; hence E(g) = Ja g d2 > f a f dl - s. But

fnf d2 =1I2(A) I

hence

IIEII >- E(g) >- Y I2(A;)l - c.

But Y;_, 12(A;) J may be taken arbitrarily close to 12I (b2) (see Problem 4,Section 2.1); hence IIEII >- J 2I (S2) Since 12I (f) = I p I (fl), the result follows. I

If S2 is compact Hausdorff and M(0) is the collection of finite signedmeasures on d(L) (or equally well the regular finite signed measures on2(f2)), Theorems 4.3.13 and 4.3.15 show that the map E -+,u (or E -+ A) is anisometric isomorphism of the conjugate space of C(S2) and M(il), where thenorm of an element u e M(fl) is taken as I u I (0).

We close this section with some further results on approximation bycontinuous functions.

4.3.16 Theorem. Let u be a regular finite measure on normal. If fis a complex-valued Borel measurable function on S2 and 6 > 0, there is acontinuous complex-valued function g on S2 such that

u{w:f(w) # g(o)} < 6.

Furthermore, it is possible to choose g so that sup I g 1 - n,the h" are nonnegative simple functions increasing to f. Let f" = h" - h"_,,

n = 1, 2, ... (with ho = 0), so that f = Y , f". Note that f" has only twopossible values, 0 and 2-". If A" _ {f" # 01, let C. be a closed subset of A,,and V,, an open overset of A. such that µ(V" - C") < 62-". Since S2 is normal,there is a continuous g": S2 -+ [0, 1] such that g" = I on C. and g" = 0 off V".If g = Yn , 2-"g", then by the Weierstrass M-test, g is a continuous map ofn into [0, 1]. We claim that if w 0 U' , (V" - C"), a set of measure less than 6,then f(w) = g(w). To see this, observe that for each n, w e C. or co 0 V.. Ifco e C,, c A,,, then 2-"g"(co) = 2-" = f"(co), and if co 0 V", then 2-"g"(co) = 0 ="(w) since co 0 A,,.

This proves the existence of g when 0 < f _ "} =11 + f2, where f, is bounded and µ{ f2 0} = µ{ I f I > n}, whichcan be made less than 612 for sufficiently large n. If g is continuous aidµ{ f, # g} < 6/2, then µ{ f g} < 6, as desired.

Finally, if If I < M < oo, and g approximates f as above, define gl(w) _g((9) if I g(w) l < M, g1(w) = Mg(w)I I g(co) I if I g(w) I > M. Then g, is con-tinuous, I g, I < M, and f(w) = g(w) implies I g(w) I < M; hence gl(w) _g(w) = f (w). Therefore µ{ f 96 g1} < p{f # g} < 3, completing the proof.

4.3.17 Corollaries. Assume the hypothesis of 4.3.16.

(a) There is a sequence of continuous complex-valued functions f" on 0converging to f a.e. [µ], with If. I 0, there is a closed set C c S2 and a continuous complex-valued function g on S2 such that µ(C) ? µ(S2) - e and f = g on C, hence therestriction off to C is continuous. If p has the additional property that µ(A) =sup{p(K): K e A, K compact) for each A e 64(Q), then C may be taken ascompact.

PROOF. (a) By 4.3.16, there is a continuous function f" such that If. I < M =sup If I and µ{ f" 96 f } < 2 -". If A,, = {f, f) and A = lim sup" A" , thenp(A) = 0 by the Bore[-Cantelli lemma. But if co 0 A, then f(w) =f(w) forsufficiently large n.

(b) By 4.3.16, there is a continuous g such that µ{ f # g} < e/2. By regu-larity of µ, there is a closed set C e f f = g} with µ(C) >- µ{ f = g} - e/2. Theset C has the desired properties. The proof under the assumption of approxi-mation by compact subsets is the same, with C compact rather than closed.

Corollary 4.3.17(b) is called Lusin's theorem.

Problems

I . Let F be a closed subset of the metric space Q. Define f (w) = e-'(°" F)where d(w, F) = inf{d(w, y): y e F}. Show that the f are continuous andf 1 IF. Use this to give a direct proof (avoiding 4.3.2) that in a metricspace, the Baire and Bore] sets coincide.

2. Give an example of a measure space (0, #;, µ), where 12 is a metric spaceand .F = R(S2), such that for some A e .F, u(A) 96 sup{p(K): K c A, Kcompact}.

3. In 4.3.14, assume in addition that S2 is locally compact, and that µ(A) _sup{p(K): K compact subset of A} for all Borel sets A. Show that thecontinuous functions with compact support (that is, the continuousfunctions that vanish outside a compact subset) are dense in L°(S2, W, g),0 < p < co. Also, as in 4.3.14, if f e LP is approximated by the continuousfunction g with compact support, g may be chosen so that supIgI 5supifI.

4. (a) Let fl be a normal topological space, and let H be the smallest classof real-valued functions on a that contains the continuous functionsand is closed under pointwise limits of monotone sequences. Showthat H is the class of Baire measurable functions, that is, H consistsof all f: (0, sd) - (R, R(R)) (use 4.1.4).

(b) If H is as in part (a) and a(H) is the smallest a-field 5 of subsets of nmaking all functions in H measurable (relative to 5 and R(R)),show that a(H) = s?(f); hence a(H) is the same as a(C(a)).

5. Let S2 be a normal topological space, and let Ko be the class of all con-tinuous real-valued functions on Q. Having defined K. for all ordinals jfless than the ordinal a, define

K,, =U{K,,:/3<a}',

where, for any class D of real-valued functions, D' means the class of allreal-valued functions that are pointwise limits of monotone sequencesin D.

Let

K = U{KQ: a < /fi},

where /3, is the first uncountable ordinal. Show that K is the class ofBaire measurable functions.

6. Let E be a continuous linear functional on the space C(Q) of all con-tinuous complex-valued functions on the compact Hausdorff space Q.Show that there is a unique complex measure p on d(S2) such thatE(f) = !Q f du for all f e C(a), and a unique regular complex measure Aon R(Q) such that E(f) = fn f dA. for all f e C(fl). Furthermore, IIEII =

4.4 MEASURES ON UNCOUNTABLY INFINITE PRODUCT SPACES 189

p I (Q) = 12I (S2), so that C(S2)* is isometrically isomorphic to the space ofcomplex measures on .4(S2), or equally well to the space of regular complexmeasures on R (Q). (If p = p, + iµ2 is a complex measure, in f dp is de-fined as J0 f dµ, + '10 f 4142 1

provided f is integrable with respect to p1and p2. Also, p is said to be regular if p, and p2 are regular; sinceI P1 1, I P2 I < I p 1 <- 191 I + I P2 1, this is equivalent to regularity of I PSee Section 2.2, Problem 6, and Section 2.4, Problems 10 and 11 for thebasic properties of complex measures.)

7. Let M(S), 9) be the collection of finite signed measures on a a-field .9of subsets of 0. Show that if we take JIM 11 = l y I (fl), it e M(S), JF),M(f), f) becomes a Banach space. (A similar result holds for the collec-tion of complex measures.)

4.4 Measures on Uncountably Infinite Product Spaces

In Section 2.7 we considered probability measures on countably infiniteproduct spaces. The result may be extended to uncountable products if certaintopological assumptions are made about the individual factor spaces.

The product of uncountably many a-fields is formed in essentially thesame way as in the countable case.

4.4.1 Definitions and Comments. For t in the arbitrary index set T, let(Q, Jr,) be a measurable space. Let Fit E T Q, be the set of all functionsw = (w(t), t e T) on T such that w(t) e !Q, for each t e T. If t1, ..., t" e T andB" e fl_ , fl,, , we define the set B"(t,, ... , t") as {w a f , E TO,: (w(tl), ....w(t")) e B"}. We call B"(t,, ... , t") the cylinder with base B" at (t,, ... , t"); thecylinder is said to be measurable if B" e fi=, If B" = B, x x B,,, thecylinder is called a rectangle, a measurable rectangle if B; a i = 1, ... , n.Note that if all Q, = Q, then Fit E T 0, = S T, the set of all functions from Ttoo.

For example, let T = [0, 1], Q, = R for all t e T, B2 = ((u, v): u > 3,1 <v<2). Then

B2(4, j)_(xaRT:x(f)>3, 1 <x(j)<2}[see Fig. 4.1, where x, a B2(f, *) and x2 0 B2(-, i)].

Exactly as in 2.7.1, the measurable cylinders form a field, as do the finitedisjoint unions of measurable rectangles. The minimal a-field over themeasurable cylinders is denoted by lit c T F,, and called the product of theo-fields .F,. If Q, = S and 9, = So for all t, fl, c T 9, is denoted by ,9T Againas in 2.7.1, fl, E,. 9, is also the minimal a-field over the measurable rectangles.

3

2

0

x,

32 4

Figure 4.1.

We now consider the problem of constructing probability measures onH, E r , . The approach will be as follows: Let v = {t1, ..., be a finitesubset of T, where t, < t2 < < t . (If T is not a subset of R, some fixedtotal ordering is put on T.) Assume that for each such v we are given aprobability measure P on J];_, . ',,; is to represent P{w e f, e T ill:(w(t1), ... , w(tn)) e B}. We shall require that the P be "consistent"; to seewhat kind of consistency is needed, consider an example.

Suppose T is the set of positive integers and 0, = R, F, = R(R) for all t.Suppose we know P1 2345(B5) = P{w: (w,, w2, w3, w4i ws) E B5) for allB5 a R(R5). Then P{w: (w2, cu3) e B2) = P{uw (w1, (02, (03, w4, (05) e Rx B2 x R2} = P12345(R x B2 x R2), B2 a R(R2). Thus once probabilities ofsets involving the first five coordinates are specified, probabilities of sets in-volving (C02, w3) [as well as (Co,, (03 , w4), and so on], are determined. Thusthe original specification of P23 must agree with the measure induced fromP12345 We are going to show that under appropriate topological assump-tions, a consistent family of probability measures P determines a uniqueprobability measure on fl, E T ."T, .

Now to formalize: If v = {t,, ..., to}, t, < .. < t , the space (fir 52,,,

H1=1 .F,) is denoted by If u = (ti,, ..., tik) is a nonempty subset of vand y = (y(t1), ... , y(tn)) a fl,,, the k-tuple (y(til),... , y(tik)) is denoted by yn .Similarly if w = (w(t), t e T) belongs to fit E T S2 the notation to, will beused for (w(t1), ..., w(tn)).

If PE is a probability measure on the projection of P on F. is theprobability measure n defined by

[n,,(P,,)](B) = e a,,: yn e B), B e .F..


Similarly, if Q is a probability measure on fl E T F , the projection of Qon SS is defined by

r -7 l[7ru(Q)](B) = Q(w c- 11 Sgt: w e B} = Q(B(v)), B E F,,.

I tET 1111

We need one preliminary result.

4.4.2 Theorem. For each n = 1, 2, ..., suppose that .y" is the class of Borelsets of a separable metric space fl.. Let n = H. fl., with the product topology,and let F = a(Q). [Note that S2 is metrizable, so that the Baire and Borelsets of S2 coincide; we may take

d"(x", y.)d(x, y) _-

n 2" 1 + d"(x", y")

where d" is the metric of fl,,. Also, if each fl. is complete, so is S2.]Then F is the product a-field f" F.

PROOF. The sets {w E S2: w, e A,, ... , (0" E A"}, n = 1, 2, ... , where the A;range over the countable base for S2i (recall that separability and secondcountability are equivalent in metric spaces), form a countable base for 0.Since the sets are measurable rectangles, it follows that every open subset of 0belongs to f" F"; hence .F c [" F.. On the other hand, for a fixed positiveinteger i let' = (Be R(S2;): {co e n: w; E B} E F}. Then ct° is a u-field con-taining the open sets of 0;, hence ' = R(f). In other words, every measur-able rectangle with one-dimensional base belongs to F. Since an arbitrarymeasurable rectangle is a finite intersection of such sets, it follows that

We are now ready for the main result.

4.4.3 Kolmogorov Extension Theorem. For each t in the arbitrary index set T,let 0, be a complete, separable metric space, and F, the Borel sets of Sgt .

Assume that for each finite nonempty subset v of T, we are given a prob-ability measure P,, on Assume the P are consistent, that is, P.for each nonempty u t= v.

Then there is a unique probability measure P on .F = fit c T F, such that7ct,(P) = P,, for all v.

PROOF. We define the hoped-for measure on measurable cylinders byP(B"(v)) = B" a F,, .

We must show that this definition makes sense since a given measurablecylinder can be represented in several ways. For example, if all S2, = R andB2 = (- oo, 3) x (4, 5), then

B2(tl, 12) = {w: (0(t,) < 3, 4 < w(12) < 5)

_ {w: w(t,) < 3, 4 < w(t2) < 5, w(13) a R}

= B3(t,, t2 i 1, ) where B3 = (- oo, 3) x (4, 5) x R.

It is sufficient to consider dual representation of the same measurable cylinderin the form B"(v) = B"(u) where k < n and u e v. But then

P"(B") = by the consistency hypothesis

= PJy e 52,,: y,, e Bk) by definition of projection.

But the assumption B"(v) = B'(u) implies that if y e Q,,, then y e B" ify a Bk, hence P"(B") =

P is well-defined on measurable cylinders; the class .moo of measur-able cylinders forms a field, and a(.Fo) = ,F.

Now if A,, ..., A. are disjoint sets in moo, we may write (by introducingextra factors as in the above example) A; = B;"(v), i = 1, ..., m, wherev = {t,, ..., r"} is fixed and the B;", i = 1, ..., m, are disjoint sets in SF,Thus

P(`U A;) = PI `U1B;"(v))

mn

+

UB,Pv=1

by definition of P

P is a measure

m

_ P(A;) again by definition of P.

Therefore P is finitely additive on fo . To show that P is countably additiveon .tea , we must verify that P is continuous from above at 0 and invoke1.2.8(b). The Caratheodory extension theorem (I.3.10) then extends P to F.

Let Ak, k = 1, 2, ... be a sequence of measurable cylinders decreasingto 0. If P(Ak) does not approach 0, we have, for some e > 0, P(A,,) >_ e > 0for all k. Suppose Ak = B"k(vk); by tacking on extra factors, we may assumethat the numbers nk and the sets vk increase with k.

By 4.4.2, each S2"k is a complete, separable metric space and `Fvk =*010.

It follows from 4.3.8 that we can find a compact set C"k c B"k such thatPvk(B"k - C"k) < e/2k+1. Define Ak' = C"k(vk) e Ak. Then P(Ak - Ak')

P,,,, (B "k - C-) < e/2k+'. In this way we approximate the given cylindersby cylinders with compact bases.

Now takeDk=A1'nA2'n...nAk'cA, nA2n...n,gk=Ak.

Then

P(Ak - Dk) = Pi Ak U Aj) < > P(Ak (-i A;`)1=1 1=1

k k

Y P(A, - A;) < e/21+1 < e/2.1=1 1=1

Since Dk c: Ak' c Ak, P(Ak - Dk) = P(A,,) - P(Dk), consequently P(Dk) >P(Ak) - e/2. In particular, Dk is not empty.

Now pick xkeDk,k=1,2,....SayA1'=t") =C"'(v1)(noteall Dk c A 1'). Consider the sequence

(L 1 (X2, X2,), 3xt,, ... , xt",), (x,,, ... , xr",), (4,, ... , x ),

that is, x/'/,, x2,,Since the x', belong to C"', a compact subset of 1 0,,, we have a convergent

subsequence x,,;" approaching some x,,, e C"'. If A2' = C"2(v2) (so Dk c A2'for k z 2), consider the sequence X11 1, e,", ... e C"2 (eventually), andextract a convergent subsequence xv2" -+ e C"2.

Note that (x'2"), , = x12"; as n -+ oo, the left side approaches andsince {r2n} is a subsequence of {r1 }, the right side approaches x,,,. Hence(xU2)V, = x,,,.

Continue in this fashion; at step i we have a subsequence

x';" -- x,,, e C"', and (x,,)f = for j < i.Pick any co e fl: E T 0, such that x,, for all j = 1, 2, ... (such a

choice is possible since (x,,,)vj = j < i). Then co,,, e C"' for each j; hence

we nA,'c nA,=Qf,j=1 j=t

a contradiction. Thus P extends to a measure on F, and by construction,P for all v.

Finally, if P and Q are two probability measures on .` such thatfor all finite v c T, then for any B" e

P(B"(v)) = [ir,,(P)](B") = Q(B"(v)).

Thus P and Q agree on measurable cylinders, and hence on Sr by the unique-ness part of the Caratheodory extension theorem.

Problems

In Problems 1-7, (S) .F,) is a measurable space for each t e T, and

1. Let p, be the projection map of () onto SZ that is, p,(w) = w(t). If (S, 9o)is a measurable space and P. S -+ S2, show that f is measurable iff p, o f ismeasurable for all t.

2. If all n, = R, all F, = R(R), and T is a (nonempty) subset of R, howmany sets are there in F?

3. Show that if B e y , then membership in B is determined by a countablenumber of coordinates, that is, there is a countable set Tae T and a setB0 E FT, = j 1, E T. 'F, such that co e B if CT,, E B0 , where OT0 =(w(t), t E To).

4. If T is an open interval of reals and fl, = R (or R), F, = .1(R) (or R(R))for all t, use Problem 3 to show that the following sets do not belong to ,F:(a) {co: co is continuous at to}, where to is a fixed element of T.(b) {w: SUP. t;b w(t) < c}, where cc R and [a, b] c T

5. Assume each Q, is a compact metric space, with .F, the Baire (= Borel)subsets of 0, . Then by the Tychonoff theorem, f2 is compact in thetopology of pointwise convergence. Show that F is the class of Baire setsof fl, in other words, .<J, E T ,it) = [: E T d411), as follows:(a) If A = {co e n: w(t) E F}, F a closed subset of Q,, show that A =

{w a ): f(co) = 0) for some f c- C(S)). Conclude that F a .ra1(S2).(b) Use the Stone-Weierstrass theorem to show that the functions

f e C(Q) depending on only one coordinate [that is, f(w) = g,(co(t))for some t, where g, e C(S2,)] generate an algebra that is dense inC(S2). Conclude that 4(S2) c ,F.

6. Assume T is an open interval of reals, and (f2 F,) = (R, R(R)) for all t;thus n = RT, .f = A(R)T. Let A = {co e 92: co is continuous at t0), whereto is a fixed point of T.(a) Show that A is an F,a (a countable intersection of F, sets); hence

A e R(f2).(b) Show that A 0 d(fl), so that we have an example of a compact

Hausdorff space in which the Baire and Borel sets do not coincide.7. (Alternative proof of the Kolmogorov extension theoremt) Assume the

hypothesis of 4.4.3, with the stronger condition that each fl, is a compactmetric space. Put the topology of pointwise convergence on Q.(a) Let A be the set of functions in C(S2) that depend on only a finite

number of coordinates; that is, there is a finite set v c T and a con-

t Nelson. E., Ann. of Math. 69, 630 (1959).

tinuous g: 12 -+ R such that f(w) = w e 12. Use the Stone-Weierstrass theorem to show that A is dense in C(12).

(b) If f e A, define E(f) = fag dP,,. Show that E is well-defined and ex-tends uniquely to a positive linear functional on C(Q). Since E(l) = 1,the Riesz representation theorem 4.3.9 yields a unique probabilitymeasure P on An) (= f I, E T .F, by Problem 5) such that E(f) _fn f dP for all f e C(Q).

(c) Let v be a fixed finite subset of T, and let H be the collection of allfunctions g: (fl, F.,) --+ (R, E(R)) such that if f(w) = co e 12,then J, f dP = in g Show that IA e H for each open setA e 12,,, and then show that H contains all bounded Borel measur-able functions on (Q., ,F.). Conclude that P,,. (Uniquenessof P is proved as in 4.4.3.)

8. The metric space Q is said to be Borel equivalent to a subset of the metricspace Q' i1T there is a one-to-one map f: Q -+ Q' such that E = f (Q) e R(12)and f and f -' are Bore] measurable. [Measurability of f -' means thatf -' : (E, R(E)) -+ (0, R(Q)).](a) Let Q he a complete separable metric space with metric d. (It may

be assumed without loss of generality that d(x, y) < 1 for all x, y,since the metric d' = d1(1 + d) induces the same topology as d andis also complete.] Denote by [0, 1]°° the space of all sequences ofreal numbers with components in [0, 1], with the topology of point-wise convergence. (This space is metrizable; explicitly, we may takethe metric as

°° I I xn - yn I

do(X, y) = nE 2n 1 + Ix,,

If D = {w,, (02 , ...) is a countable dense subset of 12, definef: 12 --+ [0, 11' by f (w) = {d(w, wn), n = 1, 2, ... ). Show that f is con-tinuous and one-to-one, with a continuous inverse.

(b) Show thatf(1)) is a Borel subset of [0, 1]°°; thus n is Borel equivalentto a subset of [0, 1]'.

(c) Let S = {0, 1}, and let 6" consist of all subsets of S. Show that thereis a map g: ([0, 1), 4[0, 1)) --+ (S°°, .9"°) such that g is one-to-one,g[0, 1) e Y', and both g and g`' are measurable.

(d) Show that ([0, 1], R[0, 1]) and (S°°, S'°) are equivalent, that is,there is a map h: [0, 1] -+ S°° such that h is one-to-one onto, and hand h-' are measurable.

(e) If (fl, Pn) is equivalent to (Se, 9n), with associated map hn,n = 1, 2, ..., show that (ln 12n, H. ,F.) is equivalent to (fln Sn,r l. .9'n). Thus by (d), ([0, 1]°°, (-4[0, 11)°°) is equivalent to (S'°, Y').

Now by 4.4.2, (9[0, 1 ])°° is the minimal a-field over the open setsof [0, 1]Z, that is, (-4[0, 1])°° is the class of Borel sets _4([0, 1]°°).It follows from these results that if 0 is a complete, separable metricspace, S1 is Borel equivalent to a (Bore]) subset of [0, 1].

4.5 Weak Convergence of Measures

By the Riesz representation theorem, a continuous linear functional onC(S2), where S2 is compact Hausdorff, may be identified with a regular finitesigned measure on R(f2). Thus if {p,,} is a sequence of such measures, weak*convergence of the sequence to the measure p means that In f dµ - In f dpfor all f e C(O). In this section we investigate this type of convergence in asomewhat different context; 91 will be a metric space, not necessarily compact,and all measures will be nonnegative. The results form the starting point forthe study of the central limit theorem of probability.

4.5.1 Theorem. Let p, Pi, 102, ... be finite measures on the Borel sets of ametric space Q. The following conditions are equivalent:

(a) $ n f dun -> Jn f dp for all bounded continuous f: 51- R.(b) lim inf, In f dun >_ fa f dp for all bounded lower semicontinuous

f.n->R.(b') urn sups . oo in f dun < Jn f du for all bounded upper semicontinuous

f: S2 - R.(c) 5 n f dun -f n f du for all bounded f: (s2, (fz)) - (R, R(R)) such that

f is continuous a.e. [p].(d) lim inf, pn(A) >_ p(A) for every open set A (-_ S2, and un(Q) -' u(n)(d') lim sup, u (A) 5 u(A) for every closed set A c f , and

inn0 -.41).(e) u,,(A) - u(A) for every A e.4(Q) such that p(aA) = 0 (aA denotes

the boundary of A).

PROOF. (a) implies (b): If g < f and g is bounded continuous,

liminf f fdp,,_liminf f by (a).n-oo 12 n

But since f is lower semicontinuous (LSC), it is the limit of a sequence of con-tinuous functions, and if If < M, all functions in the sequence can also betaken less than or equal to M in absolute value. (See the appendix ongeneral topology, Section A6, for the basic properties of semicontinuous

4.5 WEAK CONVERGENCE OF MEASURES 197

functions.) Thus if we take the sup over g in the above equation, we obtain(b).

(b) is equivalent to (b'): Note that f is LSC if -f is upper semicon-tinuous (USC).

(b) implies (c): Let f be the lower envelope off (the sup of all LSCfunctions g such that g < f) and f the upper envelope (the inf of all USCfunctions g such that g >: f). Since f(x) = lim infy.x f(y) and f (x) = limsup.,-.,f (y), continuity off at x implies f (x) =f(x) =J(x). Furthermore, f isLSC and f is USC. Thus if f is bounded and continuous a.e. [µ],

f f dµ = f f dµ 5 Jim inf f f dµn by (b)n s1 n-oo n

Slim inf f f dµn since f 5 fn

S lim sup f f dµnn- 00 a

Slim sup f f dµn since f < fn-c n

S fn fdi by (b')

= f f dp, proving (c).

(c) implies (d): Clearly (c) implies (a), which in turn implies (b). If Ais open, then IA is LSC, so by (b), lim inf,,-. pn(A) >- µ(A). Now In - 1, sopn(Q) -+ fi(Q) by (c)-

(d) is equivalent to (d'): Take complements.(d) implies (e): Let A° be the interior of A, A the closure of A. Then

iim sup pn(A) S lim sup pn(A) S µ(A) by (d')n- oo n- 00

= µ(A) by hypothesis.Also, using (d),

lim inf pn(A) >_ lim inf pn(A°) > p(A°) = p(A).n- ao n- cc

(e) implies (a) : Let f be a bounded continuous function on Q. If If < M,let A = {c a R: p(f -1{c)) # 0}; A is countable since the f -'{c} are disjointand p is finite. Construct a partition of [- M, MI, say - M = to < t, << tj = M, with t; 0 A, i = 0, 1, ..., j (M may be increased if necessary).

If B, = {x: t; 5 f(x) < ti.,.1}, i = 0, 1, ..., j - 1, it follows from (e) thati-^1 J-1/_ tipn(Bi) - Y- ti y(Bi)i=0 i=0

[Since f -'(ti, 1.+i) is open, of -' [ti, ti, I) c.f -'{ti, ti+i}, and {if -'{ti, ti+i} =0 since ti, ti+, 0 A.] Now

fafdp -fnfdp SJ J dju. - L ti pn(Bi)

S2 2 i=0

+ I'Y- ti p,,(B.) - y_ ti p(B.)It=0 i=0

[tiIt(Bi)+ - Jn f d4

The first term on the right may be written as

1

j_1E J

(f(x) - ti)i=0 Bt

and this is bounded by maxi(ti+i - which can be made arbitrarilysmall by choice of the partition since p (S1) -+ p(fl) < oo by (e). The thirdterm on the right is bounded by max.(ti+i - ti)p(S2), which can also be madearbitrarily small. The second term approaches 0 as n -+ co, proving (a). I

4.5.2 Comments. Another condition equivalent to those of 4.5.1 is thatIn f dp - fn f du for all bounded uniformly continuous f: it -* R (see Problem1).

The proof of 4.5.1 works equally well if the sequence is replaced bya net.

The convergence described in 4.5.1 is sometimes called weak or vagueconvergence of measures. We shall write

p p are defined on e(R), there are correspondingdistribution functions F and F on R. We may relate convergence of measuresto convergence of distribution functions.

4.5.3 Definition. A continuity point of a distribution function F on R is apoint x e R such that F is continuous at x, or ± oo (thus by convention, ooand - oo are continuity points).

4.5.4 Theorem. Let p, Al, P2' ... be finite measures on £(R), with corre-sponding distribution functions F, F,, F2, .... The following are equivalent:

(a) p w' p.(b) F (a, b] - F(a, b] at all continuity points a, b of F, where F(a, b] =

F(b) - F(a), F(co) = lim, . F(x), F(-oo) = limx.._,,, F(x).

4.5 WEAK CONVERGENCE OF MEASURES 199

If all distribution functions are 0 at - oo, condition (b) is equivalent to thestatement that F,,(x) -+ F(x) at all points x E R at which F is continuous, andF (oo) --, F(oo).

PROOF. (a) implies (b): If a and b are continuity points of F in R, then(a, b] is a Borel set whose boundary has p-measure 0. By 4.5.1(e), µn(a, b] -+µ(a, b], that is, F,,(a, b] -+ F(a, b]. If a = - oo, the argument is the same, andif b = oo, then (a, oo) is a Borel set whose boundary has µ-measure 0, and theproof proceeds as before.

(b) implies (a): Let A be an open subset of R; write A as the disjointunion of open intervals I1, 12, .... Then

lim inf µn(A) = lim inf Z .U.(")n- '0 k

Y lim inf Pn(Ik) by Fatou's lemma.k n-co

Let E > 0 be given. For each k, let Ik' be a right-semiclosed subinterval of Iksuch that the endpoints of Ik' are continuity points of F, and u(Ik') >- µ(Ik)- c2-k; the Ik' can be chosen since F has only countably many discontinuities.Then

Thus

lim inf µn(Ik) >- lim inf µn(Ik) = P(IO') by (b).n- 00 n-oo

n- oo k k

liminfµn(A) - Y_ p(Ik) - Y_µ(Ik)-e=µ(A)-e.

Since a is arbitrary, we have µn '-+,u by 4.5.1(d). I

Condition (b) of 4.5.4 is sometimes called weak convergence of thesequence {Fn} to F, written Fn - F.

Problems

1. (a) If F is a closed subset of the metric space Q, show that IF is the limitof a decreasing sequence of uniformly continuous functions fn,with O!gf.<I for all n.

(b) Show that in 4.5.1, µn-4µ if fn f dµn - fn f dµ for all boundeduniformly continuous P. S2 - R.

2. Show that in 4.5.1, µn-" µ if µn(A) - µ(A) for all open sets A such thatµ(OA) = 0.

3. Let f be a locally compact metric space. If A c Q, A is said to be boundedif A c K for some compact set K.

If p, pi, P2' ... are measures on R(fl) that are finite on bounded Borelsets, we say that p,, p if in f dp -+ f a f dp for all continuous functionsf: 0 -+ R with compact support. (The support off is defined by supp f ={x: f(x) 0 0); note that f has compact support if f vanishes outside acompact set.)

Show that the following conditions are equivalent:

(a) pn w* p.(b) p (A) -> p(A) for all bounded Borel sets A with µ(8A) = 0.(c) 5- f dµ, -> f n f dp for all bounded f:.0 -. R such that f has compact

support and is continuous a.e. [p].

4.6 References

The approach to the Daniell integral and the Riesz representation theoremgiven in Sections 4.2 and 4.3 is based on Neveu (1965). If the hypothesis thatL contains the constant functions is dropped in the development of the Daniellintegral, the situation becomes more complicated. However, a representationtheorem somewhat similar to 4.2.9 may be proved under the assumption thatf e L implies I A f c- L. For a detailed proof of this result (known as Stone'stheorem), see Royden (1968, Chapter 13).

There are other versions of the Riesz representation theorem appropriatein a locally compact Hausdorff space Q. If E is a positive linear functionalon the set CjS2) of continuous real-valued functions on f2 with compactsupport, there is a unique regular measure p on the Borel sets such thatE(f) = fn f dp for each fin QS2). Here, regularity means that p is finite oncompact sets, y(A) = i-if{p(V): V A, V open) for all Borel sets A, andp(A) = sup(p(K): K e A, K compact) for all Borel sets A which are eitheropen or of finite measure. Also, if E is a continuous linear functional on theBanach space C0(Q) of continuous real-valued functions on 12 that vanishat oo (f vanishes at oo if, given e > 0, there is a compact set K such thatIf I < s off K), there is a unique regular finite signed measure on the Bore]sets such that E(f) = in f dp for all f e C0(Q); furthermore, 1(E11 = I P1 ((2).As in the compact case, the result may be extended to complex-valuedfunctions and complex measures, as in Problem 6, Section 4.3. Proofs aregiven in Rudin (1966).

In probability theory, there is usually no reason to replace compact bylocally compact spaces in order to take advantage of a more general versionof the Riesz representation theorem. For if we have a random experimentwhose outcomes are represented by points in a locally compact space, wesimply compactify the space.

For an account of the theory of weak convergence of measures, withapplications to probability, see Billingsley (1968) and Parthasarathy (1967).

Appendix on General Topology

Al Introduction

The reader is assumed to be familiar with elementary set theory, includingbasic properties of ordinal and cardinal numbers [see, for example, Halmos(1960)]. Also, an undergraduate course in point-set topology is assumed;Simmons (1963) is a suitable text for such a course. In this appendix, we shallconcentrate on aspects of general topology that are useful in functionalanalysis and probability. A good reference for collateral reading is Dugundji(1966).

Before proceeding, we mention one result, which, although usuallycovered in a first course in topology, deserves to be stated explicitly becauseof its fundamental role in the construction of topological vector spaces (see3.5.1). Throughout the appendix, a neighborhood of a point x is an open setcontaining x, an overneighborhood of x is an overset of a neighborhood of x.

Al.l Theorem. Let S2 be a set, and suppose that for each x e n, we are givena nonempty collection V(x) of subsets of f) satisfying the following:

(a) x e each U e 'K(x).(b) If U1, U2 E'Y''(x), then U1 n U2 E 7''(x).(c) If U e y''(x) and U c V, then V e '4''(x).

201

202 APPENDIX ON GENERAL TOPOLOGY

(d) If U e 'V (x), there is a set V e 'Y'-(x) such that V c U and U E 'K(y) foreach y e V.

Then there is a unique topology on fl such that for each x, Y'(x) is the systemof overneighborhoods of x.

PROOF. If such a topology exists, a set U will be open if U is an overneighbor-hood of each of its points, that is, U e 1"(x) for each x e U. Thus it suffices toshow that .f = { U (-_ Q: U e 71(x) for each x e U} is a topology, and for eachx, the overneighborhood system . V (x) coincides with 'V(x). If U, V e .l,then by (b), U n V E 1'(x) for each x e U n V, so U n V e J. If U, e .f foreach i e I, then (c) implies that U{U;: i e I} E ; (c) yields S2 e 9' as well.Since 0 e 9- trivially, .fT is a topology.

If U E .K(x), there is a set V e with x e V c U. But then U e 'V (x) by(c). Conversely, if U e *(x), let W = {x' e U: U e Y''(x)}. If x' a W, then by(d), there is a set V E *(X') with V c U and U e 11(y) for each y e V. Butthen V c W, so by (c), W e Y' (x'); consequently, W e by definition of I,and furthermore x e W by (a). Thus if U e Y''(x), there is a set W e .T withx e W c U; hence U e .K(x).

For the remainder of the appendix, V(x) will always stand for the collec-tion of neighborhoods of x.

A2 Convergence

If Q is a metric space, the topology of f) can be described entirely interms of convergence of sequences. For example, a subset A of S2 is closediff whenever n = 1, 2, ...} is a sequence of points in A and x - x, we havex e A. Also, x e.4, the closure of A, if there is a sequence of points in A con-verging to x. This result does not generalize to arbitrary topological spaces.

A2.1 Example. Let a be the first uncountable ordinal, and let S2 be the set ofall ordinals less than or equal to a (recall that for ordinals, a < b meansa e b). Put the order topology on 0; this topology has as a base the setsS2 n (a, b) = {x e 92: a < x < b} where a and b are arbitrary ordinals.

We show that a belongs to the closure of f2 - {a}, but no sequence inS2 - {a} converges to a. If U is a neighborhood of a, then for some a, b, wehave a r: S2 n (a, b) a U. Now a < a < b; hence a is countable, and thereforeso is a + 1 . Thus a + 1 E U n (S2 - {a}), proving that U E 0 - {a}. But ifx,, n 0 - {a}, n = 1, 2, ..., then each x is countable; hence so is c = sup x,,.

A2 CONVERGENCE 203

Since c < a, V = (c, a] is a neighborhood of a and x is never in V, so that thesequence cannot possibly converge to or.

In order to describe an arbitrary topology in terms of convergence, wemust consider objects more general than sequences. A sequence is a functionon the positive integers; the desired generalization, called a "net" or " gener-alized sequence," is a function on a directed set.

A2.2 Definitions. A directed set is a set D on which there is defineda preordering (a reflexive and transitive relation), denoted by <-,

with the property that whenever a, b c- D, there is a c e D with a < c andb < c. A net in a topological space f is a function from a directed set D intoQ. A net will be denoted by {x, n e D} or simply by The net is saidto converge to the point x if for every neighborhood U of x, the net is even-tually in U, that is, there is an no e D such that x e U for all n e D such thatn>-n0.

The basic relations between convergence and topology in metric spacescan now be generalized to arbitrary topological spaces.

A2.3 Theorem. Let A be a subset of the topological space fi.

(a) A point xe S2 belongs to A if there is a net in A such that x - x.(b) A is closed if for every net in A such that x,, -+ x, we have x c- A.(c) A point x e r) is a cluster point of A (that is, every neighborhood of

x contains a point of A other than x) if there is a net in A - {x} convergingto X.

PROOF. (a) If x e A and x - x, let U E IW(x). Then x is eventually in U,in particular U n A 0, and thus x e A. Conversely, if x e A, then for eachU e *(x), choose xv e U n A. Then {xv} becomes a net in A (with U:!5; V ifU=> V) and xv -x.

(b) Suppose that A is closed. If x e A, x - x, then by (a), x e A; thusby hypothesis, x e A. Conversely, if A is not closed, let x e A - A. By (a) wecan find x a A, x,, x. Since x 0 A, the result follows.

(c) If x e A - {x}, x -> x, then if U E W(x), x is eventually in U, henceU n (A - {x}) 96 0, and thus x is a cluster point of A. Conversely, if x is acluster point of A, then for each U e Qe'(x), choose xv c- U n (A - {x}); the xvform a net in A - {x} converging to x.

A2.4 Comments. The reason that sequences are not adequate in A2.3 is thatin choosing a point xv in each neighborhood U of x, we are in general forcedto make uncountably many choices. This would not be necessary if S2 were

first countable. (A first countable space is one with a countable base of neigh-borhoods at each point; that, is, if x e S2, there are countably many neighbor-hoods V1, VZ , ... of x such that for each neighborhood U of x, we haveV. c U for some n; by replacing V. by fl : i V;, we may assume without lossof generality that Vn+1 (-- V. for all n.) In A2.3, the proof will go through if wesimply choose x e V for n = 1, 2, .... Thus in a first countable space,sequences are adequate to describe the topology; in other words, A2.3 holdswith "net" replaced by "sequence."

A criterion for openness in terms of nets may also be given: The set Vis open if for every net in 92 such that x --> x e V, we have x e V even-tually. [The " only if " part follows from the definition of convergence; for the"if" part, apply A2.3(b) to V`.]

A2.5 Definitions. Let {x,,, n e D} be a net, and suppose that we are givena directed set E and a map k -* nk of E into D. Then {x,,,,, k e E } is called asubnet of {x,,, n e D}, provided that " as k becomes large, so does nk" ; that is,given no a D, there is a ko e E such that k >_ ko implies nk >_ no. If E = D = thepositive integers, we obtain the usual notion of a subsequence.

If {xn , n e D} is a net in the topological space .0, the point x e Q is calledan accumulation point of the net if for each neighborhood U of x, x isfrequently in U; in other words, given n e D, there is an m e D with m z nand xa U.

A2.6 Theorem. Let {xn , n e D} be a net in the topological space Q. If x e S2,x is an accumulation point of (xn) if there is a subnet {xnk , k e E) convergingto X.

PROOF. If xnk - x, U c- °I1(x) and n c- D, then for some ko a E, we havexnk E U for k >_ ko. But by definition of subnet, there is a k1 E E such thatk _ k1 implies k n. Thus jfk _ koand k _ k1,we have nk>:nand

xx an accumulation point of {xn, n e D). Let E be the

collection of pairs (n, U), where n e D, U is a neighborhood of x, andx E U. Direct E by setting (n, U) < (m, V) if n < m and U V. If k =(m, V) e E let nk = m. Given n and U, then if k = (m, V) >_ (n, U), we havenk = m >_ n, so that {xnk , k c- E} is a subnet of {xn , n c- D). Now if U isa neigh-borhood of x, then x e U for some n e D. If k = (m, V) z (n, U), thenxnk = x, e V c U; therefore x. I

For some purposes, it is more convenient to specify convergence in atopological space by means of filters rather than nets. If {xn, n e D) is a net

A2 CONVERGENCE 205

in fl, and a e D, let Ta = {n a D: n >_ a}, and let x(T,) be the set of all x,,,n > a. The x(T0), a e D, are called the tails of the net. The collection sd oftails is an example of a filterbase, which we now define.

A2.7 Definitions and Comments. Let sd be a nonempty family of subsets of aset r). Then sd is called a filterbase in fl if

(a) each U e sd is nonempty;(b) if U, V e sd, there is a W e sd with W e U n V.

If F is a nonempty family of subsets of 0 such that

(c) each U E .°F is nonempty,(d) if U, V e F, then U n V e P, and(e) if U c.F and U c V, then V e Pr,

then .F is called a filter in n. If sd is a filterbase, then F = {U c S2: U e Vfor some V e d) is a filter, called the filter generated by sd. If sd is the collec-tion of neighborhoods of a given point x in a topological space, sd is a filter-base, and the filter generated by sd is the system of overneighborhoods of x.

A filterbase sd in a topological space fl is said to converge to the point x(notation sd x) if for each U E 611(x) there is a set A e sd such that A C U.A filter .F in ) is said to converge to x if each U e 611(x) belongs to F. Thusa filterbase sd converges to x if the filter generated by sd converges to X.

If {xa , ii a D} is a net, then x -- x if for each U c-611(x) we have x(Ta) c Ufor some a e D, that is, x -+ x if the associated filterbase converges to x.

Convergence in a topological space may be described using filterbasesinstead of nets. The analog of Theorem A2.3 is the following:

A2.8 Theorem. Let B be a subset of the topological space Q.

(a) A point x E S2 belongs to B if there is a filterbase din B such that-+ X.(b) B is closed if for every filterbase sd in B such that .4 ---, x, we have

xaB.(c) A point x E n is a cluster point of B if there is a filterbase in B - (x)

converging to x.

PROOF. (a) If sd -+ x and U E 011(x), then A c U for some A E sd, in partic-ular, U n B 0 0; thus x e B. Conversely, if x e B, then U n B 96 Q for eachU e 611(x). Let sd be the collection of sets U n B, U e 611(x). Then sd is a filter-base in Band sd -- x.


(b) If B is closed, W is a filterbase in B, and sl --+ x, then x e B by (a),hence x e B by hypothesis. Conversely, if B is not closed and x e B - B, by(a) there is a filterbase sad in B with sad -+ x. Since x 0 B, the result follows.

(c) If there is such a filterbase a and U e °h(x), then U A for someA e .W; in particular, U n (B - {x}) 96 0, so x is a cluster point of B. Conver-sely, if x is a cluster point of B, let a consist of all sets U n (B - {x}),U e 1&(x). Then d is a filterbase in B - {x} and d -+ x. I

If fl is first countable, the filterbases in A2.8 may be formed using acountable system of neighborhoods of x, so that in a first countable space,the topology may be described by filterbases containing countably many sets.

A2.9 Definitions. The filterbase 9 is said to be subordinate to the filterbase sadif for each A e d there is a B e.4 with B c A; this means that the filtergenerated by d is included in the filter generated by B.

If k e E} is a subnet of {x, n e D}, the filterbase determined by thesubnet is subordinate to the filterbase determined by the original net. For ifno e D, there is a ko e E such that k - ko implies nk >_ no. Therefore

k>-ko} c{x,,:neD, nzno}.

If sat' is a filterbase in the topological space 0, the point x e r) is called anaccumulation point of sat if U n A # 0 for all U e &Ii(x) and all A e W, inother words, x e A for all A e sad. We may now prove the analog of TheoremA2.6.

A2.10 Theorem. Let sat be a filterbase in the topological space 0. If x e 0,x is an accumulation point of d if there is a filterbase -4 subordinate to d witha x; in other words, some overfilter of d converges to x.

PROOF. If -4 is subordinate to d and 1--+ x, let U e 61l(x), A e sat. ThenU => B and A B, for some B, B, a a; hence U n A B n B,, which isnonempty since .4 is a filterbase. Therefore x e A. Conversely, if x is an accu-mulation point of sad, let .4 consist of all sets U n A, Ue °le(x), A e sad. Thensat c 9 (take U = 91), hence 9 is subordinate to a; since a -+ x, the resultfollows. I

A2.11 Definition. An ultrafilter is a maximal filter, that is, a filter included inno properly larger filter. (By Zorn's lemma, every filter is included in anultrafilter.)

A2 CONVERGENCE

A2.12 Theorem. Let .F be a filter in the set Sl.

207

(a) .F is an ultrafilter if for each A c Sl we have A E .F or A` a F.(b) If .F is an ultrafilter and p: Sl - Sl', the filter T generated by the

filterbase p(F) = {p(F): F e .F} is an ultrafilter in il'.(c) If S is a topological space and .F is an ultrafilter in 0, F converges

to each of its accumulation points.

PROOF. (a) If ,F is an ultrafilter and A 0 F, necessarily A n B = 0 for someB e F. For if not, let .4 consist of all sets A n B, B e .F; then sad is a filterbasegenerating a filter larger than .F. But A n B = 0 implies B c A°; henceA` e .F. Conversely, if the condition is satisfied, let F be included in thefilter W. If A e W and A 0 F, then A` a .F c T, a contradiction sinceAnA`=0.

(b) Let A c SY; by (a), eitherp-1(A) e IF orp-1(A`) e .F. Ifp^1(A) a .F,then A pp-1(A) e p(.F); hence A e T. Similarly, if p-1(A°) a .F, thenA` e T. By (a), 9 is an ultrafilter.

(c) Let x be an accumulation point of .F . If U e R1(x) and U # .F, thenU° e F by (a). But U n U° = 0, contradicting the fact that x is an accumu-lation point of .F.

We have associated with each net {x,,, n e D} the filterbase {x(T4): a e D}of tails of the net, and have seen that convergence of the net is equivalent toconvergence of the filterbase. We now prove a converse result.

A2.13 Theorem. If sad is a filterbase in the set fl, there is a net in a such thatthe collection of tails of the net coincides with sad.

PROOF. Let D be all ordered pairs (a, A) where a e A and A e sad; define(a, A) S (b, B) iff B e A. If (a, A) and (b, B) belong to D, choose C e andwith C e A n B; for any c e C we have (c, C) >: (a, A) and (c, C) Z(b, B), hence D is directed. If we set X(., A) = a we obtain a net in Sl withx(T(a, A)) = A. I

We conclude this section with a characterization of continuity.

A2.14 Theorem. Let f: it -+ Q', where 0 and fl' are topological spaces. Thefollowing are equivalent:

(a) The function f is continuous on Sl; that is, f -1(V) is open in Sl when-ever V is open in Sl'.


(b) For every net in Q converging to the point x e S2, the net {f(x).

(c) For every filterbase d in fZ converging to the point x c- ), the filter-base f (sad) converges to f (x).

PROOF. Let be a net and 4 a filterbase such that the tails of the netcoincide with the elements of the filterbase. If, say, x(TQ) = A E s4, thenf (A) _ { f n e D, n >_ a}. Thus the tails of the net { f coincide withthe elements off (.a?). It follows that (b) and (c) are equivalent.

If f is continuous and x -+ x, let V be a neighborhood of fl x). Thenf -'(V) is a neighborhood of x; hence x,, is eventually in f -'(V), so thatf (x.) is eventually in V. Thus (a) implies (b). Conversely, if (b) holds and C isclosed in if, let be a net in f -'(C) converging to x. Then f(x)(b), and since C is closed we have f (x) e C by A2.3(b). Thus x e f -'(C), hencef -'(C) is closed, proving continuity off. I

A3 Product and Quotient Topologies

In the Euclidean plane R2, a base for the topology may be formed fromsets U x V, where U and V are open subsets of R; in fact U and V can betaken to be open intervals, so that U x V is an open rectangle. If

{(xn , y.), n=1,2,...)

is a sequence in R2, then (x., (x, y) if x -> x and y -+ y, that is, con-vergence in R2 is "pointwise" or "coordinatewise" convergence.

In general, given an arbitrary collection of topological spaces S2i, i E 1, letQ be the cartesian product fi'EI f ,, which is the collection of all families(x;, i e I) ; that is, all functions on I such that x, a 0, for each i. We shallplace a topology on S2 such that convergence in the topology coincides withpointwise convergence.

A3.l Definition. The product topology (also called the topology of point wiseconvergence) on S2 = fl; E r f2, has as a base all sets of the form

{xE):x1keUIk, k=1,...,n)

where the Uik are open in S and n is an arbitrary positive integer. (Since theintersection of two sets of this type is a set of this type, the sets do in factform a base.)

A3 PRODUCT AND QUOTIENT TOPOLOGIES 209

If pi is the projection of f2 onto f2i, the product topology is the weakesttopology making each pi continuous; in other words, the product topology isincluded in any topology that makes each pi continuous.

The product topology has the following properties:

A3.2 Theorem. Let 0 =F11 E 1 fl,, with the product topology.

(a) If {x("), n c D} is a net in 0 and x e f2, then x(") -+ x iff x;") -+ xi foreach i.

(b) A map f from a topological space L20 into n is continuous if pi of iscontinuous for each i.

(c) If fi: f2o --+ 1i , i e I, and we define f: 1o -> 0 by f (x) = (fi(x), i e I ),then f is continuous iff each fi is continuous.

(d) The projections pi are open maps of f2 onto L2,.

PROOF. (a) If x(") --+ x, then x;") = pi(xl)) --+ pi(x) = xi by continuity ofthe pi. Conversely, assume x;"> - xi for each i. Let

V= {ye0:yikE Uik, k = 1,...,r},

be a basic neighborhood of x. Since xik E Uik , there is an nk e D withx(,k) E Uik for n >_ nk. Therefore, if n e D and n >_ nk for all k = 1, ..., r, wehave x e V, so that xt"i -+ X.

(b) The "only if" part follows by continuity of the pi. Conversely,assume each pi of continuous. If x(") -+ x, then pi(f (x("))) -+ pi(f (x)) by hypo-thesis; hence f(P)) -f(x) by (a).

(c) We have fi = pi o f, so (b) applies.(d) The result follows from the observation that

{1i if i # any ik,pi{x a S2: xik E Uik , k = 1, ... , n}

Uik if i = ik for some k.

Note that if 0 is the collection of all functions from a topological space Sto a topological space T, then 0 = fii E 1 Qi, where I = S and 1i = T for all i.If {f,,) is a net of functions from S to T, and f: S --+ T, then f" -+f in the pro-duct topology iff f"(s) --+f(s) for all s E S.

We now consider quotient spaces.

A3.3 Definition. Let 0o be a topological space, and p a map of 0, onto aset f2. The identification topology on 0 is the strongest topology making pcontinuous, that is, the open subsets of S2 are the sets U such that p-1(U) is


open in no. When n has the identification topology, it is called an identifica-tion space, and p is called an identification map.

A quotient topology may be regarded as a particular identificationtopology. Let R be an equivalence relation on the topological space no,S2o/R the set of equivalence classes, and p: 0..0 - S2o/R the canonical projec-tion: p(x) = [x], the equivalence class containing x. The quotient space of S2oby R is S2o/R with the identification topology determined by p.

In fact, any identification space may be regarded as a quotient space. Tosee this, we need two preliminary results.

A3.4 Lemma. If n has the identification topology determined by p: 0o - S2,then g: S2 -> Q, is continuous iff g o p: 12o -- Q, is continuous.

PROOF. The "only if" part follows from the continuity of p. If g e p is con-tinuous and V is open in n1, then (g o p)-1(V) = p-1(g-1(V)) is open in no.By definition of the identification topology, g-1(V) is open in 0. 1

A3.5 Lemma. Let p: CIO -+ 0 be an identification, and let h: no -1, S21 becontinuous. Assume that h -p-' is single-valued, in other words, p(x) _p(y) implies h(x) = h(y). Then h c p'1 is continuous.

PROOF. Since h = (h o p-1) o p, the result follows from A3.4. (Note thatP_' is defined on all of 0 since p is onto.)

A3.6 Theorem. Let f: no - fi be an identification. Define an equivalencerelation R on no by calling x and y equivalent iff f(x) = fly). Let p be thecanonical projection of no onto S2o/R. Then S2o/R, with the quotient topology,is homeomorphic to n.

PROOF. We have f(x) = f(y) if p(x) = p(y), so by A3.5, fop-1 and p of -1are both continuous. Since these functions are inverses of each other, theydefine a homeomorphism of n and S2o/R. I

The following result gives conditions under which a given topology arisesfrom an identification.

A3.7 Theorem. Let p be a map of 00 onto 0. If p is continuous and eitheropen or closed, it is an identification, that is, the identification topology on r)determined by p coincides with the original topology on U.

A4 SEPARATION PROPERTIES 211

PROOF. Since p is continuous and the identification topology is the largest onemaking p continuous, the original topology is included in the identificationtopology. If p is an open map and U is an open subset of 0 in the identifica-tion topology, p-'(U) is open in fl,, hence p(p-'(U)) = U is open in theoriginal topology. If p is a closed map, the same argument applies, with"open" replaced by "closed."

A4 Separation Properties and Other Ways of Classifying Topological Spaces

Topological spaces may be classified as to how well disjoint sets may beseparated, as follows. (The results of this section are generally discussed in afirst course in topology, and will not be proved.)

A4.1 Definitions. A topological space S2 is said to be a To space if givenany two distinct points x and y, at least one point has a neighborhood notcontaining the other; f2 is a T, space if each point has a neighborhood notcontaining the other; .0 is a T2 (or Hausdorff) space if x and y have disjointneighborhoods. This is equivalent to uniqueness of limits of nets (or filter-bases). Also, C is said to be a T3 (or regular) space if i2 is T2 and for eachclosed set C and point x 0 C, there are disjoint open sets U and V with x e Uand C c V; it is said to be a T4 (or normal) space if i2 is T2 and for each pairof disjoint closed sets A and B there are disjoint open sets U and V withAcUand BcV.

It follows from the definitions that Ti implies T;_1i i = 4, 3, 2, 1. Also, theT1 property is equivalent to the statement that {x} is closed for each x. Thespace f) is regular if it is Hausdorff and for each open set U and pointx e U, there is a V c- 0&(x) with V c U. The space S is normal if it is Haus-dorff and for each closed set A and open set U A, there is an open set Vwith AcVc VcU.

A metric space is T4, for if A and B are disjoint closed sets, we maytake U = {x: d(x, A) - d(x, B) < 0}, V = {x: d(x, A) - d(x, B) > 0}, whered(x, A) = inf{d(x, y) : y e A}.

A4.2 Urysohn's Lemma. Let S) be a Hausdorff space. Then 0 is normal iffor each pair of disjoint closed sets A and B, there is a continuous functionf: 0 -+ [0, 1] with f = 0 on A and f = 1 on B.

A4.3 Tietze Extension Theorem. Let t) be a Hausdorff space. Then S2 isnormal iff for every closed set A c C1 and every continuous real-valued

function f defined on A, f has an extension to a continuous real-valuedfunction F on n. Furthermore, if if I < c (respectively, If 1 < c) on A, then1FI can be taken less than c (respectively, less than or equal to c) on Q.

A4.4 Theorem. Let A be a closed subset of the normal space n. There is acontinuous f: t -> [0, 1] such that A =f -1{O} if A is a Gd , that is, a countableintersection of open sets.

A4.5 Definitions and Comments. A topological space n is second countableif there is a countable base for the topology, first countable if there is acountable base at each point (see A2.4). Second countability implies firstcountability but not conversely. Any metric space is first countable.

If 0 is second countable, it is necessarily separable, that is, there is acountable dense subset of Q. Furthermore, if n is second countable, it isLindelof, that is, for every family of open sets V1, i e I, such that Ut V1 = S2,there is a countable subfamily whose union is Q (for short, every open cover-ing of 0 has a countable subcovering).

In a metric space, the separable, second countable, and Lindelof proper-ties are equivalent as follows. Second countability always implies separabilityand Lindelof. If n is separable with a countable dense set {x1, x2 , ...), thenthe balls B(x;, r) = {y a f : d(y, xi) < r}, i = 1, 2,..., r rational, form acount-able base. If n is Lindelof, the cover by balls B(x, 1/n), x e S2, has a countablesubcover {B(x,,;, 1/n), i = 1, 2, ...}, and the sets B(x,,;, 1/n), i, n = 1, 2, ...,form a countable base.

This result implies that any space that is separable but not second count-able (or not Lindelof), or Lindelof but not second countable (or not separable)cannot be metrizable, that is, there is no metric whose topology coincideswith the original one.

A4.6 Definitions and Comments. A topological space .0 is said to be com-pletely regular if Q is Hausdorff and for each x e n and closed set C 0with x 0 C, there is a continuous f: fZ [0, 1] such that f (x) = 1 and f = 0 onC.

If A is a subset of the normal space n, then A, with the relative topology, iscompletely regular (this follows quickly from Urysohn's lemma). Also,complete regularity implies regularity. Thus complete regularity is in betweenregularity and normality; for this reason, completely regular spaces are some-times called T34 spaces.

Now if CIO is a Hausdorff space and F is the family of continuous mapsf: no -' [0, 1 ], let n = fl (If : f e , r) where each If = [0, 11. Let e:.00 -+ f bethe evaluation map, that is, e(x) = (f (x), f e ,F).

A5 COMPACTNESS 213

With the product topology on fl, e is continuous, and e is one-to-one if Fdistinguishes points, in other words, given x, y e ao, x :A y, there is an f e Fsuch that f (x) 56f (y). Finally, if .F distinguishes points from closed sets, thatis, if 1Q, is completely regular, then e is an open map of !no onto e(flo) c Q.[If is a net and e(x) x ++ x, there is a neighborhood U of xsuch that x is not eventually in U; that is, given m, there is an n >_ m withx 0 U. Choose f e °F with f (x) = 1 and f = 0 on flo - U. Then for eachm, we have f (x,,) = 0 for some n z m, so that f (x.) ++f (x). But thene(x) a contradiction.]

It follows that if f2o is completely regular, it is homeomorphic to a subsetof a normal space. [Since 0 is a product of Hausdorff spaces, it is Hausdorff;the Tychonoff theorem, to be proved later, shows that f2 is compact, andhence normal (see A5.3(d) and A5.4).]

Since e(Q0) is determined completely by F, we may say that the continu-ous functions are adequate to describe the topology of flo.

A5 Compactness

The notion of compactness appears in virtually all areas of mathema-tics. The original compactness result was the Heine-Borel theorem: If[0, 1] c U; Vi, where the Vi are open subsets of R, then in fact [0, 1) iscovered by finitely many V, , that is, [0, 1 ] c Uk= 1 V.,, for some V;. , ... , V;..In general, we have the following definition:

A5.1 Definition. The topological space f2 is compact if every open coveringof fl has a finite subcovering.

There are several ways of expressing this idea.

A5.2 Theorem. If fl is a topological space, the following are equivalent:

(a) f2 is compact.(b) Each family of closed sets C, c f2 with the finite intersection property

(all finite intersections of the Ci are nonempty) has nonempty intersection.Equivalently, for every family of closed subsets of 0 with empty intersection,there is a finite subfamily with empty intersection.

(c) Every net in f2 has an accumulation point in 0; in other words(by A2.6), every net in f2 has a subnet converging to a point of fl.

(d) Every filterbase in fl has an accumulation point in f2, that is (byA2.10), every filterbase in fl has a convergent filterbase subordinate to it, orequally well, every filter in f2 has an overfilter converging to a point of 0.

(e) Every ultrafilter in 0 converges to a point of Q.


PROOF. Parts (a) and (b) are equivalent by the duality between open andclosed sets. If is a net and sad is a filterbase whose elements coincide withthe tails of the net, then the accumulation points of the net and of the filter-base coincide, so that (c) and (d) are equivalent. Now (d) implies (e) byA2.12(c), and (e) implies (d) since every filter is included in an ultrafilter. Toprove that (b) implies (d), observe that if sd is a filterbase, the sets A e s:1,hence the sets A, A c sad, have the finite intersection property, so by (b), thereis a point x E n{A: A c-.4}. Finally, we prove that (d) implies (b). If theclosed sets C; have the finite intersection property, the finite intersections ofthe C; form a filterbase, which by (d) has an accumulation point x. But then xbelongs to all C,. I

It is important to note that if A c Q, the statement that every covering ofA by sets open in f has a finite subcovering is equivalent to the statementthat every covering of A by sets open in A (in the relative topology inheritedfrom S2) has a finite subcovering. Thus when we talk about a compact subsetof a topological space, there is no ambiguity.

The following results follow quickly from the definition of compactness.

A5.3 Theorem. (a) If S2 is compact and f is continuous on 11, then f (Q) iscompact.

(b) A closed subset C of a compact space ) is compact.(c) If A and B are disjoint compact subsets of the Hausdorff space Q,

there are disjoint open sets U and V such that A c U and B e V. In particular(take A = {x}), a compact subset of a Hausdorff space is closed.

(d) A compact Hausdorff space is normal.(e) If A is a compact subset of the regular space 0, and A is a subset of

the open set U, there is an open set V with A c V e V e U.

PROOF. (a) This is immediate from the definition of compactness.(b) If C is covered by sets U open in Q, the sets U together with a -- C

cover 0. By compactness there is a finite subcover.(c) If x 0 B and y e B, there are disjoint neighborhoods U,(x) and V(y)

of x and y. The V(y) cover B; hence there is a finite subcover V(y), i = 1, ... ,n. Then U' =n7 I and V' = U'=1 V(y;) are disjoint open sets withx e U' and B c V'. If we repeat the process for each x e A, we obtain disjointopen sets U(x) and V(x) as above. The U(x) cover A; hence there is a finitesubcover U(X), i = 1, ... , m. Take U = U"' I U(x,), v = n,,"= 1 V(x,).

(d) If A and B are disjoint closed sets, they are compact by (b); the resultthen follows from (c).

A5 COMPACTNESS 215

(e) If x e A, regularity yields an open set V(x) with x e V(x) andV(x) c U. The V(x) cover A; so for some x...... xn, we have

n n n

A c U V(xi) c U V(xi) = U V(xi) C U.i=1 i=1 i=1

The following is possibly the most important compactness result.

A5.4 Tychonoff Theorem. If S2i is compact for each i e 1, then ill _ fl , iiiis compact in the product topology.

PROOF. Let 97 be an ultrafilter in 0. If pi is the projection of i2 onto Ui,then by A2.12(b), pi(.F) is a filterbase that generates an ultrafilter in fli. Byhypothesis, pi(F) converges to some xi a Di, and it follows that ." -+ x =(xi, i e 1). [To see this, observe that if sal is a filterbase in i2 and {x"} is a netwhose tails are the elements of a, then the tails of {pi(x")} are the elements ofpi(d). Since x" -+ x if pi(x") -+ pi(x) for all i, by A3.2(a), it follows thatsd - x iff pi(d) -' pi(x) for all i.] The Tychonoff theorem now follows fromA5.2(e).

The following result is often used to infer that the inverse of a particularone-to-one continuous map is continuous.

A5.5 Theorem. Let f: i2 - i2, where i2 is compact, 01 is Hausdorff, and f iscontinuous. Then f is a closed map; consequently iff is one-to-one onto, it is ahomeomorphism.

PROOF. By A5.3(a), (b) and (c). I

A5.6 Corollary. Let p: i2o - 0 be an identification, and let h: ffo - ill becontinuous; assume hop-' is single valued, and hence continuous byA3.5. Assume also that f)o is compact (hence so is i2 because p is onto) andf2, is Hausdorff. If h op-1 is one-to-one onto, it is a homeomorphism.

PROOF. Apply A5.5 with f = h o `. I

Corollary A5.6 is frequently applied in constructing quotient spaces. Forexample, if one pair of opposite edges of a rectangle are identified, we obtaina cylinder. Formally, let 12 = {(x, y): 0 < x < 1, 0:5 y < 1). Define an

equivalence relation R on I2 by specifying that (0, y) be equivalent to (1, y),0 < y < 1, with the other equivalence classes consisting of single points.

Let h(x, y) = (ei2Rx, y), (x, y) ,E I2 ; h maps I2 onto a cylinder C. If p is thecanonical projection of I2 onto I2/R, then A5.6 implies that h o p-1 is ahomeomorphism of I2/R and C.

In some situations, for example in metric spaces, there are alternative waysof expressing the idea of compactness.

A5.7 Definition. A topological space 12 is said to be countably compact ifevery countable open covering of 0 has a finite subcover.

A5.8 Theorem. For any topological space S2, the following properties (a)-(d) are equivalent, and each implies (e). If S2 is T1, then all five properties areequivalent.

(a) Q is countably compact.(b) Each sequence of closed subsets of 92 with the finite intersection

property has nonempty intersection.(c) Every sequence in fl has an accumulation point.(d) Every countable filterbase in f) has an accumulation point.(e) Every infinite subset of fl has a cluster point.

PROOF. The equivalence of (a), (b), and (d) is proved exactly as in A5.2,and (d) implies (c) because the tails of a sequence form a countable filterbase.To prove that (c) implies (d), let sad = {A1, A2, ...} be a countable filterbase,and choose x e n"=1 A; , n = 1, 2. .... If x is an accumulation point of

and U e °IE(x), then for each n there is an m >- n such that a U, henceU n nm 1 A i 0 0. It follows that U n A # 0 for all n, and consequently xis an accumulation point of W. To prove that (c) implies (e), pick a sequence

of distinct points from the infinite set A and observe that if x is anaccumulation point of the sequence, then x is a cluster point of A. Finally, weshow that (e) implies (a) if S2 is T1. Say U1, U2, ... form a countable opencovering of S2 with no finite subcover. Choose x1 0 Ul ; having chosen dis-tinct x1, ... , xk with x; 0 U/= I Ui , j = 1, ..., k, then xl,... , xk all belong tosome finite union Ui=1 Ui, n k + 1; choose Xk+10 Ui=I U1 (hencexk+I 11 U1 U. and x1, ..., xk+I are distinct). In this way we form aninfinite set A = {x1, X2, ...} with no cluster point. For if x is such a point,x belongs to U for some n. Since n is T1, there is a set U e V(x) such thatU c U and xi 0 U, i = 1, 2, ..., n - 1 (unless xi = x). Since Xk 0 U. fork >- n, U contains no point of A distinct from x. I

A5 COMPACTNESS 217

A5.9 Definitions and Comments. The topological space Q is said to besequentially compact iff every sequence in ) has a convergent subsequence.By A5.8(c), sequential compactness implies countable compactness. In afirst countable space, countable and sequential compactness are equivalent.For if x is an accumulation point of the sequence and V1, V2, ... (withV,,.., = V. for all n) form a countable base at x, then for each k we mayfind nk >_ k such that e Vk. Thus we have a subsequence converging to x.

A5.10 Theorem. In a second countable space or a metric space, compactness,countable compactness, and sequential compactness are equivalent.

PROOF. A second countable space is Lindelof (see A4.5), so compactness andcountable compactness are equivalent. It is first countable, so countablecompactness and sequential compactness are equivalent; this result holds ina metric space also, because a metric space is first countable.

Now a sequentially compact metric space fl is totally bounded, that is,for each e > 0, 0 can be covered by finitely many balls of radius e. (If not,inductively pick x1, x2, ... with xi+1 0 U;=, B(xi, s); then can have noconvergent subsequence.) Thus for each positive integer n, ) can be coveredby finitely many balls l 1n), i = 1, 2, ..., k,,. If { U; , j e J} is an arbitraryopen covering of SZ, for each ball I/n) we choose, if possible, a set URiof the covering such that B(x,,i, 1/n). If x e fl, then x belongs to a ballB(x, a) included in some U;; hence x e B(x,,i, 1/n) a B(x, e) c U. for somen and i; therefore x e Thus the U,i form a countable subcover, and 1),which is countably compact, must in fact be compact. I

Note that a compact metric space is Lindelof, hence (see A4.5) is secondcountable and separable.

A5.11 Definition. A Hausdorff space is said to be locally compact if eachx E !Q has a relatively compact neighborhood, that is, a neighborhood whoseclosure is compact. (Its follows that a compact Hausdorff space is locallycompact.)

A5.12 Theorem. The following are equivalent, for a Hausdorff space i2:

(a) D is locally compact.(b) For each x e S2 and U e all(x), there is a relatively compact open set V

with x e V c V c U. (Thus a locally compact space is regular; furthermore,the relatively compact open sets form a base for the topology.)

(c) If K is compact, U is open, and K c U, there is a relatively compactopen set V with K c V c V c U.

PROOF. It is immediate that (c) implies (a), and (b) implies (c) is proved byapplying (b) to each point of K and using compactness. To prove that (a)implies (b), let x belong to the open set U. By (a), there is a neighborhood V1of x such that K = V1 is compact. Now K is compact Hausdorff, and henceregular, and x e U n V1, which is open in f2, and hence open in K. Thus(see A4. 1) there is a set W open in C1 such that x e W n K and the closure ofW n K in K, namely, W n K, is a subset of U n V1.

Now xe Wn V1 and Wn V1 c WnKcU, so V=Wn V1 is thedesired relatively compact neighborhood.

The following properties of locally compact spaces are often useful:

A5.13 Theorem. (a) Let S2 be a locally compact Hausdorff space. IfK c U c 52, with K compact and U open, there is a continuous f:.0 -+ [0, 1]such that f = 0 on K and f = 1 on 0 - U. In particular, a locally compactHausdorff space is completely regular.

(b) Let S2 be locally compact Hausdorff, or, more generally, com-pletely regular. If A and B are disjoint subsets of i2 with A compact and Bclosed, there is a continuous f: 11 -+ [0, 1] such that f = 0 on A and f = 1 on B.

(c) Let S2 be locally compact Hausdorff, and let A c U c 0, with Acompact and U open. Then there are sets B and V with A c V c B e U,where V is open and a-compact (a countable union of compact sets) and B iscompact and is also a G6 (a countable intersection of open sets). Consequently(take A = {x}) the a-compact open sets form a base for the topology.

PROOF. (a) Let K c V c V c U, with V open, V compact [see A5.12(c)).P is normal, so there is a continuous g: V-+ [0, 11 with g = 0 on K, g = 1 onT7- V. Define! = g on V. f = 1 on Q- V. (On V- V, g = 1 so! is well-defined.) Now f is continuous on 0 (look at preimages of closed sets), so f isthe desired function.

(b) By complete regularity, for each x e A, there is a continuousf,,: 0 -+ [0, 1] with f,(x) = 0, fx = 1 on B. By compactness,

n

A C U {x: fx,(x) < Hi=1

for some x1,...,xn.Let ;then g=1onBand0<g<IonA,so f = max(2g - 1, 0) is the desired function.

A5 COMPACTNESS 219

(c) By A.5.12(c), we may assume without loss of generality that U isincluded in a compact set K. By (b) there is a continuous f: Q -+ [0, 11 withf = 0 on A, f = I off U. Let V = {x: f(x) < ];), B = {x: f(x) < 1}. Then V isopen, B is closed, a.id A c V c B c U c K; hence B is compact. Now B is aGa because

B= n{x:f(x)<1+and V= U{x:f(x)<K,n=1 2 n n=1 2 n

so V is a-compact. (In fact V is a countable union of compact Ga's.) I

A5.14 Corollary. Let C be completely regular, A a compact Ga subset of Q.There is a continuous f: cZ - [0, 11 such that f -1{0} = A.

PROOF. If A = n', Un , the U" open, by A5.13(b) there are continuousfn : S2 - [0, 1 ] wi th fn = 0 on A and fn = 1 off Un . Let f =Yn 1 f, 2-n. I

A5.15 Theorem. If 0 is Hausdorff, the following are equivalent:

(a) 0 is locally compact Lindelof.(b) KI can be expressed as UOO 1 Un where U. is a compact subset of

Un+1 for each n.(c) rl is locally compact and a-compact.

PROOF. (a) implies (b): The relatively compact open sets form a base byA5.12(b), in particular they cover Q. Extract a countable subcover { V1, V2 , ...}using Lindelof, and let U1 = V1, and for n > 1, U. = Vn u W,, where W. is arelatively compact open set and Wn =) U"-' Vj.

(h) implies (c): We have g = Un 1 U", proving a-compactness. If x e 0,then x e U. for some n, with U. compact, proving local compactness.

(c) implies (a): If 0 = Un 1 Kn, K. compact, and {U;} is an opencovering of Q, extract a finite subcovering of each Kn, and put the setstogether to form a countable subcovering of 0. 1

The Urysohn metrization theorem, which we shall not prove, states thatfor a second countable space Q, metrizability and regularity are equivalent.This result yields the following corollary to A5.15.

A5.16 Theorem. If n is locally compact Hausdorff, then S2 is second count-able if f) is metrizable and a-compact.

Proof. If n is second countable, it is metrizable by A5.12(b) and the Urysohnmetrization theorem. Also, f) is Lindelof (see A4.5), hence is a-compact byA5.15. Conversely, if f) is metrizable and a-compact, then f) is Lindelof byA5.15, hence second countable (see A4.5).

Finally, we consider the important one point compactjcation.

A5.17 Theorem. Let 0 be locally compact Hausdorff, and let fl* = 0 U {oo},where oo stands for any element not belonging to II. Put the following topo-logy on fl*: U is open in n* if U is open in .0 or U is the complement (in Q*)of a compact subset of .0. Then:

(a) If U c 0, U is open in it if U is open in S2*; thus S2* induces theoriginal topology on 0.

(b) fl* is compact Hausdorff.

PROOF. Part (a) follows from the definition of the topology. To prove (b), let{ U3 be an open covering of fl*. Then oo belongs to some U, and the remainingUj cover the compact set a* - U;, so the cover reduces to a finite subcover.Since f is Hausdorff, distinct points of it have disjoint neighborhoods. Ifx e 0, let V be an open subset of it with x e V and V c Q, where V, theclosure of V in d2, is compact [see A 5.12(b)]. Then V and Q* - V are dis-joint neighborhoods of x and co.

A6 Semicontinuous Functions

If f,, f2, ... are continuous maps from the topological space .0 to theextended reals R, and f (x) increases to a limit f (x) for each x, f need not becontinuous; however, f is lower semicontinuous. Functions of this type playan important role in many aspects of analysis and probability.

A6.1 Definition. Let .0 be a topological space. The function f: ) -+A is saidto be lower semicontinuous (LSC) on Q if {x a S2: f(x) > a} is open in f foreach a e R, upper semicontinuous (USC) on 1Z iff {x a fl f (x) < a} is open inn for each a e R. Thus f is LSC if -f is USC. Note that f is continuous if itis both LSC and USC.

We have the following criterion for semicontinuity.

A6.2 Theorem. The function f is LSC on S2 if, for each net convergingto a point x e n, we have lim inf f f (x), where lim inf f means

A6 SEMICONTINUOUS FUNCTIONS 221

sup,, infk, f (xk). Hence f is USC iff lim sup f f(x) whenever x.-x. (Ina first countable space, "net" may be replaced by "sequence.")

PROOF. Let f be LSC. If x - x and b < f (x), then x e f `(b,(b, oo], an opensubset of S2, hence eventually x,, e.f -'(b, oo], that is, b eventually.Thus Jim inf f f (x). Conversely if x --+ x implies

_f(x),we show that V = {x: f(x) > a} is open. Let x,, --> x, where f(x) > a. Thenlim inf,,f(x,,) > a, hence f (x,,) > a eventually, that is, x e V eventually.Thus (see A2.4) V is open. I

We now prove a few properties of semicontinuous functions.

A6.3 Theorem. Let f be LSC on the compact space fl. Then f attains itsinfimum. (Hence iff is USC on the compact space S2, f attains its supremum.)

PROOF. If b = inff, there is a sequence of points x,, a 0 with f (x,,) -+ b. Bycompactness, we have a subnet x,,, converging to some x e 92. Since f is LSC,lim infk f (x.,) >_ f (x). But f (x,,,,) -+ b, so that f (x) < b; consequently f (x) _b. I

A6.4 Theorem. If f, is LSC on 12 for each i e I, then sup; f, is LSC; if I isfinite, then min; fj is LSC. (Hence iff; is USC for each i, then inf; f, is USC,and if I is finite, then max, f, is USC.)

PROOF. Let f = sup; f,; then {x: f(x) > a) = Ui E j {x: f,(x) > a); hence{x: f (x) > a) is open. If g = min(fl, f2 , ... , then

{x: g(x) > a) =n {x:f,(x) > a)I=1

is open. I

A6.5 Theorem. Let f: S2 -+ R, SZ any topological space, f arbitrary. Define

f (x) = lim inf.f(y), x e n;

that is,y-.x

f (x) = sup inf f (y),V yeV

where V ranges over all neighborhoods of x. [If S2 is a metric space, thenf(x) = suP.= I,2, ... infd(x,,),11 f(y)]

Then f is LSC on 0 and f 5 f; furthermore if g is LSC on Sl and g < f, theng < f. Thus f, called the lower envelope off, is the sup of all LSC functions thatare less than or equal to f (there is always at least one such function, namelythe function constant at - oo).

Similarly, if J(x) = lim sup,.x f (y) = infy sup,. y f(y), then f, the upperenvelope off, is USC andf >- f; in fact f is the inf of all USC functions that aregreater than or equal to f.

PROOF. It suffices to consider f. Let be a net in Sl with x -, x andlim inf f (x,,) f (x,.) >- inff(y),

so

f (x) = sup inf f (y) _ lim infy, g(y) > g(x) since g isLSC. [If sups infy. v g(y) < b < g(x), then for each V pick xy e V withg(xv) < b. If Vl < V2 means that V2 c V1, the xy form a net converging to x,while lim infy g(xy) < b < g(x), contradicting A6.2.]

It can be shown that if Sl is completely regular, every LSC function on itis the sup of a family of continuous functions. If Q is a metric space, thefamily can be assumed countable, as we now prove.

A6.6 Theorem. Let Sl be a metric space, f a LSC function on C. There is asequence of continuous functions f.: Sl --* R such that f T f. (Thus if f isUSC, there is a sequence of continuous functions f If) If if I < M < co, thef may be chosen so that I M for all n.

PROOF [following Hausdorff (1962)]. First assume f t 0 and finite-valued.If d is the metric on n, define g(x) = inf{f(z) + td(x, z): z e Sl}, where t > 0is fixed; then 0 < g < f since g(x) f (x) + td(x, x) = f (x).

If x, y e Sl, then f(z) + td(x, z) < f(z) + td(y, z) + td(x, y). Take theinf over z to obtain g(x) < g(y) + td(x, y). By symmetry,

I9(x) - 9(y)I < td(x, y),

hence g is continuous on Q.

A7 THE STONE-WEIERSTRASS THEOREM 223

Now set t = n; in other words let inf{f(z) + nd(x, z): z E S2}.Then 0 < f,, T h < f. But given e > 0, for each n we can choose z e S2 suchthat

f .(x) + s > f (z.) + nd(x, nd(x, z.).

But f ,,(x) + E < f (x) + e, and it follows that d(x, 0. Since f is LSC,lim inf,, . f f (x) ; thus f >f(x) - e eventually. But now

f (x) - 2E for large enough n.

It follows that 0 <f T f. If If 1 < M < oo, then f + M is LSC and nonnega-tive; if 0 f+M, then f. and lffI <M.

In general, let h(x) = in + arctan x, x e R; then h is an order-preservinghomeomorphism of R and [0, n]. If f is LSC, then h -f is finite-valued, LSC,and nonnegative, so that we can find continuous functions g such that

hof. Letf. Tf

A7 The Stone-Weierstrass Theorem

In this section, we consider the family C(12) of continuous real-valuedfunctions on a compact Hausdorff space Q. The set C(S1) becomes a Banachspace under the sup norm IIf II = sup{I f(x)I : x E C2}. If A c C((I), A is calleda subalgebra of C(S2) if for all f,, f2 e A, a, b e R, we have afl + bf2 e A andfl f2 e A. Thus A is a vector space under addition and scalar multiplication,and a ring under ordinary multiplication. The main theorem of this sectiongives conditions under which A is dense in C(S2). We assume throughout thatS2 has at least two points; the Stone-Weierstrass theorem will be seen to betrivial if S2 has only a single point.

A7.1 Lemma. The function given by f(x) = IxI, -1 < x:!5; 1, can be uni-formly approximated by polynomials having no constant term.

PROOF. If 0 < y < a <- 1, then 0 < y < y + J(a2 - y2) <- a. (For y2 - 2 y+ 2a - a2 decreases when 0 < y < 1, and its value when y = a is 0.) If wedefine yo = 0, i = y,, + I(a2 - the above argument shows that0 < y,, 5 yrt+, < a for all n. Let n -+ oo to obtain y. - y, where y = y +J(a2 - y2); hence y = a.

Thus, changing the notation, definePO(X)

= 0,Pn+,(x) =(x2

-P.2(x)),

- 1 -< x < I.

By induction, pn is a polynomial having no constant term, and 0 <- pn(x) <-(x2)'12 = Ixi. Since the domain [-1, 1] is compact andpn converges

monotonically to a continuous limit p, the convergence is uniform by Dini'stheorem (see 4.3.9 for the details).

A7.2 Lemma. Let A be a subset of C(S2) that is closed under the latticeoperations; that is, if f, g c- A, then max(f, g) e A and min(f, g) E A.

If f E C(Q) and f can be approximated at each pair of points by functionsin A [if x 96 y, there is a sequence of functions fn e A with f,(x) -+f(x) andf.(y) -fly)], then f e A.

PROOF. Given S > 0, x, y E 0, there is a function e A with If -fxyl f (z) - S). Since x E U(x, y), the U(x, y) cover f) for eachfixed y; by compactness, D = U1=1 U(xj, y) for some x,, ..., x,.

Let fy = min{f,y: 1 < i:5 n} e A by hypothesis, and let

n

V(y) = n V(x,, y)i=1

If xc fl, then xc U(x;,y)for some i,sof,(x)<f,y(x)<f(x)+6;if xcV(y),then ff,y(x) > f (x) - S for all i, hence fy(x) > f (x) - b.

But the V(y) cover S2 [note y e V(y)] so by compactness, S2 = Um I V(yi) forsome yl, ..., y.. Define fa = max{ fy, : 1 <- i:!5; m} e A by hypothesis.

If x e 12, then fa(x) <f(x) + 6; since x c- V(yi) for some i, we have fa(x) >f (x) - S. Thus 11f -fall <- S and, since 6 is arbitrary, it follows that f e A. 1

A7.3 Lemma. Let A be a subalgebra of C(S)), and assume that A is a closedsubset of C(SI). Then f e A implies If I e A; also, f g E A implies max(f, g) e Aand min(f, g) e A.

PROOF. If g e A and p is a polynomial with no constant term, then p c g e A.If f e A and Ilf II = M, set g =f/M. Then Ig(x)I < 1 for all x, so by A7.1,Ilp ° 9 - 191 II can be made arbitrarily small by appropriate choice of p;consequently IgI, hence If 1, belongs to A.

Nowf + g = max(f, g)+min(f, g), If - gI = max(f, g) - min(f, g). Addi-tion and then subtraction of these equations completes the proof. I

A7.4 Lemma. Let A be a subalgebra of C(S2) that separates points; that is,if x y, there is an f e A with f (x) f (y). Assume that for each x c- S2, thereis an f e A with f (x) 0 0.

A7 THE STONE-WEIERSTRASS THEOREM 225

If x, y e 92, x # y, and r and s are arbitrary real numbers, there is anf e A withf(x) = r,f(y) =s.

PROOF. If this is not the case, then for all f1i f2 e A, the equations

af1(x) + bf2(x) = r and af1(y) + bf2(y) = s

have no solution for a and b. Thus the determinant fl(x)f2(y) - f1(y)f2(x)is 0. In particular, if fl =f, f2 =f2' we obtain f(x)f2(y) = f (y)f 2(x), orf (x) f (y)(f (y) - f (x)) = 0 for all f e A. Thus if g e A and g(x) # g(y), we haveg(x)g(y) = 0. If, say, g(x) = 0, then (takingf1 = g in the above determinant)g(x)f2(y) = g(y)f2(x) for all f2 a A. Since g(x) = 0 we must have g(y) # 0;hence f2(x) = 0 for all f2 a A, contradicting the hypothesis. The argument issymmetrical if g(y) = 0. 1

A7.5 Stone- Weierstrass Theorem. Let A be a subalgebra of C(S2) thatseparates points.

(a) If for each x e 0, there is an f e A with f (x) # 0 (in particular, if theconstant functions belong to A), then A = C(1); thus A is (uniformly) densein C(S2).

(b) If all f e A vanish at a particular x e !Q (there can be at most one suchpoint since A separates points), then A = {g a C(Q): g(x) = 01.

PROOF. (a) Since A is an algebra, so is A; by A7.3, A is closed under thelattice operations. Let g e C(S2); if x, y e S2, x # y, then by A7.4 there is anf e A e A such that f(x) = r = g(x) and f(y) = s = g(y). Thus g can be ap-proximated at each pair of points by functions in A. By A7.2, g c -A; inother words, A = C(S2).

(b) If f (x) = 0 for all f e C(Q), let B be the algebra generated by A andthe constant functions; thus B = { f + c: f e A, c a constant function}. Then Bsatisfies the hypothesis of (a), so B = C(S2). Now if S > 0, g e C(Q), and g(x) =0, then for some f e A and some c we have Ilg - (1 + c)II < 6/2. But g(x) =f(x) = 0; hence Icy < 6/2, and therefore 11g -III < S. Thus g e A, so that{g a C(S2); g(x) = 0} c A. But the reverse inclusion holds by hypothesis,completing the proof. I

A7.6 Corollary. The Stone-Weierstrass theorem holds equally well withC(Q) taken as the collection of all continuous complex-valued functions on 0,if we add the hypothesis that whenever f e A, the complex conjugate f alsobelongs to A.

PROOF. (a) Let B c A be the set of functions in A that assume only realvalues. If x # y, there is an f =f, + if2 e A with f(x) # fly). Thus eitherf,(x) #f,(y) orf2(x) #f2(y) Nowf, = (f + f)/2 andf2 = (f - f)/2i belong toA by hypothesis; hence fl, f2 E B since f, and f2 are real-valued. Thus Bseparates points. Furthermore, if x e 0, then f(x) # 0 for some f e A; butIf I2 = ff a B, so B satisfies the hypothesis of A7.5(a). If f = fl + if2 e C(Q),then f, and f2 can be uniformly approximated by functions in B; hence f canbe uniformly approximated by functions in A.

(b) This follows from (a) exactly as in A7.5.

If 0 is the unit disk in the complex plane and A is the collection of allanalytic functions on S2, then A is a closed subalgebra of C(52) that separatespoints and contains the constant functions; but A # C(Q). Thus the hypothesisf e A implies f E A cannot be dropped in A7.6.

A8 Topologies on Function Spaces

In this section we examine spaces consisting of functions from an arbitraryset 52 to another set 521. Some structure will be imposed on 52, ; it can be ametric space or, more generally, a gauge space, which we now define.

A8.1 Definitions and Comments. A gauge space is a space 52, whosetopology is determined by a family -9 = -9(01) of pseudometrics; thus asubbase for the topology is formed by the sets Bd(x, b) = (y a 521: d(x, y) < S),x e 52,, S > 0, d e .9. [A pseudometric has all the properties of a metricexcept that d(x, y) can be 0 for x :A y.] Convergence of x to x in 521 meansd(x,,, all de-9.

Since Bd,(x, S) n BdZ(x, S) = Bd(x, S), where d is the pseudometricmax(d,, d2), it follows that the sets Bd. (x, S), d' = max(di...... di ), n = 1,2, ... , the dik a _Q, form a base for the topology. Denote by -9 + =9 ' (rl,) thecollection of all pseudometrics d+.

The gauge space 01 is Hausdorff if whenever x # y, there is a d e 0 withd(x, y) # 0; in this case, -9 is said to be separating.

The Hausdorff gauge spaces coincide with the completely regular spaces.To see this, let x belong to the open set U in the gauge space 52,. If de 9+and Bd(x, S) e U, letf(y) = min(l, d(x, y)/S). Then f is continuous, 0:5f:5 1,f(x) = 0, and f = 1 off U. Conversely, if 0, is completely regular, it can beembedded in a product of closed unit intervals (see A4.6). Now a product ofgauge spaces is a gauge space [take d(x, y) = di(xi, yi), where the d; are

A8 TOPOLOGIES ON FUNCTION SPACES 227

pseudometrics for the ith coordinate space], and a subspace of a gauge spaceis a gauge space, and the result follows.

If -9 consists of countably many pseudometrics dl, d2, ... , let

00

d = min(d", 2-");"=1

the single pseudometric d incudes the topology of 521; hence S21 is pseudo-metrizable (metrizable if Q, is Hausdori).

We now discuss function space topologies.

A8.2 Definitions and Comments. Let F(12, 521) be the collection of allfunctions from the set 0 to the gauge space C. If d is a family of subsets of52, the topology ,,, of uniform convergence on members of d has as subbasethe sets

(g e F(12, 121): sup d(g(x), f (x)) < 8},xeA

Aed, b>0, de-9.

Convergence relative to .d means uniform convergence on each A e sad.For example, if W = {12}, we obtain the topology of uniform convergence

on 52, also called the uniform topology; if fl is a topological space and sadconsists of all compact subsets, we obtain the topology of uniform convergenceon compact sets. If sd is the collection of all singletons {x}, x e S2, we have thetopology of pointwise convergence.

Adaptations of standard proofs in metric spaces show the following:

A8.3 Theorem. If C(1), (21) is the collection of all continuous maps from thetopological space Ci to the gauge space 01, then C(12,121) is closed in F(12,121)relative to the topology of uniform convergence on 52. In other words, auniform limit of continuous functions is continuous.

PROOF. If f" e C(52,121), f" -+f uniformly on 12, x0 a 92, and d e ..9, then

d(f(x),f(x0)) <_ d(f(x),f"(x)) + d(,,(x),f"(x0)) + d(f (x0),f(x0))

The result follows just as in the metric space proof. I

A8.4 Theorem. If f is a continuous mapping from the compact gauge space12 to the gauge space Q1i then f is uniformly continuous; that is, if d e 3(521)and s > 0, there is a d+ e -9 +(1)) and a 8 > 0 such that if x1i x2 a 12 andd+(x1, x2) < S, then d(f(x1), f(x2)) < e.

PROOF. If f is not uniformly continuous, there is an e > 0 and e e 2(Sfl) suchthat for all S > 0, d+ a 2+(S2), there are points x, y e S2 with d+(x, y) < (5but e(f (x), fly)) >_ 8. If we choose such (x, y) for each S > 0 and d e 2(12), weobtain a net with d(x,,, 0 but e for all n.(Take (61 i d) >_ (62, d') iff S1 5 a2 .) By compactness we find a subnet with

yn,.) -> (xo, yo) for some xo, yo a fl; thus f(x,,,) -->f(xo), f(Y,,k) -'f(Yo)by continuity. But 0, so d(xo, yo) = 0 and consequently,e(f(xo), f(yo)) = 0 by continuity off, a contradiction. I

We now prove a basic compactness theorem in function spaces.

A8.5 Arzela-Aseoli Theorem. Let 0 be a compact topological space, S21a Hausdorff gauge space, and G c C(12, f21), with the uniform topology.Then G is compact iff the following three conditions are satisfied:

(a) G is closed,

(b) {g(x): g e G} is a relatively compact subset of S21 for each x e 12, and(c) G is equicontinuous at each point of 92; that is, if e > 0, dc- 2(S)1),

x0 a 92, there is a neighborhood U of x0 such that if x e U, then

d(g(x), g(xo)) < e

for all geG.

PROOF. We first note two facts about equicontinuity.

(1) If M e F(0, S21), where Q is a topological space and l1 is a gaugespace, and M is equicontinuous at x0, the closure of M in the topology ofpointwise convergence is also equicontinuous at x0.

(2) If M is equicontinuous at all x E S2, then on M, the topology ofpointwise convergence coincides with the topology of uniform convergenceon compact subsets.

To prove (1), let f e M, f - f pointwise; if d e Q(S21), we have

d(f(x),f(x0)) < d(1f(x),f,(xo)) + d(1f(xo),.f(xo))

If S > 0, the third term on the right will eventually be less than 6/3 by thepointwise convergence, and the second term will be less than 6/3 for x in someneighborhood U of x0, by equicontinuity. If x e U, the first term is even-tually less than 6/3 by pointwise convergence, and the result follows.

To prove (2), let f, e M, f f pointwise, and let K be a compact subset ofn; fix 6 > 0 and d e 2' (S21). If x c- K, equicontinuity yields a neighborhood

A8 TOPOLOGIES ON FUNCTION SPACES 229

U(x) such that y c U(x) implies d(f,(y), f (x)) < 6/3 for all n. By compactness,K c U'= 1

U(x;) for some x1, ..., x,. Then

d(f(x),f.(x)) < d(f(x),f(x;)) + d(f(x1),f,(xi)) + d(f.(x1),f (x)).

If x e K, then x e U(x;) for some i; thus the third term on the right is less than6/3 for all n, so that the first term is less than or equal to 6/3. The second termis eventually less than 6/3 by pointwise convergence, and it follows thatf -+f uniformly on K.

Now assume (a)-(c) hold. Since G fXEn{g(x): g e G}, which is point-wise compact by (b) and the Tychonoff theorem A5.4, the pointwise closureGo of G is pointwise compact. Thus if is a net in G, there is a subnetconverging pointwise to some g e Go. By (c) and (1), Go is equicontinuous ateach point of f2; hence by (2), the subnet converges uniformly to g. But g iscontinuous by A8.3; hence g e G by (a).

Conversely, assume G compact. Since S21 is Hausdorff, so is C(f2, f )[as well as F(f, S21)], hence G is closed, proving (a). The map g -+ g(x) of Ginto f is continuous, and (b) follows from A5.3(a). Finally, if G is notequicontinuous at x, there is an E > 0 and a d e k(ill) such that for eachneighborhood U of x there is an xu e U and gu e G with d(gv(x), g11(x11)) >_ E.If U >_ V means U V, the gu form a net in G, so there is a subnet converginguniformly to a limit g e G. But xu --> x; hence g11(xu) - g(x), a contradiction.

[The last step follows from the fact that the map (x, g) -, g(x) of fl x G intof21 is continuous. To see this, let x -> x, and g -+g uniformly on fl; ifd e .9(S21), then

d(gn(xn), g(x)) <_ d(gn(xn), g(xn)) + d(g(x ), g(x)).

The first term approaches 0 by uniform convergence, and the second termapproaches 0 by continuity of g.]

If G satisfies (b) and (c) of A8.5, but not necessarily (a), the closure of Gin the uniform topology satisfies all three hypotheses, and hence is compact.In the special case when S21 is the set of complex numbers, C(C2, .01) = C(f) isa Banach space, and in particular, compactness and sequential compactnessare equivalent. We thus obtain the most familiar form of the Arzela-Ascolitheorem: If f1, f2 , ... is a sequence of continuous complex-valued functionson the compact space fl, and if the f are pointwise bounded and equicon-tinuous, there is a uniformly convergent subsequence. In fact the f must beuniformly bounded by the continuity of the map (x, g) -- g(x) referred toabove.

A9 Complete Metric Spaces and Category Theorems

A9.1 Definitions and Comments. A metric space (or the associated metricd) is said to be complete iff each Cauchy sequence (that is, each sequencesuch that d(x", xm) --> 0 as n, m - oo) converges to a point in the space.

Any compact metric space is complete since a Cauchy sequence with a con-vergent subsequence converges.

It is important to recognize that completeness is not a topological property.In other words, two metrics, only one of which is complete, may be equivalent,that is, induce the same topology. As an example, the chordal metric d' on thecomplex plane [d'(z1, z2) is the Euclidean distance between the stereographicprojections of z, and z2 on the Riemann sphere] is not complete, but theequivalent Euclidean metric d is complete. Note also that the sequence z" = nis d'-Cauchy but not d-Cauchy, so the notion of a Cauchy sequence is nottopological.

The subset A of the topological space 0 is said to be nowhere dense if theinterior of A is empty, in other words, if the complement of A is dense. Aset B c 0 is said to be of category 1 in (1 if B can be expressed as a countableunion of nowhere dense subsets of a; otherwise B is of category 2 in Q.

The following result is the best known category theorem.

A9.2 Baire Category Theorem. Let (S), d) be a complete metric space. If A.is closed in (I for each n = 1, 2, ..., and U"'=1 A. = S2, then the interiorA"° is nonempty for some n. Therefore i2 is of category 2 in itself.

PROOF. Assume A"° = 0 for all n. Then Al # f), and since S2 - A, is open,there is a ball B(x1, S1) c 0 - A, with 0 < a1 < 1. Now B(x1i 2a1) ¢ A2since A2° = 0; therefore B(x1,161) - A2 is a nonempty open set, henceincludes a ball B(x2, l52) with 0 < b2 < 1. Inductively, we find B(xn, an),0<b,,<2-", such that B(xn, S") c B(x"_1, 8"_1) - A" [in particular,B(x", 8n)rAn=0]. Now ifn<m,

m-1 m-1d(xn a x,,,):5 F d(xi a xi,. 1) < Y, 2 -'-+ 0 as n, m --' oo,

i=n i=n

so by completeness, xn converges to some x e 0. Since xk a B(xn, 48") fork > n, we have x e B(xn, Sn) for all n, hence x cannot belong to any An, a con-'tradiction.

In A9.2, if for each n = 1, 2, ..., U. is an open dense subset of S2, thenn,

=

1U.0 0, for if the intersection is empty, then 0 is the union of the

nowhere dense sets f) - Un. We prove a stronger result:

A9 COMPLETE METRIC SPACES AND CATEGORY THEOREMS 231

A9.3 Theorem. If U. is an open dense subset of the complete metric space 12(n = 1, 2, ...), then n 1 U is dense.

PROOF. Consider any ball B. Then B is a closed subset of the complete metricspace n, hence B is complete. Now U. n B is dense in B since B is open;hence U n B (and therefore U n B) is dense in B. By the remark after A9.2,(n 1 Un) n Bo 0. Thus n' 1 U. meets every closed ball, hence everyopen ball, hence every nonempty open set. I

Other important spaces satisfy the conclusion of A9.3:

A9.4 Theorem. If U is an open dense subset of the locally compact Haus-dorff space fl (n = 1, 2, ...), then nn 1 U. is dense.

PROOF. Let U be a nonempty open subset of f1; we must show thatU n nn

1U,,# 0. Since U1 is dense, U n U1 # 0, and by local compactness

(see A5.12), there is a nonempty, relatively compact, open set V1 with V1 cU n U1.

Now V1 n U2 0 so we may choose a nonempty, relatively compact,open set V2 with V2 c V1 n U2. Inductively we select nonempty, relativelycompact, open sets V. with V. c

V decrease with it, they have the finite intersection property, andsince all V are subsets of the compact set V1. n-, V, # 0. If x belongs tothis intersection, then x e U n n 1 U..

A9.5 Definition. The topological space f2 is said to be a Baire space if theintersection of countably many open dense subsets of n is dense in f2. ByA9.3 and A9.4, complete metric spaces and locally compact Hausdorffspaces are Baire spaces.

A9.6 Theorem. (a) SZ is a Baire space if for each A of category 1 in Q,A° = 0; that is, fl - A is dense.

(b) Let f2 be a Baire space. If fZ = Un 1 A. where the A. are closed,then 0 0 for some n. Thus f2 is of category 2 in itself.

PROOF. (a) Assume fZ is a Baire space, and let A = UH 1 A,,, where the Anare nowhere dense. Then K) - A = n 1 (0 - A.) D n 1(Q - A.), which isdense by hypothesis.

Conversely, if every set of category 1 in fZ has empty interior, let U. bedense in f2, n = 1, 2..... Then El - U,, is (closed and) nowhere dense, so

Un 1 (S2 - U") is of category 1 in S2. By hypothesis, the complement of thisset, namely (1 , U., is dense.

(b) If A"° = 0 for all n, then fl is of category 1 in itself, contradicting(a). I

There are spaces that cannot be expressed as a countable union of now-where dense sets, but are not Baire spaces. (As an example, consider thefree union of a Baire space and a non-Baire space.)

A9.7 Definition. The topological space Q is said to be topologically completeiff there is a complete metric don 0 that induces the given topology.

The following sequence of results is aimed at characterizing topologicallycomplete spaces.

A9.8 Theorem. Let (0, d) be a complete metric space. If U is an open subsetof S2, U is topologically complete.

PROOF. If x, y e U, let d'(x, y) = d(x, y) + If(x) -f(y)I, where f (x) =[d(x, i) - U)]-'. Then d' is a metric on U, and d-convergence is equivalentto d'-convergence by continuity off. Thus d' induces the original topology. If{x"} is d'-Cauchy, it is d-Cauchy; hence it converges (relative to d) to somex e !Q. But x e U; for if not, then f(x") -+f(x) = oo, contradicting {x"}d'-Cauchy. Since d and d' are equivalent on U, x,, converges to x relative tod'. I

A9.9 Theorem. Let (0, d) be a complete metric space, and U., n = 1, 2, ... ,open subsets of n. If A = l , U", that is, if A is a G5, then A, with therelative topology, is topologically complete.

PROOF. Let d,, be a complete metric for U,, with d,, equivalent to d [d,, existsby A9.8; we may assume that d,, < 1, for if not replace d,, by d"/(1 + d").]Define g: A B = fl 1 U; by g(x) = (x, x, ...), and define a metric d° on Bby d°(x, y)= Y;°_ 2-`d,(x,, y;). Then d° induces the product topology on B,and a sequence {P)} in B is d°-Cauchy if {x;")} is d; Cauchy for each i. Itfollows that d° is complete. (Thus we have shown that a countable product oftopologically complete spaces is topologically complete.)

Now g is a homeomorphism of A and g(A) c B, and in fact g(A) is closedin B. For if (x", x", ...) - y e B, then for each i we have d,(x", y,) - 0 asn - oo, so that x" -+ y, . But then y, is the same for all i, say y, = yo. Since

A9 COMPLETE METRIC SPACES AND CATEGORY THEOREMS 233

yi e U; for each i we have yo e n? , U; =A, hence y = (yo, yo, ...) eg(A),proving g(A) closed. Since B is complete, so is g(A), and since g is a homeo-morphism, A is topologically complete. I

It follows from A9.9 that the irrational numbers are topologically com-plete. In fact, a complete metric for the set of irrationals in (0, 1) is given byd(.a, a2 ..., b, b2, ...) = 1/n, where n is the first integer for which a 0 b (inthe decimal expansion).

A9.10 Theorem. Let A be a topologically complete subspace of the Haus-dorfl space 0. If A is dense in Q, then A is a Ga .

PROOF. Let U. = {x a S2: for some neighborhood V of x, the diameter ofV n A is less than 1/n}. (The diameter is calculated using a particular com-plete metric d for A.) If x e U,,, then x c- V c U,,, so U,, is open in Q. We aregoing to show that A =no , U,,.

Let x e A, and let Vo = {y e A: d(x, y) < 1/4n). Then Vo is open in A;hence Vo = V n A, where V is open in S2. Now x e Yo c V, and if y, z e Vo,we have d(y, z) < d(y, x) + d(x, z) < 1/2n, so that Vo has diameter less thanor equal to 1/2n < 1/n. It follows that xe nn , U,,.

Conversely, assume x e nn , Un. Since A is dense, there is a net xi e Awith xi -+ x. For each n, let V be a neighborhood of x such that V n A hasdiameter less than 1/n. Eventually xi e V, and it follows that for someindex i,, we have d(xi, x;) < 1 In for i, j >: in. (In other words, the net isCauchy.) It may be assumed that i,,< in,, for all n; then the xi form aCauchy sequence which converges to a limit y e A by completeness. Since S2is Hausdorfi, we have y = x; hence x e A.

We now prove the main theorem on topological completeness.

A9.11 Theorem. Let n be a metrizable topological space. Then S2 is topolog-ically complete if it is an absolute Ga; in other words, 0 is a G,, in everymetric space in which it is topologically embedded.

PROOF. If S2 is topologically complete and is embedded in the metric spaceQ,, then 0 is dense in 31; so by A9.10, S2 is a Ga in 31. Thus 12 can be expressedas nR , W,,, where W = U,, n I2, U,, open in 52,. But 0 is a closed subset ofthe metric space 52,, so S2 is a G,, in Q, (rl = nn , {x a S2, : d(x, S2) < 1/n)).It follows that S2 is a Ga in 52,.


Conversely, let fl be an absolute G. Embed (I in its completion (weassume as known the standard process of completing a metric space byforming equivalence classes of Cauchy sequences). By A9.9, f) is topologicallycomplete. I

We now establish topological completeness for a wide class of spaces.

A9.12 Theorem. A locally compact metric space is topologically complete.

PROOF. If Q is such a space and KI is its completion, then 4 is open in S2. For ifx e ), there is, by local compactness, an open (in S2) set V with x e V n S2 andV n (I compact. In fact V c S2, proving D open. For if y e V and y 0.0,then for each Ue °W(y), U n V n f A0 since 0 is dense; consequently,U n V n fl # 0. The sets U n V n f, U e all(y), form a filterbase -4 inV n fl converging to y, and since S1 is Hausdorff, y is the only possible accu-mulation point of .411. But y 0 V n 0, contradicting compactness of V n S2[see A5.2(d)]. By A9.8, f2 is topologically complete.

A10 Uniform Spaces

We now give an alternative way of describing gauge spaces.

A10.1 Definitions and Comments. Let V be a relation on the set fl, that is,a subset of fl x (I. Then V is called a connector if the diagonal

D = {(x, x): x e 0}

is a subset of V. A nonempty collection . of connectors is called a uniformityif

(a) for all VL, V2 e .., there is a We Y with W c VL n V2 (in otherwords W is a filterbase), and

(b) for each V e, there is a W e . with WW -' c V.

[The relation WW-' is the composition of the two relations W and W -';that is, (x, z) e WW -' if for some y e Q we have (x, y) e W -' and (y, z) a W;(x, y) e W ' means (y, x) e W.]

If in addition, .*' is a filter, that is, if V e .e and V c W, then W e .y,then .*' is called a uniform structure. In particular, if _* is a uniformity, thefilter generated by ..£° is a uniform structure.

A10 UNIFORM SPACES 235

As an example, let -9 be a family of pseudometrics own, and let W consistof all sets of the form V = {(x, y): d+(x, y) <S}, b > 0, d+ e _Q+; then . is auniformity on fl. [To see that condition (b) is satisfied, let

W = {(x, y): d+(x, y) <S/2};

then WW - 1 c V by the triangle inequality.]By analogy with the pseudometric case, if V belongs to the uniformity W,

we say that x and y are V-close if (x, y) e V.Two uniformities W, and .7°2 are said to be equivalent if they generate

the same uniform structure; in other words, given V2 a -W2, there is aV, e *, with V, c V2 and given W, a .W,, there is a W2 E 2 with W2 c W1.Two uniform structures that are equivalent must be equal; hence there isexactly one uniform structure in each equivalence class of uniformities.

If Y is a uniform structure and V, W e W, WW -1 c V, then if D is thediagonal, we have

W-1 = DW-1 cWW-1 V,

hence W c V -1, so that V a .. But then

V n V e (and V n V-1 V).

Now U = V n V is symmetric, that is, U = U -1. Thus if . is a uniformstructure, the symmetric sets in .; form a uniformity that generates 0, sothat every uniformity is equivalent to a uniformity containing only symmetricsets.

We are going to show later that every uniform structure is generated bya uniformity corresponding to a family of pseudometrics. As a preliminary,we establish the following result:

A10.2 Metrization Lemma. Let U" , n = 0, 1, 2, ..., be a sequence of sub-sets of 0 x S2 such that each U,, is a connector, U0 = 0 x f), and U,3+1(= U"+1U"+lU"+1) e U,, for all n. Then there is a function d from S2 x S2to the nonnegative reals, such that d satisfies the triangle inequality andU. c {(x, y): d(x, y) < 2-") c U"_1 for all n = 1, 2. .... If each U. is sym-metric, there is a pseudometric satisfying this condition.

PROOF. Define f(x, y) = 0 if (x, y) e U. for all n; f(x, y) = # if (x, y) a U0 -U1 ; f (x, y) = I if (x, y) e U, - U2, and in general, f (x, y) = 2-n if(x, y) e U"_, - U". Since U"+, c U.+, c U,,, f is well-defined on C2 x fl;furthermore,

(x, y) e U,, if f (x, y) <2 -". (1)

If x, y e 0, let d(x, y) = inf Y"=0 f(xi, xi+1), where the inf is taken overall finite sequences x0, x1, ..., xn+1 with xo = x, x"+1 = Y

Since a chain from x to y followed by a chain from y to z yields a chainfrom x to z, d satisfies the triangle inequality, and since d(x, y) <- f(x, y),U. c {(x, y): d(x, y) < 2 -") by (1). If the U. are symmetric, then so is d;hence d is a pseudometric.

Now we claim that

n

f(xo, xn+1) <- 2 Y f(xi, xi+1) for any xo, ... , xn+1 (2)i=O

This is clear for n = 0, so assume the inequality for all integers up ton - 1. Let a = Y"=0f(xi, xi+1) If a = 0, then (xi, xi+1) e n o Un, andsince Ui3 c Ui_1 we have (xo, xn+1) e U. for all i, and there is nothing furtherto prove. Thus assume a > 0.

Let k be the largest integer such that Yk=o f(xi, x;+1) < a12 [iff(xo, x1) >a/2, take k = 0]. Then Y+=o f(xi, xi+1) > a/2, so y"=k+1f(xi, xi+1) <_ a/2.By induction hypothesis, f(xo, xk) < 2 Ek=o f(xi, xi+1) < 2a/2 = a and(xk+l , xn+1) C 2 Y-i=k+ls(xi, xi+l) 2a/2 = a Also,

n

f(xk, xk+1) ! Y f(xi, xi+l) = a.i=0

Let m be the smallest nonnegative integer such that 2-' :!9 a. If m _< 2,then (2) holds automatically since f <_ 3; thus assume m > 2. Then 2-' :5;a < 2`(in-1), so f(xo, xk) < a < 2-(rn-1), and therefore by (1), (xo, xk)[and similarly, (Xk, xk+l), (xk+1, xn+1)] belongs to Uni_1. Thus (xo, xn+1) eUn3,_1 C which implies that f(xo, xn+1) < 2-("'-2); that is,

f(x0, x"+1) :2-(m-1) < 2a,

proving (2).Finally, if d(x, y) < 2-", then for any chain from x to y, (2) yields (with

x0 = x, xn+1 = y)f(x, y) <_ 2d(x, y) < 2-("-1), so (x, y) e Un-1. 1

A10.3 Metrization Theorem. If ,° is a uniformity, the following conditionsare equivalent:

(a) . is pseudometrizable; that is, there is an equivalent uniformitygenerated by a single pseudometric. (See the example in A10.1 for the con-struction of the uniformity corresponding to a family of pseudometrics.)

(b) . has a countable base; that is, there are countably many sets Vin the uniform structure generated by .; ' such that if U e A, then V" c Ufor some n.

PROOF. If .' is pseudometrizable with pseudometric d, the sets

{(x, y): d(x, y) < r},

r rational, form a countable base. Conversely, assume Al has a countable baseVo, V1, V2, .... We may assume that the V" are symmetric (if not, considerV. n V.'), V. c Vn+1 for all n (if not, consider n:=o V;), and V. = 0 x I.Take U0 = V0, U1 a symmetric set in the uniform structure .°F generatedby . such that U13 c V1 (c Uo); let U2 be a symmetric set in suchthat U2' r-- V2 n U1. In general, let U. be a symmetric set in ..y such thatU"3 c V" n U"_1. (To see that the U" exist, let W. be a symmetric set in Fwith W"2 C V" n U"_ I. If U. is a symmetric set in F such that U"2 W",then U"3 = U"3D c U,,4 c W"2 C V" n U"_1. Also, since U. (- UU3 c V",the U,, form a base for .*°.)

By A10.2, there is a pseudometric d such that U. c {(x, y): d(x, y) < 2-"} eU"_1i n = 1, 2, .... It follows that . is equivalent to the uniformity of d. I

We now construct a topology arising from a uniformity.

A10.4 Definitions and Comments. Let . be a uniformity. The topologyinduced by or is defined by specifying the system Yl'(x) of overneighborhoodsof each point x. If V e -°, let V(x) = {y: (x, y) e V}. We take $ (x) as theclass of all oversets of the sets V(x), V e . °.

It is immediate that the first three axioms for an overneighborhood systemare satisfied (see A1.1). We verify the last axiom [A1.l(d)]. If V e .*', let Ube a symmetric set in the uniform structure generated by Y e such that U2 C V.If W e .3t° and W c U, we claim that W(y) c V(x) for each y e W(x). Forif z e W(y), then (y, z) e W; but also (x, y) e W, and hence (x, z) E W' cU 2 c V, so z e V(x), as asserted. Thus V(x) e V(y) for ally e W(x) C V(x), asrequired in A I. I (d).

It follows that there is a unique topology having (for each x) the setsV(x), V e .y°, as a base for the overneighborhoods of x. Furthermore, anyuniformity equivalent to W will induce the same topology. A topologicalspace whose topology is determined in this way is called a uniform space.

In particular, let .9 be a family of pseudometrics on 0, and let .-t° consistof all sets V = {(x, y): d+(x, y) < S}, S > 0, d+ E then

V(x)={y:d+(x,y)<S}.

Thus the topology induced by Y e coincides with the gauge space topologydetermined by -9 (see A8.1). In other words, every gauge space is a uniformspace. Conversely, every uniform space is a gauge space, as we now prove.

A10.5 Theorem. Let 3F be a uniform structure, and let A' be the uniformitygenerated by the pseudometrics d such that the sets Vda = {(x, y): d(x, y) < 8}belong to .F for all b > 0. Then F and .*' are equivalent, so that the topologyinduced by :" is identical to the gauge space topology determined by thepseudometrics d.

PROOF. It follows from the definition of .*' that . c .F. Now if U is asymmetric set in F, set U0 = S2 x f2, U, = U, U2 a symmetric set in F suchthat U23 c U1, and in general, let U. be a symmetric set in .°F such that

C By A10.2, there is a pseudometric don d2 with

U. C Vd.2-^ C Un-1

for all n. Since .F is a uniform structure, V2 - e .F for all n, hence V db a F'for all S > 0. But U2 C_ Vd, 114 c U1 = U, proving the equivalence of .F and.r. I

Since uniform spaces and gauge spaces coincide, the concept of uniformcontinuity, defined previously (see A8.4) for mappings of one gauge space toanother, may now be translated into the language of uniform spaces. Iff. 12 -- 521, where f2 and n, are uniform spaces with uniformities ?l and f-,f is uniformly continuous iff given V e V, there is a U e'W such that (x, y) e Uimplies (f(x), f(y)) e V.

We now consider separation properties in uniform spaces.

A10.6 Theorem. Let f) be a uniform space, with uniformity .*'. The follow-ing are equivalent:

(a) (1{ V: V e 0} is the diagonal D.(b) C1 is T2.(c) f)isT1.(d) S2 is To.

PROOF. If (a) holds and x 96 y, then (x, y) 0 V for some V e Y. If W is asymmetric set in the uniform structure generated by .', and W2 c V, thenW(x) and W(y) are disjoint overneighborhoods of x and y, proving (b). If (d)holds and x y, there is a set V e .W such that y 0 V(x) [or a set W e ay withx 0 W(y)]. But then (x, y) t V for some V e *, proving (a). I

Finally, we discuss topological groups and topological vector spaces asuniform spaces.

A topological group is a group on which there is defined a topology whichmakes the group operations continuous (x,, -* x, y - y implies x y -+ xy;


x -+ x implies xK ' -+ x '). Familiar examples are the integers, with ordinaryaddition and the discrete topology; the unit circle {z: Izi = 1) in the complexplane, with multiplication of complex numbers and the Euclidean topology;all nonsingular n x n matrices of complex numbers, with matrix multiplica-tion and the Euclidean topology (on R"2). If a = Ili S , where the Di aretopological groups, then n is a topological group with the product topologyif multiplication is defined by xy = (x,y;, i e I).

A10.7 Theorem. Let n be a topological group, and let .*' consist of allsets VN = {(x, y) E 11 x 0: yx-' e N}, where N ranges over all overneigh-borhoods of the identity element e in Q. Then -ye is a uniformity, and thetopology induced by .f° coincides with the original topology. In particular,if 12 is Hausdorif, it is completely regular.

PROOF. The diagonal is a subset of each VN, and Vs n VM = VN n M, so onlythe last condition [A10.1(b)] for a uniformity need be checked. Letf(x, y) _xy-I, a continuous map of 92 x 0 into g (by definition of a topologicalgroup). If N is an overneighborhood of e, there is an overneighborhood Mof e such that f(M x M) c N.

We claim that VM VM' c VN. For if (x, y) e VM' and (y, z) a VM, then(y, x) e VM; hence xy-' e M and zy-' a M. But zx-' = (zy-')(yx-') =(zy-')(xy-')-' ef(M x M) c N. Consequently, (x, z) e VN, so ,Y is auniformity.

Let U be an overneighborhood of the point x in the original topology. IfN = Ux-' ={yx-':ye U), then VN(x)={y:(x,y)eVN}={y:yx-'eN)=Nx = U; therefore U is an overneighborhood of x in the topology induced by

Conversely, let U be an overneighborhood of x in the uniform spacetopology; then VN(x) c U for some overneighborhood N of e. But VN(x) =Nx, and the map y - yx, carrying N onto Nx, is a homeomorphism of S2with itself. Thus Nx, hence U, is an overneighborhood of x in the originaltopology. I

A10.8 Theorem. Let f) be a topological group, with uniformity .;t° asdefined in A 10.7. The following are equivalent:

(a) . is pseudometrizable (see A10.3).(b) S2 is pseudometrizable; that is, there is a pseudometric that induces

the given topology.(c) f2 has a countable base of neighborhoods at the identity e (hence

at every x, since the map N - Nx sets up a one-to-one correspondence be-tween neighborhoods of e and neighborhoods of x).


PROOF. We obtain (a) implies (b) by the definition of the topology induced by,e (see A10.4). Since every pseudometric space is first countable, it followsthat (b) implies (c). Finally, if (c) holds and the sets N1, N2, ... form a count-able base at e, then by definition of Ye, the sets VN,, VNz .... form a countablebase for 0; hence, by A10.3, 0 is pseudometrizable. I

A topological vector space is a vector space with a topology that makesaddition and scalar multiplication continuous (see 3.5). In particular, atopological vector space is an abelian topological group under addition. Theuniformity -ye consists of sets of the form VN = {(x, y): y - x e N), where N isan overneighborhood of 0. Theorem A10.8 shows that a topological vectorspace is pseudometrizable if there is a countable base at 0.

The following fact is needed in the proof of the open mapping theorem (see3.5.9):

A10.9 Theorem. If L is a pseudometrizable topological vector space, there isan invariant pseudometric [d(x, y) = d(x + z, y + z) for all z] that inducesthe topology of L.

PROOF. The pseudometric may be constructed by the method given in A10.2and A10.3, and furthermore, the symmetric sets needed in A10.3 may betaken as sets in the uniformity Jr itself rather than in the uniform struc-ture generated by Ye. For if N is an overneighborhood of 0, so is -N ={-x: x e N) since the map x -p -x is a homeomorphism. Thus M =N n (- N) is an overneighborhood of 0. But then VM is symmetric andVMcVN.

Now by definition of VM , we have (x, y) e V. iff (x + z, y + z) e VM forall z, and it follows from the proof of A10.2 that the pseudometric d isinvariant.

Bibliography

Apostol, T.M., " Mathematical Analysis." Addison-Wesley, Reading, Massachusetts,1957.

Bachman, G., and Narici, L., " Functional Analysis." Academic Press, New York, 1966.Billingsley, P., "Convergence of Probability Measures." Wiley, New York, 1968.Dubins, L., and Savage, L., "How to Gamble If You Must." McGraw-Hill, New York,

1965.Dugundji, J., "Topology." Allyn and Bacon, Boston, 1966.Dunford, N., and Schwartz, J. T., " Linear Operators." Wiley (Interscience), New York,

Part 1, 1958; Part 2, 1963; Part 3, 1970.Halmos, P. R., " Measure Theory." Van Nostrand, Princeton, New Jersey, 1950.Halmos, P. R., " Introduction to Hilbert Space." Chelsea, New York, 1951.Halmos, P. R., " Naive Set Theory." Van Nostrand, Princeton, New Jersey, 1960.Hausdorff, F., " Set Theory." Chelsea, New York, 1962.Kelley, J. L., and Namioka, I., " Linear Topological Spaces." Van Nostrand, Princeton,

New Jersey, 1963.Liusternik, L., and Sobolev, V., "Elements of Functional Analysis." Ungar, New York,

1961.LoBve, M., "Probability Theory." Van Nostrand, Princeton, New Jersey, 1955; 2nd ed.,

1960; 3rd ed., 1963.Neveu, J., " Mathematical Foundations of the Calculus of Probability." Holden-Day,

San Francisco, 1965.Parthasarathy, K., "Probability Measures on Metric Spaces." Academic Press, New York,

1967.Royden, H. L. "Real Analysis." Macmillan, New York, 1963; 2nd ed., 1968.Rudin, W., " Real and Complex Analysis." McGraw-Hill, New York, 1966.Schaefer, H., " Topological Vector Spaces." Macmillan, New York, 1966.Simmons, G., "Introduction to Topology and Modern Analysis." McGraw-Hill, New

York, 1963.Taylor, A. E., "Introduction to Functional Analysis." Wiley, New York, 1958.Titchmarsh, E. C., "The Theory of Functions." Oxford Univ. Press, London and New

York, 1939.Yosida, K., "Functional Analysis." Springer-Verlag, Berlin and New York, 1968.

241

Solutions to Problems

Chapter 1

Section 1.1

2. We have iim sup. An = (-1, 1 ], lim inf,, A = {0}._3. Using lim sup,, An = {w: co e A. for infinitely many n}, lim infra An

{co: co e An for all but finitely many n), we obtain

liminfAn ={(x,y): x2 +y2 < 1},

lim sup A. ={(x, y): x2 + y2 < 1) -{(0, 1), (0, -1)}.n

4. If x = lim sup., x., then lim supra An is either (- co, x) or (- oo, x].For if y e An for infinitely many n, then xn >y for infinitely many n;hence x >- y. Thus lim sup. An c (- oo, x]. But if y < x, then xn > y forinfinitely many n, so y e lira sup. An. Thus (- co, x) c lim sup. A., andthe result follows. The same result is valid for lim inf; the above analysisapplies, with "eventually" replacing "for infinitely many n."

243

244

Section 1.2

SOLUTIONS TO PROBLEMS

4. (a) If - oo < a < b < c < oo, then µ(a, c] = µ(a, b] + µ(b, c], andp(a, co) = p(a, b] + µ(b, oo); finite additivity follows quickly. IfA" = (- oo, n], then A" I R, but µ(A") = n+). µ(R) = 0. Thus µ isnot continuous from below, hence not countably additive.

(b) Finiteness of p follows from the definition; since p(- oo, n] -> co,i is unbounded.

5. We have p(U1

A,) >_ u(U°=1 A,) = Y; I p(A;) for all n; let n -- ooto obtain the desired result.

8. The minimal a-field F (which is also the minimal field) consists of thecollection ig of all (finite) unions of sets of the form B1 n B2 n - n B.,where B. is either A, or A,`. For any a-field containing A1, ..., A. mustcontain all sets in 9r ; hence c .F. But Ir is a a-field; hence F c 9r.Since there are 2" disjoint sets of the form B1 n n B", and each suchset may or may not be included in a typical set in .F, .F has at most22" members. The upper bound is attained if all sets B1 n - - n B. arenonempty.

9. (a) As in Problem 8, any field over ' must contain all sets in (; henceF. But W is a field; hence F e W. For if A; = nj=1 B,j, then

(Ui=1 A:)` = ni=1 U;=1 B`'J, which belongs to W because of thedistributive law A n (B v C) = (A n B) v (A n C).

(b) Note that the complement of a finite intersection ni=1 B;j belongsto -9; for example, if B1i B2 a W, then

(B1 n B2")"

=B1`uB2

= (B1` n B2) U (B1` - B2c) u (B2 n B1) v (B2 r B1`) a !R.

Now :a is closed under finite intersection by the distributive law,and it follows from this and the above remark that -9 is closedunder complementation and is therefore a field. Just as in theproof that F = 9r, we find that F = -9.

(c) This is immediate from (a) and (b).11. (a) Let A,, e ..', n - 1, 2, .... Then A,, belongs to some W,,., and we

may assume a1 < a2 < , so '',,, c 'a2 c Let a = sup" of. <

F'1. Then all Wa" a W., hence all A. e (ga. Thus U" An e Wa+1 e be,so U,, A e .9. If A e 91, then A belongs to some WQ; hence A` e1ea+1 C Y.

(b) We have card 'a S c for all a. For this is true for a = 0, by hypo-thesis. If it is true for all 1 < a, then Up,,, Wa has cardinality at

CHAPTER I 245

most (card a)c = c. Now if -9 has cardinality c, then _9' hascardinality at most cx° = (2"0)x0 = 2'° = c. Thus card Wa < c. Itfollows that Up1 W. has cardinality at most c.

Section 1.3

3. (a) Since A(O) = 0 we have n e ..#, and A' is clearly closed undercomplementation. If E, Fe . ' and A c 0, then

A[An(EvF)]=A[An(EuF)nE]+ ALA n (E U F) n E'] since E E .'

=A(AnE)+A(AnFt E`).Thus

A[An(EuF)]+A[An(EuF)`]=A(AnE)+A(AnE`nF)+A(AnE`nF)= A(A n E) + A(A n E`) since Fe.ff= A(A) since E e M.

This proves that ,' is a field. Also, if E and F are disjoint we have

A[An(EvF)]A[An(EuF)nEI+A[An(EuF)nE`]=A(Ar-E)+A(Ar-FnE`)=A(AnE)+A(AnF) since EnF=Q.

Now if the E are disjoint sets in ..1 and F. = Ui=1 E, T E, then

2(A) = A(A m F,,) + A(A r FCC)since F,, belongs to the field A'

A(AnF;)+2(Ar E`)since E` c Fn` and A is monotone

n

A(AnE;)+2(Ar E`)i=1by what we have proved above.

Since n is arbitrary,00

A(A) I A(A n En) + A(A n E")n=1

A(A n E) + A(A n E') by countable subadditivity of A.

Thus E e . #, proving that . f' is a a-field.

246 SOLUTIONS TO PROBLEMS

Now A(A n E) + I(A n E`) > .1(A) by subadditivity, hence

00

A(A) = Z A(A n En) + ).(A n E`).n=1

Replace A by A n E to obtain I(A n E) ).(A n En), as de-sired.

(b) All properties are immediate except for countable subadditivity. IfA = U- 1 A., we must show that u*(A) < Y 1 µ*(A,), andwe may assume that p*(A,,) < oo for all n. Given e > 0, we maychoose sets Enk e Fo with A. c Uk Enk and µ(E"k) < µ*(An) +e2-". Then A e Un, k Enk and En, k µ(Enk) < [.,n µ*(An) + e. Thusµ*(A) < Yn µ*(An) + e, a arbitrary.

Now if A e ., then µ*(A) < µ(A) by definition of µ*, and ifA c Un E,,, E. e .moo, then µ(A) < >Jn µ(E,) by 1.2.5 and 1.3.1.Take the infimum over all such coverings of A to obtain µ(A) <µ*(A); hence µ* = µ on .Fo.

(c) If Fe Fo , A c fl, we must show that µ*(A) >_ µ*(A n F) ++ µ*(A n F`); we may assume µ*(A) < oo. Given E > 0, there aresets En e Fo with A e U. E. and E"= 1 u*(A) + e. Now

µ*(A n F) < (U (En n F))

µ(EnnF)

by monotonicity

n

since p* is countably subadditive and µ* = µ on'0

Similarly,

µ*(A n F°) < y µ(E,, n F°).n

Thus

µ*(A n F) + p*(A n F°) < Y_µ(E,) < µ*(A) + s,

and the result follows.(d) If A = B u N, where B e a(Fo), N c M e a(.Fo), µ*(M) = 0, then

B e 4 {note F0 e 4 and .,# is a or-field, so a(.°F0) c 4]. Also, anyset C with p*(C) = 0 belongs to 4 by definition of p*-measur-ability; hence A e Therefore the completion of o(F0) is includedin 4.

Now assume y u-finite on Fo , and let A c-./#. If (I is the disjointunion of sets A. e JFo with µ(A,) < oo, then by definition of p*, thereis a set B. a a(.Fo) such that A n A. e B. and µ*(B, - (A n A.))= 0.

CHAPTER I 247

[Note that if A 0 .off we obtain only µ*(B") = µ*(A n A"); howeverif A e M (so that A n A" also belongs to M), we have

µ*(B"- (AnA"))=A*(B")-p*(AnA")=0.]

If B = U" B", then B e a(FO), A c B, and u*(B - A) = 0. Thisargument applied to A` yields a set C e a(.F0) with C c A andµ*(A - C) = 0. Therefore A = C u (A - C) with C eA-CcB-Cea(.Fo), and u*(B - C) = u*(B - A) + u*(A - C)= 0. Thus A belongs to the completion of a(. 0) relative to p*.

Section 1.4

1. Using the formulas of 1.4.5, the following results are obtained:(a) 3, (b) 8.5, (c) 5,(d) 7.25, (e) p(}, co) + p(- co, -D = 7.25.

4. Let Ck = {x a R": -k < x,:5 k, i = 1, ..., n}. Then µ is finite on Ck;hence the Borel subsets B of Ck such that µ(a + B) =µ(B) form a mono-tone class including the field of finite disjoint unions of right-semiclosedintervals in Ck ; hence all Borel subsets of Ck belong to the class (see1.2.2). If B e J(R"), then B n Ck I B; hence a + (B n Ck) T a + B, andit follows that µ(a + B) = µ(B).

Now ifBe!(R"), then B=AvC,Ae (R"), CaDe9J(R"),withµ(D) = 0. Thus a + B = (a + A) u (a + C), and, by Problem 3,a + A e J(R"), a + C c a + D e R(R"). By what we have proved above,µ(A + D) = u(D) = 0; hence a + B e.4(R") and µ(a + B) = µ(B).

5. Let A be the unit cube {x a R": 0 < x; < 1, i = 1, ..., n}, and let c= µ(A).For any positive integer, r, we may divide each edge of A into r equalparts, so that A is decomposed into r" subcubes A1, ..., A,, each withvolume r-". By translation-invariance, µ(A,) is the same for all i, so if )is Lebesgue measure, we have

p(A) = r-"µ(A) = r-"c = r-"c.?(A) = c11(A), i = 1, ..., r".

Now any subinterval I of the unit cube can be expressed as the limitof an increasing sequence of sets Bk, where each BA, is a finite disjointunion of subcubes of the above type. Thus y = cA on subintervals of theunit cube, and hence on all Borel subsets of the unit cube by the Cara-theodory extension theorem. Since R" is a countable disjoint union ofcubes, it follows that p = c A on Q(R").

6. (a) If r + x1 = s + X2, x1, x2 a A, then x1 is equivalent to x2 , so thatx1 = x2 since A was constructed by taking one member from eachdistinct Bx. Thus r = s, a contradiction.

If x e R, then x e B.,; if y is the member of B,, that belongs to A,then x - y is a rational number r, hence x e r + A.

(b) If 0< r< 1, then r + A c [0, 2]; thus

Y(µ(r + A): 0 < r < 1, r rational)

= µ(U{r + A: 0 < r < I, r rational}) by (a)

<p[0,2]<00.

But p(r + A) = y(A) by Problem 4; hence u(r + A) must be 0 forall r. Since R is a countable union of sets r + A by (a), µ(R) = 0, acontradiction.

8. LetF(x,y)=1 ifx+y>0;F(x,y)=0ifx+y<0.Ifa1 =0,b1 =1,a2 = -1, b2 = 0, then

Lb,a, Obi., F(x, Y) = F(bl, b2) - F(al, b2)- F(bl, a2) + F(al, a2)

=1-1-1+0=-1;hence F is not a distribution function. Other examples: F(x, y) _max(x, y), F(x, y) = [x + y], the largest integer less than or equal tox+y.

12. (a) First assume that u is finite. Let' be the class of subsets of R"having the desired property; we show that '' is a monotone class.For let B. a If, B. T B. Let K" be a compact subset of B. withµ(B") < p(KK) + s, e > 0 preassigned. By replacing K" by Ui=1 K,,we may assume the K. form an increasing sequence. Then µ(B) _

u(B") < µ(K") + e, so that

µ(B) = sup{µ(K): K a compact subset of B},

and B e W. If B. a W, B" . B, let K" be a compact subset of B" suchthat y(B") < p(K,.) + e2 -", and set K =n, 1 K.. Then

µ(B)-p(K)=p(B-K)5 (OB - Kr)) <- 1N(Bn-Ka):5 e;n=1 n=1

thus B e f. Therefore' is a monotone class containing all finitedisjoint unions of right-semiclosed intervals (a right-semiclosedinterval is the limit of an increasing sequence of compact intervals).Hence W contains all Borel sets.

If p is a-finite, each B e !(R") is the limit of an increasingsequence of sets B, of finite measure. Each B, can be approximatedfrom within by compact sets [apply the previous argument to themeasure given by ui(A) = p(A n B;), A a R(R")], and the aboveargument that ' is closed under limits of increasing sequencesshows that B E W.

CHAPTER 1 249

(b) We have p(B) < inf{p(V): V B, V open}

< inf{p(W): W B, W = K`, K compact}.

If p is finite, this equals p(B) by (a) applied to B`, and the resultfollows.

Now assume p is an arbitrary Lebesgue-Stieltjes measure,and write R" = Uk 1 Bk, where the Bk are disjoint bounded sets;then B. c Ck for some bounded open set C,. The measure pk(A) _p(A n Ck), A e R(R"), is finite; hence if B is a Borel subset of Bkand e > 0, there is an open set Wk B such that pk(Wk) < pk(B)+ e2-k. Now Wk n Ck is an open set V. and B n Ck = B sinceB e Bk a Ck; hence p(Vk) < p(B) + s2`k. For any A e M(R"), letV. be an open set with Vk z) A n Bk and p(Vk) < p(A n Bk) + e2-k.Then V = Uk 1

Vk is open, V A, and p(V) < ik 1 p(Vk) <p(A) + e.

(c) Construct a measure p on M(R) as follows. Let p be concentratedon S = {1/n: n = 1, 2,...) and take p{1/n} = 1/n for all n. SinceR = U 1 {l/n} u S` and p(S`) = 0, p is a-finite. Since

00

1

µ10,I] = Y- - = 00,n=1 n

p is not a Lebesgue-Stieltjes measure. Now p{0} = 0, but if V is anopen set containing 0, we have

p(V) > p(-s, e) for some s

>_ 1 for some rk=r k

=00.

Thus (b) fails. (Another example: Let p(A) be the number ofrational points in A.)

Section 1.5

2. If B e R(R),

{w:h(co)eB} ={coeA:h(co)eB} u{(vaA`:h(c))eB}

= [A of -1(B)] u [A` n g- 1(B)]

which belongs to F since f and g are Borel measurable.5. (a) {x: f is discontinuous at x} = Un 1 D, where D" = {x a Rk: for all

S > 0, there exist x1i x2 e Rk such that Ix1 --- xJ < S and 1x2 - xI < S,but If(XI) -f(X2)1 >- 1/n). We show that the D. are closed. Let {xa}

SOLUTIONS TO PROBLEMS

be a sequence of points in D. with xQ - x. If 6 > 0 and N ={y: Iy - xI < b), then x,, e N for large a, and since X. a D", there arepoints x,1 and xa2 e N such that I f(xa1) -f(xa2)I >- 1/n. ThusIx", - x1 < b, Ix.2 - x1 < 8, but f(xal) - f(xa2)I > 1/n, so thatxe D".

The result is true for a function from an arbitrary topologicalspace S to a metric space (T, d). Take D,, = {x e S: for every neigh-borhood N of x, there exist x1, x2 e N such that d (f (xl), f (X2)) >I /n}. (the above proof goes through with "sequence" replaced by" net.")

In fact T may be a uniform space (see the appendix on generaltopology, Section A10). If -9 is the associated family of pseudo-metrics, we take D. = {x e S: for every neighborhood of N of x,there exist x1, x2 e N and d e 2 such that d(f(x1), f(x2)) >- 1/n) andproceed as above.

The result is false if no assumptions are made about the topologyof the range space. For example, let S2 = (1, 2, 3}, with open sets 0,fl, and {1}. Define P. Q -. f2 by f(1) =f(3) = 1, f(2) = 2. Then theset of discontinuities is {3}, which is not an F,,.

(b) This follows from part (a) because the irrationals I cannot be express-ed as a countable union of closed sets C,,. For if this were possible,then each C,, would have empty interior since every nonempty openset contains rational points. But then I is of category 1 in R, andsince Q = R - I is of category 1 in R, it follows that R is of category1 in itself, contradicting the Baire category theorem.

6. By Problem 11, Section 1.4, there are c Bore] subsets of R"; hence thereare only c simple functions on R". Since a Borel measurable function isthe limit of a sequence of simple functions, there are c"0 = c Borelmeasurable functions from R" to R. By 1.5.8, there are only c Borelmeasurable functions from R" to Rk.

7. (a) Since the P,, are measures, J:k P"(Ak) = P"(S2) = 1, and it followsquickly that the a"k satisfy the hypotheses of Steinhaus' lemma. If{x"} is the sequence given by the lemma, let S = {k: xk = 1) and let Bbe the union of the sets Ak, k e S. Then

[P.(B)l

t" = 1 L. [P,,(Ak) - P(Ak)J = l - P(Ak)J1 - CC keS l - a k e S

and it follows that t" converges, a contradiction. Thus P is a prob-ability measure. If Bk e .F, Bk j 0, then given e > 0, we haveP(Ak) < e for large k, say for k >- ko. Thus P"(Ak0) < e for largen, say for n >- no. Since the Ak decrease, we have supnzrtp P"(Ak) < efork >- ko, and since Ak 10, there is a k1 such that for n = 1, 2, ... ,no - 1, P,,(Ak) < e fork > k1. Thus sup" P"(Ak) < e, k >- max(ko, k1).

CHAPTER I 251

(b) Without loss of generality, assume 1 for all n. Add a point(call it oo) to the space and set 1 - 1 - P(fl) =P{oo}. The P are now probability measures, and the result followsfrom part (a).

Section 1.6

2. In Y-1 Ifnl dp = Y' 1 fn Ifnl dp < oo; hence Y , is integrableand therefore finite a.e. Thus Y , f converges a.e. to a finite-valuedfunction g.

Let g _ Yk=, fk. Then Yk 1 Ifkl, an integrable function. Bythe dominated convergence theorem, In g dp ffl g dp, that is,

00

> f. fn f fn du.n=1 f2 fln=1

3. Let xo a (c, d), and let xn -+ xo , x 96 xo . Then

b b

(Xn

1

-XO) [f-f(x., y) dy - faf(xO,Y)dy]

= f b rf(xn , Y) - f (xo , Y)]dY.

a L xn - XO

By the mean value theorem,

f(xn, Y) -f(x0, Y) =f1(An, Y)xn - xo

for some A = between x and xo . By hypothesis, Ifi(2n, Y)I 5 h(y),where h is integrable, and the result now follows from the dominatedconvergence theorem (since y) - f(xo, y)]/[x - x0] -f1(xo, y),f,(x, ) is Borel measurable for each x).

8. Let it be Lebesgue measure. If f is an indicator IB, B e M(R), the resultto be proved states that p(B) = p(a + B), which holds by translation-invariance of it (Problem 4, Section 1.4). The passage to nonnegativesimple functions, nonnegative measurable functions, and arbitrarymeasurable functions is done as in 1.6.12.

Section 1.7

2. (a) If f is Riemann-Stieltjes integrable, a =f = /3 a.e. [p] as in 1.7.1(a).Thus the set of discontinuities off is a subset of a set of p-measure 0,together with the endpoints of the subintervals of the Pk. Take adifferent sequence of partitions having the original endpoints as

interior points to conclude that f is continuous a.e. [µ]. Conversely,if f is continuous a.e. [p], then a =f = /3 a.e. [p]. [The result that fcontinuous at x implies a(x) =f(x) = /3(x) is true even if x is anendpoint.] As in 1.7.1(a), f is Riemann-Stieltjes integrable.

(b) This is done exactly as in 1.7.1(b).3. (a) By definition of the improper Riemann integral, f must be Riemann

integrable (hence continuous a.e.) on each bounded interval, and theresult follows. For the counterexample to the converse, take f (x) = 1,n< x <n + 1, n an even integer; f(x) _ -1, n < x < n + 1, n anodd integer. Then the limit of rob(f) does not exist. (Alternatively,take f (x) identically 1; then rab(f) -> + oo as a -+ - oo, b -+ oo.)

(b) Define

fi(x) =(.1(x) if -n<x<n,

elsewhere.

Then f T f; hence f is measurable relative to the completed a-field;also, In f,, dp I In f dp by the monotone convergence theorem. ButIn f dµ = r_,,,,,(f) by 1.7.1(b), and r_,,.,,(f) r(f) by hypothesis;the result follows.

For the counterexample, take

(- 1)n

f(X) n+1'

Section 2.1

Chapter 2

2. Let D = {c o: f(w) < 0}; then 1.(A n D) < 0, 2(A n D`) z 0 for all A e F.By 2.1.3(d),

n<x<n+l, n=0,1,...,

0, X<0.

Wehaver(f)=1-+++- --,butfRJfldµ=l+j+}+...=oo,

so that f is not Lebesgue integrable on R.

2+(A) = A(A n D°) = f f + dµA

CHAPTER 2

since f + =f on D` and f + = 0 on D. Similarly,

2(A)= - A(A n D) = fA f - dµ

since f -f on D, and f - = 0 on D`. The result follows.4. If El, ..., E. are disjoint sets in F, with all E. c A,

n n n

12(E1) I = I +(Ei) - 1(E1)1 <- [A+(Ei) + 2 (E1)]i=t

= IAI(UE;) <- IAI(A).

Thus the sup of the terms Yi=1 1A(E)I is at most I2I(A). But

12I(A) = 2+(A) + A -(A)

= A(A n D`) - A(A n D)

= IA(A n D`)I + I1(A n D)I

since A(A n D`) >- 0 and A(A n D).-5 0

= I1(E1)I + IA(E2)I

and the result follows.

Section 2.2

253

2. 1/n},n=1, 1,2, ...,sothatA=Un'=1A,,. Now

00 > f I9I dµ zA n

hence p(An) < oc. For the example, let p be Lebesgue measure on P1(R),and let g(x) be any strictly positive integrable function, such as g(x) _e- I"l. In this case, A = R, so that p(A) = oo.

4. If f is an indicator IA, the result is true by hypothesis. If f is a non-negative simple function Y;=1 x j IA, , the A; disjoint sets in F, then

gdp= x; f 1 ,gdpJ

fd2= x;1(A;)= >x;Jn n ;_1 J=i Aj i=1

= fig dy by the additivity theorem.

If f is a nonnegative Borel measurable function, let fl, f2, ... be non-negative simple functions increasing to f. By what we have just proved

J a fn d2 = fn fn g dµ; hence Ja f d l= In fg dp by the monotone conver-gence theorem. Finally, if f is an arbitrary Borel measurable function,write f = f - f -. By what we have just proved,

f f+d2= f f+gdp, f f-dl= f f-gdµn n n n

and the result follows from the additivity theorem.6. (a) In the definition of IAI, we may assume without loss of generality that

the E1 partition A. If A is the disjoint union of sets A1, A2, ... a .r,then

n n ao n Co

11(E) I= i A(Ej r Ai)I < E Y_ 12(Ej n A,) Ij=1 j=1 1i=1 j=1 i=1

00 n 00

= E 12(Ej n Ai)I s IAI(A).i=1 j=1 i=1

Thus IAI(A) 121(A). Now to show the reverse inequality, wemay assume I2I(A) < oo; hence I21(A) < IAI(A) < co. For each i,there is a partition {E11, ..., Ein,} of A. such that

aY I A(Ei) I > I A I (A) - , E> 0 preassigned.

j = 1 2'

Then for any n,n ni n

I A I (A) ? E E 12(Eij) I? E 12 I (A) - a.i=1 j=1 1=1

Since n and a are arbitrary, the result follows.(b) If E1, ..., En are disjoint measurable subsets of A,

n n n

1(21 + 22)(E1) I < F.. 121(E1) I + Y- 122(E,) Ii=1 i=1 i=1

< 121 I (A) + 122 I (A),

proving IA1 +' 21 -< IA1I + 1121; IaAI = lal 121 is immediate from thedefinition of total variation.

(c) If µ(A) = 0 and 1211(Ai°) = 0, i = 1, 2, then µ(A1 u A2) = 0 and by(b), lei + 221(Alc .\ A20):!9 1A1I(A1`) + I22I(A2`) = 0-

(d) This has been established when A is real (see 2.2.5), so assume A com-plex, say, 2 = Al + i22. If µ(A) = 0, then A1(A) =22(A) = 0; henceA <<,u implies Al 4p and d2 4.u. By 2.2.5(b), I A I I 4 p, 1221 410;hence by (b), IAI 4 p. The converse is clear since I2(A)I <- 121(A).

(e) The proof is the same as in 2.2.5(c).(f) See 2.2.5(d).

CHAPTER 2 255

(g) The "if" part is done as in 2.2.5(e); for the "only if" part, letp(A,,) -> 0. Since JAI 0

<liminf fb `f(x+h)-f(x)l dxh-'o a L h

+=limionfhi

fbf(x)dx - fa+f(x)dx]h

[definef(x) = f(b), x > b; f(x) = f(a), x < a]

< lim inf 1 [hf (b + h) - hf(a)] since f is increasingh-o h

= f(b) - f(a).Alternatively, let It be the Lebesgue--Stieltjes measure corresponding tof (adjusted so as to be right continuous), and let m be Lebesgue measure.Write µ = i +,u2, where p1 < m and P2 1 m. By 2.3.8 and 2.3.9,

fb b

f'(x)dx= ff'dma a

= bfDµdma

= f b dm1 dm = N1(a, b] < µ(a, b] =f(b) -f(a)

6. (a) Since A is linear and m is translation-invariant, so is ..; hence byProblem 5, Section 1.4, A = c(A)m for some constant c(A). Now ifA, and A2 are linear transformations on R', then

m(A,A2 E) = c(A1)m(A2 E) = c(A1)c(A2)m(E)

and

m(A,A2 E) = c(A1A2)m(E);

hence

c(A1A2) = c(A1)c(A2)

Since det(A,A2) = det A, det A2, it suffices to assume that A cor-responds to an elementary row operation. Now if Q is the unit cube{x: 0 < xi 5 1 ,i=1, ..., k}, then m(Q) = 1; hence c(A) = m(A(Q)).If e1, ..., ek is the standard basis for Rk, then A falls into one of thefollowing three categories:

(1) Ae1 = ej, Ae; = e;, Aek = ek, k i or j. Then c(A) = 1Idet AI.

=

(2) Ae! = kej, Ae; = e;, j i. Then c(A) = Iki _ Idet AI.(3) Ae1 = e; + e;, Aek = ek, k 0 i. Then det A = 1 and c(A) is I

also, by the following argument. We may assume i = 1, j = 2.Then

ff k

A(Q)={AEa;e;:0<a;<1, i=l,...,k}111111

i=1f( k

={y= Yb;et:0<b;<1, i&2, b,<b2<b1+1l` 1= 1 111

If B, _ {y a A(Q): b2 < 1), B2 = {y a A(Q): b2 > 1}, andB2 - e2 = {y - e2:y a B2}, then B, n (B2 - e2) = Q ; for ify e B,, then b1 < b2 and if y + e2 a B2 c A(Q), then b2 + 1 <-b, + 1. Therefore,

c(A) = m(A(Q)) = m(B1) + m(B2) = m(B1) + m(B2 - e2)

= m(B, U (B2 - e2)) = m(Q) = 1.

(b) Fix x e V and define S(y) = T(x + y) - T(x), y e {z - x: z e V).Then if C is a cube containing 0, we have

(c)

T(x+C)={T(x+y):yeC}=T(x)+{S(y):yeC}= T(x) + S(C).

Therefore, m(T(x + C)) = m(S(C)), so that differentiability of Tat xis equivalent to differentiability of S at 0, and the derivatives, if theyexist, are equal. Also, the Jacobian matrix of S at 0 coincides withthe Jacobian matrix of T at x, and the result follows.Let A = A(0), and define S(x) = A-'(T(x)), x e V; the Jacobianmatrix of S at 0 is the identity matrix, and S(0) = A-'(0) = 0. If weshow that the measure given by m(S(E)), E e 2(V), is differentiable

CHAPTER 2 257

at 0 with derivative 1, then

m(S(C)) - I

I <a

m(C) Idet AI

for sufficiently small cubes C containing 0. Thus by (a), m(T(C)) _m(AS(C)) = Idet Alm(S(C)); hence

m(T(C)) I I m(S(C)) Im(C) - I det Alm(C)

1 l det A I< e,

so that t is differentiable at 0 with derivative ldet Al.(d) (i) If x e C, then lxi < Jk f3 < S; hence I T(x) - xl < afl _

(QZ - $). Therefore T(x) e C2 .(ii) If x e 8C, then I T(x) - xl < afi as above, and a/3 = :(p - fi1).

Therefore T(x) C,.(iii) We have IT(x) - xl < a/3, and

Of < 1

1-2a-1-jR'-2P"and the result follows.

(iv) If y e C, - T(C) but y e T(C), then y e T(8C), contradicting(ii).

Now C, is the disjoint union of the sets C, n T(C) and C, - T(C);the first set is open since T is an open map, and the second set is openby (iv). By (iii), C, n T(C) # 0, so by connectedness of C1, we haveC, = C, n T(C), that is, C, c T(C). Also, T(C) c C2 by (i).

Therefore m(CI) < m(T(C)) < m(C2), that is,

(I - 2a)'13' < m(T(C)) < (I + 2a)kjJk.

Thus I - s < (1 - 2a)k < m(T(C))/m(C) < (I + 2a)k 0. By Problem 12, Section 1.4, and thefact that

00

E= U En( sup I(Cr) < niln, i = I C,, r 0, and A(C) < nm(C) for all open cubes C con-taining a point of K and having diameter less than 1/j. If r, > 0, choosean open set D c K such that m(D) < e.

(f)

Now partition Rk into disjoint (partially closed) cubes B of dia-meter less than I/j and small enough so that if B n K 96 0, thenB e D. If the cubes that meet K are B1, ..., B, we may find opencubes C, z) Bi, I < i < t, such that m(Ci) < 2m(B,) and diam C-, <I /j. Then

2(K) < ,l(Bi) < 1(C;) < n m(Ci) < 2n m(B;)i=1 i= r=1 =1

< 2nm(D) < 2ns.

Since a is arbitrary, 1(K) = 0, a contradiction.

If f is an indicator IB, let E = T -'(B), so that B = T(E). Then

fW f(y) dy = m(T(E)) and fV f(T(x)) IJ(x)I dx = fE IJ(x)I dx,

and the result follows from part (e). The usual passage to simplefunctions, nonnegative measurable functions and arbitrary measur-able functions completes the proof.

Section 2.4

1. (a) If f = {al, ... , a, , 0, 0,...), then fn f dp = Yk=1 ak by definition ofthe integral of a simple function. If f = {an, n = 1, 2, ...}, with alla >- 0, fn fdp = 1 a,, by the result for simple functions and themonotone convergence theorem. If the a are real numbers, thenfnfdp=fnf+dp-fnf-dp=Y' la.+-Y l an- if this isnot of the form + oo - oc. Finally, if the a are complex,

f dp= Rea.+iIma,,n- =1 n=1

provided Re f and Im f are integrable; since IRe aJ, IIm anI <-lanl <- IRe anI + IIm anl, this is equivalent to En 1 oo.

(b) If f(a) = 0 except for a e F, F finite, than in f dy = &f(x) bydefinition of the integral of a simple function. If f >- 0, thenIn fdp >- I, f dp for all finite F; hence in fdp >- Y. f (a). If f (a) > 0for uncountably many a, then Y. f (a) = oo ; hence In fdp = ooalso. If f (a) > 0 for only countably many a, then in fdp = Y. f (a)by the monotone convergence theorem. The remainder of the proofis as in part (a).

8. Apply Holder's inequality with g = 1, f replaced by If jr, p = s/r,1/q = I - r/s, to obtain

CHAPTER 2 259

I"dµ<(fIfI"Pdµ)11p(f I dµ)1/9

Therefore If 11, < If II as desired.9. We have fn If l" dp <_ Ilf 11 xµ(S2), so lim supp-,, If IIp <- IIIII,,. Now

lets > 0, A = {co: I f(w)l >- I!f li 0 - s) (assuming if 11w < oo). Then

fi1 Ifl"dµ>-fA

If l"dµ - (II! II.-E)"µ(A)

Since p(A) > 0 by definition of IIIII, we have lim infp_. Iif IIp >- IIf II M

If Ilf II = co, let A = {w: I f(w)I >- M} and show that

lim inf IIf IIp ? M;p-oo

since M can be arbitrarily large, the result follows.If p(S2) = oo, it is still true that lim infp.,, Illllp ? Ilfil"',; for if

p(A) = co in the above argument, then IIf IIp = oo for all p < oo. How-ever, if p is Lebesgue measure on p(R) and f(x) = 1 for n < x <n + (1 /n), n = 1, 2, ... , and f (x) = 0 elsewhere, then 11f 11p = oo forp<oo,butllfll.=1.

11. (a) We have I JE (f - z) dpi < fE l f - zl dp. Butw a Eimpliesf(w)e D;hence If(w) - zl < r; and thus J E If- zl 4:5 rp(E). If p(E) > 0,then

1 d pp(E) fE f µ -

f '(f - z) dµ r;=1

p(E)

hence [l /µ(E)] JE f dµ e D c S`, a contradiction. Thereforep(E) = 0, that is, µ{w: f(w) a D} = 0. Since {w: f(w) 0 S} is acountable union of sets f '(D), the result follows.

(b) Let p = iA I ; if El, ..., En are disjoint measurable subsets of A

I2(E;)I = I f h dp < f I h I dpj=1 j=1 E1 j=1 Ef

n

< r E µ(E j) < rp(A,).j=1

Thus µ(A,) < rµ(A,), and since 0 < r < 1, we have µ(A,) = 0. IfA = (co: Ih(w)l < 1} = U {A,: 0 < r < 1, r rational}, then µ(A) = 0,so that Ihi >- I a.e.

Now if p(E) > 0, then [ 1 /µ(E) ] JE h dµ = A(E)/µ(E) e S, whereS = (z e C: lzl < 1). By (a), h(w) e S for almost every co, soIhI < 1 a.e. [121].

(c) If E e ,°F, In IE h dl2I = JE h dJAI = A(E) by (b); also, I. IEg dµ =J, g dp = 2(E) by definition of A. It follows immediately thatIn fh diAI = f n fg du when f is a complex-valued simple function. Iff is a bounded, complex-valued Borel measurable function, by1.5.5(b), there are simple functions ff -+f with I f,, <_ Ill. By thedominated convergence theorem, In fh diAl = In fg dµ. If f = hIE,we obtain 121(E) = f E hg dµ.

(d) In (c), 121(E) >- 0 for all E; hence hg > 0 a.e. [p] by 1.6.11. But ifg((o) = Ig(w)le'O() and h(w) = e`*(W), then e'(9-v) = 1 a.e. on{g :A 0}, so that hg = Ig1 a.e., as desired.

12. If I(a, b) can be approximated in L°° by continuous functions, let0 < E 0, there are points x E (a, a + 5)and y e (a - 5, a) such that I 1 - f (x)I < s and If() I < e. Consequently,lim supra, f(x) >- 1 - e and lim infra- f(x) < e, contradicting con-tinuity off.

Section 2.5

4. Let Bjkb = (I fj -fkl >- 5), Ba = nn 1UTJ, Bjka Then

oo

U Bjka } Baj,k=n

and the proof proceeds just as in 2.5.4.5. Let (f,,,) be a subsequence converging a.e., necessarily to f by Problem 1.

By 1.6.9, f is p-integrable. Now if In f du +- In f dp, then for somee > 0, we have I In f, dp - 1, f dp >- s for n in some subsequence {mk).But we may then extract a subsequence {f,,) of { f a.e.,so that In f,, du - In f du by 1.6.9, a contradiction.

Section 2.6

4. By Fubini's theorem,

1

Jn,NZ(C(wi)) dpI((o 1).N(C) = ff Ic dµ = in,

[jig dµ2] dpl =(

n

Similarly, p(C) = JRi pl(C((02)) dp2((02). The result follows since f >- 0,fnf=0implies f=0a.e.

CHAPTER 2

7. (a) Let

261

Ank={xefl': n <f(x)<n

( k-1 kBnk = y e 522:

n< y< n)

(n = 1, 2, ... , k = 1, 2, ... , n; where k = n, include the right end-point as well). Then

CO nG= lI I U (Ank x Bnk) E .F.

n=1 k=1

If f is only defined on a subset of 521, replace S21 by the domain off inthe definition of Ank .

(b) Assume B c C1. Each x e S21 is countable and (x, y) e B impliesy < x; hence there are only countably many points Yx1, Yx2 , ... C-.02such that (x, yxn) e B. (If there are only finitely many points yx1,yx., take yxk = y,,, for k Z n.)Thus B = U 1 G., where G is the graph of the function fn definedby fn(x) = yxn . By part (a), B e.. [Note that yxn a K12, which may beidentified with [0, 11; thus (a) applies.] If B e C2, each y E 522 iscountable, and (x, y) e B implies x < y, so there are only countablymany points xyn c-01 such that (xyn, y) a B. The result follows asabove.

(c) If F e 0, then F = (F n C1) u (F n C2) and the result follows frompart (b).

9. Assuming the continuum hypothesis, we may replace [0, 1 ] by the firstuncountable ordinal Y1. Thus we may take 521 = C12 = Y1, F1 = 2 = theimage of the Borel sets under the correspondence of [0, 1 ] with Y1, andµl = µ2 = Lebesgue measure. Let f =1c, where C = {(x, y) e S21 x Q2:y < x) and the ordering " 5 " is taken in fl, not [0, 1]. For each x,(y: f(x, y) = 1) is countable, and for each y, {x: f(x, y) = 0} is count-able; it follows that f is measurable in each coordinate separately.Now fo f (x, y) dy = 0 for all x; hence So [ S' f(x, y) dy] dx = 0. ButS o [1 - f (x, y)] dx = 0 for ally; hence 10 ' [J , f(x, y) dx] dy = 1. It followsthat f is not jointly measurable, for if so, the iterated integrals would beequal by Fubini's theorem.

Section 2.7

1. Let T be the smallest a-field containing the measurable rectangles. ThenI cfly

1Fj since a measurable rectangle is a measurable cylinder. But

the class of sets A =F11= I 52; such that {w a 52: (w1, ..., a1n) e A } e I is a

a-field that contains the measurable rectangles of IIj=1 Qj, and hencecontains all sets in f;= 1 Fj. Thus all measurable cylinders belong to T,so Flit I y; W.

5.

r k n ll{xeR°°:f(x)=n}={x: xi<1, k=1,2,...,n-1, Y_ x,>_1}111 i=1 i=1 1111

if n=1,2,...,n

{xeR°°:f(x)=oo)= {x: x;<1, n=1,2,...).i = i

111111

In each case we have a finite or countable intersection of measurablecylinders.

Chapter 3

Section 3.2

6. (a) Let z e K, a compact subset of U. If r < do, a standard applicationof the Cauchy integral formula yields

1 2n

f(z) = 2n fof (z + re") dB.

Thus if0<d<d0,d2 d 1 a 2n

f(z) 2 = f orf(z) dr =2n

for dr f. f(z + rei°) d6,

or

2n d

f(z) = ,nd2 fB=of=of(z + reie)r dr dO

=n d2 f f f(x+iy)dxdy

where D is the disk with center at z and radius r. An application ofthe Cauchy-Schwarz inequality to the functions 1 and f shows that

If(z)I _<nd2

(nd2)'1211f 11,

as desired.

CHAPTER 3 263

(b) If f , f2 , ..., is a Cauchy sequence in H(U), part (a) shows that fnconverges uniformly on compact subsets to a function f analytic onU. But H(U) c L2(f , .F, p) where 12 = U, .°r is the class of Borelsets, and y is Lebesgue measure; hence fn converges in L2 to afunction g e H(U). Since a subsequence of converges tog a.e.,we have f = g a.e. Therefore f e H(U) and fn -+f in L2, that is, in theH(U) norm.

7. (a) If 0<r<1,1

fo If(Yero)12 dO = 1

fo

an rnern°am rme-im° d©

2n o 27r o n

1 2n

Y an am rn+m f ei(n-m)° dOJ21r n, m 0

since the Taylor seriesconverges uniformly on compactsubsets of DCO

E Ian12Y

2n,

n=0

which increases to Y' O Ian12 as r increases to 1.

(b) f f If(x + iy)I 2dxdy = fordr f 2.If(rei°)12 dO

D

l< 21rN2(f) f r dr = irN2(f).0

2n

(c) f f Ifn(x + iy 1 2 dx dy= fo fo

r2nr dr d8 = {2n + 2) -' 0,D

but

f2nlfn(YeiB)I2dO = I f2nr2n dO = r2n;

2n 2n

hence N(fn) = 1 for all n.(d) If { fn} is a Cauchy sequence in H2, part (b) shows that { fn} is

Cauchy in H(D). By Problem 6, fn converges uniformly on compactsubsets and in H(D) to a function f analytic on D. Now if 0 <r0 < 1,e > 0, the Cauchy property in H2 gives

1 2n

I f, If(Ye°) -fmrei°12 d©S e

for r < ro and n, m exceeding some integer N(e). Let m - oo; sincefm -- f uniformly for Izi 5 ro,

2n

2nfo

lfn(re'B) - f(re'B) 12 dO < s, r < ro, n >- N(s).

Since ro may be chosen arbitrarily close to I, N 2(f" - f) < e forn >- N(F), proving completeness.

(e) Since e" corresponds to (0, ... , 0, 1, 0, ...), with the I in positionn, in the isometric isomorphism between H2 and a subspace of 12,the e" are orthonormal. Now if f c- H2, with Taylor coefficients a,n = 0, 1, ... , then <f, en> = an , again by the isometric isomorphism.Thus

N2(f) _ Y lanl2 = Y I<f, e">I2n=0 n=0

(f)

Fz

and the result follows from 3.2.13(f).

<en, em> = if iy)em(x + iy) dx dy = f f en(rete)em(reie)r dr dOD D

[(2n + 2)(2m + fo2"f

o

1

r n+me'("-m)er dr dO

0, n :A m,1, n=m.

Thus the en are orthonormal. Now if f e H(D) with

f(z)=Eanz",00"=0

then

2n ro

fo fo f (re'a)em(reie)r dr dO

OD 2" ro (2m + 2)1/2 ame_ E a. f f r"ey"e rme-r dr dOn-o 0 0 2n

since the Taylor series converges uniformly on compactsubsets of D

2m + 2 1/2 r0 2m+2 2 1/2 2m+2

-am( 2R ) 2m+22n=am(2m+2) ro

CHAPTER 3

ro-'1 n..=0 0 0

a 1227C

n4o 2n + 2

The result now follows from 3.2.13(f).9. (a) Let g be a continuous complex-valued function on [0, 2n] with

g(0) = g(2n). Then g(t) = h(e'r), where h(z) is continuous on{z e C: z 1). By the Stone-Weierstrass theorem, h can beuniformly approximated by functions of the form _k- Forthe algebra generated by z", n = 0, ± 1, ±2, ..., separates points,contains the constant functions, and contains the complex con-jugate of each of its members since 2 = 1/z for IzI = 1.

Thus g(t) can be uniformly approximated (hence approximatedin L2) by trigonometric polynomials Y-k cke'k'. Since any con-tinuous function on [0, 27z] can be approximated in L2 by acontinuous function with g(O) = g(2n), and the continuous func-tions are dense in L2, the trigonometric polynomials are dense inL2

265

Now feis integrable on D (by the Cauchy-Schwarz inequality),so we may let ro -I and invoke the dominated convergencetheorem to obtain

{{'

1/2(T2"m +2)

But the same argument with replaced by f shows that

frro

IIf IIH(D) = lim J Jrneinerme-imer dr dO

(b) By 3.2.6, S 20n If(,)k"= cke'k`I2 dt is minimized when ck = ak =

(1/2n) f o" f (t)e-'k` dt. Furthermore, some sequence of trigonometricpolynomials converges to f in L2 since the trigonometric poly-nomials are dense. The result follows.

(c) This follows from part (a) and 3.2.13(c), or, equally well, from part(b) and 3.2.13(d).

10. (a) Let be an infinite orthonormal subset of H. Take M ={x1, x2 i ...}, where x = (I + l n = 1, 2, .... To show thatM is closed, we compute, for n 0 m,

Ilxn-x.112=

121 2=(1+n) +(i+»t) >2.

Thus if y" e M, y" -+ y, then y" =y eventually, so y e M. SinceIIx"112 = I +(I In), M has no element of minimum norm.

(b) Let M be a nonempty closed subset of the finite-dimensional spaceH. If x e H and N =M n {y: IIx - yll < n}, then N 0 for somen. Since y IIx - yIi, y e N, is continuous and N is compact,inf{IIx - YII : Y e N} = IIx -Yoll for some yo e N c M. But the infover N is the same as the inf over M; for if y e M, y 0 N, then11x - yo 11 <- n < II x -YII. Note that yo need not be unique; forexample, let H = R, M = (- 1, 1), x = 0.

Section 3.3

3. Since fa IK(s, t)I di is continuous in s, it assumes a maximum at somepoint u e [a, b]. If K(u, t) = r(t)e10" , r(t) > 0, let z(t) = e- "(0. Letx,, x2 , ... be a sequence in C [a, b] such that fa IX"(t) - z(t)l dt -+0;we may assume that Ix"(t)I < 1 for all n and t (see 2.4.14). Since K isbounded,

f 6K(s, t)z(t) dt lim bK(s, t)x"(t) dt I = lim I(Ax")(s)I < IIAII" n-.

fIII

a co

Set s = u to obtain

as desired.7. (a) If x e L, then

11x111=

n

x;e;

fbIK(u,t)Idt<IIAII,

a

Ix,l Ile.hli < (niax IIe,l11) Ix,l

But >i=1 IX,I < 11nkkX112, so we may take k = max; 11e,111.(b) Let S = {x: lix112 = 1}. Since (L, II 112) is isometrically isomorphic

to C", S is compact in the norm 11 112. Now the map x - 11x111 is acontinuous real-valued function on (L, 11 Ill), and by part (a), thetopology induced by II II1 is weaker than the topology induced by11

112. Thus the map is continuous on (L, II 112); hence it attains aminimum on S, necessarily positive since x e S implies x 96 0.

(c) If x e L, x 96 0, let y = x/ 11x112; then IIYIl1 > M11YI12 by (b); hence11 x1I1 >: rn11xIl2 . By (a) and Problem 6(b), 11 11, and 11 112 induce thesame topology.

(d) By the above results, the map T: D= 1 x; e; - (x1, ... , x") is aone-to-one onto, linear, bicontinuous map of L and C" [note thatI1Y_:=1 x,112 is the Euclidean norm of (x1i ..., x") in C"]. If y; e L,

CHAPTER 3 267

y j --> y e M, then y ; - y k - 0 as j, k -* oo; hence T(y; - yk) =Ty; - Tyk -> 0. Thus {T),,} is a Cauchy sequence in C". If Ty; -zE C", then y' - ->T-1zeL.

9. For (a) implies (b), see Problem 7; if (c) holds, then {x: Ilxl1 < c} iscompact for small enough e > 0; hence every closed ball is compact(note that the map x -+ kx is a homeomorphism). But any closedbounded set is a subset of a closed ball, and hence is compact.

To prove that (f) implies (a), choose x1 e L such that IIx1 I{Suppose we have chosen x1i...,xkeL such that IIxi11 = I andI1xi - x;ll >- I for i, j = 1, ..., k, i j. If L is not finite-dimensional,then S{xl, ..., xk} is a proper subspace of L, necessarily closed, byProblem 7(d). By Problem 8, we can find xk+1 eL with Ilxk+111 = 1 andIlxi - xk+l II >- 1, i = 1, ..., k. The sequence x1, x2 , ... satisfies IIxn11 = 1for all n, but Ilxn - x,n11 ? -1 for n 0 m; hence the unit sphere cannotpossibly be covered by a finite number of balls of radius less than 4.

11. (a) Define ).(A) = f (IA), A e F. If A1i A2, ... are disjoint sets inwhose union is A, then ).(A) = Y? 1 f(IA,) since f is continuous

and"=1 IA, L' > IA . [Note that

n ao

I 1 > IA, - IAI ° dµ = E µ(Ai) 0Sl i=1 i=n+1

by finiteness of p.] Thus is a complex measure on F. If u(A) = 0,then IA = 0 a.e. [µ], so we may write IA L°+ 0 and use thecontinuity of f to obtain .1(A) = 0. By the Radon-Nikodymtheorem, we have ).(A) = JA y du for some p-integrable y. ThusAx) = J. xy dy when x is an indicator; hence when x is a simplefunction. Since f is continuous, y is p-integrable, and the finite-valued simple functions are dense in L°, the result holds when x is abounded Bore] measurable function.

Now let y1i Y2, ... be nonnegative, finite-valued, simple func-tions increasing to lyl. Then

IIY"Ilq = f Yn9 dk f IYI dp = f yn-'e-iargyy dpn n n

since is bounded

Il.f1111 Y9 - I li° = I;f 1111 Y.11gl° since (q - 1)p = q.

Thus IlYnllq < If II ; hence by the monotone convergence theorem,IIYIIq <- ii! 11; in particular, y e Lq. But now Holder's inequality andthe fact that finite-valued simple functions are dense in L° yieldAx) = fn xy dµ for all x e V. Holder's inequality also gives

Ilf11 < IIYIIq; hence IIf II = IIYIIq. If y, is another such function, theng(x) = In x(y - y,) dy = 0 for all x e LP. By the above argument,IIY - Y, IIq=0, soy=y, a.e. [µ].

(b) (i) If, say, yA - yB > 0 on the set F c A n B, let x = IF; thenxIA = xIB; hence Sn x(yA - yB) dp = 0, that is,

f (YA-YB)dp=0.

But then µ(F) = 0.(ii) Since YA a.e. on A. we have

11YA. I q -< II f dµ.A,,, - A

Since approaches k4 as n - oo, so does IIYA v Amllqand it follows that dy - 0 as n -, oo. By sym-metry, we may interchange m and n to obtain

f f ff2 Am - A A,,,

as n, m -' oo. Thus YAn converges in LQ to a limit y, and since[If 11 for all n, IIYIIq < Ilf II. If is another sequence

of sets with k, the above argument with A, replacedby B. shows that IIYA - YB -+ y also.

(iii) Let A e,, µ(A) < co. In (ii) we may take all A,, A, sothat YA = YA a.e. on A; hence y = yA a.e. on A. Thus ifx = IA, then f (X) = f (xIA) = Sn xyA dy = in xy dµ. It fol-lows that f (x) = Sn xy dµ if x is simple. [If µ(B) = oo, then xmust be 0 on B since x e LP.] Since y e V, the continuity offand Holder's inequality extend this result to all x e LP.

(c) The argument of (a) yields a µ-integrable y such that f (x) =In xy dyfor all bounded Borel measurable x. Let B = {(o : ly(w)I >- k); then

kp(B) < f IYI du = f IBe- iargyydpB n

=f(IBe-`a`gy):5 IIYII IIIBII, = IIfll,(B).

Thus if k > Ilf II, we have p(B) = 0, proving that y e L°° andIIYII < If 11. As in (a), we obtain f(x) =Snxydp for all xeL',III II = IIYII ., and y is essentially unique.

(d) Part (i) of (b) holds, with the same proof. Now if 12 is the union ofdisjoint sets A,,, with oo, define y on .0 by taking y = yAnon A,,. Since II for all n, we have y e L°° and IIYII. -<

CHAPTER 3

IIf II . If x e L1, then Y,."=1 XI" + x; hence

269

f(x)_ f f xydp= f xydp.n=1 n=1 S2 n=1 A. n

Since IIf II <_ ll llby Holder's inequality (with p = 1, q = oo), the

result follows.

Section 3.4

3. Let {yn} be a Cauchy sequence in M, and let xo be any element of Lwith norm 1. By 3.4.5 (c), there is an f e L* with IIf II = I and f (xo) =11 xo ll = I ; we define A,, a [L, M ] by A. x = f (x)yn . Then ll (A. - A,n)xIl =If(x)l Ily,, - ymll <- Ilyn - y,nll llxIl, so that IIAn - A,nll s Ilyn - ynll --> 0.By hypothesis, the An converge uniformly to some A e [L, M ]; there-fore I(yn - Axoll = IIA,,xo - AxoII <- IIAn - All -+ 0.

7. Let L be the set of all scalar-valued functions on 0, with sup norm,and let M be the subspace of L consisting of simple functions

x = 2i Xi IA3 ,

where the Ai are disjoint sets in .. Define g on M by

g(x) = Y1 x1 yo(Ai).

Now Ig(x)I <- maxi (xil Ei yo(A)< maxi Ixilµo(Q) = po(Q)IIxll; henceIlgll <- po(c) < eo. By the Hahn-Banach theorem,g has an extension to acontinuous linear functional f on L, with If II = Ilgll Define µ(A) =f(IA), A c 0. Since f is linear, p is finitely additive, and since f is anextension of g, p is an extension of yo. Now if µ(A) < 0, then

f(IAc) = µ(A°) = p(O) - p(A) > µ(S]!).

But IIIAJ = 1, so that If II > p(Sl), a contradiction.8. Since An x -+ Ax for each x, sup,, IIAnxII < co. By the uniform bounded-

ness principle, sup,, II An II = M < oo ; hence

IlAxll = limllAnxll = lim inf IIAnxII"_00

Ilxll lim inf IIA,,II 5 Mllxlln-.ao

12. Let L be the set of all complex-valued functions x on [0, 1 ] with a con-tinuous derivative x', M the set of all continuous complex-valued func-tions on [0, 1 ], with the sup norm on L and M. If Ax =x', x e L, then Ais a linear map of L onto M, and A is closed. For if x -' x and x,,' y,then since convergence relative to the sup norm is uniform convergence,

!o x"'(s) ds -+ Jo y(s) ds = z(t). Thus xn(t) - xn(0) -- z(t); hence x(t)- x(O) = z(t). Therefore x' = z' = y. But A is unbounded, for if xn(t) _sin nt, then Ilx"II = 1, IlAxnll oo.

13. By the open mapping theorem, A is open; hence A{x a L: IIxII < 1} is aneighborhood of 0 in M; say {y e M: IIYII < b} c A{x e L: Ilxll < 1}. Ify e M, y 0 0, then Sy/2IIYII has norm less than 6, hence equals Az forsome z e L, Ilzll < 1. If x = (2IIYII/8) z, then Ax = y and Ilxll < 2IIyII/8.Thus we may take k = 2/6.

Section 3.5

3. Let xl, ..., x,, be a basis for L, and define A: C" -* L by A(al, ..., an) =alxl + + anxn. It is immediate that A is one-to-one onto, linear, andcontinuous; we must also show that A-' is continuous. If L is one-dimensional, then A-'(ax) = a, so the null space of A-' is {0}, which isclosed by the Hausdorff hypothesis. By Problem 2, A-' is continuous. Ifdim L = n + 1 and the result holds up to dimension n, let f be a non-trivial linear functional on L. Then [see 3.3.4(a)] N =f -'(0) is a maximalproper subspace of L; hence by the induction hypothesis, it is topologi-cally isomorphic to C". It follows that N is closed in L; the argument isthe same as in Problem 7(d), Section 3.3.

By Problem 2, every linear functional on L is continuous, so if pi isis the projection of C" on the ith coordinate space, p; -A-' is con-tinuous for each i, hence A -' is continuous.

5. If L were metrizable, it would have a countable base at 0, which can beassumed to be of the form

Un={feL:If(x)I <Sn for all xeAn},

where Sn > 0 and An is a finite subset of [0, 1 ], n = 1, 2, .... Let y be apoint in [0, 1 ] not belonging to Un I An, and let U = {f e L: I f(y)I < 1).Then U is a neighborhood of 0 in L but no U. c U, for there are (con-tinuous) functions f with f = 0 on A. butf(y) = 1.

6. (a) implies (b): The sets {x: d(x, 0) < r}, r rational, form a base at 0.(b) implies (c): The base at 0 may be assumed to consist of circledconvex sets, and the proof of 3.5.7 shows that the correspondingMinkowski functionals generate the topology of L.(c) implies (a) : Up, p2, ... are seminorms generating the topology ofL, define

d(x, y) = E 1 pn(x - Y)"=12"1+pn(x-y)'

CHAPTER 3 271

It is easily checked that d is a pseudometric, and d(x;, x) - 0 ifpn(x; - x) -+ 0 for each n, in other words, iff x1 --+ x in the topology of L.

7. Let K" = {z: Izl < 1 - 1/n), n = 1, 2, .... If K is any compact subset ofD, K c K. for large n; hence sup{I f(z)I : z e K) < sup{If(z)j: z e K"} =p"(f). Thus the sets U. = {feL: p"(f) < 1/n}, n = 1, 2, ..., form acountable base at 0, so by Problem 6, L is metrizable.

Suppose a single norm II II induces the topology of L. By continuityof pn, for each n there is a 8" > 0 such that 11f 11 < 6. implies p"(f) < I ;hence If II < 1 implies p"(f) < l/5,,. Therefore W = {feL: p"(f) < 1/6.for all n} is an overneighborhood of 0.

For each n let z,, be a point in Knt1 but not in K", and let f" e U" suchthat I 1/S"+1 i a choice off" is possible because, for example, if

In<Izol<Iz"I<-I-n+I'

then (z/zo)' approaches 0 uniformly on K. as k - oo, but approaches coat z,,.

Now pn+1(.f") ? If"(z")I > 1/b"+1 , and since the U. decrease with nand fn e U", we have, for any k, fn e Uk for all n >- k; consequentlyfn -> 0. But then f" e W for large n; hence pn+1(f") < a contradic-tion.

8. (a) We may write x = XICO, r) + XI[r, 1 y + z, and

d(x, 0) = f0 Ix(t)I" dt = d(y, 0) + d(z, 0)

since [0, r) n [r, 1 ] = 0. Now d(y, 0) _ fo I xI" dt, which is con-tinuous in r, approaches 0 as r -+ 0, and approaches d(x, 0) asr -> 1. By the intermediate value theorem, d(y, 0) = d(z, 0) _. d(x, 0) for some choice of r.

(b) By part (a) we can find yl with d(y1 i 0) d(x, 0) and I f (yl) I ? #.Let x1 = 2y1; then I f(x1)I >- 1 and d(x1, 0) = 2" d(y1i 0) =2p-1 d(x, 0). Having chosen x1 i ..., x,, with I f (x1) I >_ 1 andd(x1, 0) =2""- " d(x, 0), i = 1, ... , n, apply part (a) to x" to obtainxn+1 with If(x,,+1)I z 1 and d(x"+1, 0) =2"-1 d(x,,, 0) =2(n+1)("-1' d(x, 0). Since p < 1, d(x", 0) -> 0 as n -+ oo, as desired.

13. (a) Let i be the identity map from (L, l2) to (L, 1). Since 1 2 ,i is continuous, and by the open mapping theorem, 9-2 ell.

(b) If fT is the topology induced by II 11j, .1 = 1, 2, then 9-1 a 107"2 byhypothesis, and the result follows from part (a).

15. Define T: L -+ C" by T(x) = (fl(x), ... , f,,(x)) and define h: T(L) --. C"by h(Tx) = g(x). Then h is well-defined; for if Tx1 = Tx2, then f1(x1) =

f;(x2) for all i, so g(x1) = g(x2) by hypothesis. Since h is linear on

T(L), it may be extended to a linear functional on C", necessarily of theform h(y1, ..., y") = c1y1 + + c" y,,. Thus g(x) = h(Tx) = c1 fl(x)

all xeL.16. (a) Assume fi(x) = 0 for all i. If k is any real number, f;(kx) = 0 for all

i, hence J(kxlI < 1. Since k is arbitrary, x must be 0.(h) Since the yJ are linearly independent and T -1 is one-to-one, the x;

are linearly independent. If x e L, then Tx = Yi=1 c; y, for somec1, ... , ci,; hence x = Y;= 1 ci x; .

Chapter 4

Section 4.2

2. Let Sl be the first uncountable ordinal, with F the class of all subsets offl, and µ(A) = 0 if A is countable, µ(A) = oo if A is uncountable. Define,for each o (c 0, f,#o) = I if co < a; fa((o) = 0 if co > a. Then fa T f wheref = 1, and J, f, dp = 0 for all a since fa is the indicator of a countable set.But 1, f dp = oo, so the monotone convergence theorem fails.

Section 4.3

4. (a) First note that H is a vector space. For if g is continuous, f f e H:f + g e H} contains the continuous functions and is closed underpointwise limits of monotone sequences, and hence coincides withH. Thus if f e H and g is continuous, then f + g e H. A repetition ofthis argument (notice the bootstrapping technique) shows that ifg e H, then {f e H: f + g e H} = H; hence H is closed under addi-tion. Now if a is real, then {f e H: of e H) = H by the same argu-ment; hence H is closed under scalar multiplication.

Let 9 be the open F. sets. Then So is closed under finite inter-section, and IA e H for all A e Y. (By 4.3.5, IA is the limit of anincreasing sequence of continuous functions.) By 4.1.4, IA e H forall A e a($") [=d(fl) by 4.3.21. The usual passage to nonnegativesimple functions, nonnegative measurable functions, and arbitrarymeasurable functions shows that all Baire measurable functionsbelong to H. But the class of Baire measurable functions contains thecontinuous functions and is closed under pointwise limits of mono-tone sequences; hence H is the class of Baire measurable functions.

(b) All functions in H are Baire measurable by part (a), hence a(H) a sad.But if A e sad, then IA is Baire measurable, so IA e H by part (a).

CHAPTER 4 273

But then IA is a(H)-measurable by definition of a(H), henceA E a(H).

5. Let H be the class of Baire measurable functions. Then Ko c H, and ifKK c H for all /3 < or, then Ka c H since H is closed under monotonelimits. Thus K c H. But K contains the continuous functions and isclosed under pointwise limits of monotone sequences. For iffl,fz , ... EKand f,eKa.,n=1,2,...,where a"<f1.If c =sup. a"<N1,then f" E Ka for all n; hence f e Ka+1 c K. By minimality of H (Problem4), H c K.

6. Existence of it and A follows from 4.3.13 applied to the real and imaginaryparts of E. Now if J0 f dµ = 0 for all f e C(S2), then Sag dµ1 = Jag dµ2 =0for all real-valued continuous functions on [2; hence Pi = µ2 = 0 by4.3.13. This proves uniqueness of µ (and similarly, uniqueness of A).

Now if f e C(S2), then In f dA = In fh dill for some complex-valuedBorel measurable h of absolute value 1; hence IE(f)l <-(n IfIdiAl;therefore IIE II -< IAI(f)) On the other hand, let E1, ..., E. be disjoint Borelsubsets of 0, with A(E) = rJe'01, j = 1, ..., n. Define f = I;=1 e-i° IE,and, for a given S > 0, let g be a continuous complex-valued function on0 such that lgl < 1 and fn If - gI diAl < S (see 4.3.14). Now

d2_ rJ = I A(E) 1,frf

J=1 J=1

hence

I E(g) I = fngds, >_I f fdA I-S= A(EJ)I -S.U J=1

Since S is arbitrary, it follows that IIE II ? I2I(0). But just as in4.3.13, A is the unique extension of p, so JAI(n) = Ipl(fl), completing theproof.

Section 4.4

2. Since there are c Borel subsets of R", there are only c measurable rec-tangles in 9. Thus (Problem 11, Section 1.2) card 9 < c. But card.F >_ c (consider {co: co, = a), a e R); hence card 9 = c.

3. Let W be the class of sets in F for which the conclusion holds. If A E ',then co e A` iff OT0 0 B0, that is, if wTa E Bo`; thus A` E W. If A. E c6',n = 1, 2, ... , with co e A. if wT E BTU, let To = Un 1 T", a countablesubset of T. Then co c- U' , A" if (or(, c BT., where BT. = (y E 0T0 :Yr. E BT" for some n). Now {y E r'To : YT" E BTU} = pn 1(BT"), where p" isthe projection of S2To onto Or.. Thus p": (0T0, -FT)) --* (ClT,,, -T"), so

that BT. E .F r" , and consequently UM L, A. E W. Therefore' is a a-fieldcontaining the measurable cylinders, so ' _ .F.

4. (a) If the set A in question belongs to .F let To and Bo be as in Problem3; assume without loss of generality that to E To. Since To is count-able, there is a sequence t" to with t" 0 To. Define x(t) = 0,t E To; x(t) = 1, t To. Now if w(t) = 0 for all t, then co e A, hencewTo E Bo. But WTO = Xr,,, so that x e A. Since x is discontinuous atto, this is a contradiction.

(b) Again assume the set A belongs to .F and let To and Bo be as inProblem 3. Since To is countable, we can find s e T with s 0 To.Define x(t) = c - 1, t s; x(s) = c + 1, and let w(t) = c - I for allt. Then w e A and w(t) = x(t) for t E To ; hence x e A, a contradic-tion.

5. (a) Let F= n , V" , V,, open. Let g" be a continuous function from fl,to [0, 1 ] with g" = 0 on F, g" = I on V" Let g = Y' , 2-"g,,. Theng is continuous and g-'{0} = F. Define f: 0 -> R by f(w) = g(w(t)).Then f e C(92) and {w: f(w) = 0) = A; therefore A e But if %is the class of subsets B of f2, such that {w a 0: w(t) a B) E _d(S2),then ' is a a-field containing the closed sets; hence ' = .F,. Itfollows that all measurable rectangles belong to d(fl); henceF c sa1(S2).

(b) If x, y e fl, x # y, then for some t, we have x(t) 96 y(t). Let g be acontinuous function from ), to R with g(x(t)) # g(y(t)). Definef (co) = g(w(t)), co e n. Then f e C(S2), f depends on only one co-ordinate, and f(x) o fly). Since constant functions trivially dependon only one coordinate, the Stone-Weierstrass theorem implies thatthe algebra A generated by the functions in C(S) depending on onecoordinate is dense in C(Q). [The algebra A may be describedexplicitly as the collection of finite sums of finite products of func-tions depending on one coordinate.]

Now if f(w) = g (w(t)), g e C(.Q,), then f = g o p, where p, isthe projection of Q onto Q. Thus every function in A is measurablerelative to F and R(R). But each function in C(f)) is a uniform(hence pointwise) limit of a sequence of functions in A, so that .Fmakes every function in C(S2) measurable. Therefore d(Q) c F.

8. (a) Since d is continuous, so is f. If co # w', let (co,,*} be a sequence in Dwith w"* - w. Eventually d(w, w"*) 96 d(w', w"*); hence f is one-to-one. If f (w") -f (w), but d(w", w) > S > 0 for all n, let co* be anelement of D such that d(w, co*) < 6/2. Since f (Con) -+ f (w), we haved(w", co*) -> d(w, co*) as n -+ co, and therefore d(w", (o) < d(w", w*)+ d(w*, co) < S for large n, a contradiction. Thus f has a continu-ous inverse.

CHAPTER 4 275

(b) By part (a), f is a homeomorphism of !Q and f (S2) c [0, 1]'. Since i2is complete, it is a G,, in any space in which it is topologically embed-ded (see the appendix on general topology, Theorem A9.11). Thusf()) is a Ga in [0, 1 ]', in particular a Borel set.

(c) Let r e [0, 1) with binary expansion 0. a,a2 (to avoid ambiguity,do not use expansions that terminate in all ones). Define g(r) =(a,, a2, ...); g is then one-to-one. If k/2" is a dyadic rational numberwith binary expansion O.a1 .. a.00 , then

['k k+1g[

'k2" {yeS"0:yi =a,,...,y"=a"}.

Thus g maps finite disjoint unions of dyadic rational intervals ontomeasurable cylinders, and since g[0, 1) = S' - {y e S°°: yn iseventually 1) = S `° - a countable set, we have g[0, 1) a Y ' and gand g-1 are measurable.

(d) Let A = {y e S o0: yn is eventually 1), B = {y e S °°: y c- g[0, 1) andg-1(y) is rational}, where g is the map of part (c). Let q be a one-to-one correspondence of the rationals in [0, 1] and A u B. Defineh : [0, 1 ] -. S' by

h(x) = g(x) if x is irrational,q(x) if x is rational.

Then h has the desired properties.(e) Define h: fln O2 -+11n Sn by h(w1, ()2 , ...) _ (M(all), h2(w2), ...).

The mapping h yields the desired equivalence.

Section 4.5

3. Assume µ"Z g, and let A be a bounded Borel set whose boundary hasp-measure 0. Let V be a bounded open set with V A, and let Gj ={x e V: d(x, A) < 1/j}. If fj is a continuous map from it to [0, 1 ] such thatfj = 1 on A and fj = 0 off Gj [see A5.13(b)], then

lµ"(A)-p(A)I -<I fn(UA-fj)dO"I +I ffjdp"- ffjdµ

+I f (fj-IA)dpi. (1)

Since fj - IA is 0 on A and also on Gj`, the third term on the right sideof (1) is bounded by p(Gj - A):!-< p(Gj - A°). Similarly, the first termis bounded by p (G j - A°). The second term approaches 0 as n --* co sincethe support offj is a subset of the compact set G.

Now let gjk be a continuous function from S2 to [0, 1], with suppg jk c V, such that as k -+ oo,

gjk {tip-A°}

To verify that the gjk exist, note that Gj - A° = nn 1 U., where U. ={x e V: d(x, Gi - A°) < 1/n}. Let g.. be a continuous function from C1 to[0, 1] such that g 'n = I on Cj - A°, gj. = 0 off U., and set gjk =min(gj1, ... , gjk). Now

µn(Gj-A°)=fnlGj-AOdun<- fngjkdpn-' fngjkdµ as n -oo.

Thus

lim sup µn(G j - A°) < f gjk dp.n- 00 n

Since supp gjk c V and µ(V) < oo, we may let k -+ oo and invoke themonotone convergence theorem to obtain

lim sup un(Gj - A°) S p(Gj - A°).n-ao

It follows from (1) and the accompanying remarks that

lim sup I µn(A) - µ(A)I < 2p(G j - A°).n-I ao

As j -+ oo we have G j - A° 1 A - A° = 8A, and since µ(8A) = 0 we con-clude that µn(A) -> µ(A), proving that (a) implies (b).

Assume p .(A) -+ µ(A) for all bounded Borel sets with µ(8A) = 0, andlet f be a bounded function from 0 to R, continuous a.e. [µ] withsuppf c K, K compact. Let V and W be bounded open sets such thatK c V c V c W [see A5.12(c)]. Now

v= n {xeW:d(x, V)<8}= n Wa;6>0 6>0

the Wa are open and

8W1 c {x a W: d(x, VV) = S}.

Thus the Ws have disjoint boundaries and µ(W) < oo, so µ(8W1) = 0 forsome 8. Therefore we may assume without loss of generality that we haveK e V, with V a bounded open set and µ(8V) = 0.

Now if A c V, the interior of A is the same relative to V as to theentire space Q since V is open. The closure of A relative to V is given byAv = A n V; hence the boundary of A relative to V is 81, A = (8A) n V.

CHAPTER 4 277

If A is a Borel subset of V and µ(8y A) = 0, then µ[(8A) n V] = 0;also

(8A)nV°cAnV°cV-V=V-V°=BV;hence µ[(8A) n V` ] = 0, so that u(8A) = 0. Thus by hypothesis,P(A). By 4.5.1, if and µ' denote the restrictions of µ and p to V, wehave un' w, µ'. Since f restricted to V is still bounded and continuous a.e.[µ], we have f y f du.'- Jy f du', that is, fn f du. -+ Jn f dµ. This provesthat (b) implies (c); (c) implies (a) is immediate.

Subject Index

A

Absolute continuityof functions, 70of measures, 59, 63

Absolute G,, 233Absolute homogeneity, 114Absorption of one set by another, 167Accumulation point

of filterbase, 206of net, 204

Additivity theorem for integrals, 45Adjoint of linear operator, 149Algebra of sets, 4Almost everywhere, 46Annihilator of subset of normed linear

space, 149Approximation

of Baire or Borel sets by closed,compact, or open sets, 34, 179-183

by continuous functions, 88, 185-188by simple functions, 38, 88, 90

Arzela-Ascoli theorem, 228

B

Baire category theorem, 230Baire sets, 178Baire space, 231Banach space, 114Banach-Alaoglu theorem, 162Bessel's inequality, 119Bilinear form, 150Borel-Cantelli lemma, 66Borel equivalence, 195Borel measurable functions, 35, 36

complex-valued, 80properties of, 38-40

Borel sets, 7, 27, 178Bornivore, 167Bornological space, 167Bounded linear operators, 128

weakly, 150Bounded set, in topological vector space.

167Bounded variation, 71

279

280

C

c, space of convergent sequences ofcomplex numbers, 115, 136

Cantor function, 77, 78Cantor sets, 33, 34Caratheodory extension theorem, 19Cardinality arguments, 13, 34, 42Cauchy in measure, 93Cauchy-Schwarz inequality, 82, 116

for sums, 87Chain rule, 69Change of variable formula for multiple

integrals, 78Chebyshev's inequality, 84Circled set, 151Closed graph theorem, 148, 166Closed linear operator, 147

unbounded, 150Cluster point, 203Compact topological space, 213

countably, 216locally, 217relatively, 217sequentially, 217a, 218

Compactification, one-point, 220Complete metric space, 230Complete orthonormal set, 122Completeness of L° spaces, 85, 90Completion of measure space, 18Complex measure, 69Composition of measurable functions, 39Conjugate isometry, 131Conjugate linear map, 131Conjugate space, 142

second,142Consistent probability measures, 190, 191Continuity of countably additive set

functions, 10, 11Continuity point of distribution function,

198Continuous functions dense in L°, 88, 185,

188Continuous linear functionals, 130, 135

extension of, 140, 156representations of, 130-133, 136, 137,

184-186,188space of, 131, 141

SUBJECT INDEX

Convergenceof filterbases, 205of nets, 203in normed linear spaces

strong, 144weak, 144

of sequences of linear operators, 134strong, 134, 144, 149uniform, 134

of sequences of measurable functions,92ff.

almost everywhere, 47almost uniform, 93in L", 88in L`°, 89in measure, 92in probability, 92

Convergence theorems for integrals, 44, 47,49

Convex sets, 119in topological vector spaces, 154ff.

Countably additive set function, 6, 43, 62expressed as difference of measures, 11,

44, 61Counting measure, 7Cylinder, 108, 189

D

Daniell integral, 170ff., 175Daniell representation theorem, 175Decreasing sequence of sets, IDe Morgan laws, IDensity, 66Derivative

of function of bounded variation, 76of signed measure, 74Radon-Nikodym, 66

Difference operator, 27Differentiation

under integral sign, 52of measures, 74ff.

Dini's theorem, 181Directed set, 203Discontinuous linear functional, 135Discrete distribution function, 76Distribution function, 23, 29

decomposition of, 76Dominated convergence theorem, 49

extension of, 96Dynkin system, 168, 169

SUBJECT INDEX

E

Egoroff's theorem, 94Equicontinuity, 228Extension of finitely additive set functions,

149Extension theorems for measures, 13ff., 18,

19, 22, 183, 184

F

F, set, 42, 178Fatou's lemma, 48Field of sets, 4Filter, 205Filterbase, 205

subordinate, 206Finitely additive set function, 6

not countably additive, 11, 12a-finite, 9

First countable space; 204, 212Fubini's theorem, 101, 104

classical, 103, 106Functional analysis, 113ff.

basic theorems of, 138ff.

G

Go set, 178Gauge space, 226, 237Good sets principle, 5Gram-Schmidt process, 125Gramian, 125

H

Hahn-Banach theorem, 139, 140, 149Hausdorff space, 211Heine-Borel theorem, 213Hermite polynomials, 125Hilbert spaces, 114, 116ff.

classification of, 123, 124separable, 124, 133

Holder inequality, 82for sums, 87

I

Identification topology, 209

281

Increasing sequence of sets, IIndicator, 35Inner product, 114

space, 114Integrable function, 37, 81Integral, 36ff.

as countably additive set function, 43indefinite, 59, 73

Integration of series, 46, 52Internal point, 154Isometric isomorphism, 123, 133, 137, 142,

163, 186, 189

J

Jordan-Hahn decomposition theorem, 60

K

Kolmogorov extension theorem, 191, 194

L

L' spaces, 80ff.completeness of, 85continuous linear functionals on, 131-

133,137,165/', l'(S2), 87L°°, 891°°, , (c1), 90

Lattice operations, 170Lebesgue decomposition theorem, 68, 76Lebesgue integrable function, 51Lebesgue integral

abstract, 36ff.comparison with Riemann integral, 53

Lebesgue measurable function, 39Lebesgue measurable sets, 26, 31, 33, 54Lebesgue measure, 26, 31, 100, 106Lebesgue set, 78Lebesgue-Stieltjes measure, 23, 27Legendre polynomials, 125Lim inf (lower limit), 2Lim sup (upper limit), 2Limit

under integral sign, 52of sequence of sets, 3, 12, 52

282

Lindelof space, 212Linear functionals, 130

continuous, 130, see also Continuouslinear functionals

positive, 170Linear manifold, 119

generated by set, 121Linear operator(s), 127ff.

bounded,128closed, 147continuous, 128with discontinuous inverse, 147idempotent, 130range and null space of, 149spaces of, 133

Lipschitz condition, 78Locally compact space, 217Locally convex topological vector space,

153characterization of, 156

Lusin's theorem, 187

M

Measurable cylinder, 108, 189Measurable function, 35

Bore], 35jointly, 101, 107Lebesgue, 39

Measurable rectangle, 97, 108, 189Measurable sets and spaces, 35Measure(s), 6

absolutely continuous, 59, 63complete, 18complex, 69extension of, 13ff., 183, 184on field, 6finite, 9on infinite product spaces, 108ff., 189ff.Lebesgue, 26, 31, 100, 106l.,ebesgue-Stieltjes, 23, 27outer, 16, 22probability, 6product, 97, 100, 104, 106, 109, 111regular, 183, 189a-finite, 9signed, 62singular, 59, 66spaces of, 186, 189

SUBJECT INDEX

on topological spaces, 178ff.uniformly a-finite, 97

Measure-preserving transformation, 50Minkowski functional, 154Minkowski inequality, 83

for sums, 87Monotone class theorem, 19Monotone convergence theorem, 44

extended, 47Monotone set function, 16

N

Negative partof countably additive set function, 62of function, 37

Neighborhood, 201Net, 203Norm(s), 84, 114

on finite-dimensional space, 134inducing same topology, 134-136of linear operator, 128, 133

Normal topological space, 178, 211Normed linear space, 114

linear operators on, 127ff.

0

Open mapping theorem, 147, 159Orthogonal complement, 121Orthogonal direct sum, 121Orthogonal elements, 118Orthogonal set, 118Orthonormal basis, 122Orthonormal set, 118

complete, 122Outer measure, 16, 22Overneighborhood, 201

P

Parallelogram law, 118Parseval relation, 122Polarization identity, 124Polish space, 180Positive homogeneity, 138Positive linear functional, 170

SUBJECT INDEX

Positive partof countably additive set function, 62of function, 37

Pre-Hilbert space, 114Probability measure, 6Product measure theorem, 97, 104

classical, 100, 106, 111infinite-dimensional, 109, 111

Product a-field, 97, 108, 189Product topology, 208Projection, 119, 120, 130, 148Projection theorem, 121Pseudometric, 84, 226Pseudonorm,84Pythagorean relation, 118

Q

Quotient space, 135Quotient topology, 210

R

Radial, 154Radial kernel, 154Radon-Nikodym derivative, 66Radon-Nikodym theorem, 63Rectangle, 96, 97, 108, 189Reflexivity, 142, 163, 164Regular measure, 183, 189Regular topological space, 211

completely, 212Riemann integral, 53-57Riemann-Stieltjes integral, 56Riesz lemma, 136Riesz representation theorem, 130, 133,

181, 182, 184-186, 188

S

Second countable space, 212Section of set, 98Semicontinuous functions, 220Semimetric, 84Seminorm(s), 84, 113

family of, generating locally convextopology, 153

Separable Hilbert spaces, 124, 133

Separable topological spaces, 212Separation properties for topological

spaces, 211Separation theorems, 159-161

strong, 160Set function, 3

countably additive, 6finite, 9finitely additive, 6

Shift operator, 129one-sided (unilateral), 129two-sided (bilateral), 129

a-field (a-algebra), 4countably generated, 111, 148minimal, 5, 12

Simple functions, 36dense in Lo, 88, 90

Singular distribution function, 77, 78Singular measures, 66Solvability theorem, 150Space spanned by set, 121Steinhaus' lemma, 42Stone's theorem, 200Stone-Weierstrass theorem, 225Strong convergence

in normed linear space, 144of operators, 134,144, 149

Strong topology, 144, 161Subadditivity, 114, 138

countable, 16Sublinear functional, 138Subnet, 204Subspace, 119

closed, 119

T

T, spaces, 211, 212Tails of net, 205Tietze extension theorem, 211Topological isomorphism, 136Topological spaces, 201ff,

measures on, 178ff.Topological vector space, 114, 150ff.

locally convex, 153Topologically complete space, 232Topology

of pointwise convergence, 208, 227of uniform convergence, 153, 227

Total variation, 62, 69

283

284

Translation-invariance of Lebesguemeasure, 33

Tychonoff theorem, 215

U

Ultrafilter, 206Uniform boundedness principle, 143Uniform convergence of operators, 134Uniform space, 237Uniform structure, 234Uniformity, 234Urysohn metrization theorem, 219Urysohn's lemma, 211

V

Vague (=weak) convergence of measures,198

SUBJECT INDEX

Variationbounded,71of function, 71lower, 62total, 62, 69upper, 62

Vitali-Hahn-Saks theorem, 43

W

Weak and weak* compactness, 162-164Weak convergence, 144, 161

of distribution functions, 199of measures, 196-199

Weak* convergence and weak* topology,161

Weak topology, 144, 161

[Robert B. Ash] Measure, Integration and Functiona(BookFi.org)

Documents

Transcript of [Robert B. Ash] Measure, Integration and Functiona(BookFi.org)