Measure and Integration · 2020. 5. 18. · Measure and Integration is a foundational course,...

Measure and Integration

Xue-Mei LiImperial College London

May 18, 2020

Contents

1 Introduction 7

1.1 Module Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.2 Course Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

1.3 Prologue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.4 The mass transport problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

1.5 Coin tossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1.6 Inadequacy of Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

1.7 Observables of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10

2 Measurable sets, measurable functions and measures 11

2.1 Set Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12

2.2 Measurable Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2.1 Countability of σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2.2 Fundamental Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . 14

2.3 Borel σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.3.1 Background material* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

3 Measures 19

3.1 Basics of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

3.2 Construction of measures from pre-measures . . . . . . . . . . . . . . . . . . . . . . . 21

3.2.1 Outer Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

3

CONTENTS 4

3.2.2 Extension of pre-measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

3.2.3 Proof for Caratheodory Theorems . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2.4 Construction of Lebesgue-Stieljes / Lebesgue measure . . . . . . . . . . . . . . 28

3.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Lebesgue Measure on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3.4 Uniqueness of Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4.1 π − λ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3.5 Measure determining sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Measurable functions 39

4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.1.1 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

4.1.2 Product σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4.2 Properties of measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

4.2.1 Measurable functions on the real line . . . . . . . . . . . . . . . . . . . . . . . 44

4.3 Factorisation Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4 Appendix* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.1 Littlewood’s three principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

4.4.2 Product σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

4.4.3 The Borel σ-algebra on the Wiener Space . . . . . . . . . . . . . . . . . . . . . 48

5 Integration 51

5.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.2 Outline of the construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.3 Integral of simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.4 Integration of non-negative functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.5 Integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

5.6 Further limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

CONTENTS 5

5.7 Examples and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5.8 The Monotone Class Theorem for Functions . . . . . . . . . . . . . . . . . . . . . . . . 66

5.8.1 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

5.9 The pushed forward measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

5.10 Appendix: Riemann integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

5.11 Appendix: Riemann-Stieltjes integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

5.12 Appendix: Functions with bounded variation . . . . . . . . . . . . . . . . . . . . . . . 71

6 Product measures and Fubini’ Theorem 73

6.1 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

6.1.1 Sections of subsets of product spaces . . . . . . . . . . . . . . . . . . . . . . . 78

6.2 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

7 Radon-Nikodym Theorem 83

7.1 Singular and absolutely continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

7.2 Signed measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

7.3 Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

8 Conditional Expectations 91

8.1 Conditional expectation and probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 91

8.1.1 Conditioning on a random variable . . . . . . . . . . . . . . . . . . . . . . . . . 93

8.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

8.3 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.3.1 Mode of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

8.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99

8.3.3 Conditional expectation as orthogonal projection . . . . . . . . . . . . . . . . . 100

8.3.4 Useful Inequalities* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

8.4 Conditioning probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

8.4.1 Finite σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

CONTENTS 6

9 Mastery Material: Ergodic Theory 105

9.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105

9.1.1 Example: circle rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106

Chapter 1

Introduction

1.1 Module Information

Prerequisite Real Analysis (M2PM1), Metric Spaces and Topology (M2PM5), and basic Probability.Measure and Integration is a foundational course, underlies analysis modules. It brings together manyconcepts previously taught separately, for example integration and taking expectation, reconciling dis-crete random variables with continuous random variables.

Contents. Measurable sets, sigma-algebras, Measurable functions, Measures. Integration with re-spect to measures, Lp spaces, Modes of convergences, basic important convergence theorems (Domi-nated convergence theorem, Fatou’s lemma etc), useful inequalities, properties of integrals.

This module is related to: Probability theory, Applied probability, Markov processes, Fourier analysisand theory of distributions, Functional analysis, ODE’s, Ergodic theory, Probability theory, StochasticFiltering, Applied Stochastic Processes.

1.2 Course Structure

Assessment

• CW 1. (5%) 05/11/2019, Deadline: Tuesday 4pm (submit to U/G office)

• CW 2. (5%) 03/12/2019, Deadline: Tuesday 4pm (submit to U/G office)

• May Exam (90%): 2 hour exam (year 3), 2.5-hour exam (year 4/Msc).

Problem sheets.. Weekly (not assessed, they are the best material for preparing for the exam.)

7

1.3. PROLOGUE 8

Office Hour. Wednesdays: 11:00-12:00

Problem class: Wednesday 30 October 10am-12am, room 658.

References.

• Real Analysis by H. L. Royden, Third edition

• Real Analysis, Modern Techniques and their applications, by G. B. Folland

• Measure Theory by P. R. Halmos

• Measure and Integration, by R. Schilling, second edition.

How to use this notes? A ‘remark’ is material which is helpful with our understanding, which may ormay not be covered in the lectures. An ‘exercise’ may appear in the weekly problem sheet, is in the notefro one or more of the following reasons: (1) the conclusion is interesting; (2) the proof is representative,(3) proving it improve our understanding.

If a section /paragraph is given a ∗ sign, it is not covered in class (or not ‘officially’ covered in class).They are in the notes for the background or considered to be useful to further our understanding.

1.3 Prologue

Measure theory is not only a fundamental tool, they are fundamental concepts.

1.4 The mass transport problem

Monge was concerned with the cost of moving material from a mine to a construction site, Monge’sproblem (1781) is: What is the optimal way to transport a pile of sand?

Let us think of µ as the original sand mass distribution, a measure µ. We also think of the target massas a measure ν. A map T is a transport map if it pushes µ to ν. Find a map x 7→ T (x) such that the cost

1.5. COIN TOSSING 9

(distance) ∫R|x− T (x)|µ(dx)

is minimised among all transport maps. We may also use other cost functions, e.g. kinetic energy or takeinto consideration of the geometry of the space.

Example 1.1 Shift of a block of mass of height one of uniform density on [0, n] to [1, n+ 1]. There aremore than one transport maps: T (x) = x+ 1 fro any x ∈ [0, n];

T (x) =

n+ 1− x, x ∈ [0, 1)

x, x ∈ [1, n].

Example 1.2 Can one find a map shifting 1 unit of mass at location 0 to location 1 and −1 of equalmass?

No such map exists. The measures are µ = δ0, ν = 12δ1 + 1

2δ−1.

Kontorovich’s formulation (1940’s). Find a measure on Ω× Ω that minimises∫|x− y|µ(dx, dy)

and has marginals µ, ν respectively, i.e. µ(Ω×B) = ν(B), µ(A×Ω) = µ(A). This is called a transportplan.

Find a transport plan for Example 2. (Hint: this is a measure on (0, 0), (1, 0), (0, 1), (1, 1).

Optimize the transport cost involved in distribution of pastries from bakeries to coffee shops. E.g.Bakery 1 delivers 50 pastries to Cafe 1, 30 each to Cafe shops 2 and 3, Bakery 2 delivers 20 to Cafe 2,30 to cafe 3.

1.5 Coin tossing

We will need to understand what does it mean by taking limits of functions and measures.

Example 1.3 Toss a fair coin 1000 times, note down 1 if head and −1 if tail.

Law of large numbers. the average value is expected to be 0 (Converges almost surely).

Central Limit Theorems. The probability distributions of 1√n

∑ni=1Xi is approximately given by

the Gaussian curve. The convergence of the random variable is in weak sense.

1.6. INADEQUACY OF RIEMANN INTEGRALS 10

1.6 Inadequacy of Riemann Integrals

Let f : [a, b]→ R be a bounded function.

Proposition 1.6.1 A bounded function f : [a, b] → R is Riemann integrable if and only if the set ofdiscontinuity of f has Lebesque measure zero.

Example 1.4 A not Riemann integrable function is f1 : [0, 1] → R with f1(x) = 0 for x ∈ Q andf1(x) = 1 ∈ [0, 1] \ Q. Thus U(f) = 1 and L(f) = 0. Same with f2 : [0, 1] → R with f2(x) = 1 forx ∈ Q and f2(x) = 0 ∈ [0, 1] \Q.

There are more irrational points than rational points, perhaps∫ 1

0 f1(x)dx can be defined to be 1? Is thereis a quantitative statement about there are more irrational numbers than rational numbers?

There exists a sequence of bounded Riemann integrable functions fn such that fn(x) → f(x) forevery x ∈ [0, 1]. But f(x) is not Riemann integrable.

Example 1.5 Let Q = q1, q2, . . . be a enumeration of rational numbers. Define fn(x) = 1 if x ∈q1, q2, . . . , qn and set fn(x) = 0 otherwsie. Then

∫fn(x)dx = 1, and f(x) is the Dirichlet function

which is not Riemann integrable.

1.7 Observables of Random Variables

Let (Ω,F ,P) be a probability space, X be a real valued random variable. If P(X = i) = pi with∑i pi = 1,

EX =∑i

piP(X = i)

If X is a random variable with standard Gaussian normal distribution. Then either or

EX =

∫y

1√2πf(x)dx.

Both expectations are integrals∫ydµ where µ is the probability measure X∗dP, the pushed forward

measure given byX . Both are given also by∫

ΩXdP. These concepts use integration theory with respectto a measure.

Chapter 2

Measurable sets, measurable functionsand measures

What kind of subsets of the plane or the real line can be measured? The basic sets are: intervals on R;triangles, squares, polygons, squares on R2. Measuring the length of a circle is a mathematical problem,they are accomplished by ‘exhaustions’ (by the ancient Greeks 5th BC, the Achimedes of Syracus : 2-3BC), or taking limit in the modern language, c.f. the definition of definite integrals.

Given a set X , the collection of its subsets is its power set. We denote it by 2X . If a subset A canbe measured, this ought to lead to some understanding of its complement Ac = X \ A. What kind ofsubsets can be measured? What axioms should they satisfy? If A,B can be measured, it is natural to bepostulate that A ∪ B can be measured. This means we want our collections of measurable sets to be aBoolean algebra.

To build up more useful concepts, we begin with sets we can measure, gradually build up more mea-surable sets by taking complements, unions, and approximations. The classical construction of Lebesguemeasure has this flavour (for example ‘outer measures’ originated from over measurements.)

Typical spaces are: (1) finite set, X = 1, . . . , n; (2) X = Rn or X = Zn; (3) Some manifold, forexample X = Sn, the n-dimensional sphere, (4) X = T , the torus; (5) the Hilbert space X = L2([0, 1]),or (6) X = `2.

Suppose we drop a pin on a surface of area 1, how frequently does the pint fall within an an areamarked by A? With what do we measure it? –can we define a ‘uniform measure’? Let us begin with R.

Example 2.1 We want to assign a number to every subset of R so that the measure of a half open intervalis its length, and such that it is translation invariant, i.e. µ(A + t) = µ(A) for any t ∈ R and any Borelset A where

A+ t = a+ t : a ∈ A

11

2.1. SET ALGEBRA 12

is the translate of B by the number t.

Does there exist a translation invariant countably additive measure on R which assign an intervalits length? The answer is not. Let us define an equivalent relation: x ∼ y if and only if x − y ∈ Q.Using Axiom of choice we can choose the representative of the equivalent relations to be in (0, 1]. Thisquotient set V = R/ ∼ as a subset of (0, 1] is called the Vitali set. If q1 6= q2 are two rational numbers,the translates V + q1 and V + q2 are disjoint. Let A = Q ∩ (−1, 1], then

(0, 1] ⊂ ∩q∈A(A+ q) ⊂ [−1, 2].

The second inequality is trivial. Take y ∈ (0, 1], there exists q ∈ (−1, 1]∩Q = A such that y = q+ [y],where [y] is the representative of y. Thus y ∈ supq∈A(V − q) and the left hand side inclusion is proved.What can be the measure of V . Suppose it is λ. By translation invariance, (A + q) has measure λ. Thecountable number of copies of it has measure λ · ∞. But then 1 ≤ λ · ∞ ≤ 3, which is impossible forany λ ∈ [0,∞].

2.1 Set Algebra

Denote by Ac the complement of a subset A.

de Morgan’s identities:(∩A∈ΛAα)c = (∪A ∈ Λα)c

(∪A∈ΛAα)c = (∩A ∈ Λα)c

This holds for any number of sets, not necessarily countable number of.

Distribution Law

A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C).

Distribution law holds for also for infinite number of sets. For example

(⋃i

Ai) ∩ (⋃i

Bj) =⋃i,j

(Ai ∩Bj).

Limits If An is an increasing sequence of subsets, we set

limn→∞

An = ∪∞n=1An.

Similarly if An is a decreasing sequence of subsets, we define

limn→∞

An = ∩∞n=1An.

Let f : X → Y be a map, define

f−1(A) = x ∈ X : f(x) ∈ A.

Taking pre-image respects set operations:

2.2. MEASURABLE SPACE 13

Lemma 2.1.1f−1

(⋃Bi

)=⋃f−1(Bi),

f−1(⋂

Bi

)=⋂f−1(Bi),

f−1(A \B) = f−1(A) \ f−1(B).

2.2 Measurable Space

Let X be a set. We study collections of subsets of X . Let φ denote the empty set.

Definition 2.2.1 A collection of subsets of X , denoted by A, is a (Boolean) algebra if the followingproperty holds.

• (Non-empty ) φ ∈ A.

• (closed under complements) If B ∈ A then Bc ∈ A;

• (closed under finite unions) If B ∈ A, C ∈ A, then B ∪ C ∈ A.

The assumption φ ∈ A can be equally replaced by the condition that A is not empty.

We would like to be able to take limits.

Definition 2.2.2 A σ-algebra over X , denoted by F , is a collection of subsets of X such that:

• non-empty: φ ∈ F .

• closed under complements: If B ∈ F , then Bc ∈ F .

• closed under countable unions: If we have a sequence Bn with Bn ∈ F , then⋃∞n=1Bn ∈ F .

Elements of F are called measurable sets and (X ,F) is said to be a measurable space.

By de Morgan’s identities, if Bn ∈ F then ∩∞n=1Bn ∈ F . If A,B ∈ F then so does A \B.

The smallest σ-algebra is φ,X. If A ⊂ X , F = φ,X , A,Ac is a σ-algebra.

Proposition 2.2.1 Intersection of any number of σ-algebras is a σ-algebra. If C is any collection ofsubsets of a set X , then there always exists a smallest σ-algebra containing C.

This is straightforward to prove and left as an exercise. The intersection of σ-algebras is often denotedby ∧. For example, F1 ∧ F2 denote the intersection of F1 and F2.


Definition 2.2.3 If C is a collection of subsets, the smallest σ-algebra generated by it is called the σ-algebra generated by C. This is denoted by σC of σ(C).

Remark 2.2.2 Suppose F = σ(C) this does not mean every measurable set is of the form ∪∞i=1Ci withCi ∈ C. Taking for example C to be the collection (−∞, a], where a ∈ R, then σ(C) = B(R),however if Ci = (−∞, ai] are elements of C, then ∪∞i=1Ci is an interval of the form (−∞, a) or (−∞, a]

with a = supai, i ≥ 0, and the Borel set −1, 1, for instance, is not of this form.

2.2.1 Countability of σ-algebras

If X is a set, with a finite partition. by which we mean there exists A1, . . . , An disjoint with X =

∪nk=1Ak. Let F = σA1, . . . , An. Then F is generated. by unions of sets from Ai. There are 2n suchchoices: we can include or not include a particular set and every element of F will coming from such aunion, and so the σ-algebra consists of a finite number of elements.

We will see below that the σ-algebra with non-finite number of elements is uncountable. So a σ-algebra is either generated by a partition or has uncountable number of elements.

Proposition 2.2.3 A σ-algebra F is either finite or uncountable.

Proof will be released later in November.

2.2.2 Fundamental Examples and Exercises

Proposition 2.2.4 If f : Ω → X is a map, and (X ,B) a measurable space. Then f pulls back theσ-algebra G to a σ-algebra on Ω. This is the collection of pre-images

σ(f) := f−1(A) : A ∈ B

is a σ-algebra. This is called the σ-algebra generated by f , and also called the pre-image σ-algebra.

Proof. Exercise.

Example 2.2 If f : (X,B)→ (R,B(R) is of the form

f(x) =

n∑j=1

aj1Aj (x),

where Aj are pairwise disjoint subsets of X . Then σ(f) is generated by the finite collection of setsA1, . . . , An.


Example 2.3 To see the role played by the assumption that Aj are disjoint, we give an example whereAj are not disjoint. Set for example

f(x) = 1[1,2] + 31[1,3] + 41(−2,1].

Then

f(x) =

4, x ∈ (−2, 1) ∪ (1, 2]

3, x ∈ (2, 3]

8, x = 1

0, otherwise .

The σ-algebra generated by f is given by the partition:

1, (−2, 1) ∪ (1, 2], (2, 3], (−∞,−2] ∪ (3,∞).

If f(x) = 1[1,2] + 31[1,3], then σ(f) is generated by the collection of subsets:

[1, 2], (2, 3], (−∞, 1) ∪ (3,∞), φ,R.

Example 2.4 If F is a σ-algebra, and A ⊂ X then A ∩B : B ∈ F is again a σ-algebra.

Exercise 2.2.1 Given two σ-algebrasF1 andF2, we denote byF1∨F2 the smallest σ-algebra containingboth F1 and F2.

Show that F1 ∨ F2 can equivalently be characterised by the expressions:

• F1 ∨ F2 = σA ∪B |A ∈ F1 and B ∈ F2,

• F1 ∨ F2 = σA ∩B |A ∈ F1 and B ∈ F2.

Definition 2.2.4 A collection of subsets E is an elementary family if

• φ ∈ E ;

• If A,B ∈ E , then A ∩B ∈ E ;

• if A ∈ E then Ac is the union of disjoint sets from E .

Exercise 2.2.2 Show that if E is an elementary family, then the collection A of finite disjoint unions ofmembers of E is an algebra.

Proof If A,B ∈ E , then Bc = ∪nl=1Bl where Bl ∈ E . So A \B = A∩Bc = ∪nl=1(A∩Bl) ∈ A. Also,A ∪B = (A \B) ∪B ∈ A. By induction if Ai ∈ E then ∪ni=1Ai ∈ A. This means that the finite unionof sets from the elementary family is in A and A is closed under taking unions.

2.3. BOREL σ-ALGEBRAS 16

We show A is closed under complement. Let A = ∪nj=1Aj ∈ A where Aj ∈ E are disjoint. ThenAcj =

∑njm=1A

mj where Amj ∈ E . Since Ac = ∩nj=1(Acj), by associativity, so Ac is a finite union of

intersections of sets from E , therefore in A.

We call the following sets of the following form half open intervals: (a, b], φ or (a,∞), (−∞, a) andφ, where a, b are real numbers.

Example 2.5 The collection of unions of a finite number of disjoint half open intervals is an algebra.

This follows from taking E to be the collection of half open intervals and use the earlier exercise.

2.3 Borel σ-algebras

Definition 2.3.1 IfX is a complete separable metric space , the smallest σ-algebra generated by the opensets is called Borel σ-algebra, it is denoted by B(X ).

If X is a separable metric space, the metric topology satisfies the second axiom of countability, i.e. thereexists a countable base. This countable base generates B(X).

Example 2.6 We know the following about B(R), the Borel σ-algebra over R.

1. If a ∈ R, a = ∩∞n=1(a− 1n , a+ 1

n) ∈ B(R).

2. Any countable subsets of R is a Borel set.

3. Any intervals of any form is a Borel set.

4. B(R) is generated by the set of open intervals. This follows from the proposition below.

Proposition 2.3.1 Every open set of R is a countable union of disjoint open sets, this sets of disjoint setsis unique up to ordering.

Proof Denote by A the collection of open intervals. Let O be an open set. Let x ∈ O. Then x iscontained in an open interval, the interval is a subset of O. The set of the left end points of such intervalhas an infimum. Let us call it ax. Let bx denote the supremum of the right end of the intervals. ThenIx = (ax, bx) is the maximal interval such that x ∈ Ix ⊂ O Given x, y ∈ O either they are disjoint orthey overlap in which case they are the same. Pick up a rational number qx from Ix. Then the rationalnumbers are all distinct. So Ix : x ∈ O contains only a countable number of distinct open intervals.

2.3. BOREL σ-ALGEBRAS 17

2.3.1 Background material*

A topological space (X , T ) consists of a set X equipped with a topology T , i.e. a subset T ⊂ 2X suchthat:

• φ ∈ T and X ∈ T .

• If A0, A1, . . . , AN ⊂ T , then⋂Nn=0An ∈ T .

• If A ⊂ T , then⋃A∈AA ∈ T .

In other words, T is closed under arbitrary unions and finite intersections. Elements of T are called opensets. A function between topological spaces is continuous if the pre-images of open sets are open sets.

Given a topological space (X , T ), we define B(X ) to be the smallest σ-algebra on X containing T .This particular σ-algebra is called the Borel σ-algebra of X . In other words, the Borel σ-algebra is thesmallest σ-algebra such that all open sets are measurable. We denote by Bb(X ) the (Banach) space of allBorel-measurable and bounded functions from X to R equipped with the norm

‖ϕ‖∞ = supx∈X|ϕ(x)| . (2.1)

We denote by Cb(X ) the (Banach) space of all continuous and bounded functions from X to R equippedwith the same norm as in (2.1).

The open sets of a metric space gives a topology on the metric space. Borel σ-algebras on a completeseparable metric space is specially nice.

When discussing Borel σ-algebras, we will always assume that X is a complete separable metricspace.

A metric space is compact if any covering of it by open sets has a sub-covering of finite open sets. Adiscrete metric space (whose subsets are all open sets ) is compact if and only if it is finite. (e.g. Z andN with the usual distance d(x, y) = |x − y| is not compact.) A subset of a metric space is compact if itis compact as a metric space with the induced metric. It is relatively compact if its closure is compact.A metric space is sequentially compact if every sequence of its elements has a convergent sub-sequence(with limit in the metric space of course). It is totally bounded if for any ε > 0 it has a finite covering byopen balls of side ε. A metric space is complete if every Cauchy sequence converges.

It is a theorem that a metric space is compact if and only if it is complete and totally bounded. Ametric space is compact if and only if it is sequentially compact.

A subset of a metric space is relatively compact if it is sequentially compact (the limit may not needto belong to the subset).

Chapter 3

Measures

3.1 Basics of Measures

A measure assigns a number to every measurable set.

Definition 3.1.1 A measure µ on the measurable space (X ,F) is a map µ : F → [0,∞] with the fol-lowing properties;

• µ(φ) = 0.

• (σ-additive Propertices) If Ann>0 is a countable collection of elements of F that are all disjoint,then one has

µ(⋃n>0

An) =∑n>0

µ(An).

The triple (X ,F , µ) is called a measure space.

Definition 3.1.2 • We say µ is a finite measure if µ(X ) <∞.

• µ is σ-finite if there exists A1 ⊂ A2 ⊂ . . . such that µ(Ai) <∞ and X = ∪∞i=1Aj .

This is equivalent to the statement that there exists an increasing sequence of measurable sets Xnwith µ(Xn) <∞ such that X = ∪∞n=1Xn. The sequence is called an exhausting sequence.

Convention. From now on we study only σ-finite measures.

Example 3.1 If X = 1, 2, . . . , n, let F = 2X . Then (X ,F) is a σ-algebra. If we set µ(i) = ai, thisdefines a measure.

Proposition 3.1.1 Let (Ω,F) be a measurable space. Then

19

3.1. BASICS OF MEASURES 20

1. If A,B ∈ F and A ⊂ B then µ(B \A) = µ(B)− µ(A).

2. Monotonicity. If A ⊂ B then µ(A) ≤ µ(B).

3. (Finite) additivity. If A1, A2 ∈ F and A1 ∩A2 = φ then µ(A1 ∪A2) = µ(A1) + µ(A2).

4. Continuity from below. If An ∈ F , An increases, then µ(∪∞n=1An) = limn→∞ µ(An).

5. Continuity from above. If An ∈ F , An decreases and µ(A1) <∞, then

µ(∩∞n=1An) = limn→∞

µ(An).

Proof Claim 1 follows from that B = A ∪ (B \ A) is a disjoint union whenever A ⊂ B. Henceµ(B) = µ(A) + µ(B \A).

Claim 2 follows by claim 1, claim 3 follows from taking Ai = φ for i ≥ 3.

We prove claim 4. If A1 ⊂ A2 ⊂ A3 ⊂ ..., set A0 = φ, A = ∪∞i=1Ai Also set

B1 = A1, B2 = A2 \A1, B3 = A3 \A2, . . . ,

and so on. Then Bi are pairwise disjoint and ∪∞i=1Bi = ∪∞i=1Ai = A. Then,

µ(A) =∞∑i=1

µ(Bi) = limn→∞

n∑i=1

µ(Bi) = limn→∞

n∑i=1

[µ(Ai)− µ(Ai−1)] = limn→∞

µ(An).

Prove claim 5. We assume that µ(A1) <∞ and proceed as for claim 3.

Example 3.2 1. Let (X ,G) be a measure space and x ∈ X . The δ measure at x is defined by:

δx(A) =

1, if x ∈ A,0, if x 6∈ A.

2. A discrete (probability) measure P on a countable space Ω = ω1, ω2, . . . is a sequence ofnumbers pi ∈ [0, 1] (summing up to 1) and

P (A) =

∞∑i=1

piδωi(A).

3. Let Y be a random variable on a two state space X = 0, 1 with distribution P(Y = 0) = p andP(Y = 1) = 1− p. Then the probability distribution of Y is a measure on X given by:

PY := p δ0 + (1− p) δ1.

3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 21

Exercise 3.1.1 If µ : F → [0,∞] satisfies µ(φ) = 0, finite additive property, and continuity from below,then it is a measure.

We consider which subset of F determines a measure uniquely.

Example 3.3 The interval [0, 1] equipped with its Borel σ-algebra and the Lebesgue measure is a prob-ability space.

Example 3.4 The half-line R+ equipped with the measure

P(A) =

∫Ae−x dx

is a probability space. In such a situation, where the measure has a density with respect to Lebesguemeasure, we will also use the short-hand notation P(dx) = e−x dx.

Example 3.5 Given a ∈ Ω, the measure δa defined by

δa(A) =

1 if a ∈ A,0 otherwise.

is a probability measure.

3.2 Construction of measures from pre-measures

Definition 3.2.1 Given a measure space (M,F , µ), F is said to be complete with respect to a µ if itcontains every subset of its null sets. The completion of F with respect to µ, denoted by F , is thesmallest σ-algebra augmented with all subsets of null sets of F . We can then extend µ to F by setµ(F ) = 0 whenever F is a subset of a null set from F . This is called the completion of µ, the extensionis said to be a complete measure.

A measurable set on a measure space is said to be a ‘null set’ if it has measure zero.

LetAi be two collections of subsets, %i : Ai → [0,∞]. We say %2 is an extension of %1 ifA2 containsA1 and %1(A) = %2(A) whenever A ∈ A1.

3.2.1 Outer Measure

Definition 3.2.2 A function µ∗ : 2X → [0,∞] is said to be an outer measure over X if we have:

(1) µ∗(φ) = 0,


(2 ) Monotonicity. If A ⊂ B then µ∗(A) ≤ µ∗(B).

(3) countable sub-additivity.

µ∗(∪∞n=1An) ≤∞∑n=1

µ∗(An).

Exercise 3.2.1 • Let µ∗(φ) = 0 and µ∗(A) = ∞ if A ⊂ X is not empty. Show that µ∗ is anouter-measure.

• If an outer measure µ∗ satisfies that µ∗(A ∪B) = µ∗(A) + µ∗(B) for any A,B ∈ F where F is aσ-algebra, what can you say about µ∗?

Definition 3.2.3 Let A be a set of subsets containing φ,X . A map A → [0,∞) is a pre-measure if

• %(φ) = 0, and

• for any An ∈ A with ∪∞n=1An ∈ A,

%(∪∞n=1An) =∞∑j=1

%(An).

Outer measures can be obtained from any suitable function on a given collection of subsets, by cov-ering a set with sets from this collection. A collection of set Aα, α ∈ Λ is said to be a cover of a set Eif E ⊂ ∪α∈ΛAα. We are interested in covers consisting of a countable number of elements.

Proposition 3.2.1 Let A be a non-empty collection of subsets with φ ∈ A,X ∈ A. Let % : A → [0,∞]

be a function such that %(φ) = 0. We define a function on 2X as below. For any E ⊂ X ,

µ∗(E) = inf

∞∑j=1

%(Aj) : E ⊂ ∪∞j=1Aj , Aj ∈ A

. (3.1)

Then µ∗ is an outer measure.

Proof (1) Since every set is covered by X, the definition makes sense, and µ∗(φ) = 0. (2) If A ⊂ B,then every cover of B is a cover A and so µ∗(B) ≥ µ∗(A).

(3) To show countable sub-additivity, take any sequence of subsets Aj . For any ε > 0 there existBkj ∈ A with Aj ⊂ ∪kBk

j such that

µ∗(Aj) ≥∞∑k=1

%(Bkj )− ε

2j.


Summing over j,∞∑j=1

µ∗(Aj) ≥∞∑j=1

∞∑k=1

%(Bkj )− ε.

Since Bkj is a cover of ∪∞j=1Aj ,

µ∗(∪∞j=1Aj) ≤∞∑j=1

∞∑k=1

%(Bkj ).

Hence, ∑j

µ∗(Aj) ≥ µ∗(∪∞j=1Aj)− ε

for any ε > 0. Take ε→ 0 to see µ∗(∪∞j=1Aj) ≤∑

j µ∗(Aj), thus completing the proof.

How do we obtain additive from sub-additive? Let us single out sets that a strong additive property.Take a set A, we have two collections of subsets of A and the collection of subsets of Ac. If we take onefrom each collection, we want to prove it is additive.

Definition 3.2.4 A subset A of X is said to be µ∗-measurable (in the sense of Caratheodory) if for anyset B ⊂ X , we have

µ∗(B) = µ∗(B ∩A) + µ∗(B ∩Ac). (3.2)

We have already sub-additivity, the non-trivial part of (3.2) is

µ∗(B) ≥ µ∗(B ∩A) + µ∗(B ∩Ac). (3.3)

Theorem 3.2.2 [Caratheodory Theorem] If µ∗ is an outer measure on X and G the collection of µ∗-measurable sets, then

(1) G is a σ-algebra.

(2) The restriction of µ∗ to G is a measure.

(3) G is complete w.r.t. µ∗.

Proof The proof is deferred to §3.2.3.

3.2.2 Extension of pre-measures

Proposition 3.2.3 If A is an algebra, and % is a pre-measure on A, we define µ∗ by (3.1). Then thefollowing statements hold.


1. Every set in A is µ∗-measurable.

2. µ∗ and % agree on A.

In particular, µ∗ is a measure on σ(A).


Proposition 3.2.4 * If % is a pre-measure on an algebra A, we define µ∗ by (3.1) and % is σ-finite, thenµ∗ is its unique extension to G and therefore to σ(A).


3.2.3 Proof for Caratheodory Theorems

We return to prove some of the theorems and propositions given earlier.,

Theorem 3.2.2 (Caratheodory Theorem) If µ∗ is an outer measure on X , then the collection G ofµ∗-measurable sets is a σ-algebra, the restriction of µ∗ to G is a measure, and G is complete w.r.t. µ∗.

Proof Throughout the proof let us write µ = µ∗ for simplicity. (1) φ ∈ G and µ∗(φ) = 0. (2) Since theroles played by A and Ac are symmetric, A ∈ G implies that Ac ∈ G.

(3) Suppose A,F ∈ G, we show A ∪ F ∈ G.

Take any B ∈ X . We apply (3.2) first with A ∈ G and then with F ∈ G:

µ(B) = µ(B ∩A) + µ(B ∩Ac)= µ(B ∩A ∩ F ) + µ(B ∩A ∩ F c) + µ(B ∩Ac ∩ F ) + µ(B ∩Ac ∩ F c).

We useA∪F = (A∩F )∪(A∩F c)∪(Ac∩F ), as well as (A∪F )c = Ac∩F c, and apply sub-additivity

µ(B) = µ(B ∩A ∩ F ) + µ(B ∩A ∩ F )c + µ(B ∩Ac ∩ F ) + µ(B ∩Ac ∩ F c)≥ µ(B ∩ (A ∪ F )) + µ(B ∩ (A ∪ F )c).

This shows that A ∪ F ∈ G. Also, if A,F ∈ G are disjoint,

µ(A ∪ F ) = µ((A ∪ F ) ∩ F ) + µ((A ∪ F ) ∩ F c) = µ(F ) + µ(A).

(4) We show countable additivity. Let An ∈ G, we show A = ∪∞n=1An ∈ G. Set Fn = ∪nj=1Aj .Since G is stable under finite unions and under taking the complement, we may assume that the An are


disjoint. Then for any B ∈ X , using Aj ∈ G,

µ(B ∩ Fn) = µ(B ∩ Fn ∩An) + µ(B ∩ Fn ∩ (An)c)

= µ(B ∩An) + µ(B ∩ Fn−1)

= µ(B ∩An) + µ(B ∩ Fn−1 ∩An−1) + µ(B ∩ Fn−1 ∩Acn−1)

= µ(B ∩An) + µ(B ∩An−1) + µ(B ∩ Fn−2) = . . .

=

n∑j=1

µ(B ∩Aj).

Thus using Fn ∈ G,

µ(B) = µ(B ∩ Fn) + µ(B ∩ (Fn)c) =

n∑j=1

µ(B ∩Aj) + µ(B ∩ (Fn)c)

≥n∑j=1

µ(B ∩Aj) + µ(B ∩Ac).

Taking n→∞, then using ∪(B ∩Aj) = B ∩A and sub-additivity,

µ(B) ≥∞∑j=1

µ(B ∩Aj) + µ(B ∩Ac)

≥ µ(B ∩A) + µ(B ∩Ac) ≥ µ(B).

Hence A ∈ G and∑∞

j=1 µ(B ∩ Aj) = µ(B ∩ A) for any B ∈ G. Take B = A to conclude theσ-additivity. We proved that G is a σ-algebra and µ is a measure on it.

(5) Finally if µ(A) = 0 and F ⊂ A, then for any B ⊂ X ,

µ(B) ≤ µ(B ∩ F ) + µ(B ∩ (F )c) ≤ 0 + µ(B).

Hence all the inequalities above are equalities,

µ(B) = µ(B ∩ F ) + µ(B ∩ (F )c)

and F ∈ G. Therefore G contains all null sets of µ and µ(F ) = 0.

In the sequel, we shall assume A to be an algebra. In that case, we can use the following, convenientlemma:

Lemma 3.2.5 LetA be an algebra and % : A → [0,∞] an additive function. In words, for all A,B ∈ Adisjoint, we have:

%(A ∪B) = %(A) + %(B).

Then % is


1. monotone: for any A,B ∈ A with A ⊂ B, %(A) ≤ %(B)

2. sub-additive: for any A1, . . . , An ∈ A, %(⋃ni=1) ≤

∑ni=1 %(Ai)

Proof By additivity for all A,B ∈ A with A ⊂ B, we have

%(B) = %(A) + %(A \B) ≥ %(A),

which yields monotonicity. Now, for A1, . . . , An ∈ A, we set

Bi = Ai \

i−1⋃j=1

Aj

∈ A, i = 1, . . . , n,

so that the Bi are disjoint. Hence, using successively the additivity and monotonicity of %,

%

(n⋃i=1

Ai

)= %

(n⋃i=1

Bi

)=

n∑i=1

% (Bi) ≤n∑i=1

% (Ai) ,

which yields the requested sub-additivity.

Proposition3.2.3. If % is a pre-measure on an algebra A, we define µ∗ by (3.1). Then:

1. µ∗ = % on A.

2. every set in A is µ∗-measurable.

In particular, µ∗ is a measure on σ(A).

Proof (1) Let B ∈ A, then B is a cover for itself and µ∗(B) ≤ %(B). As before write µ = µ∗ forsimplicity. We now prove the reverse inequality. Suppose that B ⊂ ∪nAn where An ∈ A. ThenB = ∪n(B ∩ An). Now B ∩ An ∈ A, and B ∈ A, we apply σ-additivity for pre-measures, andmonotonicity,

%(B) =∞∑n=1

%(B ∩An) ≤∞∑n=1

%(An).

Take infimum over all covers,

%(B) ≤ inf∞∑n=1

%(An) = µ(B).

We have showed that % and µ agree on A.

(2) We now show that any A ∈ A is µ∗-measurable. Take any B ⊂ X , we want to show

µ∗(B) ≥ µ∗(A ∩B) + µ∗(Ac ∩B), (3.4)


For any ε > 0 choose a cover Bj of B from A with

µ∗(B) ≥∞∑j=1

%(Bj)− ε.

We first use A ∈ A ⊂ G, % = µ∗ on A and σ sub-additivity of µ, and

∞∑j=1

%(Bj) =∞∑j=1

%(A ∩Bj) +∞∑j=1

%(Ac ∩Bj)

≥ µ∗(A ∩ ∪jBj) + µ∗(Ac ∩ ∪jBj)≥ µ∗(A ∩B) + µ∗(Ac ∩B).

In the last step we use ∪jBj ⊃ B and the monotonicity of the outer measure. Therefore

µ∗(B) ≥ µ∗(A ∩B) + µ∗(Ac ∩B)− ε.

Taking ε→ 0, we obtain

µ∗(B) ≥ µ∗(A ∩B) + µ∗(Ac ∩B),

so that B is µ∗-measurable (the reverse inequality follows by sub-additivity).

Proposition 3.2.6 * Let % be a σ-finite pre-measure on an algebra A, and define µ∗ by (3.1). Then µ∗ isthe unique measure extending % to G, and therefore the unique measure on σ(A), extending %.

Proof Step 1. We first show that if ν is another measure on the µ∗-measurable set G extending %, thenν ≤ µ∗. Indeed if E ∈ G, E ⊂ ∪Aj where Aj ∈ A, then by sub-additivity,

ν(E) ≤ ν(∪Aj) ≤∑j

ν(Aj) =∑j

%(Aj).

This holds for any covering of E by sets from A, hence

ν(E) ≤ µ∗(E).

Step 2. Suppose that E ∈ G is such that µ∗(E) < ∞, we show that µ∗(E) = ν(E). We can find acover Aj of E fromA with

∑j µ∗(Aj) ≤ µ∗(E) + ε. We may and will assume that the cover is disjoint.

Set A = ∪∞j=1Aj . Since E ⊂ A,

µ∗(A \ E) = µ∗(A)− µ∗(E) ≤∑j

µ∗(Aj)− µ∗(E) < ε. (3.5)


Also, since E ⊂ A,µ∗(E) ≤ µ∗(A)≤

∑j

µ∗(Aj)

agree on A=

∑j

ν(Aj) = ν(A)

= ν(E) + ν(A \ E)

ν≤µ∗≤ ν(E) + µ∗(A \ E) ≤ ν(E) + ε,

where we have used (3.5) in the last inequality. Taking ε→ 0, we see that µ∗(E) = ν(E).

Step 3. Let % be σ-finite. By the assumption there exists an increasing sequence of sets Xn from Asuch that %(Xn) <∞ and

⋃n≥1Xn = X . Then using step 2 we can complete the proof (exercise).

3.2.4 Construction of Lebesgue-Stieljes / Lebesgue measure

Definition 3.2.5 A collection of subsets E is an elementary family if

• φ ∈ E ;

• If A,B ∈ E , then A ∩B ∈ E ;

• if A ∈ E then Ac is a finite union of disjoint sets from E .

Exercise 3.2.2 Show that if E is an elementary family, then the collection A of finite disjoint unions ofmembers of E is an algebra.

In this section, by half open interval we mean sets of the form (a, b], (c,∞), (−∞, b], or φ, wherea, b, c are finite numbers. For simplicity we write a generic half interval as (c, d], where c ∈ R∪ −∞.Also d is a real number, or∞. In the latter case by (c,∞] we really mean (c,∞] ∩ R.

Definition 3.2.6 Henceforth, in this section, let E denote the collection of half open intervals and let Abe the collection of finite unions of disjoint half open intervals.

We have seen that

Proposition 3.2.7 A is an algebra.

Remark 3.2.8 Every element of A can be written as finite disjoint union of maximal intervals (thismeans the interval cannot be extended within A):

A =n⋃k=1

(ak, bk], (3.6)


this is unique if we order them by bk < ak+1. Let us call this the canonical representation. If A is notbounded from above, bn =∞, by (an, bn] we mean (an, bn] ∩ R.

Let F : R → R be an increasing and right continuous function. Then it has at most a countablenumber of discontinuities, all of which are of jump type (i.e. left and right limits exist). We define

F (∞) = limx→∞

F (x), F (−∞) = limx→−∞

F (x). (3.7)

Definition 3.2.7 For A =⋃nj=1(aj , bj ] ∈ A, with bj < aj+1 for j = 1, . . . , n, we set

%(A) =

n∑j=1

(F (bj)− F (aj)). (3.8)

We also set %(φ) = 0.

A partition of a set F is a collection of disjoint subsets of A whose union is A.

Remark 3.2.9 This definition is independent of the expression for A. Indeed, suppose that A = (c, d] iswritten also as A =

⋃nj=1(aj , bj ], then up to a re-ordering,

c = a1 < b1 = a2 < b2 = a3 < · · · < bn = d.

This gives a telescopic sum,

n∑j=1

(F (bj)− F (aj)) = F (bn)− F (a1) = F (d)− F (c).

So when A is a single interval, there is no ambiguity in the definition of %.

If A =∑p

k=1(ck, dk], a canonical expression, so each interval is a maximal interval, then there ispartition Λk, k = 1, . . . , p of 1, . . . , n such that∑

j∈Λk

(aj , bj ] = (ck, dk].

In fact, it suffices to consider

Λk := j = 1, . . . , n : (aj , bj ] ⊂ (ck, dk].

Hencen∑j=1

%((aj , bj ]) =

p∑k=1

∑j∈Λk

%(aj , bj ] =

p∑k=1

%((ck, bk]).

Proposition 3.2.10 If F : R→ R is an increasing and right continuous function, then % defined by (3.8)is a pre-measure on A.


Proof We have %(φ) = 0, so it remains to prove σ-additivity. We start by showing that % is additive. LetA,B ∈ A be disjoint. Then there exists Ej ∈ E such that

A =n⋃j=1

Ej , B =m⋃

j=n+1

Ej .

Then

%(A ∪B) =

m∑j=1

%(Ej) =

n∑j=1

%(Ej) +

m∑j=n+1

%(Ej) = %(A) + %(B).

Hence % is indeed additive. By Lemma 3.2.5, we in particular deduce that % is monotone and sub-additive.

Step 2. Let now (Aj)j≥1 be a sequence of disjoint elements of A such that

A :=⋃j≥1

Aj

is in A. Using monotonicity followed by finite additivity,

%(A) ≥ %(

n⋃j=1

Aj) =

n∑j=1

%(Aj).

Since this holds for every n,

%(A) ≥∞∑j=1

%(Aj).

Step 3. We now prove the reverse inequality. Since A ∈ A,

A =

n⋃k=1

(ak, bk]. (3.9)

We show that %(A) =∑

j=1 %(Aj). As argued before, in Remark 3.2.9, by additivity of % we reduce thisto the case A is one single half open interval.

Step 4. We first assume A = (c, d] is bounded. Since, for all j ≥ 1, Aj is a finite union ofbounded disjoint half open intervals, the whole collection of these intervals is countable and we write it(ak, bk], k ≥ 1. In particular, we have

∞⋃j=1

Aj =

∞⋃k=1

(ak, bk], and∞∑j=1

%(Aj) = %(∞⋃j=1

Aj) =∞∑k=1

%((ak, bk]).

The idea is to approximate (c, d] by a compact interval [c + δ, d] and (ak, bk] by a larger open interval(ak, bk + δk), we can then choose a finite cover and use finite additivity.

Let ε > 0 such that ε < d− c. By the right-continuity of F , we may choose δ > 0 so that

F (c+ δ)− F (c) < ε.


Similarly, for all k ≥ 1, we choose a δk so that

F (bk + δk)− F (bk) ≤ 2−kε.

The compact set [c+δ, d] is covered by the sets (ak, bk] and therefore by the larger open sets (ak, bk+δk).We can choose a finite covering, so that for some N ,

[c+ δ, d] ⊂ ∪Nk=1(ak, bk + δk).

By monotonicity and sub-additivity of %,

%((c, d]) = F (d)− F (c) = F (d)− F (c+ δ) + F (c+ δ)− F (c) ≤ %([c+ δ, d]) + ε

≤ %(∪Nk=1(ak, bk + δk)

)+ ε

≤N∑k=1

%((ak, bk]) +N∑k=1

ε

2k+ ε

≤∞∑j=1

%(Aj) + 2ε.

( Since ∪Nk=1(ak, bk + δk) ⊂ ∪Nk=1(ak, bk + δk] ⊂ ∪jAj .)

Taking ε→ 0, we conclude that

%(A) = %((c, d]) ≤∑j

%(Aj)

as requested. Together with step 2, we have proved

%(A) =∑j

%(Aj)

for A a bounded half open interval.

Step 5. Now assume A = (−∞, b]. Then, for every n with −n < b,

%(A) = F (b)− F (−∞) = %((−n, b]) + F (−n)− F (−∞).

Note that

(−n, b] = (−n, b] ∩∞⋃j=1

Aj =∞⋃j=1

(−n, b] ∩Aj .

Since (−n, b] ∩Aj are disjoint half open sets, we use step 4 to conclude that

%((−n, b]) =

∞∑j=1

%((−n, b] ∩Aj).


Hence

%(A) =∞∑j=1

%((−n, b] ∩Aj) + F (−n)− F (−∞) ≤∞∑j=1

%(Aj) + F (−n)− F (−∞).

Taking n→∞, since F (−n)→ F (−∞) we have proved the reverse inequality. The same can be shownfor A = (a,∞). We have completed the proof for the σ-additivity for the case where the countabledisjoint union is in A.

As in (3.1), we define

µ∗(E) = inf

∞∑j=1

%(Aj) : E ⊂∞⋃j=1

Aj , Aj ∈ A

.

Since each A ∈ A is a finite disjoint union of members in E , by re-arranging we have

µ∗(E) = inf

∞∑j=1

%(Ej) : E ⊂∞⋃j=1

Ej , Ej ∈ E

= inf

∞∑j=1

%((aj , bj)) : E ⊂∞⋃j=1

(aj , bj)

.

Theorem 3.2.11 There exists a complete measure µF on the Borel σ-algebra such that µF = % on A.This is the unique measure on B(R) that agrees with % on A.

Proof This follows from the Caratheodory Theorems.

Definition 3.2.8 • The measure µF constructed in the theorem above is called Lebesgue-Stieljesmeasure associated to F .

• If F is the identity map, this is called the Lebesgue measure on R and will be denoted by λ.

Remark 3.2.12 If A = ∪∞j=1(aj , bj ] is a disjoint union, then µF (A) =∑∞

j=1 %((aj , bj ]). This followsfrom that the union is a Borel set, and σ-additive property fo the measure. The subscript F will beomitted if there is no risk of confusion.

Theorem 3.2.13 If B is a µ∗-measurable set, then

µF (B) = infµF (O) : B ⊂ O, O is open= supµF (D) : D ⊂ B, D is closed.

This holds in particular if B is a Borel measurable set.


Proof This is left as exercise.

Recall that B∆A = (B \A)⋃

(A \B).

Remark 3.2.14 If B is µ∗-measurable set with finite measure, then for every ε > 0 there exists a set Awhich is a finite union of open intervals, such that µF (B∆A) < ε.

3.2.5 Examples

Example 3.6 Let b ∈ R, then

µF ((a, b)) = µF ((a, b])− µF (b).

Observe that µF (b) 6= 0 precisely when b is a discontinuity of F . Indeed,

µF (b) = µF (∩∞n=1(b−1

n, b]) = lim

n→∞µF ((b− 1

n, b]) = F (b)− lim

n→∞F (b− 1

n) = F (b)−F (b−).

Example 3.7 Since F (x) = x is continuous, any countable set is Borel measurable and has Lebesguemeasure zero.

Example 3.8 Let

F (x) =

a, x < 0,

a+ 3, x ≥ 0.

Then F (x) = 3δ0. It is clear that for any interval,

µF ((a, b]) =

3, if 0 ∈ (a, b],

0, if 0 6∈ (a, b],

To see that for any Borel set B,

µF (B) =

3, if 0 ∈ B,0, if 0 6∈ B,

we use the uniqueness of extensions of % to the Borel σ-algebra and the fact that µF and 3δ0 agree onA.

Example 3.9 If Y is a random variable on a probability space (Ω,F ,P), set F (x) = P(Y ≤ x). ThenF is right continuous and increasing, it determines a Lebesgue-Stieljes measure µF . Since F (−∞) = 0,Also,

µF ((a, b]) = F (b)− F (a) = P(a < Y ≤ b).

Definition 3.2.9 1. A map f from one measurable space (X1,B1) to another measurable space (X2,B2)

is said to be measurable if f−1(B) ∈ B1 for any B ∈ B2. (In short , thia is f−1(B2) ⊂ B1.)

3.3. LEBESGUE MEASURE ON RN 34

2. We will call a function f between two metric spaces /topological spaces (Borel) measurable if thepre-image of a Borel set is a Borel set.

3. We also use Borel measurable for a measurable function with values in a topological space, and thetopological space is given the Borel σ-algebra.

Definition 3.2.10 Let (X1,B1) and (X2,B2) be measurable spaces, and µ1 a measure on (X1,B1). Letf : X1 → X2 be a measurable function. We define a measure µ2 on B2 by setting

µ2(A) = µ1(f−1(A)) = µ1(x : f(x) ∈ A).

This is called the pushed forward measure and denoted by f∗(µ1) or sometimes as µ−1.

Exercise 3.2.3 Show that f∗(µ1) is a measure.

If X : Ω→ R, we recognise X∗P as the probability distribution of the random variable. For example, ifX : Ω→ 0, 1, then X∗P(0) = P(X = 0) and X∗P = P(X = 0)δ0 + P(X = 1)δ1.

Exercise 3.2.4 Let f : Ω → X and suppose that f−1(A) ∈ F for every open set A. Then f−1(A) ∈ Ffor every Borel set A.

Proof Define F0 = A ∈ B(X ) | f−1(A) ∈ F. It is a σ-algebra (check!, c.f. Lemma 2.1.1) andcontains open sets of X and thus contains all Borel sets.

Proposition 3.2.15 Let X and Y be two topological spaces and let f : X → Y be continuous. Then f is(Borel) measurable.

Proof Since f is continuous, f−1(A) is open for every open set A,. This means if A is open thenf−1(A) ∈ B(X ). Set F0 = A ∈ B(Y) | f−1(A) ∈ B(X ). Then F0 contains all open sets. It is easyto check F0 is a σ-algebra. Thus F0 contains the σ-algebra generated by open sets which is the Borelσ-algebra B(Y), concluding f is Borel measurable.

3.3 Lebesgue Measure on Rn

By a similar construction, we obtain a Lebesgue measure on Rn. We do not get into details.

Definition 3.3.1 There is a unique measure λ on B(Rn) such that, for all a1, . . . , an, b1, . . . , bn ∈ Rwith ai ≤ bi,

λ((a1, b1]× · · · × (an, bn]) = Πni=1(bi − ai).

This is called the Lebesgue measure on Rn, it extends to the completion of B(Rn).

Remark 3.3.1 The Lebesgue measure is invariant under translations and rotations.

3.4. UNIQUENESS OF EXTENSIONS 35

3.4 Uniqueness of Extensions

We often know that a property holds for a sub-collection of a σ-algebra, want to show it holds for everyset in the σ-algebra. The difficulty with this is the σ-algebras are in generally complicated. If we have afinite partition of X , the σ-algebra contains a finite number of sets. Otherwise it has uncountable numberof elements. See §2.2.1.

There are a number of powerful techniques, one of them is the π − λ theorem.

3.4.1 π − λ theorem

Definition 3.4.1 A non-empty collection of sub-setC is a π-system ifA,B ∈ C implies thatA∩B ∈ C.

Definition 3.4.2 A collection C of sub-sets is called a λ-system if

1. Ω ∈ C

2. If A,B ∈ C and A ⊂ B then B \A ∈ C.

3. If An ∈ C, An increases to A, then A ∈ C.

A σ-algebra is a π-system and a λ system.

Proposition 3.4.1 (not given in class) 1. Show that the intersection of any number of λ-systems is aλ-systems. Therefore given any collections of subsets there exists a small λ-system containing it.

2. If a λ system A is also a π-system, then it is a σ-algebra.

Proof Part (1) is trivial. To see a set which is simultaneously a π-system and a λ-system is a σ-algebra, remark that taking unions is obtained from taking intersections and complements. In turn,taking countable unions is obtained from taking finite unions and monotone limits, whence the claim.

Theorem 3.4.2 (not given in class) If C is a π-system, then the smallest λ-system generated by C, λ(C),is a σ-algebra, and λ(C) = σ(C).

Proof FirstlyF1 : = A : A ∩ C ⊂ λ(C)

= A ∈ λ(C) : A ∩ F ∈ λ(C),∀F ∈ C

is a λ-system containing C, which implies F1 = λ(C). Indeed, let F ∈ C, then (1) Ω ∩ F = F ∈ C, andso Ω ∈ F1; (2) If A ⊂ B, both in F1,

(B \A) ∩ F = B ∩ F \ (A ∩ F ) ∈ F1

3.5. MEASURE DETERMINING SETS 36

(3).If An ∈ F1, An ∩ F ∈ λ(C), so (⋃An) ∩ F =

⋃n(An ∩ F ) ∈ λ(C).

Similarly,F2 : = A : A ∩ λ(C) ⊂ λ(C)

= A ∈ λ(C) : A ∩ E ∈ λ(C),∀E ∈ λ(C)

is a λ-system containing C, and therefore F2 = λ(C). Thus λ(C) is both a λ-system and a π-system,and is therefore a σ-algebra. Since it contains C, we obtain σ(C) ⊂ λ(C). But, since any σ-algebra is aλ-system, we also have λ(C) ⊂ σ(C), and the equality follows.

We present now a very useful lemma:

Lemma 3.4.3 [π − λ Theorem.] If a λ-system contains a π-system C, then it contains the σ-algebragenerated by C.

3.5 Measure determining sets

Exercise 3.5.1 Let µ be a measure on a measurable space (X ,F). Show the following two statementsare the equivalent.

• There exists a sequence of set An with ∪An = X and µ(An) <∞.

• There exists an increasing sequence of set An with ∪An = X and µ(An) <∞.

Just set Cn = ∪nk=1Ak.

Exercise 3.5.2 Show that if µ, ν are σ-finite measures on a measurable space (X ,F), then there exists acommon sequence of Cn with both µ(Cn) <∞ and ν(Cn) <∞ for each n.

Proof. Let An, Bn be two sequences of sets increasing to X , and such that µ(An), ν(Bn) finite for everyn. By distribution law, X = ∪i,j(Ai ∩Bj). It is clear both µ, ν are finite on Ai ∩Bj .

Theorem 3.5.1 Suppose that (Ω,F) is a measurable space, and C is a π-system generating F . Let µand ν be two measures which agree on C.

1. If µ(Ω) = ν(Ω) <∞, then µ = ν

2. More generally, if there exists an increasing sequence of subsets Ωk ∈ C, such that Ω = ∪k≥1Ωk

and µ(Ωk) = ν(Ωk) <∞ for all k ≥ 1, then µ = ν.


Proof (1) First assume that µ(Ω) = ν(Ω) <∞. Let

G = A ∈ F : µ(A) = ν(A).

Then Ω ∈ G by assumption. Moreover, if An is an increasing sequence of measurable sets,

µ(∞⋃n=1

An) = limn→∞

µ(An) = limn→∞

ν(An) = ν(∞⋃n=1

An)

Thus, G is closed under taking lower limit. Let A ⊂ B, A,B ∈ G, then by additive property,

µ(B \A) = µ(B)− µ(A) = ν(B)− ν(A) = ν(B \A).

This means B \ A ∈ G, and therefore G is a λ-system containing a π-system generating F . By theπ − λ-Theorem, G ⊃ F and µ = ν on F , which proves the first point.

(2) For the second point, let Fk = A ∩ Ωk : A ∈ F denote the trace σ-algebras on Ωk, and denoteby µk and νk the restrictions to Ωk of the measures µ and ν:

∀A ∈ F , µk(A) = µ(A ∩ Ωk), νk(A) = ν(A ∩ Ωk).

Applying the first point to µk and νk, we deduce that µk = νk. Therefore, by lower contiunity ofmeasures, we obtain, for all A ∈ F ,

µ(A) = limk→∞

µ(A ∩ Ωk) = limk→∞

ν(A ∩ Ωk) = ν(A),

completing the proof.

Example 3.10 Let Ω = 1, 2, 3, 4. Let G = 1, 2, 1, 3, 3, 4, which generates the discreteσ-algebra F ( the power set), but not a π-system. The measure µ and ν agree on G, not on F .

µ(1) = 1/6, µ(2) = 2/6, µ(3) = 1/6, µ(4) = 2/6,

ν(1) = 2/6, ν(2) = 1/6, ν(3) = 0, ν(4) = 3/6.

Theorem 3.5.2 If A ∈ B(R), so does t+A. The Lebesgue measure is translation invariant on the Borelσ-algebra.

Proof LetC = A ∈ B(R) : t+A ∈ B(R), ∀t ∈ R.

Since t+ Ω = Ω, Ω ∈ C. If A ⊂ B with A,B ∈ C, so t+B ∈ B(R), t+A ∈ B(R),

t+ (B \A) = x+ t : x ∈ B, x 6∈ A = (t+B) \ (t+A) ∈ B(R).


If An ∈ C is an increasing sequence,⋃nAn ∈ B(R), and t+An ∈ B(R), so

t+⋃n

An =⋃n

(t+An) ∈ B(R).

Thus⋃nAn ∈ C By π − λ theorem, any shift of a Borel set is a Borel set.

Now we show µ(t+B) = µ(B). Let

A = A ∈ B(R) : λ(t+A) = λ(A), ∀t ∈ R.

Then A contains the π-system of finite unions of disjoint half intervals. Now Ω ∈ A. If A ⊂ B withA,B ∈ A,

t+B \A = (t+B) \ (t+A),

t+A ⊂ t+B, hence

µ(t+B \A) = µ(t+B)− µ(t+A) = µ(B)− µ(A) = µ(B \A).

Thus B \A ∈ A.

If An ∈ A is an increasing sequence, so is t+An, and⋃n(t+An) = t+

⋃nAn,

µ(t+∞⋃n=1

An) = limn→∞

µ(t+An) = limn→∞

µ(An) = µ(⋃n

An).

Thus A is a λ system, concluding the proof.

Proposition 3.5.3 Two non-trivial translation-invariant Borel measures on R which assign a finite mea-sure to bounded intervals are constant multiples of each other.

Proof Let µ and ν be translation invariant measures. Then, for all n, p, q ≥ 1,

µ((0, 1]) = qµ((0,1

q]), µ([0, p/q]) = pµ((0,

1

q]) = (p/q)µ((0, 1]).

Similarly, ν([0, n)) = (p/q)ν((0, 1]). The measurements ν((0, 1]) and ν((0, 1]) are non-zero, for other-wise the measure is trivial. Set c = µ([0, 1)]/ν([0, 1).

Then µ([0, p/q]) = (p/q)µ([0, 1)] = cν([0, p/q)). Hence, µ(I) = cν(I) for all I ∈ C, where Cdenotes the collection of all half-open intervals with finite rational length. Now, C is a π-system andσ(C) = B(R). Moreover, we have ∪k≥1(−k, k] = R, and µ and cν assign the same finite mass to(−k, k] ∈ C for all k ≥ 1. By Theorem 3.5.1, we deduce that µ = cν.

Chapter 4

Measurable functions

A map f from one measurable space (Ω,F) to another measurable space (X ,G) is said to be measurableif the pre-image of any measurable set is a measurable set. We will call a function f between twotopological spaces (Borel) measurable if f−1(A) is a Borel set for every Borel set A.

Let (X ,B) be a measurable space and Ω a set. A function f : Ω→ X is always measurable when Ω

is given the the pre-image σ-algebra σ(f), this is the smallest σ-algebra on Ω such that f : Ω → X ismeasurable.

4.1 Examples

Example 4.1 If X is a discrete space, let the power set be its σ-algebra. Let Y be another space with aσ-algebra. Then any function f : X → Y is measurable.

If f is identically 1 on a set A and zero everywhere else, it is called the indicator function of A. Wedenote it by 1A:

1A(ω) =

1, if ω ∈ A0, if ω 6∈ A.

The indicator function 1A of a set is a measurable function if and only if A is a measurable set.

Example 4.2 If f : E → R takes only a finite number of values a1, . . . , ak. Then, f is measurableif and only if f = ai is measurable. Also, f can be written in the following form: f =

∑ki=1 ai1Aj .

(Just take Aj = x : f(x) = aj).

39

4.1. EXAMPLES 40

4.1.1 Simple functions

A simple functions is a measurable function taking on a finite number of distinct values. It has rep-resentation as linear combinations of indicator functions of measurable sets. Representations are notnecessarily unique. For example take the domain space to be R, let A1 = [1, 2], A2 = (2, 6], andA3 = (−∞, 1) ∪ (4,∞). Then, f = 21A1 + 31A2 + 01A3 . We often do not include the last term withzero value, but sometimes it is included for the convenience that A1 ∪A2 ∪A3 = R. The representationis not unique,

f = 21A1∩Q + 31A2∩Q + 21A1∩Qc + 31A2∩Qc .

Also,

f = 21([1,3] + 1(2,3] + 31(3,6].

let us now give one popular definition for simple functions.

Definition 4.1.1 Let (X ,F) be a measurable space. A function of the form f(x) =∑N

i=1 ai1Ai(x)

where ai ∈ R and Ai ∈ Fi is said to be a simple function.

Proposition 4.1.1 If Ai are measurable sets. Then f =∑∞

i=1 ai1Ai is measurable.

Proof If B is a Borel set,

f−1(A) = ∪i:ai∈BAi.

This is a countable union and belongs to F if every Ai ∈ F .

Remark 4.1.2 Suppose that f =∑∞

i=1 ai1Ai is measurable. If the numbers ai are distinct, and Aiare disjoint sets, thenAi is necessarily measurable. Indeed, f−1(ai) = Ai. So for f to be measurable,Ai must belong to F .

If a1 = a2, we can conclude f−1(a1) = A1 ∪ A2 is measurable, but we may not have any furtherinformation to conclude the measurability of the sets A1 and A2.

Definition 4.1.2 Suppose that a simple function f takes the following non-zero values ai, i = 1, . . . , N.Then

f(x) =n∑i=1

ai1f−1(ai)

is said to be the canonical representation for f . Set Ai = f−1(ai), then Ai are pairwise disjoint mea-surable sets and

⋃Ni=1Ai = X .

4.1. EXAMPLES 41

Exercise 4.1.1 Write the following function in the form∑∞

i=1 ai1Ai where Ai are pairwise disjoint.

f(x) =

2, x ∈ A1 = (−2π,−π]

3, x ∈ A2 = [1, 4) ∪ (7, 8)

1

2, x ∈ A3 = (4, 6]

5, x ∈ A4 = (9,∞) ∩Q0, otherwise

Solution: The canonical representation is

f(x) = 21A1 + 31A2 +1

21A3 + 51A4 .

We see that A1, A2, A3, A4, (∪4i=1Ai)

c is a partition of X .

Example 4.3 A step function f : R → R is a piecewise constant function. Steps functions with a finitenumber of steps are simple functions.

Theorem 4.1.3 Let (X ,G) be a measurable space. Then,

• Every measurable function f : X → R is the pointwise limit of simple functions:

f(x) = limn→∞

fn(x),

where fn ∈ E(G). Furthermore we can arrange so that |fn| ≤ |f |.

( If f is bounded, the convergence is uniform.)

• If fn ≥ 0 we may choose fn to be non-negative and increasing so f = supn≥1 fn.

Proof Step 1. We prove the second claim first. Let us consider dyadic partitions of the interval [0, n].Let us cut each length 1 interval into 2n pieces of equal length (this is called a dyadic partition). Thedyadic partitions have an advantage: any partition points from current level contains all partition pointsfrom the previous levels.

Let us define the measurable sets:

Anj =

x :

j

2n< f(x) ≤ j + 1

2n

, 0 ≤ j ≤ 22n − 1.

We approximate f with the value j2n on Anj and define

gn =

n(22n−1)∑j=0

j

2n1Anj .

4.1. EXAMPLES 42

Then on x : 0 ≤ f(x) < n, |fn(x)− f(x)| ≤ 12n , so

limn→∞

fn(x) = f(x).

Also fn increases with n and fn(x) ≤ f(x) for every x.

Step 2. For the first claim, we write f = f+ − f− where

f+ = max(f, 0) f− = max(−f, 0).

Let (f+)n be an non-decreasing sequence of simple functions with limit f+. Similarly, Let (f−)n be annon-decreasing sequence of simple functions with limit f−. Set

gn = (f+)n − (f−)n.

Then|gn(x)| = (f+)n(x) + (f−)n(x) ≤ f+(x) + f−(x) = |f(x)|.

For every x ∈ f−1([−n, n]),

|f(x)− gn(x)| = |(f(x)− (f+)n + (f−)n)(x)|≤ |f+(x)− (f+)n(x)|+ |f−(x)− (f−)n(x)|

≤ 2

2n.

This means that fro every x, limn→∞ gn(x)→ f(x). If |f | ≤M ≤ n0 is bounded, then for n ≥ n0,

|f(x)− gn(x)| ≤ 2

2n,

showing the uniform convergence.

4.1.2 Product σ-algebras

To understand the algebra of measurable functions, we introduce product σ-algebras.

Definition 4.1.3 Let (X ,F) and (Y,G) be two measurable spaces. We define the product σ-algebra(also called tensor σ-algebra) on the product space X × Y to be that generated by product sets

F1 ⊗F2 = σA×B : A ∈ F , B ∈ G.

This definition extends to any number of measurable spaces. We give the definition for product of acountable number of measurable spaces.

4.2. PROPERTIES OF MEASURABLE FUNCTIONS 43

Definition 4.1.4 Let (En,Fn) be measurable spaces. The tensor or product σ-algebra on the productspace Π∞n=1En is

σ

ΠnAn, : An ∈ Fn.

Remark 4.1.4 If X is a separable metric space, the metric topology satisfies the second axiom of count-ability, i.e. there exists a countable base. This countable base generates B(X). Let (Xα, α ∈ I) beseparable metric spaces. The product topology on XI is the coarsest topology such that the projectionsare continuous, it is generated by sets of the form Π−1

α (Aα) where α ∈ I and Aα are open sets of Xα.The coordinate mappings are measurable with respect to the Borel σ algebra on the product space and⊗αB(Xα) ⊂ B(Πα∈IXα).

Theorem 4.1.5 Let (Xn) be separable metric spaces and set X = Π∞i=1Xi. Then

B(X) = ⊗∞n=1B(Xn).

For a proof see Theorem 1.10 in the book by Parthasarathy.

4.2 Properties of measurable functions

Proposition 4.2.1 Letf1 : (X ,F) → (X1,F1) and f2 : (X ,F) → (X2,F2) be measurable functions.Then

ψ = (f1, f2) : X → X1 ×X2

is measurable, when the target space is given by the product σ algebra F1 ⊗F2.

Proof Apply Proposition 4.2.3, it is sufficient to know that h−1(A × B) where A,∈ F1 and B ∈ F2 ismeasurable. But,

ψ−1(A×B) = f−11 (A) ∩ f−1

2 (B) ∈ F ,


Proposition 4.2.2 If f : (X1,F1) → (X2,F2) and g : (X2,F2) → (X3,F3) are measurable functionsthen so f g : X1 → X3 is measurable.

Proof Let A ∈ F3 then f−1(A) ∈ F2. Furthermore,

(f g)−1(A) = x ∈ X1 : f g ∈ A = x ∈ X1 : g ∈ f−1(A) ∈ F1

using g is measurable.

Proposition 4.2.3 Let (Ω,F) and (X ,B) be measurable spaces. Suppose that B is generated by C andf : Ω→ X . If f−1(A) ∈ F whenever A ∈ C, then f is measurable.


Proof B′ = A ∈ B : f−1(A) ∈ F ⊃ C is a σ-algebra, check with Lemma 2.1.1, it must agree withB = σ(C).

Recall that If X,Y are metric spaces and if f : X → Y is a continuous, then f is Borel measurable.(Since continuity means that the pre-image of an open set is an open set. Since B(Y ) is generated byopen sets which forms a π-system

C := A ∈ B(Y ) : f−1(A) ∈ B(X)

contains all open sets, C = B(Y ).)

4.2.1 Measurable functions on the real line

Definition 4.2.1 If f : Ω → R (or talking values in a complete separable metric space) with the Borelσ-algebra, and (Ω,F ,P) a probability space, we tend to call f a random variable.

The Borel σ-algebra of R is generated by the set of open intervals, equally they are generated by theset of all closed intervals. We state some of these below, they follow straightforwardly from Proposition4.2.3.

Proposition 4.2.4 Let (X,B) be a measurable space. A function f : X → R is Borel measurable if oneof the following conditions hold:

1. for each real number a, x : f(x) > a is a measurable set;

2. for each real number a, x : f(x) < a is a measurable set;

3. for each real number a, x : f(x) ≥ a is a measurable set;

4. for each real number a, x : f(x) ≤ a is a measurable set;

For the first just note that sets of the form (a,∞) generates the Borel σ-algebra, apply Proposition 4.2.3to. conclude. Similarly the other statements chan be shown.

Proposition 4.2.5 Let (X,F) be a measurable space. If f, g : X → R are measurable functions, thenso are the following functions.

1. f + g

2. fg,

3. max(f, g)


4. f+ = max(f, 0), f− = max(−f, 0)

5. |f | = f+ + f−.

Proof Firstly (f, g) : X → R2 is Borel measurable. Set ψ : X → R2 by

ψ(x) = (f(x), g(x))

and define ϕ : R2 → R by ϕ(x, y) = x+ y, a continuous function. Then f + g = ϕ ψ is a compositionof measurable function, so measurable. Choosing ϕ(x, y) = xy, for the analogous proof for fg. Nowfor any a > 0, max(f, g) < a = f < a ∩ g < a ∈ F , so max(f, g) is measurable. Part 4 followsfrom part 3, and 5 follows part 1 and part 4.

Note max(f, g) = (f+g)+|f−g|2 .

Proposition 4.2.6 Let (X,B) be a measurable space. If fn : X → R be measurable functions, so arethe following:

1. supn≥1 fn,

2. infn≥1 fn;

3. lim sup fn,

4. lim inf fn.

Proof Firstlysupnfn > a = ∪nfn ≥ a,

belongs to F , as each fn ≥ a does (the above is x : infn≥1 fn(x) < a = ∪nx : fn(x) < a), sosupn≥1 fn is measurable. Also,

infn≥1

fn < a = ∩nfn < a ∈ F ,

showing that infn≥1 fn is measurable. Since

lim supn→∞

fn = infn

supk≥n

fk, lim inf fn. = supn

supk≥n

fk,

they are both measurable.

Proposition 4.2.7 If fn is a sequence of Borel measurable functions with f = limn→∞ fn(x) exists atevery x, then f is a Borel measurable function.


Example 4.4 If (X ,F) and (Y,G) are measurable spaces. Suppose that µ is a measure. Let f : X → Yand g : X → Y . Let A = f 6= g.

1. Suppose both f and g are measurable, then f − g is a measurable function, consequently, A =

f − g 6= 0 is a measurable set.

2. Suppose that F is complete with respect to µ. Suppose A is contained in a null measurable set.Since F is complete, then A is measurable and µ(A) = 0. Then f is measurable implies that g ismeasurable. Indeed, for any B ∈ G,

g ∈ B = (f ∈ B ∩Ac)⋃

(g ∈ B ∩A) ∈ F .

We use that (g ∈ B ∩ A) is contained in a null set and therefore measurable, Ac and f ∈ Bafre measurable.

Definition 4.2.2 If f = g on a set of full measure, we say f = g almost surely. This is abbreviated byf = g a.s. or f = g a.e.

Example 4.5 (This is not intended to be delivered in class) If fn is a sequence of Borel measurablefunctions. Set

f(x) =

limn→∞

fn(x), if the limit exists at x,

0, otherwise

Then f is measurable.

Proof SetU := x : lim

n→∞fn(x) exists = x : lim sup

n→∞fn(x) = lim inf

n→∞fn(x)

Since lim supn→∞ fn(x) and lim infn→∞ fn(x) are both measurable, U is a measurable set. Define

Z := (lim inf fn, lim sup fn).

( Let ∆ = (b, b) : b ∈ R denote the the diagonal subset of R2, then U = Z−1(∆).)

Note that f(x) = 0 on U c. Then for any a < 0,

x : limn→∞

fn(x) < a = x : lim supn→∞

fn(x) < a,

which is measurable since lim supn→∞ fn is measurable. Let a ≥ 0, then

x : limn→∞

fn(x) < a = x : lim supn→∞

fn(x) < a ∪ U,

which is also a measurable set.

In analysis we frequently encounter Lebesgue measurable sets and functions, this refers to the σ-algebra

4.3. FACTORISATION LEMMA 47

4.3 Factorisation Lemma

Lemma 4.3.1 (The factorization Lemma) Let (X ,G) be a measurable space and Ω a set. Let Y :

Ω → X be a measurable function and consider the measurable space (Ω, σ(Y )). Then X : Ω → R isσ(Y )-measurable if and only if there exists a measurable function f : Ω→ R such that X = f Y .

Proof ( not to be delivered in class) Recall that σ(Y ) = Y −1(B) : B ∈ G). Suppose that f : X → Ris measurable, since Y is measurable with respect to σ(Y )/G, then the composition of the measurablefunctions f Y is measurable.

For the converse, suppose that X is σ(Y )-measurable. We first assume X = 1A where A ∈ σ(Y ).Then A = Y −1(B) for some B ∈ G and

1B Y = 1ω:Y (ω)∈B = X.

So X is the composition of the measurable function 1B with Y .

Suppose that X takes only a countable number of values (an) and write An = X−1(an). Since Xis σ(Y )-measurable, there exist sets Bn ∈ G such that Y −1(Bn) = An. Define now the sets

Cn = Bn \⋃p<n

Bp.

These sets are disjoint and one has again Y −1(Cn) = An. Setting f(x) = an for x ∈ Cn and f(x) = 0

for x ∈ X \⋃nCn, we see that f has the required property.

Suppose that X ≥ 0, we may approximate it with an increasing sequence of simple functions Xn.By the paragraph above, there exists gn : X → R simple functions such that Xn = gn(Y ). In fact, gnis an increasing sequence and let g = supn gn. Since g is a pointwise limit of measurable functions, g isalso measurable. We note g(Y (ω)) = supn gn(Y (ω)) = supnXn(ω)) = X(ω). We complete the proofwith X = X+ −X−.

4.4 Appendix*

4.4.1 Littlewood’s three principles

Every Borel measurable set on R is nearly a finite union of intervals. Every (measurable) function isnearly continuous, every convergent sequence of measurable functions is nearly uniform convergent.

4.4. APPENDIX* 48

4.4.2 Product σ-algebras

Let I be an arbitrary index and let (Eα,Fα), α ∈ I be a family of measurable spaces. The tensorσ-algebra, also called the product σ-algebra, on E = Πα∈IEα is defined to be

⊗α∈IFα = σπ−1α (Aα) : Aα ∈ Fα, α ∈ I.

Here πα : E = Πα∈IEα → Eα is the projection (also called the coordinate map). If x = (xα, α ∈ I)

is an element of E, πα(x) = xα. The tensor σ-algebra is the smallest one such that for all α ∈ I , themapping

πα : (E,⊗α∈IFα)→ (Eα,Fα)

is measurable.

Let πi : Rn → R be the projection to the ith component. The tensor σ-algebra ⊗nB(R) is

⊗nB(R) = σπ−1i (A) : A ∈ B(R).

For example π−11 (A) = A× R× · · · × R.

4.4.3 The Borel σ-algebra on the Wiener Space

The set of all maps from [0, 1] to Rd can be denoted as the product space (Rd)[0,1]. The tensor σ-algebra⊗[0,1]B(Rd) is the smallest space such that the projections πt where πt(σ) = σ(t) is measurable. Theproduct topology is the smallest topology such that the projections are continuous. Hence the Borelσ-algebra of the product topology is larger than the tensor σ-algebra.

Let W d = C([0, T ],Rd) be the space of continuous functions from [0, 1] to Rn. It is a Banach spacewith the supremum norm:

‖g‖W := supt∈[0,1]

‖g(t)‖.

Any measurable set in the tensor σ-algebra is determined by the a countable number of projections. Theaction to determine whether a function is continuous or not cannot be determined by a countable numberof operations. Hence W d is not a measurable set in the tensor σ-algebra. (once we know a function iscontinuous, the function can be determined by its values on a countable dense set.)

Example 4.6 A sample continuous real valued stochastic process (Xt, 0 ≤ t ≤ 1) on a probability space(Ω,F , P ) can be considered as a function, denoted by X , from (Ω,F) to W :

X(ω)(t) = Xt(ω).

For each time t, and a ∈ Rn, ω : |Xt(ω)− a| < ε belongs to F as Xt is measurable.

4.4. APPENDIX* 49

We show that the function X : Ω → W is Borel measurable. Take f ∈ W and the open ball in theBanach space W centred at f with radius ε > 0. Its pre-image by X is:

ω : sup0≤t≤1

|Xt(ω)− fti | < ε = ω : sup0≤ti≤1,ti∈Q

|Xti(ω)− fti | < ε

= ∩0≤ti≤1,ti∈Qω : |Xti(ω)− fti | < ε.

Since for each i, ω : |Xti(ω) − fti | < ε ∈ F , the intersection of the countable number of such setsbelongs to F .

Chapter 5

Integration

5.1 Integration

Throughout this chapter (X ,F , µ) is a measure space with a σ-finite measure µ. (Previously I assumethe measure is complete, this assumption is not needed for defining integration.)

By the integral of a Borel measurable function f : X → R, we mean a value assigned to this function.Let us call this value I(f). So I : H → R is a function on a set of functionsH. For I to qualify to be anintegral it should have some properties which we describe below.

Property 1 (IP) Let H be a class of Borel measurable functions f : X → R. Suppose that to everyf ∈ H, we assign a number or∞ which we denote by I(f). We say I satisfies the IP if the followingholds:

1. (Linearity) If f, g ∈ H and a, b ∈ R, we have

I(af + bg) = aI(f) + bI(g).

2. (Monotonicity) If f ≤ g then I(f) ≤ I(g)

Notation. We use∫f for the integral of f . To emphasize the measure used we also use

∫fdµ for

the same integral. Sometime we emphasize the domain of integration as follows∫X fdµ. Sometimes we

explicitly express the argument of integration by a dummy variable:∫f(x)µ(dx), this is also written as∫

f(x)dµ(x). Also if A is a measurable set, we set∫A f =

∫f1A etc.

51

5.2. OUTLINE OF THE CONSTRUCTION 52

5.2 Outline of the construction

The procedure for defining the integral of a function f : X → R w.r.t. a measure µ is as follows.

1. If f =∑N

i=1 ai1Ai is a simple function (the assumption that Ai are measurable is included in thedefinition), we define ∫

Xf dµ =

N∑i=1

ai µ(Ai).

2. If f is a positive function we define ∫Xf dµ = sup

g

∫g dµ

where the supremum is taken over simple functions g with 0 ≤ g ≤ f . We say f is integrable if∫f dµ <∞.

3. Let f : X → R be a measurable function. Let f+ = max(f(x), 0) and f−(x) = max(−f(x), 0)

be the positive and negative parts of f , then f = f+ − f−. If both f+ and f− have finite integralswe say f is integrable and define∫

Xf dµ =

∫Xf+ dµ−

∫Xf− dµ.

The set of integrable functions is denoted by L1. Observe that in this case∫|f |dµ =

∫f+dµ+

∫f−dµ.

5.3 Integral of simple functions

Let us denote by E the set of measurable functions

E =

n∑i=1

ci1Ai : Ai ∈ F , ci ∈ R, n = 1, 2, . . .

.

Elements of E are called simple functions. We also denote by E+ the set of positive simple functions:

E+ =

n∑i=1

ci1Ai : Ai ∈ F , ci > 0, n = 1, 2, . . .

.

A simple function can have more than one representations. Take for example

f(x) =

1, x ∈ [0, 1]

1, x ∈ (2, 3]

5.3. INTEGRAL OF SIMPLE FUNCTIONS 53

Thenf(x) = 1[0,1](x) + 1[2,3](x) = 1[0,1)(x) + 11(x) + 1[2,3](x).

There exists a unique canonical representation, up to re-arranging the orders of the sets. By the canonicalform we mean

f =

n∑i=1

ai1f−1(ai),

where ai are the set of non-zero values of f (they are of course distinct numbers), this is unique up tore-ordering.

Exercise 5.3.1 A simple functions is a measurable function taking only a finite number of values, ameasurable function taking only a finite number of values is a simple function.

Proof If f is a measurable function taking only a finite number of non-zero values c1, . . . , cn, then itcan be expressed in the form of

∑ni=1 ci1f−1(ci) and f−1(ci) is measurable.

We show the converse: f =∑n

i=1 ai1Ai only takes a finite number of values.

Remark 5.3.1 An easier way to show that f takes only a finitely many values is as below: f is a linearcombination of n indicator functions. Since an indicator function takes only value 0 or 1, such a functionthus can take at most 2n values.

We next show this in at hard way, the proof itself has some value. First suppose that

f = a11A1 + a21A2 .

SetB1 = A1 \A2, B2 = A2 \A1, B3 = A1 ∩A2.

Thenf = a11B1 + a21B2 + (a1 + a2)1B3 ,

so f takes at most the following values: 0, a1, a2, a1 + a2.

We now prove by induction to show that every f of the form∑n

i=1 ai1Ai can be written in the form

f =

m∑i=j

bj1Bj ,

where Bj are disjoint. We have shown this holds for n ≤ 2. Suppose this holds for k ≤ n. Then,

f =

n+1∑i=1

ai1Ai =

m∑i=j

bj1Bj + an+11An+1 ,


where Bi are disjoint. SetCj = Bj \An+1, j = 1, . . . ,m,

Cm+j = Bj ∩An+1, C2m+1 = An+1 \ (n⋃j=1

Bm).

Then Cj are disjojnt sets and

f =m∑j=1

bj1Cj + an+11C2m+1 +m∑k=1

(bk + an+1)1Bk∩An+1 .

Thus f can be written as a finite linear combination of indicator functions of disjoint sets, among otherthings this also shows that f takes a finite number of values.

Exercise 5.3.2 1. If f, g are simple functions, so are f + g and fg. In particular E is a vector space.

2. If f is a simple function, f+, f− are both simple functions.

Proof Let us denote by ai the distinct values of f and bj the distinct values for g. We write

f =n∑i=1

ai1f−1(ai), g =m∑j=1

bj1f−1(bj),

SetEi = f−1(ai), Fj = g−1(bj).

Then (Ei ∩ Fj) ∩ (Ek ∩ Fl) = φ if (i, j) 6= (k, l). It is clear that

f + g =∑

1≤i≤n,1≤j≤m(ai + bj)1Ei∩Fj.

showing f + g is a simple function. The proof for fg is similar. It is clear that f+ = max(f, 0),f− = min(f, 0) are also simple functions. They are obtained by flipping ai to max(ai, 0) or max(−ai, 0).

Definition 5.3.1 Let f =∑n

i=1 ai1Ai , in the canonical representation. Suppose that µ(Ai) < ∞ forevery i, we then set

I(f) =n∑i=1

aiµ(Ai).

If in addition f ∈ E+, then ai > 0, we may remove the assumption µ(Ai) = ∞, I(f) is still definednon-ambiguously.

Of course I(f) takes the value ∞ if and only if one of the Ai’s has infinite measure. Observe that iff = 0 almost surely then I(f) = 0.


Lemma 5.3.2 If f ∈ E is represented as

f =M∑j=1

dj1Bj

where Bi are pairwise disjoint, then

I(f) =

M∑i=1

diµ(Bi).

Proof Observe that ⋃j:dj=ai

Bj = Ai.

Also µ(Ai) =∑j:dj=ai µ(Bj) by additive property of measures. Hence

I(f) =

n∑i=1

aiµ(Ai) =

n∑i=1

ai∑

j:dj=ai

µ(Bj)

=M∑j=1

djµ(Bj),

completing the proof. (It is clear if µ(Ai) =∞ then one of the µ(Bj) =∞.)

We show next that I has the IP.

Proposition 5.3.3 Let f, g ∈ E , vanishing outside of a set of finite measure. Then

1. for any a, b ∈ R,I(af + bg) = aI(f) + bI(g).

2. If f ≤ g a.e., then I(f) ≤ I(g).

Proof Let Ai ( respectively Bi) be the sets in the canonical representation for f (respectively forg). Let A0 = f−1(0) and B0 = f−1(0). Then, Ai ∩ Bj forms a finite collection of disjoint measurablesets, denote this collection by Ei : i ≤ N. Then

f =

N∑i=1

ai1Ei , g =

N∑i=1

bi1Ei

and

af + bg =

N∑i=1

(a ai + b bi)1Ei .

5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 56

Use Lemma 5.3.2, we see that

I(af + bg) =

N∑i=1

(a ai + b bi)µ(Ei) = aI(f) + bI(g).

Next if f ≤ g a.e. then f > g is a measurable set (by comparing their values on Ei as above), and

I(f)− I(g) = I(f − g) ≥ 0.

This proved the monotonicity.

Example 5.1 The Dirichlet function f can be written as f = 1Qc . Its integral w.r.t. the Lebesguemeasure on [0, 1] is λ(Qc ∩ [0, 1]) = 1.

Example 5.2 Let a, b be real numbers. Consider the measure space ([a, b],B([a, b]). Let µ be the theLebesgue measure on [a, b]. It is determined by its value on sets of the form A = ∪ni=1(ci, di) (union ofdisjoint intervals),

µ(A) =n∑i

(di − ci).

Consider a = t0 < t1 < · · · < tn = b. Let

f(x) = f010(x) +

n−1∑j=0

aj1(tj ,tj+1](x).

Since µ(0) = 0, ∫ b

af(x)dx =

n−1∑j=0

aj(tj+1 − tj).

5.4 Integration of non-negative functions

We will allow a non-negative function to take values in the extended real line. We say f is measurableif A = f < ∞ is measurable and f−1(A) ∈ B(R) for any A ∈ B(R). The principle is as follows: iff = ∞ on a set of non-zero measure, we should assign∞ to its integral; if f < ∞ almost surely, weignore those values that are∞.

Definition 5.4.1 If f : X → [0,∞] is a measurable function, we define the integral of f to be∫fdµ = supI(g) : 0 ≤ g ≤ f, g ∈ E.

We say f is integrable if∫fdµ <∞.


Remark 5.4.1 If µ is a finite measure, then a bounded measurable function is integrable. A special caseis when µ is the Lebesgue measure restricted to any bounded interval.

Definition 5.4.2 If A is a measurable set and f1A is integrable, we define∫Af dµ =

∫f 1A dµ.

Proposition 5.4.2 1. If f ∈ E+ then I(f) =∫fdµ.

2. the integral satisfies monotonicity.

Proof Firstly, let f ∈ E+. We may take g = f ∈ E+, so

supI(g) : g ≤ f, g ∈ E+ ≥ I(f).

Next, by monotonicity for I , if g ∈ E and g ≤ f , then I(g) ≤ I(f). Hence

supI(g) : g ≤ f, g ∈ E+ ≤ I(f).

This proves that I(f) =∫f dµ for f ∈ E+. Now if f1 ≤ f2 are non-negative and measurable, then∫

f1 dµ = supI(g) : g ≤ f1, g ∈ E+ ≤ supI(g) : g ≤ f2, g ∈ E+ =

∫f2 dµ

proving the monotonicity.

We cannot directly prove the linearity and try to obtain this from that of simple functions by takinglimits.

Definition 5.4.3 We say fn → f a.e. if there exists a null set Asuch that fn → f everywhere outside ofthe null set A.

Theorem 5.4.3 (Fatou’s lemma) Let fn be a sequence of non-negative measurable functions that con-verges almost surely on a set E to a function f , then∫

fdµ ≤ lim infn→∞

∫fndµ.

Proof By the definition we may assume that fn → f everywhere. It is sufficient to show that for anyg ∈ E+ with g ≤ f , ∫

g dµ ≤ lim infn→∞

∫fn dµ.

Step 1. Suppose that∫g <∞, set

X0 = x : g(x) > 0.


Since g takes only a finite number of values µ(X0) <∞ andM = max g <∞ ( We only need to restrictto X0 as

∫X0g =

∫X g.)

Fix ε > 0. Since g ≤ f = limn→∞ fn, for any x there exists N(x, ε) s.t. for n > N(x, ε),

fn(x) ≥ (1− ε)g(x).

LetAk = x ∈ X0 : fn(x) > (1− ε)g(x), ∀n ≥ k.

Then An is an increasing sequence and ∪nAn = X0. By the continuity property of measures,

limn→∞

µ(X0 \An) = 0.

Then for n ≥ k,

(1− ε)∫Ak

g ≤∫Ak

fn ≤∫fn.

Also,

(1− ε)∫Ak

g = (1− ε)∫g − (1− ε)

∫Ack

g.

Hence

(1− ε)∫g − (1− ε)

∫Ack

g ≤∫Ak

fn.

Since g ≤M , we may take ε to 0, ∫g −

∫Ack

g ≤ lim infn→∞

∫fn,

for every n. Take k →∞ to conclude∫g ≤ lim infn→∞

∫fn.

Step 2. Suppose that∫g =∞, since g take only a finite number of values, and g ≥ 0, it is necessary

that there exists a set of infinite measure on which g takes a positive value a > 0. Denote this set by A.Set

Ak = x : fn(x) ≥ 1

2a, ∀n ≥ k.

Then An is increasing. Since limn→∞ fn ≥ g, we have ∪nAn = A and limn→∞ µ(An) = µ(A) = ∞.For all n ≥ k, ∫

fn ≥∫

1

2a1Ak =

1

2aµ(Ak).

Thus,

lim infn→∞

∫fn ≥

1

2aµ(Ak).

Take k →∞ lim infn→∞ fn ≥ ∞, this completes the proof.


Theorem 5.4.4 (Monotone convergence theorem) If fn is a sequence of non-negative measurable func-tions converging almost everywhere to f . Suppose that fn ≤ f for all n. Then∫

limnfn = lim

n

∫fn.

Proof If 0 ≤ g ≤ fn then 0 ≤ g ≤ f , hence for every n,∫fn ≤

∫f .

lim supn

∫fn ≤

∫f.

By Fatou’s lemma,

lim inf∫fn ≥

∫limnfn =

∫f,


The definition of integrals for non-negative functions using the value supI(g) : 0 ≤ g ≤ f, g ∈ Eis not easy to use. The monotone convergence theorem allow us to use a sequence. Choose fn ∈ E withfn ↑ f (this exists by Theorem 4.1.3), then∫

fdµ = limn→∞

∫fn.

The name ‘Monotone convergence theorem’ comes from the following Corollary.

Corollary 5.4.5 If 0 ≤ f1 ≤ f2 ≤ . . . is a sequence of increasing real valued measurable functions,and let f = supn fn. Then f is integrable if and only if supn

∫fn <∞, and then∫

supnfn = sup

n

∫fn.

This means in particular that

Corollary 5.4.6 If fn is a sequence of increasing simple functions with limit f , then∫f = lim

n→∞

∫fn.

Proposition 5.4.7 If f, g are non-negative and measurable, then the following holds:(1)[Linearity] for any a, b ≥ 0, ∫

(af + bg) = a

∫f + b

∫g.

(2) [Monotonicity] If f ≥ g then∫f ≥

∫g.

(3)∫f = 0 if and only if f = 0 a.e.


Proof (1) Note that linearity holds on E . Now we approximate f, g with positive simple increasingsequence of functions. Take fn ∈ E+, gn ∈ E+ such that

fn ↑ f, gn ↑ g.

with fn ≤ f and gn ≤ g. Then∫(afn + bgn)dµ =

∫fndµ+

∫bgndµ,

We apply the monotone convergence theorem to take the limit inside the integrals, proving the linearity.

(2) It is sufficient to show f ≥ 0 a.e. then∫f ≥ 0. This is true as the simple functions approximating

f is non-negative.

(3) Let f ≥ 0 a.e. If f = 0 a.e. it is clear that∫fdµ = 0 (e.g. use monotonicity.)

Suppose that∫f = 0. Let An = f ≥ 1

n. Then f > 0 = ∪∞n=1An.

µ(An) =

∫1Andµ ≤

∫1Annfdµ ≤ n

∫fdµ = 0.

Thus

µ(f > 0 ≤∞∑n=1

µ(An) = 0,

concluding the proof.

These shows that we can exchange summation (over a finite number of functions) with integration.For a sum of non-negative functions, this holds also with infinitely many summands, as to be explainedbelow.

Corollary 5.4.8 (Exchange of order of summation and integration) Suppose that fn is a sequence ofnon-negative measurable functions, then∫ ∞∑

n=1

fn =∞∑n=1

∫fn.

Proof By Proposition 5.4.7, finite sum can be taken out of the integral. Fatou’s lemma allow us to takethe limit. ∫ ∞∑

n=1

fn =

∫limN→∞

N∑n=1

fnMonotone

= limN→∞

∫ N∑n=1

fn

linearity= lim

N→∞

N∑n=1

∫fn =

∞∑n=1

∫fn,

concluding the proof.

5.5. INTEGRABLE FUNCTIONS 61

5.5 Integrable functions

Definition 5.5.1 Let f : X → R be measurable. If both f+ and f− are integrable we say that f isintegrable, in which case we define its integral to be:∫

f =

∫f+ −

∫f−.

Note that f is integrable if and only if |f | is integrable. A bounded measurable function is integrablewith respect to a finite measure.

Definition 5.5.2 We denote by the class of integrable functions by L1. The notation L1(X ) is used toemphasize the space. Also L1(µ), L1(X ;µ) are used to emphasize the measure.

Proposition 5.5.1 If f, g are measurable functions. Then

(1) (linearity) For any a, b ∈ R, ∫(af + bg) = a

∫f + b

∫g.

(2) (Monotonicity) If g ≤ f a.e., then∫g ≤

∫f .

(3)∫|f | = 0 if and only if f = 0 almost surely.

Proof Part one follows from splitting f = f+ − f− and g = g+ − g− and the same property fornon-negative functions. If g ≤ f then g − f ≥ 0, so

∫(g − f) ≥ 0. Monotonicity now follows from the

linearity. The last one follows from Proposition 5.4.7 and use the property |f | ≥ 0.

Exercise 5.5.1 If f, g are real valued integrable functions, show the following statements hold:

1. If µ(A) = 0 then∫A f dµ = 0.

2. If∫A f = 0 for every measurable set A then f = 0 µ almost-everywhere.

3. |∫f | ≤

∫|f |

Hint: For part (2), use Proposition 5.4.7 and choose a suitable A.

Example 5.3 Recall that for x0 ∈ X , the Dirac measure at x0 is defined by:

δx0(A) =

0, if x0 ∈ A1, if x0 6∈ A

= 1A(x0).

Let f : X → R be measurable. Then f is integrable and∫X f dδx0 = f(x0).

5.6. FURTHER LIMIT THEOREMS 62

Example 5.4 Let p : X → [0,∞) be integrable. Set

µp(A) :=

∫Ap(x)µ(dx).

Then µp defines a measure.

Proof Firstly µp(φ) =∫φ p µ = 0. Next if A ∩B = φ, 1A∩B = 1A + 1B , So

µp(A ∪B) =

∫(1A + 1B)pdµ = µp(A) + µp(B).

Finally, if An is an increasing sequence of measurable sets with union A,

limn→∞

1An = 1∪∞n=1An= 1A,

showing that

µp(A) =

∫1Ap dµ =

∫limn→∞

1An pdµ = limn→∞

∫1Anp dµ = lim

n→∞µp(An).

Hence the σ-additive property holds.

Questions.Let x0 ∈ R. Can one find a function p : R→ R such that, for all A ∈ B(R), δx0(A) =

∫A p dλ?

Can one find a function p : R→ R such that, for all A ∈ B(R), λ(A) =∫A p dδx0?

Exercise 5.5.2 The answer are no and no. Prove it.

Two measures µ and ν on Ω are mutually singular if there are disjoint sets A1 and A2 such thatΩ = A1

⋃A2 with µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν. The Lebesgue measure and

any Dirac measures δx are singular.

Definition 5.5.3 We say that µ is absolutely continuous with respect to ν if whenever ν(A) = 0, wehave µ(A) = 0. We write µ ν.

Later we will see that if µ and ν are two finite measures such that µ ν then we can found a a densityfunction p such that µ is given by integration of p with respect to ν. This is called the Radon-NikodymTheorem.

5.6 Further limit theorems

Theorem 5.6.1 (Fatou’s Lemma) Suppose that fn is a sequence of non-negative measurable functions,then ∫

lim infn→∞

fn dµ ≤ lim infn→∞

∫fn dµ.

5.6. FURTHER LIMIT THEOREMS 63

Proof Firstly,lim infn→∞

fn = limk→∞

infm≥k

fm.

Observe that infm≥k fm is an increasing sequence and infm≥k fm ≤ fk. So∫lim infn→∞

fn =

∫limk→∞

infm≥k

fm

= lim infk→∞

∫infm≥k

fm ≤ lim infk→∞

∫fk.

This completes the proof.

Theorem 5.6.2 (Dominated convergence theorem) Suppose that fn is a sequence of measurable func-tions converging to a function f almost everywhere. Suppose that there exists an integrable function gsuch that |fn| ≤ |g|, then

(1)

limn→∞

∫fn dµ =

∫limn→∞

fn dµ.

(2)

limn→∞

∫|fn − f | dµ = 0.

Proof Since |fn| ≤ g and g is integrable then so is fn, also f+n ≤ |g| and f−n ≤ |g|. By the monotone

convergence theorem,

limn→∞

∫f+n dµ =

∫limn→∞

f+n dµ, lim

n→∞

∫f−n dµ =

∫limn→∞

f−n dµ,

and proving (1). For (2) note that |fn − f | → 0 a.s. and

|fn − f | ≤ 2|f |.

So by part (1),

limn→∞

∫|fn − f | dµ =

∫limn→∞

|fn − f |dµ = 0.

Exercise 5.6.1 Prove part (2) of the dominated convergence theorem.

Definition 5.6.1 A sequence of measurable functions fn is said to converge to f in L1 if

limn→∞

∫|fn − f | dµ = 0.

5.7. EXAMPLES AND EXERCISES 64

Exercise 5.6.2 Show that if fn → f in L1 then,

limn→∞

∫fn dµ =

∫f dµ.

Definition 5.6.2 1. A measurable functions with the property∫|f |p <∞ are said to be in Lp where

1 ≤ p <∞.

2. If there exists a measurable function f such that∫|fn − f |pdµ → 0, then we say fn converges to

f is Lp, in which case f is also in Lp.

5.7 Examples and exercises

Exercise 5.7.1 If a bounded function f : [a, b]→ R is Riemann integrable, then it is Lebesgue measur-able and Lebesgue integrable. Furthermore, the integrals have the common value:∫

[a,b]f(x)dµ =

∫ b

af(x)dx.

Hint. Check that the integrals agree on functions of the form:

n∑i=1

αi1(ci,di)

where (ci, di) are disjoint intervals.

Let ∆ be the dyadic partition of [a, b]: tj = (b−a)2n . Consider

fn(t) =∑j

supf(x) : x ∈ [ti, ti+1]1(ti,ti+1](t).

Then fn is measurable and converges to f a.e.

Exercise 5.7.2 1. Show that if for two measures µ and ν,∫fdµ =

∫fdν for all f ∈ E , then L1(µ) =

L1(ν), and∫fdµ =

∫fdν for all integrable functions.

2. LetE be a measurable set and µE the restriction of µ to the trace σ-algebraFE = E∩A : A ∈ F.Show that ∫

Efdµ =

∫XfdµE .

Also f ∈ L1(µE) if and only if f1E ∈ L1(µ).

5.7. EXAMPLES AND EXERCISES 65

(1) By the construction of integrals,∫fdµ =

∫fdν for all non-negative functions. Thus

∫|f |dµ =∫

|f |dν, which are either both finite or both infinite. This implies that L1(µ) = L1(ν) and the integralsare the same for they are defined by the integrals of their positive and negative parts.

(2) If g is elementary, so is g1E and∫g1Edµ =

∫gdµE for all elementary g ∈ E . Let fn be an

increasing sequence of simple functions converging to f , then fn1E ↑ f1E . Hence∫Efdµ = lim

n→∞

∫fn1E dµ = lim

n→∞

∫fn dµE =

∫fdµE .

Exercise 5.7.3 Suppose that p : [a, b]→ R is a non-negative continuous function. Set

F (x) = F (a) +

∫ x

ap(t)dt.

Let f : [a, b]→ R be a continuous function. Describe the Lebesgue-Stilejes measure µF and the integral∫f dµF

in terms of Riemann integral.

Sketch. Let A ⊂ B([a, b]). Then,

∫1A dµF = µF (A) =

∫ d

cp(t)dt =

∫[a,b]

1A p(t) dt.

By linearity for any simple function f ,∫fdµF =

∫ b

af(t)p(t)dt.

By Fatou’s lemma we can pass this to non-negative functions, and then by linearity to integrable func-tions: ∫

[a,b]fdµF =

∫ b

af(t)p(t)dt.

Example 5.5 Let fn = n1[0, 1n

] Then, fn → 0 everywhere except at 0.

∫fndλ =

∫ n

0dx = 1.

Hence we cannot exchange the order of taking limit with taking integration in this case.

5.8. THE MONOTONE CLASS THEOREM FOR FUNCTIONS 66

5.8 The Monotone Class Theorem for Functions

Proposition 5.8.1 (The Monotone Class Theorem for Functions) Let (Ω,F) be a measure space. LetC be a π system with σ(C) = F . Let H be a vector space of functions from Ω to R, with the followingproperty:

1. 1 ∈ H, and 1A ∈ H for every A ∈ C.

2. (Monotone class property) If fn ∈ H is an increasing sequence of non-negative functions withf(x) = supn f(x) finite for every x (resp. with f = supn f bounded), then f ∈ H.

ThenH contains the set of all real valued (resp. all bounded ) F-measurable functions.

Proof LetJ = B ∈ F : 1B ∈ H.

By assumption, 1 = 1Ω ∈ H, and J ⊃ C ∪ Ω. If A ⊂ B, A,B ∈ J then 1B\A = 1B − 1A ∈ H bythe vector space property of H. Thus B \ A ∈ J . Let An ∈ J be an increasing sequence of sets then bycondition 2,

1∪∞n=1An= lim

n→∞1An ∈ H.

Hence ∪∞n=1An ∈ J and J is a λ-system. It therefore contains F . This means all indictor functions areinH and simple functions are in J .

Suppose H is closed under monotone limit of functions with bounded limit. If f is bounded positiveand measurable there is an sequence of positive simple functions fn converging to f , thus f ∈ H. If f isnot positive let f = f+ − f− to conclude.

Suppose H is closed under monotone limit of functions with finite limit. If f is positive and measur-able there is an sequence of positive simple functions fn converging to f , thus f ∈ H. Again, if f is notpositive let f = f+ − f− to conclude.

Remark 5.8.2 From this we see a meta theorem (means the theorem is likely to hold, of course one needsto give a proof): If a property concerning integration of functions holds for all indicator functions, then itholds for all measurable functions (use Fatou’s lemma to obtain the monotone property) or it holds for allbounded measurable functions ( use dominated convergence theorem to obtain the monotone property).

5.8.1 Lebesgue Integration

In this section let X = R, F the completion of the Borel σ algebra, i.e. consists of Lebesgue measurablesets, and µ the Lebesgue measure λ. The restriction of λ to any measurable subset is also denoted by thesame letter.

5.9. THE PUSHED FORWARD MEASURE 67

A function f : R→ R is said to be Lebesgue measurable if it is measurable with respect to F/B(R).Borel measurable functions are of course Lebesgue measurable. In this section by a measurable functionwe refer to a Lebesgue measurable function.

Definition 5.8.1 Integrals with respect to a Lebesgue measure, are called Lebesgue integrals.

Exercise 5.8.1 Let I be a Lebesgue measurable set. Suppose that f : I → R is bounded.

1. If f is Lebesgue measurable, show that

inf∫

h dλ, h ≥ f, h ∈ E .

= sup∫

g dλ : g ≤ f, g ∈ E

(5.1)

2. Show that if (5.1) holds, then f is Lebesgue measurable.

5.9 The pushed forward measure

Suppose we are given two measurable spaces (X ,A) and (Y,B). Let µ be a measure on X and f : X →Y be a measurable function. Denote by f∗µ the pushed forward (induced) measure on (Y,B), i.e.

f∗(µ)(B) = µx : f(x) ∈ B.

The right hand side µ(f−1(B)), so we given B the measure of its pre-image.

Proposition 5.9.1 Let ϕ : Y → R a measurable function, we have∫Xϕ f dµ =

∫Yϕ d(f∗µ).

This is in the sense that ϕ is integrable with respect to f∗µ if and only if ϕ f is integrable with respectto µ.

Proof This holds for indicator functions of measurable sets by the definition of pushed forward measures.Apply monotone convergence theorem on both sides to see that those functions with the desired propertyis a monotone class. Apply the monotone class theorem to conclude.

Let (Ω,F ,P) be a probability measure. Let (S,G) be a measurable space. A measurable functionfrom X : Ω → S is called a random variable, S is called its state space. If ϕ : S → R and X : Ω → S

are measurable, then

E[ϕ(X)] :=

∫Ωϕ X dP =

∫Sϕ dX∗(P).

Exercise 5.9.1 If both Y , Y ′ are real valued integrable functions on (Ω,F ,P) with∫A Y =

∫A Y

′ forevery measurable set A, then Y = Y ′ almost surely.

5.10. APPENDIX: RIEMANN INTEGRALS 68

Proof Let A = Y > Y ′. By symmetry it is sufficient to proves that P(A) = 0. Thus we only needto prove the following statement: for any Y ≥ 0,

∫A Y = 0 for any A ∈ F , then Y = 0 almost surely.

For this just note that if Y 6= 0 = Y > 0 has positive measure then for some a > 0, Y > a haspositive measure (for otherwise P(Y > 0) = limn→∞ P(Y > 1

n) = 0), and then∫AY dP ≥ aP(Y > a) > 0,

this contradicts the assumption. Hence µ(Y 6= 0) = 0 ,completting the proof.

Example 5.6 Let f : X → Y . Given a measure on Y , can we use f and µ to define a measure on Xin a meaningful way? The answer is no in general. There is no good sensible way for this construction,for otherwise you should be able to pull back a direct measure, what that would be if f : 1, 2 → Rf(1) = 0 and f(2) = 0? If f : R → R is injective, one can define a measure from a measure on thetarget space, it would be the pushed forward measure for the inverse map.

Definition 5.9.1 If f is a measurable map from X to X , we say µ is invariant by f if f∗(µ) = µ.

Example 5.7 Let δx be the Dirac measure on Rn. Then for any transformation T : Rn → Rn, T∗(δx) =

δTx. The δ-measure is not invariant under rotations (unless x = 0), nor by translation.

5.10 Appendix: Riemann integrals

If a function f : [a, b]→ R is Riemann integrable, it is Lebesque integrable and the two integrals agree.In this and the next section we review Riemann integrals and Riemann-Stieltjes integrals.

I do not intend to cover this in class. However you might find this helpful for comparing severalnotions of integration theory.

We know how to compute the areas of a triangle and a polygon. We then define the integral underneatha continuous curve by rectangle approximation. If y = f(x), x ≥ 0 is the curve, the approximation leadsto∫ t

0 f(s)ds, this is the Riemann integral∫ ba f(x)dx. (Riemann integrals are covered in M2PM1: Real

Analysis)

The Riemann integral of f is defined as follows. Take a partition ∆ : a = x0 < a1 < · · · < an = b.Let Mi and mi be respectively the supremum and infimum values of f on the interval [ai−1, ai]. Define

U(∆, f) =

n∑i=1

Mi(ai − ai−1), L(∆, g) =

n∑i=1

mi(ai − ai−1).

U(f) = inf∆U(∆, f), L(f) = sup

∆U(∆, f).

5.11. APPENDIX: RIEMANN-STIELTJES INTEGRALS 69

Definition 5.10.1 A bounded function f : [a, b] → R is Riemann integrable, if U(f) = L(f) and thecommon value is the Riemann integral of f on [a, b].

A continuous function is Riemann integrable.

Theorem 5.10.1 A bounded function f is Riemann integrable if and only if for every ε > 0 there existsa partition such that U(f,∆)− L(f,∆) < ε.

Observe that the oscillation of f on [ai−1, ai] is Osc(f) = Mi −mi,

U(f,∆)− L(f,∆) =∑

Osc(f, [ai−1, ai])(ai − ai−1).

Define the Riemann sum:

R(f,∆) =n∑i=1

f(x∗i )(ai − ai−1).

Theorem 5.10.2 Suppose a bounded function f : [a, b]→ R is Riemann integrable. If ∆n is a sequenceof partitions with mesh goes to zero, then their Riemann sum R(f,∆n) converges to

∫ ba f(x)dx.

This can be proved using Theorem 5.10.1.

Remark 5.10.3 If f : [0,∞)→ R is measurable and Riemann integrable on every interval [0, n] wheren ∈ N, then f is Lebesgue integrable on [0,∞) if and only if

limN→∞

∫ N

0|f(x)|dx <∞.

If so,∫∞

0 f(x)dx =∫

[0,∞) fdλ.

The concept ‘Lebesgue integrable’ does not allow cancellation, while improper Riemann integrationallows it. sin(x)

x is improper Riemann integrable, bot not Lebesgue integrable.

5.11 Appendix: Riemann-Stieltjes integrals

Riemann-Stieltjes integrals aree defined in parallel to Riemann integrals. Suppose : f [a, b] → R is abounded function and g a monotone increasing function on [a, b], for example g(x) = x in which wecase we are back to Riemann integrals.

Write ∆gi = g(ai)− g(ai−1). Define

U(∆, f, g) =

n∑i=1

Mi∆gi, L(∆, f, g) =

n∑i=1

mi∆gi.

5.11. APPENDIX: RIEMANN-STIELTJES INTEGRALS 70

U(f, g) = inf∆U(∆, f, g), L(f, g) = sup

∆U(∆, f, g).

Definition 5.11.1 If U(f, g) = L(f, g), we say f is Riemann-Stieltjes integrable (with respect to g), wedenote f ∈ D(g). The common value is denoted by

∫ ba fdg.

Theorem 5.11.1 (Integrability Criterion) Suppose f is a bounded function and g a monotone increas-ing function on [a, b]. Then f is Riemann-Stieltjes integrable if and only if for any ε > 0, there exists apartition ∆ of [a, b] such that

U(∆, f, g)− L(∆, f, g) < ε.

Corollary 5.11.2 Suppose f is a bounded function with only a finite number of discontinuity points, andg a monotone increasing function on [a, b], continuous at every point where f is discontinuous. Then fis Riemann-Stieltjes integrable.

Proposition 5.11.3 Suppose that f, f1, f2 : [a, b] → R are bounded, Riemann-Stieltjes integrable withrespect to a monotone increasing function. g : [a, b]→ R. Let c ∈ R.

1. If f ∈ D(g), then for any c ∈ R, cf ∈ D(g)and∫ ba (cf)dg = c

∫ ba fdg.

2. f1 + f2 ∈ D(g), and∫ ba (f1 + f2)dg =

∫ ba f1dg +

∫ ba dg.

3. If f1 ≤ f2 then∫ ba f1dg ≤

∫ ba f2dg.

4. If f ∈ D(g) and c > 0 is a constant, then f ∈ D(cg) and∫ ba fd(cg) = c

∫ ba fdg.

5. If f ∈ D(g) ∩ D(g′), then f ∈ D(g + g′),∫ b

afd(g + g′) =

∫ b

afdg +

∫ b

addg′.

Definition 5.11.2 If g = g1 − g2, where monotone increasing functions we define∫fd(g1 − g2) =

∫fdg1 −

∫fdg2.

Riemann-Stieljes integrals do not exists if the integrator and the integrand have the same point ofdiscontinuity. That

∫ ba fdF,

∫ cb fdF exist in Riemann-Stieljes sense do not necessarily imply that

∫ ca fdF

is Riemann-Stieljes integrable.

Proposition 5.11.4 If g : [a, b] → R is continuous and F : [a, b] → R has bounded variation then theRiemann-Stieljes integral

∫ ba gdF and

∫ ba Fdg exist. If F is furthermore absolutely continuous and the

integral equals to the corresponding Lebesgue integral:∫ b

agdF =

∫ b

agF ′dx.

5.12. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION 71

5.12 Appendix: Functions with bounded variation

The most commonly used integrators are functions of bounded variations. Functions of bounded varia-tions are differences of increasing functions.

Given an interval [a, b], define P ([a, b]) to be the set of all partitions of [a, b]:

P ([a, b]) = t = (t0, t1, . . . , tn) : a = t0 < t1 < · · · < tn−1 < tn = b, n ∈ N.

Let ∆t = max0≤i≤n−1(ti+1 − ti) be the modulus of the partition. Write

∑[u,v]⊂∆

|f(u)− f(v)| =n∑j=1

|f(tj)− f(tj−1)|.

Definition 5.12.1 The total variation of a function f on [a, b] is defined by

|f |TV ([a, b]) = sup∆∈P ([a,b])

∑[u,v]⊂∆

|f(u)− f(v)|.

If this number is finite we say that f has bounded total variation over [a, b].

The function x sin( 1x) does not have bounded variation over any interval containing 0.

Note that f could have bounded total variation over [a, b] without having a bounded variation over R(e.f. f(x) = sinx).

Theorem 5.12.1 A. function is of bounded variation on [a, b] if and only if f is the difference of twomonotone real-valued functions on [a, b].

Proof Let f = f1 − f2, where f, g are increasing functions, then

|f |TV ([a, b]) = sup∆∈P ([a,b])

∑[u,v]⊂∆

|f1(u)− f1(v)− (f2(u)− f2(v))|.

For f : [a, b]→ R, define a function:

|f |TV (x) = |f |TV ([a, x]).

Example 5.8 1. If f : R→ R is an increasing function, |f |TV (x) = f(x)− f(−∞).

2. If f is locally Lipschitz continuous, then f is of bounded variation on any finite time interval.

3. The set of bounded variation function form a vector space.

5.12. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION 72

If f : R → R is an increasing then it has left and right derivatives at every point and there are only acountable number of points of discontinuity. Furthermore f is differentiable almost everywhere.

Theorem 5.12.2 Let F : R→ R+ be a function of bounded variation, right continuous with F (−∞) =

0. Then

1. F determines a Borel measure µ on R.

2. The Borel measure µ, with distribution function F , is absolutely continuous with respect to theLebesgue measure dx if and only if

F (x) =

∫ x

0F ′(t)dt.

3. It is singular with respect to dx if and only if F ′ = 0.

Chapter 6

Product measures and Fubini’ Theorem

Throughout this section let (X ,F , µ) and (Y,G, ν) be two measure spaces. A (measurable) product setis of the form A×B where A ∈ F and B ∈ G. Denote by C the collection of products measurable set.

C = A×B : A ∈ F , B ∈ G.

The product σ-algebra is generated by such sets.

F ⊗ G = σ(C).

Can we assign a measure to F ⊗ G that is consistent to µ and ν?

If X = a1, . . . , an, we use the power set σ-algebra. If we assign any non-negative values to ai,say pi, this determines a measure on 2X (no relations is needed on pi, as intersections of singletonsare emptysets, the union of two singletons is no longer a singleton). The power σ-algebra has precisely2n elements. Every measure on the finite space X , with the power σ-algebra, is determined uniquely bytheir values of the singleton sub-sets.

Let Y = b1, . . . , bm, then X × Y = (ai, bj) : 1 ≤ i ≤ n, 1 ≤ j ≤ m, and every element ofX ×Y is contained in 2X ⊗ 2Y . Thus 2X ⊗ 2Y = 2X×Y . Hence by assigning a value to every (ai, bj),this is a product set, this determine a measure on the product σ-algebra. Given µ on X and ν on Y , wedefine the product measure by

µ× ν((ai, bj)) = µ(ai)ν(bj).

Similarly if X is given by a partition A1, . . . , An, and Y is given a partition B1, . . . , Bm, and ifwe set F = σ(A1, . . . , An) and G = σ(B1, . . . , Bm). Then there are 2n and 2m elements in F andG respectively. If we assign values µ(Ai), this determine a measure on X (because the intersections oftheAi are emptyset, no relation between these numbers are neeeded for extending these to every elementof F). Every measure on F is obtained in this way. Also F ⊗ G = σ(Ai × Bj). It is clear that

73

6.1. PRODUCT MEASURES 74

Ai×Bj is a partition of X ×Y . Thus this is totally analogous to the finite state spaces discussed earlier.Furthermore given µ, ν on F and G respectively, the product measure on F ⊗ G will be defined by

µ× G(Ai ×Bj) = µ(Ai)ν(Bj).

If F and G are not finite σ-algebras, they are uncountable, the construction for the product measurebecomes much more involved.

6.1 Product measures

Recall the definition of elementary family in Definition 2.2.4.

Lemma 6.1.1 The collection C of product sets is an elementary family. The collection A of a finitenumber of disjoint union of product sets is an algebra.

Proof The empty set is also in C. Also,

(A×B) ∩ (A′ ×B′) = (A ∩A′)× (B ∩B′), (A×B)c = (X ×Bc) ∪ (Ac ×B).

Thus, C is closed under taking complement, and the complement of a set is the disjoint union of a finitenumber of product sets. By exercise 2.2.2, the collection of finite number of disjoint union of sets fromC is an algebra.

Let us define a function µ × ν : A → [0,∞] by the following rules: If E = ∪nj=1(Ai × Bj) whereAi ∈ F and Bi ∈ G, then we want to define

µ× ν(E) =n∑j=1

µ(Ai)ν(Bi). (6.1)

We must show that this is independent of the expression of E into disjoint unions of product sets.

Lemma 6.1.2 Let E = A × B ∈ C. Suppose that E = ∪∞j=1(Aj × Bj) where Ai ∈ F and Bi ∈ G,where Aj ×Bj are disjoint sets. then

µ(A)ν(B) =∞∑j=1

µ(Aj)ν(Bi).

Proof Observe that 1A×B(x, y) = 1A(x)1B(y) for x ∈ X , B ∈ Y . Also,

1A×B =∞∑i=1

1Ai×Bi =∞∑i=1

1Ai1Bi .


Holding y fixed, we may integrate 1A(x)1B(y) =∑∞

i=1 1Ai(x)1Bi(y) with respect to µ.∫X1A(x)1B(y) dµ(x) =

∫X

∞∑i=1

1Ai(x)1Bi(y)dµ(y).

Use Corollary 5.4.8 for exchanging the summation and integration,

µ(A)1B =∞∑i=1

µ(Ai)1Bi .

We use the convention0 · ∞ = 0.

( So if µ(A) = ∞, then the left hand side is ∞ if y ∈ B and 0 otherwise. The right hand side isinterpreted in the same way. Observe that Bi ⊂ B and Ai ⊂ A.)

Integrate both sides from the above line w.r.t. ν we obtain µ(A)ν(B) =∑∞

i=1 µ(Ai)ν(Bi), complet-ting the proof.

Lemma 6.1.3 Suppose that⋃∞i=1(Ai ×Bi),

⋃∞j=1(Cj ×Dj) are disjoint unions and

∞⋃i=1

(Ai ×Bi) =∞⋃j=1

(Cj ×Dj),

then∞∑i=1

µ(Ai)ν(Bi) =∞∑j=1

µ(Cj)ν(Dj).

Proof Let E =⋃∞j=1(Ai ×Bi). Then,

∞⋃i=1

(Ai ×Bi) = (∞⋃j=1

(Ai ×Bi)) ∩ E

=∞⋃i=1

∞⋃j=1

(Ai ×Bi) ∩ (Cj ×Dj)

=

∞⋃i=1

∞⋃j=1

(Ai ∩ Cj)× (Bi ∩Dj).

The last is also a union of disjoint sets.

For each i, we apply the previous lemma to

Ei = Ai ×Bi =

∞⋃j=1

(Ai ∩ Cj)× (Bi ∩Dj)


to see

µ(Ai ×Bi) =

∞∑j=1

µ(Ai ∩ Cj)ν(Bi ∩Dj)

So,∞∑i=1

µ(Ai ×Bi) =

∞∑i=1

∞∑j=1

µ(Ai ∩ Cj)ν(Bi ∩Dj).

Similarly∞∑j=1

µ(Cj ×Dj) =∞∑i=1

∞∑j=1

µ(Ai ∩ Cj)ν(Bi ∩Dj).

This finishes the proof.

The map (6.1) defines a pre-measure on A. Firstly µ × ν(φ) = 0, secondly it has finite additiveproperty on A. Furthermore, suppose that E =

⋃∞i=1Ei, where Ei ∈ A are disjoint, with the property

that E ∈ A, we show that µ(E) =∑∞

i=1 µ(Ei). On one hand, each Ei is a finite disjoint union ofproduct sets, we pull these product sets , that composes of Ei, together: they are all disjoint. Relabelthese disjoint product sets we see E =

⋃∞i=1Ai ×Bi. Re-arrange it, we see that µ(E) =

∑µi(Ei).

With this we define an outer measure µ∗, by (3.1):

µ∗(E) = inf

∞∑j=1

µ× ν(Ej) : E ⊂∞⋃j=1

Ej , Ej ∈ A

.

By the Caratheodory theorem, the outer measure is a measure on F × G and agree with µ × ν on A.This measure on F ⊗ G will be called the product measure. If µ and ν are σ-finite, there exists a uniquemeasure such that the measure of product sets is the product of the measures on each factor.

If µ, ν are finite, so is µ×ν(X ×Y) = µ(X )ν(Y) <∞. If they are both σ-finite, so is µ×ν. Indeed,there exist An ↑ X and Bn ↑ Y such that µ(An) < ∞, ν(Bn) < ∞ for every n. Now Ai × bj is anincreasing sequence converging to X × Y with finite measure, hence µ× ν is σ-finite.

Remark: Since a countable union of measures of A is a countable union of product sets,

µ∗(E) = inf

∞∑j=1

µ(Aj)ν(Aj) : E ⊂∞⋃j=1

Aj ×Bj , Aj ∈ F , Bj ∈ F ∈ A

.

Remark 6.1.4 Similar constructions can be used to obtain a product measure on a finite number ofcopies of product spaces and their product σ-algebras.

Let us take for example three measure spaces (Xi,Fi, µi). Recall that

⊗3i=1Fi = σ(

Π3i=1Ai : Ai ∈ Fi, i = 1, 2, 3

).


We define µ1 × µ2 × µ3 by

µ1 × µ2 × µ3(A1 ×A2 ×A3) = Π3i=1µi(Ai).

Proposition 6.1.5 Let (Xi,Fi, µi) be measurable spaces. We have,

(F1 ⊗F2)⊗F3 = ⊗3i=1Fi.

If µi are σ-finite, then

(µ1 × µ2)× µ3 = µ1 × µ2 × µ3.

Proof It is clear that,

Π3i=1Ai : Ai ∈ Fi, i = 1, 2, 3

⊂ E × C : E ∈ F1 ⊗F2, C ∈ F3. Hence

⊗3i=1Fi ⊂ (F1 ⊗F2)⊗F3.

On the other hand, for any C ∈ F3,

E × C : E ∈ F1 ⊗F2 ⊂ ⊗3i=1Fi.

so

E × C : E ∈ F1 ⊗F2, C ∈ F3 ⊂ ∪C∈F3E × C : E ∈ ⊗2i=1Fi ⊂ ⊗3

i=1Fi.

Thus

(F1 ⊗F2)⊗F3 = σ(E × C : E ∈ F1 ⊗F2, C ∈ F3) ⊂ ⊗3i=1Fi.

Now define µ1 × µ2 on F1 ⊗ F2. Then, (µ1 × µ2) × µ3 and µ1 × µ2 × µ3 agree on the product sets.These product set determines σ-finite measures.

Example 6.1 Let λ be the Lebesgue measure, the completion of the product measure on R×R is calledthe Lebesgue measure on R2. This is denoted by the same letter or by λ2. Our question is: givenf : R2 → R measurable, what is ∫

R2fdλ2?

Remark 6.1.6 If each factor (X ,F , µ) and (Y,G, ν) are complete, (i.e. the σ-algebra is complete witherespect to the given measure, The product σ-algebra is often NOT complete with respect to the productmeasure. We can of course complete the product σ-algebra under the product measure.

Exercise 6.1.1 Give an example of a Borel measure on R2 (i.e. on the Borel σ-algebra) that is not aproduct measure.


6.1.1 Sections of subsets of product spaces

If E ⊂ X × Y , for x ∈ X , we define the section

Ex = y ∈ Y : (x, y) ∈ E.

Similarly for y ∈ Y we defineEy = x ∈ X : (x, y) ∈ E.

Proposition 6.1.7 If E ∈ F ⊗ G, then

1. Ex ∈ G for every x ∈ X and ν(Ex) is measurable.

2. Ey ∈ F for every y ∈ Y and µ(Ey) is measurable.

3. Also,

µ× ν(E) =

∫Xν(Ex)µ(dx) =

∫Yµ(Ey)ν(dy).

Proof (1) We first assume that the measures are finite. Let

D = E ∈ F ⊗ G : conclusion holds.

Observe that

(A×B)x =

B, x ∈ A,φ, x 6∈ A.

Take µ measure of the above,

ν((A×B)x) =

ν(B), x ∈ A,0, x 6∈ A.

Integrate w.r.t. x, we see

ν((A×B)x)µ(dx) = ν(B)µ(A) = µ× ν(A×B).

A similar conclusion holds for the y-sections. This means D contains the set of products sets, which is aπ-system.

If A ⊂ B, A,B ∈ D then

(A \B)x = Ax \Bx ∈ G (A \B)y = Ay \By ∈ F .∫ν((A \B)x)µ(dx) =

∫ν(Ax)dµ(x)−

∫ν(Bx)dµ(y),

6.2. FUBINI’S THEOREM 79

A similar conclusion for the y-sections. Hence A \B ∈ D. If An ∈ D is an increasing sequence,

(⋃n

An)x =⋃

(An)x ∈ G.

Thus D is a λ system and thus equals F ⊗ G, as it contains a generating set which is a π-system. Thisconcludes the finite measure case.

(2) Suppose that the measures are σ-finite, we takeAi×Bi increasing with finite measure, then applythe proposition to E ∩ (Ai ×Bi), we see that

(E ∩ (Ai ×Bi))x =

Ex ∩Bi, if x ∈ Aiφ, otherwise,

is measurable for every x, then so is Ex = ∪i(E∩ (Ai×Bi))x. Similarly conclusions for the y-sections.Since,

µ× ν(E ∩ (Ai ×Bi)) =

∫Xν((E ∩ (Ai ×Bi))x)µ(dx) =

∫Yµ((E ∩ (Ai ×Bi))yx)ν(dy),

we may take i→∞ and use the monotone convergence theorem to conclude.

Proposition 6.1.8 If f : X × Y → R is measurable, then so are the following functions:

fx = f(x, ·) : CY → R, fy = f(·, y) : X → R

are measurable for every x ∈ X and for every y ∈ Y .

Proof For any B ∈ B(R),

(fy)−1(B) = x : f(x, y) ∈ B = (f−1(B))y ∈ F ,(fx)−1(B) = y : f(x, y) ∈ B = (f−1(B))x ∈ G.

This conclusion follows from Proposition 6.1.7.

6.2 Fubini’s Theorem

Theorem 6.2.1 (The Fubini-Tonelli Theorem) Let µ and ν be σ-finite measures.

(1) Suppose f : X×Y → [0,∞] is measurable. Then g(x) =∫fx(y)dν(y) and h(y) =

∫fy(x)dµ(x)

are both non-negative and measurable. Furthermore,∫fd(µ× ν) =

∫X

(∫Yf(x, y)dν(y)

)dµ(x) =

∫Y

(∫Xf(x, y)dµ(x)

)dν(y). (6.2)


(2) Suppose f ∈ L1(µ× ν), then

(a) fx ∈ L1(ν) for a.e. x

(b) fy ∈ L1(µ) for a.e. y ∈ Y .

(c) The almost everywhere defined functions g1(x) =∫fx(y)dν(y) ∈ L1(µ) and g2(y) =∫

fy(x)dµ(x) ∈ L1(ν) are both integrable.

(d) Furthermore, (6.2) holds.

Proof This theorem holds for characteristic functions, and therefore holds for simple functions by non-linearity. We prove (1) first. Let fn ∈ E+ such that fn ↑ f . Then

gn(x) =

∫fn(x, y)dν(y) ↑ g(x) =

∫f(x, y)dν(y),

hn(y) =

∫fn(x, y)dµ(x) ↑ h(y) =

∫f(x, y)dµ(x).

So g, h are measurable and∫fnd(µ× ν) =

∫X

(∫Yfn(x, y)dν(y)

)dµ(x) =

∫Y

(∫Xfn(x, y)dµ(x)

)dν(y).

Apply the monotone convergence theorem to conclude (1).

Also,∫fd(µ × ν) < ∞ implies that

∫Y fn(x, y)dν(y) is finite for a.e. x and

∫X fn(x, y)dµ(x) is

finite for a.e. y.

For a general integrable function f , apply this to f+ and f−, and use the statement above to conclude.

Remark 6.2.2 For simplicity we remove the brackets and write∫X

∫Yf(x, y)dν(y)dµ(x) =

∫X

(∫Yf(x, y)dν(y)

)dµ(x),∫

X

∫Yf(x, y)dν(y)dµ(x) =

∫Y

(∫Xf(x, y)dν(y)

)dµ(x).

Corollary 6.2.3 Fubini’s theorem holds if we replace µ × ν by its completion, provided that µ, ν arecomplete.

To see this we note that if f is measurable with respect to the completion of F ⊗ G, then there existsan F ⊗ G -measurable function f such that f = f except on a null set E, which by enlarging we mayassume to be F × G measurable. Then µ(Ex) = 0 for a.e. x, ν(Ey) = 0 for a.e. y. Since µ, ν arecomplete ,m they are measurable. Thus, we may apply Fubini’s theorem to f ,∫

X×Yf dµ× ν =

∫X×Y

f dµ× ν =

∫ ∫f(x, y) dµ(x)dν(y) =

∫ ∫f(x, y) dν(y)dµ(x).


We know that f(x, ·) is measurable for almost every x and f(x, ·) differs at most on Ex, a null set.Similarly we conclude f(·, y) is measurable a..e. y, and also they are in L1, the conclusion follows.

Exercise 6.2.1 Let X = 1, . . . , N be a finite state space. Write down the the Fubini-Tonelli Theoremspecific to this case.

Given a measure µ on X × Y . We say µi are the marginals of µ if

µ(X ×B) = µ2(B), µ(A× Y) = µ1(A), ∀A ∈ F , B ∈ G.

We also say that µ is a coupling of µ1 and µ2.

Exercise 6.2.2 Given an example of a Borel measure µ on R2, two Borel measures µi on R such that µis not the product measure µ1×µ2, but µi are marginals of µ. Given another set of examples of measuressuch that µ = µ1 × µ2 but µi are not the Lebesgue measure.

Note that if f is a non-negative measurable function, µ(A) :=∫A f dµ is a measure.

Exercise 6.2.3 For f : R → R given by f(x) = x2 + 1 and λ the Lebesgue measure, compute thepushed forward measure. f∗λ and compute

∫[1,2] e

√x−1 d(f∗λ).

Exercise 6.2.4 Let f(x, y) = (x2−y2)(x2+y2)2

for x ∈ [0, 1] and y ∈ (0, 1], also f(0, 0) = 0. Show thatf 6∈ L1([0, 1]× [0, 1]).

Chapter 7

Radon-Nikodym Theorem

In this chapter we compare two measure, discuss the concepts of singular and absolutely continuity.

7.1 Singular and absolutely continuity

Definition 7.1.1 We say that µ is absolutely continuous with respect to ν if whenever ν(A) = 0, wehave µ(A) = 0.

We write µ ν. Given any two finite measures µ1, µ2 we can found a reference measure (e.g. µ =

µ1 + µ2, such that µ1 µ and µ2 µ).

Definition 7.1.2 Two measures µ and ν on Ω are mutually singular if there are disjoint sets A1 and A2

such that Ω = A1⋃A2 such that µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν.

The Lebesgue measure and any Dirac measures δx are singular.

Exercise 7.1.1 If ν is a measure on (X ,G) and p : X → [0,∞) is an integrable function. Show that

µ(A) =

∫Ap(y)ν(dy)

defines a measure and µ ν. We write µ = pν.

Example 7.1 The Gaussian measure N(0, 1) on R1 is absolutely continuous w.r.t. the Lebesgue mea-sure.

N(0, 1)(A) =

∫A

1√2πe−|x|22 dx.

83

7.2. SIGNED MEASURE 84

Example 7.2 Let Ω = [0, 1] and P the Lebesgue measure . Define Q(A) = 2∫A x

2dx so

dQ

dP(x) = 2x2.

Define Q1 bydQ1

dP= 21[0, 1

2].

Then Q1 P and P is not absolutely continuous with respect to Q1. Define Q2 by

dQ2

dP= 21[ 1

2,1].

The two measures Q1 and Q2 are singular.

Exercise 7.1.2 Let ν be a finite measure. Sow that ν µ if and only if the following holds: for anyε > 0 there exists δ > 0 such that whenever µ(E) < δ then ν(E) < ε.

Exercise 7.1.3 Show that if f ∈ L1(µ) is a non-negative function, then for every ε > 0 there existsδ > 0 such that whenever µ(A) < δ we have ∫

Efdµ < ε.

Note that ν(A) =∫A dµ defines a measure with ν µ.

Exercise 7.1.4 Give examples of pairs of measure µ, ν with µ ν and µ ⊥ ν.

7.2 Signed measure

A signed measure µ is a function on sets taking values in [−∞,∞] and satisfying the properties of ameasure: µ(φ) = 0, if Aj are disjoint measurable sets, then ν(∪∞j=1Aj) =

∑∞j=1 ν(Aj), in addition we

assume that if taking an infinity value, µ takes at most one of the values from ±∞.

If f = f+ − f− is measurable, with either∫f+ < ∞ or

∫f− < ∞, then µ(A) :=

∫A f dµ is a

signed measure.

A measurable set P is said to be positive for µ if for any F ⊂ P , we have µ(F ) ≥ 0. In other words µrestricts to a (positive) measure on F . A measurable set P is said to be negative for µ if fro any F ⊂ N ,we have µ(F ) ≤ 0.

Exercise 7.2.1 If Pn is a sequence of positive sets for µ, then so is ∪∞n=1Pn.

Note that each Pn \⋃n−1k=1 Pn is positive for µ.

7.2. SIGNED MEASURE 85

Theorem 7.2.1 (The Hahn decomposition theorem) If µ is a signed measure on (X ,F), then thereexists a partition of X : X = P ∪N and P ∩N = φ, such that P is positive and N is negative, for µ.

If X = P ′ ∪N ′ is another such partition then µ(P∆P ′) = 0 and µ(N∆N ′) = 0.

Theorem 7.2.2 (Jordan decomposition theorem) If µ is a signed measure on (X ,F), then there existunique positive measures µ+ and µ− such that

µ = µ+ − µ−, µ+ ⊥ µ−.

Proof Take µ+ = µ|P , µ− = µ|N where X = P ∪N as in Hahn decomposition theorem. To see it isunique suppose µ = Q1 −Q2 with Q1 ⊥ Q2, then X = E ∪ F , disjoint E,F , where Q1(E2) = 0 andQ2(E1) = 0. Then E ∪ F is another Hahan decompostion, by its uniqueness we conclude. We can

talk about a signed measure to be absolutely continuous with respect to a (positive ) measure.

Definition 7.2.1 If µ is a signed measure on (X ,F), we set |µ| = µ+ + µ−, this is called the totalvariation of µ. If ν is a (positive ) measure, we say µ ν if µ(A) = 0 whenever ν(A) = 0 where A isa measurable set.

Exercise 7.2.2 Show that µ ν if and only if |µ| ν if and only if µ+ ν and µ− ν.

Definition 7.2.2 If µ is a signed measure we set L1(µ) = L1(µ+)∩L1(µ−), and for f ∈ L1(µ), we set∫fdµ =

∫fdµ+ −

∫fdµ−.

Lemma 7.2.3 Suppose that µ and ν are finite (positive) measures then µ ⊥ ν or there exists ε > 0 anda measurable set E such that ν(E) > 0 and E is a positive set for µ− εν.

Proof Suppose that µ is non-trivial. By Hahn decomposition theorem, there exists a partition X =

Pk ∪Nk such that Pk is a positive set and Nk a negative set for the signed measure µ− 1kν.

Let P =⋃k Pk and N = ∩kNk. Then N , a subset of Nk, is negative for every µ− 1

kν, in particular

µ(N)− 1

kν(N) ≤ 0.

Taking k → ∞, we see µ(N) ≤ 1kν(N) = 0. If µ, ν are not mutually singular, µ must charge the

complement of N which is P . Then, since P is a countable union of Pk’s we have µ(Pn) > 0 for somen. We take E = Pn on which µ− 1

nν > 0.

7.3. RADON-NIKODYM THEOREM 86

7.3 Radon-Nikodym Theorem

Terminology: A signed measure is σ-finite if both µ+ and µ− are.Use with caution the following (for it might be confused with, I used this in proofs.) We use µ ≤ ν toindicate that µ(A) ≤ ν(A) for every measurable set A.

Theorem 7.3.1 (Lebesgue-Radon-Nikodym Theorem) Let µ and ν be two σ-finite measures, µ is asigned measure while ν is a positive measure, on a space (X ,F). Then,

(1) [Lebesgue decomposition] there exist, uniquely, σ-finite measures with the property that µ = µ1 +

µ0, µ1 ⊥ ν, µ0 ν.

(2) [Radon-Nikodym Theorem] There exists a measurable function D : X → R such that µ1(A) =∫ADdν. Any two such functions agree ν almost surely.

(3) If ν and µ are finite, then D is finite a.s. and integrable.

(3) If µ is a positive measure then f ≥ 0.

Definition 7.3.1 We denote D by dµdν and call it the Radon-Nikodym derivative of µ with respect to ν.

Proof Step 1. We first assume that µ and ν are finite positive measures. Let us define

H =

f : X → [0,∞] :

∫Afdν ≤ µ(A) for all A ∈ F

.

We choose the ‘largest f’ fromH as follows. Let

M = sup∫Xfdν : f ∈ H

.

This is possible sinceH is not empty: f ≡ 0 ∈ H. Also any f ∈ H has finite integral∫fdν ≤ µ(X ) so

M <∞.

Next we see, If f, g ∈ H then max(f, g) ∈ H. Let E = f < g.∫A

max(f, g)dν =

∫E∩A

gdν +

∫E∩Ac

fdν ≤ µ(A ∩ E) + µ(A ∩ Ec) = µ(A).

There exists a sequence fn such that,∫fndν increases and

limn

∫fndµ = M.

Setgn = maxf1, . . . , fn ∈ H.


This is an increasing sequence and ∫Xfn ≤

∫Xgn ≤M.

By the monotone convergence theorem,∫

supn gndµ = limn

∫gndµ = M .

Set

D = limngn,

Then D ∈ H. Indeed, it is measurable and for any A ∈ F ,∫Agdν =

∫1Agdν = lim

n

∫1A gndν ≤ µ(A),

We show that µ − fdν is singular w.r.t. ν. If not there exists ε > 0 and E measurable such thatν(E) > 0 with µ− fdν − ε1Edν > 0 on E, i.e. for every A ∈ F ,∫

A∩E(f + ε1E)dν ≤ µ(A ∩ E).

Then∫A

((f + ε1E))dν =

∫A∩E

((f + ε1E))dν +

∫A∩Ec

fdν ≤ µ(A ∩ E) + µ(A ∩ Ec) = µ(A)

and (f + ε1E) ∈ H and its integrable is larger than M , contradicts with M is the largest integral amongall f ∈ H.

Uniqueness: if µ = µ0 + fdν = µ0 + gdν, then µ0− µ0 + (f − g)dν = 0. Then µ0− µ0 = (f − g)ν

is both singular and absoultely continuous w.r.t. ν. It must then vanish. This means µ0 = µ0 and∫A(f − g)dν = 0 for every A ∈ F and f = g a.e. w.r.t. ν.

Step 2. Suppose µ and ν are σ-finite (positive) measures,. Then there exists Ej disjoint with X =

∪jEj µ(Ei) <∞ and ν(Ej) <∞. Let µj = µ|Ej and νj = ν|Ej . By previous argument,

µj = µj0 + fjdνj

We may extend fj from Ej to X by setting it to be zero on Ecj . Set µ =∑µj(E ∩ ·) and f =

∑j fj .

Then∑

j µj is singular w.r.t. fdν. Now

µ(A) =∑j

µj(A ∩ Ej) =∑j

νj0(A) +

∫A

∑j

fjdν = ν0(A) +

∫Afdν.

Step 3. Finally suppose µ is a signed measure, proceed as above.


Exercise 7.3.1 (1) In the proof, why did we not take the density to be D(x) = supf∈H f(x)?1

(2) If we have a finite state space and discrete σ-algebra, construct the function D directly.

(3) If one has a finite σ-algebra, construct this function D directly.

We use ‘essentially unique’ to say that if D1 and D2 are two possible choices for the density of µwith respect to ν, then the set ω |D1(ω) 6= D2(ω), is of ν-measure 0.

Exercise 7.3.2 Define a Boral measure on R by µ(A) =∫A p(x)dx + 1A(2π) where p is a continuous

function vanishing everywhere on the complement of [−1, 1]. Is µ absolutely continuous with respect tothe Lebesgue measure dx? Write down its Lebesgue decomposition w.r.t. dx

Proposition 7.3.2 Let µ1, µ2, µ3 be σ -finite measures, µ2, µ3 are positive measures.

1. Suppose µ1 µ2. If g ∈ L1(µ1) then g dµ1dµ2∈ L1(µ2) and∫

gdµ1 =

∫gdµ1

dµ2dµ2.

2. If µ1 µ2, µ2 µ3 then µ1 µ3, and

dµ1

dµ3=dµ1

dµ2

dµ2

dµ3, µ3 − almost surely.

Proof (1) If g = 1A and µ1(A) <∞, we know by the definition,

µ1(A) =

∫A

dµ1

dµ2dµ2.

So the claim holds. This extends to simple functions by linearity, to non-negative functions by themonotone convergence theorem and to any L1 functions by linearity again.

(2) The transitivity of the absolute continuity is clear(check!). Use part (1), first suppose that dµ1dµ2∈

L1(µ2), we see

µ1(A) =

∫A

dµ1

dµ2dµ2

g=dµ1dµ2=

∫A

dµ1

dµ2

dµ2

dµ3dµ3.

Then we assume µ1 is a positive σ-finite measure. break down X = ∪Ej where µ1(Ej) <∞ andEj aredisjoint measurable sets. We apply the argument, on each Ej , then sum them up. For a signed measurewe break it down to m1 = µ+ − µ− and to each we apply the above conclusion.

1Hint: Let A be a non-measurable set, let us consider the collections of functions indexed by A as follows:

fy(x) =

1, x = y

0, x 6= y= 1y(x).

Is supy∈A f(y) measurable?


Definition 7.3.2 If µ ν and ν µ we say that they are equivalent. In this case they have the sameset of measure zero’s.

Proposition 7.3.3 If µ and ν are equivalent then dµdν = 1

dνdµ

a.e.

Chapter 8

Conditional Expectations

Let us take a probability space (Ω,F ,P). The σ-algebra F represents the set of all observation andmeasurements one can make on a physical system. From observations ( representing partial information)we would like to make predictions on the value of an observables, an observable is a measurable function( a random variable). The observed information is represented by a sub-σ-algebra G. Suppose G = σ(Y )

where Y is a random variable (this represents information obtained by for example an experiment). If arandom variable X : Ω → R is furthermore measurable w.r.t. G, then there exists a Borel measurablefunction ϕ : R→ R, such that X = ϕ(Y ), thus the observable X is determined by the information in G(i.e. by Y ). If X is not measurable, can we predict its value by G, what is the best approximation for it?

We denote by EX or E(X) the integral of X , called the (mathematical) expectation of X:

EX =

∫ΩXdP.

This is the best predicted value for X given F . If X is real valued, EX minimizes, among all constants,the value minc E(X − c)2.

8.1 Conditional expectation and probabilities

Definition 8.1.1 Let X be a real-valued random variable on some probability space (Ω,F ,P) such thatE|X| < ∞ and let F ′ be a sub σ-algebra of F . Then the conditional expectation of X with respect toF ′ is a F ′-measurable random variable X ′ such that∫

AX(ω) P(dω) =

∫AX ′(ω) P(dω) , (8.1)

for every A ∈ F ′. We denote this by X ′ = E(X | F ′).

91

8.1. CONDITIONAL EXPECTATION AND PROBABILITIES 92

Proposition 8.1.1 With the notations as above, the conditional expectation X ′ = E(X | F ′) exists andis essentially unique.

Proof Denote by P also the restriction of P to F ′ and define the measure Q on (Ω,F ′) by

Q(A) =

∫AX(ω) P(dω), ∀A ∈ F ′.

Then Q is a measure on (Ω,G), absolutely continuous with respect to ν. Also Q(Ω) =∫XdP < ∞

so it is a finite measure. Its density with respect to P , given by the Radon-Nikodym theorem, is thena G-measurable functions and is the required conditional expectation. The uniqueness follows from theuniqueness statement in the Radon-Nikodym theorem.

Using the Monotone Class Theorem we obtain:

Proposition 8.1.2 For all bounded G-measurable function g.∫ΩgX dP =

∫ΩgE(X|G) dP. (8.2)

Example 8.1 • If F ′ = φ,Ω, verify that E(X|F ′) = EX . Remember that X ′ being φ,Ω-measurable means that the pre-image under X ′ of an arbitrary Borel set is either φ or Ω, so theconditional expectation is a constant.

• If X is G measurable, then E(X|G) = X a.e.

Example 8.2 If the only information we know is whether a certain event B happened or not, then itshould by now be intuitively clear that the conditional expectation of a random variable X with respectto this information is given by E(X|B), if B happened and by E(X|Bc) if B didn’t happen. It isa straightforward exercise that the conditional expectation of X with respect to the σ-algebra FB =

φ,Ω, B,Bc is indeed given by

E(X | FB)(ω) =

E(X|B) if ω ∈ B

E(X|Bc) otherwise.

Proof. Denote by Y the right hand side. Then, E(1BY ) = E (E(X|B)1B) = E(X1B). SimilarlyE(1BcY ) = E(1BcX), and hence also E(X) = E(Y ).

If Y is a measurable function, we denote

E(X|Y ) = E(X|σ(Y )). (8.3)

Exercise 8.1.1 Suppose that A1, . . . , An is a partition of Ω: they are disjoint and their union is Ω, alsoP(Ai) > 0 for every i. Let F ′ be generated by them. Then

E(X|F ′) =n∑i=1

E(X|Ai)1Ai .

8.1. CONDITIONAL EXPECTATION AND PROBABILITIES 93

If Y takes only a countable values si with positive probability, then Y = si is a partition of Ω and

E(X|Y ) =∑i

E(X|Y = si)1Y=si .

8.1.1 Conditioning on a random variable

In many situations, we will describe F ′ as the σ-algebra generated by another random variable Y :

Ω → X . By the factorization lemma there exists ϕ : X → R such that E(X|Y ) = ϕ(Y ) a.s. If twofunctions ϕ and ϕ′ satisfy this relation, then ϕ = ϕ′ on a set of B(X ) of full measure (measured by thelaw of Y which is denoted by PY ). To see this just note that if A = ω : ϕ(Y (ω) = ϕ(Y (ω) andB = x : ϕ(x) = ϕ(x), then A = Y −1(B) and PY (B) = P(Y −1(B)). And we do know that by theuniqueness fo the conditional expectation, P

(ω : ϕ(Y (ω)) = ϕ(Y (ω)

)= 1.

Notation. We denote by E(X|Y = y) the measurable function ϕ : X → R such that E(X|Y ) =

ϕ(Y ).

E(X|Y ) = ψ(Y ) a.s. If two functions ϕ and ϕ satisfy this relation, by the uniqueness fo the condi-tional expectation,

P(ω : ϕ(Y (ω)) = ϕ(Y (ω)

)= 1.

Write Y∗µ for the pushed forward measure on C. Then

(Y∗µ)(ϕ = ϕ′) = 1.

(Just note that if A = ω : ϕ(Y (ω)) = ϕ(Y (ω)) and B = x : ϕ(x) = ϕ(x), then A = Y −1(B) and(Y∗µ)(B) = P(Y −1(B)) by the definition).

Definition 8.1.2 We denote by E(X|Y = y) the measurable function ϕ : X → R s.t. E(X|Y ) = ϕ(Y ).

Example 8.3 Take a measurable function Y : Ω→ E = y1, y2, . . . , yn. Define

Ai = Y −1(yi) = ω : Y (ω) = yi.

Then σ(Y ) = σAi and Ai ∩Aj = φ if i 6= j. Let X ∈ L1(Ω,F , P ). Define

ϕ(yi) =

E(1AiX)

P (Ai), if P (Ai) 6= 0

0, if P (Ai) = 0.

Define

E(X|Ai) =E(X1Ai)

P (Ai), if P (Ai) 6= 0,

otherwise define it to be zero. Then

ϕ(y) =n∑i=1

E(X|Ai)1Ai(y), ϕ(Y (ω)) =n∑i=1

E(X|Ai) 1Y=yi.

8.2. PROPERTIES 94

Then E(X|Y )(ω) = ϕ(Y (ω)). Indeed for any Ak ∈ σ(Y ),∫Ak

ϕ(Y )dP =

∫Ak

(n∑i=1

E(X|Ai) 1Ai

)dP =

∫Ak

E(X|Ak) dP =

∫Ak

XdP.

We could also define the function E(X|Y = y) as follows.

Definition 8.1.3 Define PY (A) = P(Y ∈ A), the probability distribution of Y . If X takes value in R,then E(X |Y = y) is any random function such that for any B Borel measurable,∫

BE(X |Y = y) PY (dy) =

∫ω:Y (ω)∈B

XdP.

When conditioning on a number of variables Y1, . . . , Yn, the following notation is also used:

E(X|Y1, . . . , Yn) = E(X|σ(Y1) ∨ · · · ∨ σ(Yn)). (8.4)

8.2 Properties

Conditional expectation has very nice properties, they behave almost like integration with respect to a(family of ) measures.

Proposition 8.2.1 • (linearity) If X,Y are integrable and a, b are numbers, then

E(aX + bY |G) = aE(X|G) + bE(Y |G), a.s.

• (positivity preserving) If X ≥ 0 and is integrable then E(X|G) ≥ 0 a.s.

Proof The linearity follows from uniqueness of conditional expectations. (ii) is a direct consequence ofthe Radon-Nikodyme theorem.

Definition 8.2.1 1. We say B is independent of G if P(A ∩B) = P(A)P(B) for every A ∈ G.

2. We say two σ-algebras F and G are independent is any A ∈ F is independent of any B ∈ G.

3. Two random variables are independent if σ(X) is independent of σ(Y ).

4. A random variable is said to be independent of a σ-algebra G, if σ(Y ) and G are independent.

Proposition 8.2.2 1. E(X) = E(E(X|F ′)).

2. If X is F ′ measurable, then E(X|F ′) = X .

8.2. PROPERTIES 95

3. (Tower property) If G1 ⊂ G2,

E(X|G1) = E(E(X|G1)|G2)) = E(E(X|G2)|G1)).

4. (With any information, the best estimate is expectation) If X is independent of G,

E(X|G) = E(X), a.e.

Proof (1) To see this set A = Ω in equation (8.1.1). (2) follows from uniqueness.

(3) If G1 ⊂ G2, E(X|G1) is G2-measurable, so E(X|G1) = E(E(X|G1)|G2). One can also show bythe definition that

E(X|G1) = E(E(X|G2)|G1)).

(4) For any A ∈ G,∫A E(X|G)dP = E(X1A) = E(X)P (A) =

∫A E(X) dP . Since EX ∈ G, the

conclusion follows.

Exercise 8.2.1 Let X,Y ∈ L1(Ω,F , P ) and G a sub-σ-algebra of F . If X is G measurable, XY ∈ L1

thenE(XY |G) = XE(Y |G).

This means we take out ‘what is known’.

Proposition 8.2.3 Let Xn ∈ L1(Ω,F , P ) and G a sub-σ-algebra of F .

1. (conditional monotone convergence) If fn is a sequence of non-negative random variables, suchthat fn ↑ f a.s., then

E(fn|G) ↑ E(f |G), a.s.

2. (Conditional bounded convergence Theorem. ) If |Xn| ≤ g ∈ L1 then

E(Xn|G)→ E(X|G).

3. (Conditional Fatou’s lemma ) If Xn ≥ 0, E(lim infn→∞Xn|G) ≤ lim infn→∞ E(Xn|G).

Proof (1) Apply monotone convergence theorem to fn1Aand E(fn1A) where A ∈ G. Note E(fn|G) isalso non-negative and increasing and has a limit F . Then∫

AfndP ↑

∫AfdP,

∫A

E(fn|G)dP ↑∫AFdP,

Since∫A fndP =

∫A E(fn|G)dP ,

∫A fDP =

∫A FdP and E(f |G) = limn→∞ E(fn|G).

(2) check it holds for X = 1B where B ∈ G and apply Monotone class theorem.

(3) Apply Fatou’s lemma.

8.3. LP SPACES 96

Proposition 8.2.4 Suppose that σ(X) ∨ G is independent of A, then E(X|A ∨ G) = E(X|G).

Proof Let A ∈ A, B ∈ G, ∫A∩B

XdP = E(X1B)P (A),∫A∩B

EX | GdP =

∫A

EX1B|GdP = P (A)E(X1B).

Since A ∩B forms a π-system, and we can prove that

C = D ∈ G ∨ A :

∫DXdP =

∫D

EX|GdP

is a λ system. Hence C = G ∨ A.

8.3 Lp spaces

8.3.1 Mode of convergence

Wo wrap up we note the commonly used notions of convergence. Let (X ,F , µ) be a measure space withσ-finite measure, and fn, f : X → R be Borel measurable functions.

Definition 8.3.1 We say fn converges to f in Lp if∫|fn − f |pdµ→ 0.

Definition 8.3.2 We say fn converges to f in measure if for any ε > 0,

limn→∞

µ(x : |fn(x)− f(x)| > ε) = 0.

Definition 8.3.3 We say fn converges to f a.s. if there exists a null set N and for any x 6∈ N ,

limn→∞

fn(x) = f(x).

Remark 8.3.1 If fn → f in Lp it converges in measure.

Let (X ,B, µ) be a measure space. Let f : X → Rd, denote by |f | is norm in Rd. We define

‖f‖Lp =

(∫|f |p)dµ

) 1p

.

8.3. LP SPACES 97

Definition 8.3.4 Two functions f, g : Ω→ R are considered to be in the same equivalent class if f = g

almost surely.

For 1 ≤ p <∞, the Lp space is the space of equivalent class of measurable |f |p ∈ L1.

Lp = f : X → Rd : measurable,∫X|f |p dµ <∞.

Note that∫|f |pdµ = 0 if and only if |f | = 0 almost surely. That Lp is a vector space is clear from the

elementary inequality |a+ b|p ≤ Cp|a|p + Cp|b|p. Define

‖f‖Lp =

(∫E|f(x)|pdµ(x)

) 1p

.

Then ‖f‖ is a norm which follows from the following:

Theorem 8.3.2 (Minkowski’ inequality) Let f, g ∈ Lp where 1 ≤ p <∞. Then, f + g ∈ Lp and

‖f + g‖Lp ≤ ‖f‖Lp + ‖g‖Lp .

We also use ‖f‖p for ‖f‖Lp .

Proposition 8.3.3 The Lp spaces are vector spaces, they are Banach space under the norm ‖f‖Lp .

This means for every Cauchy sequence xn ∈ X , there exists x ∈ X such that

limn→∞

‖xn − x‖p = 0.

Proof A normed linear is complete if and only if any absolutely convergent sequence is convergent. Letfn ∈ Lp with

∑∞n=1 ‖fn‖p = M <∞. We define

gn(x) =

n∑k=1

|fk(x)|.

Then supn ‖gn‖p ≤M and by the dominated convergence theorem,

limn→∞

∫|gn|pdµ =

∫limn→∞

|gn|pdµ =

∫ ∞∑k=1

|fk(x)|p dµ.

Hence the integral is finite and∑∞

k=1 |fk(x)| <∞ a.e. Then

f(x) =

∞∑k=1

fk(x),

this converges almost everywhere. Set fn(x) =∑n

k=1 fk(x). Then |fn(x)| ≤∑∞

k=1 |fk(x)| ∈ Lp, bythe dominated convergence theorem, (since |fn − f |p ≤ 2

∑∞k=1 |fk(x)| ∈ Lp) |fn − f |p converges in

L1, hence fn → f in Lp. This completes the proof.

8.3. LP SPACES 98

Theorem 8.3.4 Let 1 ≤ p < ∞. If f ∈ Lp(X ) then there exists a sequence of simple functions fn suchthat

limn→∞

∫X|fn − f |pdµ = 0.

Proof We may assume that f ≥ 0 (standard methods passes to f = f+−f−). Let fn ≤ f be a sequenceof non-negative increasing simple functions converging to f (this exists, which theorem?). Define

gn = fp − (f − fn)p.

Then gn → fp, and gn is non-negative and increasing , by the monotone convergence theorem,

limn→∞

∫ (fp − (f − fn)p

)dµ =

∫fpdµ

i.e.

limn→∞

∫(f − fn)pdµ = 0,


Corollary 8.3.5 Let 1 ≤ p <∞, then Lp(Rn) is a separable metric space.

Sketch proof. Simple functions∑n

i=1 ai1Ai where Ai = Πni=1[c,di] with ai, ci, di ∈ Q is a countable

dense subset of Lp, hence Lp is separable.

Theorem 8.3.6 Let 1 ≤ p < ∞. The space C∞c (Rn) the space of C∞ functions on Rn with compactsupport is dense in Lp(Rn).

Definition 8.3.5 A function ϕ : Rd → R is convex if

ϕ(tx+ (1− t)y) ≤ tϕ(x) + (1− t)ϕ(y)

for any x, y ∈ Rd and for any t ∈ [0, 1].

The following are convex functions: ϕ(x) = |x|, |x|p, p > 1. If d = 1, ex is convex. Jensen’s Inequalitystates that If ϕ is twice differentiable f is convex if and only if ϕ

′′ ≥ 0. Example of convex functionsare: ϕ(x) = xp for p > 1, ϕ(x) = ex.

Theorem 8.3.7 (Jensen’s Inequality) If ϕ is a convex function then for all integrable functions f and µa probability measure

ϕ

(∫fdµ

)≤∫ϕ(f)dµ.

8.3. LP SPACES 99

Proof It is a fact that if ϕ is convex then there exists a constant c ∈ R such that

ϕ(y)− ϕ(∫

fdµ

)≥ c

(y −

∫fdµ

).

Take y = f(x),

ϕ(f(x))− ϕ(∫

fdµ

)≥ c

(f(x)−

∫fdµ

).

Integrate this over with respect to the probability measure µ to conclude.

Example 8.4 Let x1, . . . xn be points in R and suppose that µ =∑n

i=1 λiδxi is a probability measureon R. Then apply Jensen to f(x) = x and any convex ϕ:

ϕ(

n∑i=1

λixi) ≤n∑i=1

λiϕ(xi).

Theorem 8.3.8 (Holder’s inequality (Cauchy-Schwarz if p = q = 2)) Let 1 ≤ p, q ≤ ∞. If f ∈Lp, g ∈ Lq with 1

p +′ f1q = 1 (with convention 1∞ = 0), then fg ∈ L1 and .

∫|fg|dµ ≤

(∫|f |pdµ

) 1p(∫|g|qdµ

) 1q

,1

p+

1

q= 1.

If µ is a finite measure, by Holder inequality, if p < q we have Lp ⊂ Lq.

8.3.2

Proposition 8.3.9 Conditional Jensen’s Inequality. Let ϕ : Rd → R be a convex function. Then

ϕ (E(X|G)) ≤ E(ϕ(X)|G).

For p ≥ 1, ‖E(X|G)‖p ≤ ‖X‖p.

Lemma 8.3.10 If E|Xn −X| → 0, then E|E(Xn|G)− E(X|G)| → 0.

Proof By Jensen’s inequality, |E(Xn −X∣∣G)| ≤ E(|Xn −X|

∣∣G), the latter converges in L1,

EE(|Xn −X|∣∣G) = E|Xn −X| → 0,


8.3. LP SPACES 100

8.3.3 Conditional expectation as orthogonal projection

We will learn shortly that Lp, the space of Lp integrable and F-measurable functions, is a closed vectorspace under the norm Lp, and simple functions are dense in Lp. Furthermore L2 is a Hilbert space, wedenote it by L2(Ω,F , P ). If G is a sub-σ algebra, the sub-space of L2(Ω,F , P ) that are measurable withrespect to G, which we denote by L2(Ω,G, P ) is a closed sub-space of L2.

Since L2(Ω,F , P ) is a Hilbert space and L2(Ω,G, P ) is a closed subspace of L2, let π denote theorthogonal projection defined by the projection theorem ( §II.2 Functional Analysis [?]),

π : L2(Ω,F , P )→ L2(Ω,G, P ).

Theorem 8.3.11 If X ∈ L2(Ω,F , P ) then E(X|G) = π(X).

Proof * Let f ∈ L2(Ω,F , P ). Then for any h ∈ L2(Ω,G, P ),

〈f − πf, h〉L2(Ω,F ,P ) = 0.

This is,∫

Ω fhdP =∫

Ω πfhdP .

Let A ∈ G and take h = 1A to see that πf = E(f |G).

f − πf

πf

f

Corollary 8.3.12 E(X) is the constant that minimizes E(X − c)2 and E(X|G) minimizes E(X − Y )2

where Y is a G-measurable random variable.

Remark 8.3.13 The projection πX is the unique element of L2(Ω,G, P ) such that

E|X − πX|2 = minY ∈L2(Ω,G,P )

E|X − Y |2.

Remark 8.3.14 The conditional expectation defines on L2 can be extended to L1 to give the conditionalexpectation, as defined earlier. Indeed, simple functions and bounded functions are in L2. Let f ∈ L1

with f ≥ 0. Let fn be a sequence of bounded functions (increasing with n) converging to f pointwise.Then πfn is well defined and ∫

AfndP =

∫AπfndP, A ∈ G.

Since fn increases with n and is positive; so does πfn and limn→∞ πfn exists. We may exchange limitsand integration: ∫

AfdP = lim

n→∞

∫AfndP =

∫A

limn→∞

π(fn)dP.

8.3. LP SPACES 101

Thus we defineE(f |G) = lim

n→∞π(fn)

For f ∈ L1 let f = f+ − f− and define E(f |G) = E(f+|G)− E(f−|G).

8.3.4 Useful Inequalities*

Exercise 8.3.1 (Chebychev’s inequality/ Markov inequality) If X is an Lp random variable then fora ≥ 0,

P (|X| ≥ a) ≤ 1

aE|X|p.

Exercises

Exercise 8.3.2 Suppose that Y1, Y2 be random variables taking values in X1 and X2 respectively. LetZ = E(X|σ(Y1)∨σ(Y2)). Show that the statement that a σ(Y1)∨σ(Y2)-measurable random variable Zis the conditional expectation of X with respect to σ(Y1) ∨ σ(Y2) is equivalent to one of the followingstatements

1. For any A ∈ σ(Y1) and B ∈ σ(Y2),∫A∩B

Z dP =

∫A∩B

X dP,

2. For any gi : Xi → R Borel measurable and bounded,

E (g1(Y1)g2(Y2)Z) = E (g1(Y1)g2(Y2)X) ,

Exercise 8.3.3 Let h : E ×E → R be an integrable function on a metric space E. Let X,Y be randomvariables with state space E such that h(X,Y ) ∈ L1. Let H(y) = E (h(X, y)). Then

E(h(X,Y )|σ(Y )) = H(Y ).

Exercise 8.3.4 Let X : Ω → X1 be F ′ ⊂ F measurable and let Y : Ω → X2 be independent ofF ′. Let Φ : X1 × X2 → R be measurable and such that E|Φ(X,Y )| < ∞. For x ∈ X1 defineg(x) = E[Φ(x, Y )] =

∫Ω Φ(x, Y (ω)

)P(dω). Then

E(

Φ(X,Y )∣∣∣ F ′) = g(X).

Proof Let gi : Xi → R be bounded measurable. Let Φ(x, y) = g1(x)g2(y).

E(g1(X)E(g2(Y )|F ′)

)= g1(X)E

(g2(Y )‖F ′

)= g1(X)Eg2(Y ) = Φ(X).

8.4. CONDITIONING PROBABILITY 102

So g1(X)g2(Y ) ∈ H. where H def=

Φ : E(Φ(X,Y )

∣∣ F ′) = Φ(X)

. The earlier statementshows that if A× B ∈ C, where C = A× B : A ∈ B(X1), B ∈ B(X2). That C is a π system followsfrom A × B ∩ C × D = (A ∩ C) × (B ∩ D). Finally, the π-system C generates B(X1) × B(X2) bythe π − λ theorem, and then we conclude the statement holds for any bounded measurable functions Φ,by approximating Φ with simple functions, and thusH contains all bounded Borel measurable functionsfrom X1 ×X2 → R.

Exercise 8.3.5 Prove that if (X,Y ) has joint density p(x, y), then for any Borel measurable set A andx ∈ R,

P (X ∈ A|Y = y) =

∫A p(x, y)dx∫∞−∞ p(x, y)dx

.

8.4 Conditioning probability

We define similarly the concept of conditional probability.

Definition 8.4.1 Let B ∈ F , define

P(B|F ′) := E(1B|F ′).

This is called the conditional probability of B given F ′.

Remark 8.4.1 ** For every B ∈ F , this identity P(B | F ′) = E(1B | F ′) holds outside of a null set. IfBi is a sequence of disjoint sets, there is a common set of null sets for every Bi, and so

P(∞⋃i=1

Bi | F ′) = E(∑

i

1Bi | F ′)

=∑i

P(B|F ′)

holds a.e. To make P(· | F ′) a probability measure, the σ-additive property must hold for all sequencesof disjoint unions, that is an unaccountable number of sequences in general, and one has to be able tochoose a common set of null sets for all of them. If this can be done we have a family of probabilitymeasures P(·|F ′)(ω), called regular conditional probabilities, and then

E(X|F ′)(ω) =

∫ΩX(ω′)P(dω′|F ′)(ω), a.e..

See Thm 7.1 in ‘Probability measures on metric spaces’ by K.R. Parthasarthy: regular conditional prob-abilities exist for complete separable metric spaces and their Borel σ-algebras.

8.4. CONDITIONING PROBABILITY 103

8.4.1 Finite σ-algebras

For finite σ-algebras, we can construct a family of probability measures Pω, so the conditional expecta-tions is integration w.r.t. this family, by which we mean E(X|G)(ω) =

∫XdP omega.

Let A,B ∈ F . If P (B) > 0. define

P (A|B) =P (A ∩B)

P (B).

Let X : Ω→ R, define,

E(X|B) =E(X1B)

P (B).

If B1, . . . , Bn is a partition of Ω with P (Bi) > 0 for all i, we define G = σBi, i = 1, . . . , n. Thenfor ω ∈ Bi,

E(X|G)(ω) = E(X|Bi)(ω).

Fix ω. For A ∈ F , define

P (A|G)(ω) =

n∑i=1

P (A|Bi)1Bi(ω).

This is a probability measure, denoted by P (dω′|G)(ω). And

E(X|G)(ω) =

∫X(ω′)P (dω′|G)(ω).

For a large class of general σ-algebra, we can indeed construct a family of probability measures,called conditional probabilities.

Chapter 9

Mastery Material: Ergodic Theory

The mastery material are basics of ergodic theory. These are: definitions of measure preserving trans-formations, invariant sets, ergodic measures, Poincare’s Recurrence Theorem and Birkhoff’s ergodicTheorem (and their proofs). These can be found in many textbooks.

• An Introduction to Infinite Ergodic Theory, by Jon Aaronson, published by the Americam Mathe-matical Society. In library, 517.518.1AAR

• Ergodic Theory and Dynamical systems by Yves Coudene (Chapter 2)

https://link.springer.com/content/pdf/10.1007

• Introduction to ergodic theory, by Peter Walters

• Ergodic theory and information by Billingsley.

For your convenience, some basic concepts are given below, this is not the full scope of the readingmaterial. Please refer to the references for a comprehensive study.

9.1 Basic Concepts

Let (X ,F , µ) be a measure space with the measure σ-finite.

Definition 9.1.1 A map T : X → X is a measure preserving transformation (map) if the pushed forwardmeasure by T is the same as µ, i.e.

T∗µ = T.

105

9.1. BASIC CONCEPTS 106

In other words µ(T−1(A)) = µ(A) for every measurable set A. We also say T preserves µ, also µ isinvariant under T , and also µ is an invariant measure for T .

A Dirac measure δa is invariant under T if T leaves the point a fixed.

Definition 9.1.2 1. The invariant σ-algebra of a transformation T is

I = A ∈ F : T−1(A) = A.

2. A function f : X → R is invariant under T if f(Tx) = f(x) for every x ∈ X .

It is easy to see that I is a σ-algebra.

Theorem 9.1.1 (Poincare Recurrence Theorem) Let (X ,F , µ) be a finite measure space and let T :

X → X be a measure preserving transformation. Let A ∈ F . Then for almost every point in A, thereexists some n ≥ 1 (and hence an infinitely many n) such that Tn(x) ∈ A.

If µ is a probability measure the celebrated Birkhoff’s ergodic theorem states that the time average equalsto spatial average.

Theorem 9.1.2 (Birkhoff’s ergodic theorem) Let (X ,F , µ) be a probability space and T preservesmeasure. Then for any f ∈ L1,

limn→∞

1

n

n∑j=1

f(T jx) = E(f |I)(x), a.e.

A transformation is said to be non-singular if T∗µ µ. Let T : X → X preserves a probabilitymeasure µ. We say T is ergodic (we also say µ is ergodic) if whenever A ∈ I then µ(A) = 0 orµ(Ac) = 0.

9.1.1 Example: circle rotation

The Lebesgue measure on the unit circle S1 is defined to be the pushed forward measure on the Lebesguemeasure on [0, 1) to S1 by the map α 7→ e2παi. A rotation is x 7→ xe2πθi where θ ∈ R.

We are going to actually identify the unit circle S1 with the quotient space R/Z, so x ∼ y if x = n+y,where n is an integer. So S1 is [0, 1] with 0 and 1 glued together. From now on this is what we use.

Definition 9.1.3 Let θ ∈ S1 define Tθ : S1 → S1 by Tθ(α) = θ + α mod 1. This is a called a rotationof the circle.

The Lebesgue measure is invariant under any circle rotation.

Measure and Integration · 2020. 5. 18. · Measure and Integration is a foundational course,...

Documents

Transcript of Measure and Integration · 2020. 5. 18. · Measure and Integration is a foundational course,...