How to measure what counts - integration monitoring and indicators
Measure and Integration · 2020. 5. 18. · Measure and Integration is a foundational course,...
Transcript of Measure and Integration · 2020. 5. 18. · Measure and Integration is a foundational course,...
Measure and Integration
Xue-Mei LiImperial College London
May 18, 2020
Contents
1 Introduction 7
1.1 Module Information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2 Course Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.3 Prologue . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 The mass transport problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.5 Coin tossing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.6 Inadequacy of Riemann Integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.7 Observables of Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2 Measurable sets, measurable functions and measures 11
2.1 Set Algebra . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Measurable Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2.1 Countability of σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2.2 Fundamental Examples and Exercises . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Borel σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.3.1 Background material* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
3 Measures 19
3.1 Basics of Measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Construction of measures from pre-measures . . . . . . . . . . . . . . . . . . . . . . . 21
3.2.1 Outer Measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3
CONTENTS 4
3.2.2 Extension of pre-measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
3.2.3 Proof for Caratheodory Theorems . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2.4 Construction of Lebesgue-Stieljes / Lebesgue measure . . . . . . . . . . . . . . 28
3.2.5 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33
3.3 Lebesgue Measure on Rn . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.4 Uniqueness of Extensions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.4.1 π − λ theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
3.5 Measure determining sets . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Measurable functions 39
4.1 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.1.1 Simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.1.2 Product σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.2 Properties of measurable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
4.2.1 Measurable functions on the real line . . . . . . . . . . . . . . . . . . . . . . . 44
4.3 Factorisation Lemma . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4 Appendix* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.1 Littlewood’s three principles . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
4.4.2 Product σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
4.4.3 The Borel σ-algebra on the Wiener Space . . . . . . . . . . . . . . . . . . . . . 48
5 Integration 51
5.1 Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2 Outline of the construction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.3 Integral of simple functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52
5.4 Integration of non-negative functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.5 Integrable functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
5.6 Further limit theorems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
CONTENTS 5
5.7 Examples and exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64
5.8 The Monotone Class Theorem for Functions . . . . . . . . . . . . . . . . . . . . . . . . 66
5.8.1 Lebesgue Integration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.9 The pushed forward measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.10 Appendix: Riemann integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68
5.11 Appendix: Riemann-Stieltjes integrals . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.12 Appendix: Functions with bounded variation . . . . . . . . . . . . . . . . . . . . . . . 71
6 Product measures and Fubini’ Theorem 73
6.1 Product measures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74
6.1.1 Sections of subsets of product spaces . . . . . . . . . . . . . . . . . . . . . . . 78
6.2 Fubini’s Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79
7 Radon-Nikodym Theorem 83
7.1 Singular and absolutely continuity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
7.2 Signed measure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
7.3 Radon-Nikodym Theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
8 Conditional Expectations 91
8.1 Conditional expectation and probabilities . . . . . . . . . . . . . . . . . . . . . . . . . 91
8.1.1 Conditioning on a random variable . . . . . . . . . . . . . . . . . . . . . . . . . 93
8.2 Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
8.3 Lp spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.3.1 Mode of convergence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
8.3.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
8.3.3 Conditional expectation as orthogonal projection . . . . . . . . . . . . . . . . . 100
8.3.4 Useful Inequalities* . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
8.4 Conditioning probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
8.4.1 Finite σ-algebras . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
CONTENTS 6
9 Mastery Material: Ergodic Theory 105
9.1 Basic Concepts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
9.1.1 Example: circle rotation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
Chapter 1
Introduction
1.1 Module Information
Prerequisite Real Analysis (M2PM1), Metric Spaces and Topology (M2PM5), and basic Probability.Measure and Integration is a foundational course, underlies analysis modules. It brings together manyconcepts previously taught separately, for example integration and taking expectation, reconciling dis-crete random variables with continuous random variables.
Contents. Measurable sets, sigma-algebras, Measurable functions, Measures. Integration with re-spect to measures, Lp spaces, Modes of convergences, basic important convergence theorems (Domi-nated convergence theorem, Fatou’s lemma etc), useful inequalities, properties of integrals.
This module is related to: Probability theory, Applied probability, Markov processes, Fourier analysisand theory of distributions, Functional analysis, ODE’s, Ergodic theory, Probability theory, StochasticFiltering, Applied Stochastic Processes.
1.2 Course Structure
Assessment
• CW 1. (5%) 05/11/2019, Deadline: Tuesday 4pm (submit to U/G office)
• CW 2. (5%) 03/12/2019, Deadline: Tuesday 4pm (submit to U/G office)
• May Exam (90%): 2 hour exam (year 3), 2.5-hour exam (year 4/Msc).
Problem sheets.. Weekly (not assessed, they are the best material for preparing for the exam.)
7
1.3. PROLOGUE 8
Office Hour. Wednesdays: 11:00-12:00
Problem class: Wednesday 30 October 10am-12am, room 658.
References.
• Real Analysis by H. L. Royden, Third edition
• Real Analysis, Modern Techniques and their applications, by G. B. Folland
• Measure Theory by P. R. Halmos
• Measure and Integration, by R. Schilling, second edition.
How to use this notes? A ‘remark’ is material which is helpful with our understanding, which may ormay not be covered in the lectures. An ‘exercise’ may appear in the weekly problem sheet, is in the notefro one or more of the following reasons: (1) the conclusion is interesting; (2) the proof is representative,(3) proving it improve our understanding.
If a section /paragraph is given a ∗ sign, it is not covered in class (or not ‘officially’ covered in class).They are in the notes for the background or considered to be useful to further our understanding.
1.3 Prologue
Measure theory is not only a fundamental tool, they are fundamental concepts.
1.4 The mass transport problem
Monge was concerned with the cost of moving material from a mine to a construction site, Monge’sproblem (1781) is: What is the optimal way to transport a pile of sand?
Let us think of µ as the original sand mass distribution, a measure µ. We also think of the target massas a measure ν. A map T is a transport map if it pushes µ to ν. Find a map x 7→ T (x) such that the cost
1.5. COIN TOSSING 9
(distance) ∫R|x− T (x)|µ(dx)
is minimised among all transport maps. We may also use other cost functions, e.g. kinetic energy or takeinto consideration of the geometry of the space.
Example 1.1 Shift of a block of mass of height one of uniform density on [0, n] to [1, n+ 1]. There aremore than one transport maps: T (x) = x+ 1 fro any x ∈ [0, n];
T (x) =
n+ 1− x, x ∈ [0, 1)
x, x ∈ [1, n].
Example 1.2 Can one find a map shifting 1 unit of mass at location 0 to location 1 and −1 of equalmass?
No such map exists. The measures are µ = δ0, ν = 12δ1 + 1
2δ−1.
Kontorovich’s formulation (1940’s). Find a measure on Ω× Ω that minimises∫|x− y|µ(dx, dy)
and has marginals µ, ν respectively, i.e. µ(Ω×B) = ν(B), µ(A×Ω) = µ(A). This is called a transportplan.
Find a transport plan for Example 2. (Hint: this is a measure on (0, 0), (1, 0), (0, 1), (1, 1).
Optimize the transport cost involved in distribution of pastries from bakeries to coffee shops. E.g.Bakery 1 delivers 50 pastries to Cafe 1, 30 each to Cafe shops 2 and 3, Bakery 2 delivers 20 to Cafe 2,30 to cafe 3.
1.5 Coin tossing
We will need to understand what does it mean by taking limits of functions and measures.
Example 1.3 Toss a fair coin 1000 times, note down 1 if head and −1 if tail.
Law of large numbers. the average value is expected to be 0 (Converges almost surely).
Central Limit Theorems. The probability distributions of 1√n
∑ni=1Xi is approximately given by
the Gaussian curve. The convergence of the random variable is in weak sense.
1.6. INADEQUACY OF RIEMANN INTEGRALS 10
1.6 Inadequacy of Riemann Integrals
Let f : [a, b]→ R be a bounded function.
Proposition 1.6.1 A bounded function f : [a, b] → R is Riemann integrable if and only if the set ofdiscontinuity of f has Lebesque measure zero.
Example 1.4 A not Riemann integrable function is f1 : [0, 1] → R with f1(x) = 0 for x ∈ Q andf1(x) = 1 ∈ [0, 1] \ Q. Thus U(f) = 1 and L(f) = 0. Same with f2 : [0, 1] → R with f2(x) = 1 forx ∈ Q and f2(x) = 0 ∈ [0, 1] \Q.
There are more irrational points than rational points, perhaps∫ 1
0 f1(x)dx can be defined to be 1? Is thereis a quantitative statement about there are more irrational numbers than rational numbers?
There exists a sequence of bounded Riemann integrable functions fn such that fn(x) → f(x) forevery x ∈ [0, 1]. But f(x) is not Riemann integrable.
Example 1.5 Let Q = q1, q2, . . . be a enumeration of rational numbers. Define fn(x) = 1 if x ∈q1, q2, . . . , qn and set fn(x) = 0 otherwsie. Then
∫fn(x)dx = 1, and f(x) is the Dirichlet function
which is not Riemann integrable.
1.7 Observables of Random Variables
Let (Ω,F ,P) be a probability space, X be a real valued random variable. If P(X = i) = pi with∑i pi = 1,
EX =∑i
piP(X = i)
If X is a random variable with standard Gaussian normal distribution. Then either or
EX =
∫y
1√2πf(x)dx.
Both expectations are integrals∫ydµ where µ is the probability measure X∗dP, the pushed forward
measure given byX . Both are given also by∫
ΩXdP. These concepts use integration theory with respectto a measure.
Chapter 2
Measurable sets, measurable functionsand measures
What kind of subsets of the plane or the real line can be measured? The basic sets are: intervals on R;triangles, squares, polygons, squares on R2. Measuring the length of a circle is a mathematical problem,they are accomplished by ‘exhaustions’ (by the ancient Greeks 5th BC, the Achimedes of Syracus : 2-3BC), or taking limit in the modern language, c.f. the definition of definite integrals.
Given a set X , the collection of its subsets is its power set. We denote it by 2X . If a subset A canbe measured, this ought to lead to some understanding of its complement Ac = X \ A. What kind ofsubsets can be measured? What axioms should they satisfy? If A,B can be measured, it is natural to bepostulate that A ∪ B can be measured. This means we want our collections of measurable sets to be aBoolean algebra.
To build up more useful concepts, we begin with sets we can measure, gradually build up more mea-surable sets by taking complements, unions, and approximations. The classical construction of Lebesguemeasure has this flavour (for example ‘outer measures’ originated from over measurements.)
Typical spaces are: (1) finite set, X = 1, . . . , n; (2) X = Rn or X = Zn; (3) Some manifold, forexample X = Sn, the n-dimensional sphere, (4) X = T , the torus; (5) the Hilbert space X = L2([0, 1]),or (6) X = `2.
Suppose we drop a pin on a surface of area 1, how frequently does the pint fall within an an areamarked by A? With what do we measure it? –can we define a ‘uniform measure’? Let us begin with R.
Example 2.1 We want to assign a number to every subset of R so that the measure of a half open intervalis its length, and such that it is translation invariant, i.e. µ(A + t) = µ(A) for any t ∈ R and any Borelset A where
A+ t = a+ t : a ∈ A
11
2.1. SET ALGEBRA 12
is the translate of B by the number t.
Does there exist a translation invariant countably additive measure on R which assign an intervalits length? The answer is not. Let us define an equivalent relation: x ∼ y if and only if x − y ∈ Q.Using Axiom of choice we can choose the representative of the equivalent relations to be in (0, 1]. Thisquotient set V = R/ ∼ as a subset of (0, 1] is called the Vitali set. If q1 6= q2 are two rational numbers,the translates V + q1 and V + q2 are disjoint. Let A = Q ∩ (−1, 1], then
(0, 1] ⊂ ∩q∈A(A+ q) ⊂ [−1, 2].
The second inequality is trivial. Take y ∈ (0, 1], there exists q ∈ (−1, 1]∩Q = A such that y = q+ [y],where [y] is the representative of y. Thus y ∈ supq∈A(V − q) and the left hand side inclusion is proved.What can be the measure of V . Suppose it is λ. By translation invariance, (A + q) has measure λ. Thecountable number of copies of it has measure λ · ∞. But then 1 ≤ λ · ∞ ≤ 3, which is impossible forany λ ∈ [0,∞].
2.1 Set Algebra
Denote by Ac the complement of a subset A.
de Morgan’s identities:(∩A∈ΛAα)c = (∪A ∈ Λα)c
(∪A∈ΛAα)c = (∩A ∈ Λα)c
This holds for any number of sets, not necessarily countable number of.
Distribution Law
A ∩ (B ∪ C) = (A ∩B) ∪ (A ∩ C), A ∪ (B ∩ C) = (A ∪B) ∩ (A ∪ C).
Distribution law holds for also for infinite number of sets. For example
(⋃i
Ai) ∩ (⋃i
Bj) =⋃i,j
(Ai ∩Bj).
Limits If An is an increasing sequence of subsets, we set
limn→∞
An = ∪∞n=1An.
Similarly if An is a decreasing sequence of subsets, we define
limn→∞
An = ∩∞n=1An.
Let f : X → Y be a map, define
f−1(A) = x ∈ X : f(x) ∈ A.
Taking pre-image respects set operations:
2.2. MEASURABLE SPACE 13
Lemma 2.1.1f−1
(⋃Bi
)=⋃f−1(Bi),
f−1(⋂
Bi
)=⋂f−1(Bi),
f−1(A \B) = f−1(A) \ f−1(B).
2.2 Measurable Space
Let X be a set. We study collections of subsets of X . Let φ denote the empty set.
Definition 2.2.1 A collection of subsets of X , denoted by A, is a (Boolean) algebra if the followingproperty holds.
• (Non-empty ) φ ∈ A.
• (closed under complements) If B ∈ A then Bc ∈ A;
• (closed under finite unions) If B ∈ A, C ∈ A, then B ∪ C ∈ A.
The assumption φ ∈ A can be equally replaced by the condition that A is not empty.
We would like to be able to take limits.
Definition 2.2.2 A σ-algebra over X , denoted by F , is a collection of subsets of X such that:
• non-empty: φ ∈ F .
• closed under complements: If B ∈ F , then Bc ∈ F .
• closed under countable unions: If we have a sequence Bn with Bn ∈ F , then⋃∞n=1Bn ∈ F .
Elements of F are called measurable sets and (X ,F) is said to be a measurable space.
By de Morgan’s identities, if Bn ∈ F then ∩∞n=1Bn ∈ F . If A,B ∈ F then so does A \B.
The smallest σ-algebra is φ,X. If A ⊂ X , F = φ,X , A,Ac is a σ-algebra.
Proposition 2.2.1 Intersection of any number of σ-algebras is a σ-algebra. If C is any collection ofsubsets of a set X , then there always exists a smallest σ-algebra containing C.
This is straightforward to prove and left as an exercise. The intersection of σ-algebras is often denotedby ∧. For example, F1 ∧ F2 denote the intersection of F1 and F2.
2.2. MEASURABLE SPACE 14
Definition 2.2.3 If C is a collection of subsets, the smallest σ-algebra generated by it is called the σ-algebra generated by C. This is denoted by σC of σ(C).
Remark 2.2.2 Suppose F = σ(C) this does not mean every measurable set is of the form ∪∞i=1Ci withCi ∈ C. Taking for example C to be the collection (−∞, a], where a ∈ R, then σ(C) = B(R),however if Ci = (−∞, ai] are elements of C, then ∪∞i=1Ci is an interval of the form (−∞, a) or (−∞, a]
with a = supai, i ≥ 0, and the Borel set −1, 1, for instance, is not of this form.
2.2.1 Countability of σ-algebras
If X is a set, with a finite partition. by which we mean there exists A1, . . . , An disjoint with X =
∪nk=1Ak. Let F = σA1, . . . , An. Then F is generated. by unions of sets from Ai. There are 2n suchchoices: we can include or not include a particular set and every element of F will coming from such aunion, and so the σ-algebra consists of a finite number of elements.
We will see below that the σ-algebra with non-finite number of elements is uncountable. So a σ-algebra is either generated by a partition or has uncountable number of elements.
Proposition 2.2.3 A σ-algebra F is either finite or uncountable.
Proof will be released later in November.
2.2.2 Fundamental Examples and Exercises
Proposition 2.2.4 If f : Ω → X is a map, and (X ,B) a measurable space. Then f pulls back theσ-algebra G to a σ-algebra on Ω. This is the collection of pre-images
σ(f) := f−1(A) : A ∈ B
is a σ-algebra. This is called the σ-algebra generated by f , and also called the pre-image σ-algebra.
Proof. Exercise.
Example 2.2 If f : (X,B)→ (R,B(R) is of the form
f(x) =
n∑j=1
aj1Aj (x),
where Aj are pairwise disjoint subsets of X . Then σ(f) is generated by the finite collection of setsA1, . . . , An.
2.2. MEASURABLE SPACE 15
Example 2.3 To see the role played by the assumption that Aj are disjoint, we give an example whereAj are not disjoint. Set for example
f(x) = 1[1,2] + 31[1,3] + 41(−2,1].
Then
f(x) =
4, x ∈ (−2, 1) ∪ (1, 2]
3, x ∈ (2, 3]
8, x = 1
0, otherwise .
The σ-algebra generated by f is given by the partition:
1, (−2, 1) ∪ (1, 2], (2, 3], (−∞,−2] ∪ (3,∞).
If f(x) = 1[1,2] + 31[1,3], then σ(f) is generated by the collection of subsets:
[1, 2], (2, 3], (−∞, 1) ∪ (3,∞), φ,R.
Example 2.4 If F is a σ-algebra, and A ⊂ X then A ∩B : B ∈ F is again a σ-algebra.
Exercise 2.2.1 Given two σ-algebrasF1 andF2, we denote byF1∨F2 the smallest σ-algebra containingboth F1 and F2.
Show that F1 ∨ F2 can equivalently be characterised by the expressions:
• F1 ∨ F2 = σA ∪B |A ∈ F1 and B ∈ F2,
• F1 ∨ F2 = σA ∩B |A ∈ F1 and B ∈ F2.
Definition 2.2.4 A collection of subsets E is an elementary family if
• φ ∈ E ;
• If A,B ∈ E , then A ∩B ∈ E ;
• if A ∈ E then Ac is the union of disjoint sets from E .
Exercise 2.2.2 Show that if E is an elementary family, then the collection A of finite disjoint unions ofmembers of E is an algebra.
Proof If A,B ∈ E , then Bc = ∪nl=1Bl where Bl ∈ E . So A \B = A∩Bc = ∪nl=1(A∩Bl) ∈ A. Also,A ∪B = (A \B) ∪B ∈ A. By induction if Ai ∈ E then ∪ni=1Ai ∈ A. This means that the finite unionof sets from the elementary family is in A and A is closed under taking unions.
2.3. BOREL σ-ALGEBRAS 16
We show A is closed under complement. Let A = ∪nj=1Aj ∈ A where Aj ∈ E are disjoint. ThenAcj =
∑njm=1A
mj where Amj ∈ E . Since Ac = ∩nj=1(Acj), by associativity, so Ac is a finite union of
intersections of sets from E , therefore in A.
We call the following sets of the following form half open intervals: (a, b], φ or (a,∞), (−∞, a) andφ, where a, b are real numbers.
Example 2.5 The collection of unions of a finite number of disjoint half open intervals is an algebra.
This follows from taking E to be the collection of half open intervals and use the earlier exercise.
2.3 Borel σ-algebras
Definition 2.3.1 IfX is a complete separable metric space , the smallest σ-algebra generated by the opensets is called Borel σ-algebra, it is denoted by B(X ).
If X is a separable metric space, the metric topology satisfies the second axiom of countability, i.e. thereexists a countable base. This countable base generates B(X).
Example 2.6 We know the following about B(R), the Borel σ-algebra over R.
1. If a ∈ R, a = ∩∞n=1(a− 1n , a+ 1
n) ∈ B(R).
2. Any countable subsets of R is a Borel set.
3. Any intervals of any form is a Borel set.
4. B(R) is generated by the set of open intervals. This follows from the proposition below.
Proposition 2.3.1 Every open set of R is a countable union of disjoint open sets, this sets of disjoint setsis unique up to ordering.
Proof Denote by A the collection of open intervals. Let O be an open set. Let x ∈ O. Then x iscontained in an open interval, the interval is a subset of O. The set of the left end points of such intervalhas an infimum. Let us call it ax. Let bx denote the supremum of the right end of the intervals. ThenIx = (ax, bx) is the maximal interval such that x ∈ Ix ⊂ O Given x, y ∈ O either they are disjoint orthey overlap in which case they are the same. Pick up a rational number qx from Ix. Then the rationalnumbers are all distinct. So Ix : x ∈ O contains only a countable number of distinct open intervals.
2.3. BOREL σ-ALGEBRAS 17
2.3.1 Background material*
A topological space (X , T ) consists of a set X equipped with a topology T , i.e. a subset T ⊂ 2X suchthat:
• φ ∈ T and X ∈ T .
• If A0, A1, . . . , AN ⊂ T , then⋂Nn=0An ∈ T .
• If A ⊂ T , then⋃A∈AA ∈ T .
In other words, T is closed under arbitrary unions and finite intersections. Elements of T are called opensets. A function between topological spaces is continuous if the pre-images of open sets are open sets.
Given a topological space (X , T ), we define B(X ) to be the smallest σ-algebra on X containing T .This particular σ-algebra is called the Borel σ-algebra of X . In other words, the Borel σ-algebra is thesmallest σ-algebra such that all open sets are measurable. We denote by Bb(X ) the (Banach) space of allBorel-measurable and bounded functions from X to R equipped with the norm
‖ϕ‖∞ = supx∈X|ϕ(x)| . (2.1)
We denote by Cb(X ) the (Banach) space of all continuous and bounded functions from X to R equippedwith the same norm as in (2.1).
The open sets of a metric space gives a topology on the metric space. Borel σ-algebras on a completeseparable metric space is specially nice.
When discussing Borel σ-algebras, we will always assume that X is a complete separable metricspace.
A metric space is compact if any covering of it by open sets has a sub-covering of finite open sets. Adiscrete metric space (whose subsets are all open sets ) is compact if and only if it is finite. (e.g. Z andN with the usual distance d(x, y) = |x − y| is not compact.) A subset of a metric space is compact if itis compact as a metric space with the induced metric. It is relatively compact if its closure is compact.A metric space is sequentially compact if every sequence of its elements has a convergent sub-sequence(with limit in the metric space of course). It is totally bounded if for any ε > 0 it has a finite covering byopen balls of side ε. A metric space is complete if every Cauchy sequence converges.
It is a theorem that a metric space is compact if and only if it is complete and totally bounded. Ametric space is compact if and only if it is sequentially compact.
A subset of a metric space is relatively compact if it is sequentially compact (the limit may not needto belong to the subset).
Chapter 3
Measures
3.1 Basics of Measures
A measure assigns a number to every measurable set.
Definition 3.1.1 A measure µ on the measurable space (X ,F) is a map µ : F → [0,∞] with the fol-lowing properties;
• µ(φ) = 0.
• (σ-additive Propertices) If Ann>0 is a countable collection of elements of F that are all disjoint,then one has
µ(⋃n>0
An) =∑n>0
µ(An).
The triple (X ,F , µ) is called a measure space.
Definition 3.1.2 • We say µ is a finite measure if µ(X ) <∞.
• µ is σ-finite if there exists A1 ⊂ A2 ⊂ . . . such that µ(Ai) <∞ and X = ∪∞i=1Aj .
This is equivalent to the statement that there exists an increasing sequence of measurable sets Xnwith µ(Xn) <∞ such that X = ∪∞n=1Xn. The sequence is called an exhausting sequence.
Convention. From now on we study only σ-finite measures.
Example 3.1 If X = 1, 2, . . . , n, let F = 2X . Then (X ,F) is a σ-algebra. If we set µ(i) = ai, thisdefines a measure.
Proposition 3.1.1 Let (Ω,F) be a measurable space. Then
19
3.1. BASICS OF MEASURES 20
1. If A,B ∈ F and A ⊂ B then µ(B \A) = µ(B)− µ(A).
2. Monotonicity. If A ⊂ B then µ(A) ≤ µ(B).
3. (Finite) additivity. If A1, A2 ∈ F and A1 ∩A2 = φ then µ(A1 ∪A2) = µ(A1) + µ(A2).
4. Continuity from below. If An ∈ F , An increases, then µ(∪∞n=1An) = limn→∞ µ(An).
5. Continuity from above. If An ∈ F , An decreases and µ(A1) <∞, then
µ(∩∞n=1An) = limn→∞
µ(An).
Proof Claim 1 follows from that B = A ∪ (B \ A) is a disjoint union whenever A ⊂ B. Henceµ(B) = µ(A) + µ(B \A).
Claim 2 follows by claim 1, claim 3 follows from taking Ai = φ for i ≥ 3.
We prove claim 4. If A1 ⊂ A2 ⊂ A3 ⊂ ..., set A0 = φ, A = ∪∞i=1Ai Also set
B1 = A1, B2 = A2 \A1, B3 = A3 \A2, . . . ,
and so on. Then Bi are pairwise disjoint and ∪∞i=1Bi = ∪∞i=1Ai = A. Then,
µ(A) =∞∑i=1
µ(Bi) = limn→∞
n∑i=1
µ(Bi) = limn→∞
n∑i=1
[µ(Ai)− µ(Ai−1)] = limn→∞
µ(An).
Prove claim 5. We assume that µ(A1) <∞ and proceed as for claim 3.
Example 3.2 1. Let (X ,G) be a measure space and x ∈ X . The δ measure at x is defined by:
δx(A) =
1, if x ∈ A,0, if x 6∈ A.
2. A discrete (probability) measure P on a countable space Ω = ω1, ω2, . . . is a sequence ofnumbers pi ∈ [0, 1] (summing up to 1) and
P (A) =
∞∑i=1
piδωi(A).
3. Let Y be a random variable on a two state space X = 0, 1 with distribution P(Y = 0) = p andP(Y = 1) = 1− p. Then the probability distribution of Y is a measure on X given by:
PY := p δ0 + (1− p) δ1.
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 21
Exercise 3.1.1 If µ : F → [0,∞] satisfies µ(φ) = 0, finite additive property, and continuity from below,then it is a measure.
We consider which subset of F determines a measure uniquely.
Example 3.3 The interval [0, 1] equipped with its Borel σ-algebra and the Lebesgue measure is a prob-ability space.
Example 3.4 The half-line R+ equipped with the measure
P(A) =
∫Ae−x dx
is a probability space. In such a situation, where the measure has a density with respect to Lebesguemeasure, we will also use the short-hand notation P(dx) = e−x dx.
Example 3.5 Given a ∈ Ω, the measure δa defined by
δa(A) =
1 if a ∈ A,0 otherwise.
is a probability measure.
3.2 Construction of measures from pre-measures
Definition 3.2.1 Given a measure space (M,F , µ), F is said to be complete with respect to a µ if itcontains every subset of its null sets. The completion of F with respect to µ, denoted by F , is thesmallest σ-algebra augmented with all subsets of null sets of F . We can then extend µ to F by setµ(F ) = 0 whenever F is a subset of a null set from F . This is called the completion of µ, the extensionis said to be a complete measure.
A measurable set on a measure space is said to be a ‘null set’ if it has measure zero.
LetAi be two collections of subsets, %i : Ai → [0,∞]. We say %2 is an extension of %1 ifA2 containsA1 and %1(A) = %2(A) whenever A ∈ A1.
3.2.1 Outer Measure
Definition 3.2.2 A function µ∗ : 2X → [0,∞] is said to be an outer measure over X if we have:
(1) µ∗(φ) = 0,
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 22
(2 ) Monotonicity. If A ⊂ B then µ∗(A) ≤ µ∗(B).
(3) countable sub-additivity.
µ∗(∪∞n=1An) ≤∞∑n=1
µ∗(An).
Exercise 3.2.1 • Let µ∗(φ) = 0 and µ∗(A) = ∞ if A ⊂ X is not empty. Show that µ∗ is anouter-measure.
• If an outer measure µ∗ satisfies that µ∗(A ∪B) = µ∗(A) + µ∗(B) for any A,B ∈ F where F is aσ-algebra, what can you say about µ∗?
Definition 3.2.3 Let A be a set of subsets containing φ,X . A map A → [0,∞) is a pre-measure if
• %(φ) = 0, and
• for any An ∈ A with ∪∞n=1An ∈ A,
%(∪∞n=1An) =∞∑j=1
%(An).
Outer measures can be obtained from any suitable function on a given collection of subsets, by cov-ering a set with sets from this collection. A collection of set Aα, α ∈ Λ is said to be a cover of a set Eif E ⊂ ∪α∈ΛAα. We are interested in covers consisting of a countable number of elements.
Proposition 3.2.1 Let A be a non-empty collection of subsets with φ ∈ A,X ∈ A. Let % : A → [0,∞]
be a function such that %(φ) = 0. We define a function on 2X as below. For any E ⊂ X ,
µ∗(E) = inf
∞∑j=1
%(Aj) : E ⊂ ∪∞j=1Aj , Aj ∈ A
. (3.1)
Then µ∗ is an outer measure.
Proof (1) Since every set is covered by X, the definition makes sense, and µ∗(φ) = 0. (2) If A ⊂ B,then every cover of B is a cover A and so µ∗(B) ≥ µ∗(A).
(3) To show countable sub-additivity, take any sequence of subsets Aj . For any ε > 0 there existBkj ∈ A with Aj ⊂ ∪kBk
j such that
µ∗(Aj) ≥∞∑k=1
%(Bkj )− ε
2j.
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 23
Summing over j,∞∑j=1
µ∗(Aj) ≥∞∑j=1
∞∑k=1
%(Bkj )− ε.
Since Bkj is a cover of ∪∞j=1Aj ,
µ∗(∪∞j=1Aj) ≤∞∑j=1
∞∑k=1
%(Bkj ).
Hence, ∑j
µ∗(Aj) ≥ µ∗(∪∞j=1Aj)− ε
for any ε > 0. Take ε→ 0 to see µ∗(∪∞j=1Aj) ≤∑
j µ∗(Aj), thus completing the proof.
How do we obtain additive from sub-additive? Let us single out sets that a strong additive property.Take a set A, we have two collections of subsets of A and the collection of subsets of Ac. If we take onefrom each collection, we want to prove it is additive.
Definition 3.2.4 A subset A of X is said to be µ∗-measurable (in the sense of Caratheodory) if for anyset B ⊂ X , we have
µ∗(B) = µ∗(B ∩A) + µ∗(B ∩Ac). (3.2)
We have already sub-additivity, the non-trivial part of (3.2) is
µ∗(B) ≥ µ∗(B ∩A) + µ∗(B ∩Ac). (3.3)
Theorem 3.2.2 [Caratheodory Theorem] If µ∗ is an outer measure on X and G the collection of µ∗-measurable sets, then
(1) G is a σ-algebra.
(2) The restriction of µ∗ to G is a measure.
(3) G is complete w.r.t. µ∗.
Proof The proof is deferred to §3.2.3.
3.2.2 Extension of pre-measures
Proposition 3.2.3 If A is an algebra, and % is a pre-measure on A, we define µ∗ by (3.1). Then thefollowing statements hold.
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 24
1. Every set in A is µ∗-measurable.
2. µ∗ and % agree on A.
In particular, µ∗ is a measure on σ(A).
Proof The proof is deferred to §3.2.3.
Proposition 3.2.4 * If % is a pre-measure on an algebra A, we define µ∗ by (3.1) and % is σ-finite, thenµ∗ is its unique extension to G and therefore to σ(A).
Proof The proof is deferred to §3.2.3.
3.2.3 Proof for Caratheodory Theorems
We return to prove some of the theorems and propositions given earlier.,
Theorem 3.2.2 (Caratheodory Theorem) If µ∗ is an outer measure on X , then the collection G ofµ∗-measurable sets is a σ-algebra, the restriction of µ∗ to G is a measure, and G is complete w.r.t. µ∗.
Proof Throughout the proof let us write µ = µ∗ for simplicity. (1) φ ∈ G and µ∗(φ) = 0. (2) Since theroles played by A and Ac are symmetric, A ∈ G implies that Ac ∈ G.
(3) Suppose A,F ∈ G, we show A ∪ F ∈ G.
Take any B ∈ X . We apply (3.2) first with A ∈ G and then with F ∈ G:
µ(B) = µ(B ∩A) + µ(B ∩Ac)= µ(B ∩A ∩ F ) + µ(B ∩A ∩ F c) + µ(B ∩Ac ∩ F ) + µ(B ∩Ac ∩ F c).
We useA∪F = (A∩F )∪(A∩F c)∪(Ac∩F ), as well as (A∪F )c = Ac∩F c, and apply sub-additivity
µ(B) = µ(B ∩A ∩ F ) + µ(B ∩A ∩ F )c + µ(B ∩Ac ∩ F ) + µ(B ∩Ac ∩ F c)≥ µ(B ∩ (A ∪ F )) + µ(B ∩ (A ∪ F )c).
This shows that A ∪ F ∈ G. Also, if A,F ∈ G are disjoint,
µ(A ∪ F ) = µ((A ∪ F ) ∩ F ) + µ((A ∪ F ) ∩ F c) = µ(F ) + µ(A).
(4) We show countable additivity. Let An ∈ G, we show A = ∪∞n=1An ∈ G. Set Fn = ∪nj=1Aj .Since G is stable under finite unions and under taking the complement, we may assume that the An are
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 25
disjoint. Then for any B ∈ X , using Aj ∈ G,
µ(B ∩ Fn) = µ(B ∩ Fn ∩An) + µ(B ∩ Fn ∩ (An)c)
= µ(B ∩An) + µ(B ∩ Fn−1)
= µ(B ∩An) + µ(B ∩ Fn−1 ∩An−1) + µ(B ∩ Fn−1 ∩Acn−1)
= µ(B ∩An) + µ(B ∩An−1) + µ(B ∩ Fn−2) = . . .
=
n∑j=1
µ(B ∩Aj).
Thus using Fn ∈ G,
µ(B) = µ(B ∩ Fn) + µ(B ∩ (Fn)c) =
n∑j=1
µ(B ∩Aj) + µ(B ∩ (Fn)c)
≥n∑j=1
µ(B ∩Aj) + µ(B ∩Ac).
Taking n→∞, then using ∪(B ∩Aj) = B ∩A and sub-additivity,
µ(B) ≥∞∑j=1
µ(B ∩Aj) + µ(B ∩Ac)
≥ µ(B ∩A) + µ(B ∩Ac) ≥ µ(B).
Hence A ∈ G and∑∞
j=1 µ(B ∩ Aj) = µ(B ∩ A) for any B ∈ G. Take B = A to conclude theσ-additivity. We proved that G is a σ-algebra and µ is a measure on it.
(5) Finally if µ(A) = 0 and F ⊂ A, then for any B ⊂ X ,
µ(B) ≤ µ(B ∩ F ) + µ(B ∩ (F )c) ≤ 0 + µ(B).
Hence all the inequalities above are equalities,
µ(B) = µ(B ∩ F ) + µ(B ∩ (F )c)
and F ∈ G. Therefore G contains all null sets of µ and µ(F ) = 0.
In the sequel, we shall assume A to be an algebra. In that case, we can use the following, convenientlemma:
Lemma 3.2.5 LetA be an algebra and % : A → [0,∞] an additive function. In words, for all A,B ∈ Adisjoint, we have:
%(A ∪B) = %(A) + %(B).
Then % is
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 26
1. monotone: for any A,B ∈ A with A ⊂ B, %(A) ≤ %(B)
2. sub-additive: for any A1, . . . , An ∈ A, %(⋃ni=1) ≤
∑ni=1 %(Ai)
Proof By additivity for all A,B ∈ A with A ⊂ B, we have
%(B) = %(A) + %(A \B) ≥ %(A),
which yields monotonicity. Now, for A1, . . . , An ∈ A, we set
Bi = Ai \
i−1⋃j=1
Aj
∈ A, i = 1, . . . , n,
so that the Bi are disjoint. Hence, using successively the additivity and monotonicity of %,
%
(n⋃i=1
Ai
)= %
(n⋃i=1
Bi
)=
n∑i=1
% (Bi) ≤n∑i=1
% (Ai) ,
which yields the requested sub-additivity.
Proposition3.2.3. If % is a pre-measure on an algebra A, we define µ∗ by (3.1). Then:
1. µ∗ = % on A.
2. every set in A is µ∗-measurable.
In particular, µ∗ is a measure on σ(A).
Proof (1) Let B ∈ A, then B is a cover for itself and µ∗(B) ≤ %(B). As before write µ = µ∗ forsimplicity. We now prove the reverse inequality. Suppose that B ⊂ ∪nAn where An ∈ A. ThenB = ∪n(B ∩ An). Now B ∩ An ∈ A, and B ∈ A, we apply σ-additivity for pre-measures, andmonotonicity,
%(B) =∞∑n=1
%(B ∩An) ≤∞∑n=1
%(An).
Take infimum over all covers,
%(B) ≤ inf∞∑n=1
%(An) = µ(B).
We have showed that % and µ agree on A.
(2) We now show that any A ∈ A is µ∗-measurable. Take any B ⊂ X , we want to show
µ∗(B) ≥ µ∗(A ∩B) + µ∗(Ac ∩B), (3.4)
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 27
For any ε > 0 choose a cover Bj of B from A with
µ∗(B) ≥∞∑j=1
%(Bj)− ε.
We first use A ∈ A ⊂ G, % = µ∗ on A and σ sub-additivity of µ, and
∞∑j=1
%(Bj) =∞∑j=1
%(A ∩Bj) +∞∑j=1
%(Ac ∩Bj)
≥ µ∗(A ∩ ∪jBj) + µ∗(Ac ∩ ∪jBj)≥ µ∗(A ∩B) + µ∗(Ac ∩B).
In the last step we use ∪jBj ⊃ B and the monotonicity of the outer measure. Therefore
µ∗(B) ≥ µ∗(A ∩B) + µ∗(Ac ∩B)− ε.
Taking ε→ 0, we obtain
µ∗(B) ≥ µ∗(A ∩B) + µ∗(Ac ∩B),
so that B is µ∗-measurable (the reverse inequality follows by sub-additivity).
Proposition 3.2.6 * Let % be a σ-finite pre-measure on an algebra A, and define µ∗ by (3.1). Then µ∗ isthe unique measure extending % to G, and therefore the unique measure on σ(A), extending %.
Proof Step 1. We first show that if ν is another measure on the µ∗-measurable set G extending %, thenν ≤ µ∗. Indeed if E ∈ G, E ⊂ ∪Aj where Aj ∈ A, then by sub-additivity,
ν(E) ≤ ν(∪Aj) ≤∑j
ν(Aj) =∑j
%(Aj).
This holds for any covering of E by sets from A, hence
ν(E) ≤ µ∗(E).
Step 2. Suppose that E ∈ G is such that µ∗(E) < ∞, we show that µ∗(E) = ν(E). We can find acover Aj of E fromA with
∑j µ∗(Aj) ≤ µ∗(E) + ε. We may and will assume that the cover is disjoint.
Set A = ∪∞j=1Aj . Since E ⊂ A,
µ∗(A \ E) = µ∗(A)− µ∗(E) ≤∑j
µ∗(Aj)− µ∗(E) < ε. (3.5)
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 28
Also, since E ⊂ A,µ∗(E) ≤ µ∗(A)≤
∑j
µ∗(Aj)
agree on A=
∑j
ν(Aj) = ν(A)
= ν(E) + ν(A \ E)
ν≤µ∗≤ ν(E) + µ∗(A \ E) ≤ ν(E) + ε,
where we have used (3.5) in the last inequality. Taking ε→ 0, we see that µ∗(E) = ν(E).
Step 3. Let % be σ-finite. By the assumption there exists an increasing sequence of sets Xn from Asuch that %(Xn) <∞ and
⋃n≥1Xn = X . Then using step 2 we can complete the proof (exercise).
3.2.4 Construction of Lebesgue-Stieljes / Lebesgue measure
Definition 3.2.5 A collection of subsets E is an elementary family if
• φ ∈ E ;
• If A,B ∈ E , then A ∩B ∈ E ;
• if A ∈ E then Ac is a finite union of disjoint sets from E .
Exercise 3.2.2 Show that if E is an elementary family, then the collection A of finite disjoint unions ofmembers of E is an algebra.
In this section, by half open interval we mean sets of the form (a, b], (c,∞), (−∞, b], or φ, wherea, b, c are finite numbers. For simplicity we write a generic half interval as (c, d], where c ∈ R∪ −∞.Also d is a real number, or∞. In the latter case by (c,∞] we really mean (c,∞] ∩ R.
Definition 3.2.6 Henceforth, in this section, let E denote the collection of half open intervals and let Abe the collection of finite unions of disjoint half open intervals.
We have seen that
Proposition 3.2.7 A is an algebra.
Remark 3.2.8 Every element of A can be written as finite disjoint union of maximal intervals (thismeans the interval cannot be extended within A):
A =n⋃k=1
(ak, bk], (3.6)
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 29
this is unique if we order them by bk < ak+1. Let us call this the canonical representation. If A is notbounded from above, bn =∞, by (an, bn] we mean (an, bn] ∩ R.
Let F : R → R be an increasing and right continuous function. Then it has at most a countablenumber of discontinuities, all of which are of jump type (i.e. left and right limits exist). We define
F (∞) = limx→∞
F (x), F (−∞) = limx→−∞
F (x). (3.7)
Definition 3.2.7 For A =⋃nj=1(aj , bj ] ∈ A, with bj < aj+1 for j = 1, . . . , n, we set
%(A) =
n∑j=1
(F (bj)− F (aj)). (3.8)
We also set %(φ) = 0.
A partition of a set F is a collection of disjoint subsets of A whose union is A.
Remark 3.2.9 This definition is independent of the expression for A. Indeed, suppose that A = (c, d] iswritten also as A =
⋃nj=1(aj , bj ], then up to a re-ordering,
c = a1 < b1 = a2 < b2 = a3 < · · · < bn = d.
This gives a telescopic sum,
n∑j=1
(F (bj)− F (aj)) = F (bn)− F (a1) = F (d)− F (c).
So when A is a single interval, there is no ambiguity in the definition of %.
If A =∑p
k=1(ck, dk], a canonical expression, so each interval is a maximal interval, then there ispartition Λk, k = 1, . . . , p of 1, . . . , n such that∑
j∈Λk
(aj , bj ] = (ck, dk].
In fact, it suffices to consider
Λk := j = 1, . . . , n : (aj , bj ] ⊂ (ck, dk].
Hencen∑j=1
%((aj , bj ]) =
p∑k=1
∑j∈Λk
%(aj , bj ] =
p∑k=1
%((ck, bk]).
Proposition 3.2.10 If F : R→ R is an increasing and right continuous function, then % defined by (3.8)is a pre-measure on A.
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 30
Proof We have %(φ) = 0, so it remains to prove σ-additivity. We start by showing that % is additive. LetA,B ∈ A be disjoint. Then there exists Ej ∈ E such that
A =n⋃j=1
Ej , B =m⋃
j=n+1
Ej .
Then
%(A ∪B) =
m∑j=1
%(Ej) =
n∑j=1
%(Ej) +
m∑j=n+1
%(Ej) = %(A) + %(B).
Hence % is indeed additive. By Lemma 3.2.5, we in particular deduce that % is monotone and sub-additive.
Step 2. Let now (Aj)j≥1 be a sequence of disjoint elements of A such that
A :=⋃j≥1
Aj
is in A. Using monotonicity followed by finite additivity,
%(A) ≥ %(
n⋃j=1
Aj) =
n∑j=1
%(Aj).
Since this holds for every n,
%(A) ≥∞∑j=1
%(Aj).
Step 3. We now prove the reverse inequality. Since A ∈ A,
A =
n⋃k=1
(ak, bk]. (3.9)
We show that %(A) =∑
j=1 %(Aj). As argued before, in Remark 3.2.9, by additivity of % we reduce thisto the case A is one single half open interval.
Step 4. We first assume A = (c, d] is bounded. Since, for all j ≥ 1, Aj is a finite union ofbounded disjoint half open intervals, the whole collection of these intervals is countable and we write it(ak, bk], k ≥ 1. In particular, we have
∞⋃j=1
Aj =
∞⋃k=1
(ak, bk], and∞∑j=1
%(Aj) = %(∞⋃j=1
Aj) =∞∑k=1
%((ak, bk]).
The idea is to approximate (c, d] by a compact interval [c + δ, d] and (ak, bk] by a larger open interval(ak, bk + δk), we can then choose a finite cover and use finite additivity.
Let ε > 0 such that ε < d− c. By the right-continuity of F , we may choose δ > 0 so that
F (c+ δ)− F (c) < ε.
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 31
Similarly, for all k ≥ 1, we choose a δk so that
F (bk + δk)− F (bk) ≤ 2−kε.
The compact set [c+δ, d] is covered by the sets (ak, bk] and therefore by the larger open sets (ak, bk+δk).We can choose a finite covering, so that for some N ,
[c+ δ, d] ⊂ ∪Nk=1(ak, bk + δk).
By monotonicity and sub-additivity of %,
%((c, d]) = F (d)− F (c) = F (d)− F (c+ δ) + F (c+ δ)− F (c) ≤ %([c+ δ, d]) + ε
≤ %(∪Nk=1(ak, bk + δk)
)+ ε
≤N∑k=1
%((ak, bk]) +N∑k=1
ε
2k+ ε
≤∞∑j=1
%(Aj) + 2ε.
( Since ∪Nk=1(ak, bk + δk) ⊂ ∪Nk=1(ak, bk + δk] ⊂ ∪jAj .)
Taking ε→ 0, we conclude that
%(A) = %((c, d]) ≤∑j
%(Aj)
as requested. Together with step 2, we have proved
%(A) =∑j
%(Aj)
for A a bounded half open interval.
Step 5. Now assume A = (−∞, b]. Then, for every n with −n < b,
%(A) = F (b)− F (−∞) = %((−n, b]) + F (−n)− F (−∞).
Note that
(−n, b] = (−n, b] ∩∞⋃j=1
Aj =∞⋃j=1
(−n, b] ∩Aj .
Since (−n, b] ∩Aj are disjoint half open sets, we use step 4 to conclude that
%((−n, b]) =
∞∑j=1
%((−n, b] ∩Aj).
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 32
Hence
%(A) =∞∑j=1
%((−n, b] ∩Aj) + F (−n)− F (−∞) ≤∞∑j=1
%(Aj) + F (−n)− F (−∞).
Taking n→∞, since F (−n)→ F (−∞) we have proved the reverse inequality. The same can be shownfor A = (a,∞). We have completed the proof for the σ-additivity for the case where the countabledisjoint union is in A.
As in (3.1), we define
µ∗(E) = inf
∞∑j=1
%(Aj) : E ⊂∞⋃j=1
Aj , Aj ∈ A
.
Since each A ∈ A is a finite disjoint union of members in E , by re-arranging we have
µ∗(E) = inf
∞∑j=1
%(Ej) : E ⊂∞⋃j=1
Ej , Ej ∈ E
= inf
∞∑j=1
%((aj , bj)) : E ⊂∞⋃j=1
(aj , bj)
.
Theorem 3.2.11 There exists a complete measure µF on the Borel σ-algebra such that µF = % on A.This is the unique measure on B(R) that agrees with % on A.
Proof This follows from the Caratheodory Theorems.
Definition 3.2.8 • The measure µF constructed in the theorem above is called Lebesgue-Stieljesmeasure associated to F .
• If F is the identity map, this is called the Lebesgue measure on R and will be denoted by λ.
Remark 3.2.12 If A = ∪∞j=1(aj , bj ] is a disjoint union, then µF (A) =∑∞
j=1 %((aj , bj ]). This followsfrom that the union is a Borel set, and σ-additive property fo the measure. The subscript F will beomitted if there is no risk of confusion.
Theorem 3.2.13 If B is a µ∗-measurable set, then
µF (B) = infµF (O) : B ⊂ O, O is open= supµF (D) : D ⊂ B, D is closed.
This holds in particular if B is a Borel measurable set.
3.2. CONSTRUCTION OF MEASURES FROM PRE-MEASURES 33
Proof This is left as exercise.
Recall that B∆A = (B \A)⋃
(A \B).
Remark 3.2.14 If B is µ∗-measurable set with finite measure, then for every ε > 0 there exists a set Awhich is a finite union of open intervals, such that µF (B∆A) < ε.
3.2.5 Examples
Example 3.6 Let b ∈ R, then
µF ((a, b)) = µF ((a, b])− µF (b).
Observe that µF (b) 6= 0 precisely when b is a discontinuity of F . Indeed,
µF (b) = µF (∩∞n=1(b−1
n, b]) = lim
n→∞µF ((b− 1
n, b]) = F (b)− lim
n→∞F (b− 1
n) = F (b)−F (b−).
Example 3.7 Since F (x) = x is continuous, any countable set is Borel measurable and has Lebesguemeasure zero.
Example 3.8 Let
F (x) =
a, x < 0,
a+ 3, x ≥ 0.
Then F (x) = 3δ0. It is clear that for any interval,
µF ((a, b]) =
3, if 0 ∈ (a, b],
0, if 0 6∈ (a, b],
To see that for any Borel set B,
µF (B) =
3, if 0 ∈ B,0, if 0 6∈ B,
we use the uniqueness of extensions of % to the Borel σ-algebra and the fact that µF and 3δ0 agree onA.
Example 3.9 If Y is a random variable on a probability space (Ω,F ,P), set F (x) = P(Y ≤ x). ThenF is right continuous and increasing, it determines a Lebesgue-Stieljes measure µF . Since F (−∞) = 0,Also,
µF ((a, b]) = F (b)− F (a) = P(a < Y ≤ b).
Definition 3.2.9 1. A map f from one measurable space (X1,B1) to another measurable space (X2,B2)
is said to be measurable if f−1(B) ∈ B1 for any B ∈ B2. (In short , thia is f−1(B2) ⊂ B1.)
3.3. LEBESGUE MEASURE ON RN 34
2. We will call a function f between two metric spaces /topological spaces (Borel) measurable if thepre-image of a Borel set is a Borel set.
3. We also use Borel measurable for a measurable function with values in a topological space, and thetopological space is given the Borel σ-algebra.
Definition 3.2.10 Let (X1,B1) and (X2,B2) be measurable spaces, and µ1 a measure on (X1,B1). Letf : X1 → X2 be a measurable function. We define a measure µ2 on B2 by setting
µ2(A) = µ1(f−1(A)) = µ1(x : f(x) ∈ A).
This is called the pushed forward measure and denoted by f∗(µ1) or sometimes as µ−1.
Exercise 3.2.3 Show that f∗(µ1) is a measure.
If X : Ω→ R, we recognise X∗P as the probability distribution of the random variable. For example, ifX : Ω→ 0, 1, then X∗P(0) = P(X = 0) and X∗P = P(X = 0)δ0 + P(X = 1)δ1.
Exercise 3.2.4 Let f : Ω → X and suppose that f−1(A) ∈ F for every open set A. Then f−1(A) ∈ Ffor every Borel set A.
Proof Define F0 = A ∈ B(X ) | f−1(A) ∈ F. It is a σ-algebra (check!, c.f. Lemma 2.1.1) andcontains open sets of X and thus contains all Borel sets.
Proposition 3.2.15 Let X and Y be two topological spaces and let f : X → Y be continuous. Then f is(Borel) measurable.
Proof Since f is continuous, f−1(A) is open for every open set A,. This means if A is open thenf−1(A) ∈ B(X ). Set F0 = A ∈ B(Y) | f−1(A) ∈ B(X ). Then F0 contains all open sets. It is easyto check F0 is a σ-algebra. Thus F0 contains the σ-algebra generated by open sets which is the Borelσ-algebra B(Y), concluding f is Borel measurable.
3.3 Lebesgue Measure on Rn
By a similar construction, we obtain a Lebesgue measure on Rn. We do not get into details.
Definition 3.3.1 There is a unique measure λ on B(Rn) such that, for all a1, . . . , an, b1, . . . , bn ∈ Rwith ai ≤ bi,
λ((a1, b1]× · · · × (an, bn]) = Πni=1(bi − ai).
This is called the Lebesgue measure on Rn, it extends to the completion of B(Rn).
Remark 3.3.1 The Lebesgue measure is invariant under translations and rotations.
3.4. UNIQUENESS OF EXTENSIONS 35
3.4 Uniqueness of Extensions
We often know that a property holds for a sub-collection of a σ-algebra, want to show it holds for everyset in the σ-algebra. The difficulty with this is the σ-algebras are in generally complicated. If we have afinite partition of X , the σ-algebra contains a finite number of sets. Otherwise it has uncountable numberof elements. See §2.2.1.
There are a number of powerful techniques, one of them is the π − λ theorem.
3.4.1 π − λ theorem
Definition 3.4.1 A non-empty collection of sub-setC is a π-system ifA,B ∈ C implies thatA∩B ∈ C.
Definition 3.4.2 A collection C of sub-sets is called a λ-system if
1. Ω ∈ C
2. If A,B ∈ C and A ⊂ B then B \A ∈ C.
3. If An ∈ C, An increases to A, then A ∈ C.
A σ-algebra is a π-system and a λ system.
Proposition 3.4.1 (not given in class) 1. Show that the intersection of any number of λ-systems is aλ-systems. Therefore given any collections of subsets there exists a small λ-system containing it.
2. If a λ system A is also a π-system, then it is a σ-algebra.
Proof Part (1) is trivial. To see a set which is simultaneously a π-system and a λ-system is a σ-algebra, remark that taking unions is obtained from taking intersections and complements. In turn,taking countable unions is obtained from taking finite unions and monotone limits, whence the claim.
Theorem 3.4.2 (not given in class) If C is a π-system, then the smallest λ-system generated by C, λ(C),is a σ-algebra, and λ(C) = σ(C).
Proof FirstlyF1 : = A : A ∩ C ⊂ λ(C)
= A ∈ λ(C) : A ∩ F ∈ λ(C),∀F ∈ C
is a λ-system containing C, which implies F1 = λ(C). Indeed, let F ∈ C, then (1) Ω ∩ F = F ∈ C, andso Ω ∈ F1; (2) If A ⊂ B, both in F1,
(B \A) ∩ F = B ∩ F \ (A ∩ F ) ∈ F1
3.5. MEASURE DETERMINING SETS 36
(3).If An ∈ F1, An ∩ F ∈ λ(C), so (⋃An) ∩ F =
⋃n(An ∩ F ) ∈ λ(C).
Similarly,F2 : = A : A ∩ λ(C) ⊂ λ(C)
= A ∈ λ(C) : A ∩ E ∈ λ(C),∀E ∈ λ(C)
is a λ-system containing C, and therefore F2 = λ(C). Thus λ(C) is both a λ-system and a π-system,and is therefore a σ-algebra. Since it contains C, we obtain σ(C) ⊂ λ(C). But, since any σ-algebra is aλ-system, we also have λ(C) ⊂ σ(C), and the equality follows.
We present now a very useful lemma:
Lemma 3.4.3 [π − λ Theorem.] If a λ-system contains a π-system C, then it contains the σ-algebragenerated by C.
3.5 Measure determining sets
Exercise 3.5.1 Let µ be a measure on a measurable space (X ,F). Show the following two statementsare the equivalent.
• There exists a sequence of set An with ∪An = X and µ(An) <∞.
• There exists an increasing sequence of set An with ∪An = X and µ(An) <∞.
Just set Cn = ∪nk=1Ak.
Exercise 3.5.2 Show that if µ, ν are σ-finite measures on a measurable space (X ,F), then there exists acommon sequence of Cn with both µ(Cn) <∞ and ν(Cn) <∞ for each n.
Proof. Let An, Bn be two sequences of sets increasing to X , and such that µ(An), ν(Bn) finite for everyn. By distribution law, X = ∪i,j(Ai ∩Bj). It is clear both µ, ν are finite on Ai ∩Bj .
Theorem 3.5.1 Suppose that (Ω,F) is a measurable space, and C is a π-system generating F . Let µand ν be two measures which agree on C.
1. If µ(Ω) = ν(Ω) <∞, then µ = ν
2. More generally, if there exists an increasing sequence of subsets Ωk ∈ C, such that Ω = ∪k≥1Ωk
and µ(Ωk) = ν(Ωk) <∞ for all k ≥ 1, then µ = ν.
3.5. MEASURE DETERMINING SETS 37
Proof (1) First assume that µ(Ω) = ν(Ω) <∞. Let
G = A ∈ F : µ(A) = ν(A).
Then Ω ∈ G by assumption. Moreover, if An is an increasing sequence of measurable sets,
µ(∞⋃n=1
An) = limn→∞
µ(An) = limn→∞
ν(An) = ν(∞⋃n=1
An)
Thus, G is closed under taking lower limit. Let A ⊂ B, A,B ∈ G, then by additive property,
µ(B \A) = µ(B)− µ(A) = ν(B)− ν(A) = ν(B \A).
This means B \ A ∈ G, and therefore G is a λ-system containing a π-system generating F . By theπ − λ-Theorem, G ⊃ F and µ = ν on F , which proves the first point.
(2) For the second point, let Fk = A ∩ Ωk : A ∈ F denote the trace σ-algebras on Ωk, and denoteby µk and νk the restrictions to Ωk of the measures µ and ν:
∀A ∈ F , µk(A) = µ(A ∩ Ωk), νk(A) = ν(A ∩ Ωk).
Applying the first point to µk and νk, we deduce that µk = νk. Therefore, by lower contiunity ofmeasures, we obtain, for all A ∈ F ,
µ(A) = limk→∞
µ(A ∩ Ωk) = limk→∞
ν(A ∩ Ωk) = ν(A),
completing the proof.
Example 3.10 Let Ω = 1, 2, 3, 4. Let G = 1, 2, 1, 3, 3, 4, which generates the discreteσ-algebra F ( the power set), but not a π-system. The measure µ and ν agree on G, not on F .
µ(1) = 1/6, µ(2) = 2/6, µ(3) = 1/6, µ(4) = 2/6,
ν(1) = 2/6, ν(2) = 1/6, ν(3) = 0, ν(4) = 3/6.
Theorem 3.5.2 If A ∈ B(R), so does t+A. The Lebesgue measure is translation invariant on the Borelσ-algebra.
Proof LetC = A ∈ B(R) : t+A ∈ B(R), ∀t ∈ R.
Since t+ Ω = Ω, Ω ∈ C. If A ⊂ B with A,B ∈ C, so t+B ∈ B(R), t+A ∈ B(R),
t+ (B \A) = x+ t : x ∈ B, x 6∈ A = (t+B) \ (t+A) ∈ B(R).
3.5. MEASURE DETERMINING SETS 38
If An ∈ C is an increasing sequence,⋃nAn ∈ B(R), and t+An ∈ B(R), so
t+⋃n
An =⋃n
(t+An) ∈ B(R).
Thus⋃nAn ∈ C By π − λ theorem, any shift of a Borel set is a Borel set.
Now we show µ(t+B) = µ(B). Let
A = A ∈ B(R) : λ(t+A) = λ(A), ∀t ∈ R.
Then A contains the π-system of finite unions of disjoint half intervals. Now Ω ∈ A. If A ⊂ B withA,B ∈ A,
t+B \A = (t+B) \ (t+A),
t+A ⊂ t+B, hence
µ(t+B \A) = µ(t+B)− µ(t+A) = µ(B)− µ(A) = µ(B \A).
Thus B \A ∈ A.
If An ∈ A is an increasing sequence, so is t+An, and⋃n(t+An) = t+
⋃nAn,
µ(t+∞⋃n=1
An) = limn→∞
µ(t+An) = limn→∞
µ(An) = µ(⋃n
An).
Thus A is a λ system, concluding the proof.
Proposition 3.5.3 Two non-trivial translation-invariant Borel measures on R which assign a finite mea-sure to bounded intervals are constant multiples of each other.
Proof Let µ and ν be translation invariant measures. Then, for all n, p, q ≥ 1,
µ((0, 1]) = qµ((0,1
q]), µ([0, p/q]) = pµ((0,
1
q]) = (p/q)µ((0, 1]).
Similarly, ν([0, n)) = (p/q)ν((0, 1]). The measurements ν((0, 1]) and ν((0, 1]) are non-zero, for other-wise the measure is trivial. Set c = µ([0, 1)]/ν([0, 1).
Then µ([0, p/q]) = (p/q)µ([0, 1)] = cν([0, p/q)). Hence, µ(I) = cν(I) for all I ∈ C, where Cdenotes the collection of all half-open intervals with finite rational length. Now, C is a π-system andσ(C) = B(R). Moreover, we have ∪k≥1(−k, k] = R, and µ and cν assign the same finite mass to(−k, k] ∈ C for all k ≥ 1. By Theorem 3.5.1, we deduce that µ = cν.
Chapter 4
Measurable functions
A map f from one measurable space (Ω,F) to another measurable space (X ,G) is said to be measurableif the pre-image of any measurable set is a measurable set. We will call a function f between twotopological spaces (Borel) measurable if f−1(A) is a Borel set for every Borel set A.
Let (X ,B) be a measurable space and Ω a set. A function f : Ω→ X is always measurable when Ω
is given the the pre-image σ-algebra σ(f), this is the smallest σ-algebra on Ω such that f : Ω → X ismeasurable.
4.1 Examples
Example 4.1 If X is a discrete space, let the power set be its σ-algebra. Let Y be another space with aσ-algebra. Then any function f : X → Y is measurable.
If f is identically 1 on a set A and zero everywhere else, it is called the indicator function of A. Wedenote it by 1A:
1A(ω) =
1, if ω ∈ A0, if ω 6∈ A.
The indicator function 1A of a set is a measurable function if and only if A is a measurable set.
Example 4.2 If f : E → R takes only a finite number of values a1, . . . , ak. Then, f is measurableif and only if f = ai is measurable. Also, f can be written in the following form: f =
∑ki=1 ai1Aj .
(Just take Aj = x : f(x) = aj).
39
4.1. EXAMPLES 40
4.1.1 Simple functions
A simple functions is a measurable function taking on a finite number of distinct values. It has rep-resentation as linear combinations of indicator functions of measurable sets. Representations are notnecessarily unique. For example take the domain space to be R, let A1 = [1, 2], A2 = (2, 6], andA3 = (−∞, 1) ∪ (4,∞). Then, f = 21A1 + 31A2 + 01A3 . We often do not include the last term withzero value, but sometimes it is included for the convenience that A1 ∪A2 ∪A3 = R. The representationis not unique,
f = 21A1∩Q + 31A2∩Q + 21A1∩Qc + 31A2∩Qc .
Also,
f = 21([1,3] + 1(2,3] + 31(3,6].
let us now give one popular definition for simple functions.
Definition 4.1.1 Let (X ,F) be a measurable space. A function of the form f(x) =∑N
i=1 ai1Ai(x)
where ai ∈ R and Ai ∈ Fi is said to be a simple function.
Proposition 4.1.1 If Ai are measurable sets. Then f =∑∞
i=1 ai1Ai is measurable.
Proof If B is a Borel set,
f−1(A) = ∪i:ai∈BAi.
This is a countable union and belongs to F if every Ai ∈ F .
Remark 4.1.2 Suppose that f =∑∞
i=1 ai1Ai is measurable. If the numbers ai are distinct, and Aiare disjoint sets, thenAi is necessarily measurable. Indeed, f−1(ai) = Ai. So for f to be measurable,Ai must belong to F .
If a1 = a2, we can conclude f−1(a1) = A1 ∪ A2 is measurable, but we may not have any furtherinformation to conclude the measurability of the sets A1 and A2.
Definition 4.1.2 Suppose that a simple function f takes the following non-zero values ai, i = 1, . . . , N.Then
f(x) =n∑i=1
ai1f−1(ai)
is said to be the canonical representation for f . Set Ai = f−1(ai), then Ai are pairwise disjoint mea-surable sets and
⋃Ni=1Ai = X .
4.1. EXAMPLES 41
Exercise 4.1.1 Write the following function in the form∑∞
i=1 ai1Ai where Ai are pairwise disjoint.
f(x) =
2, x ∈ A1 = (−2π,−π]
3, x ∈ A2 = [1, 4) ∪ (7, 8)
1
2, x ∈ A3 = (4, 6]
5, x ∈ A4 = (9,∞) ∩Q0, otherwise
Solution: The canonical representation is
f(x) = 21A1 + 31A2 +1
21A3 + 51A4 .
We see that A1, A2, A3, A4, (∪4i=1Ai)
c is a partition of X .
Example 4.3 A step function f : R → R is a piecewise constant function. Steps functions with a finitenumber of steps are simple functions.
Theorem 4.1.3 Let (X ,G) be a measurable space. Then,
• Every measurable function f : X → R is the pointwise limit of simple functions:
f(x) = limn→∞
fn(x),
where fn ∈ E(G). Furthermore we can arrange so that |fn| ≤ |f |.
( If f is bounded, the convergence is uniform.)
• If fn ≥ 0 we may choose fn to be non-negative and increasing so f = supn≥1 fn.
Proof Step 1. We prove the second claim first. Let us consider dyadic partitions of the interval [0, n].Let us cut each length 1 interval into 2n pieces of equal length (this is called a dyadic partition). Thedyadic partitions have an advantage: any partition points from current level contains all partition pointsfrom the previous levels.
Let us define the measurable sets:
Anj =
x :
j
2n< f(x) ≤ j + 1
2n
, 0 ≤ j ≤ 22n − 1.
We approximate f with the value j2n on Anj and define
gn =
n(22n−1)∑j=0
j
2n1Anj .
4.1. EXAMPLES 42
Then on x : 0 ≤ f(x) < n, |fn(x)− f(x)| ≤ 12n , so
limn→∞
fn(x) = f(x).
Also fn increases with n and fn(x) ≤ f(x) for every x.
Step 2. For the first claim, we write f = f+ − f− where
f+ = max(f, 0) f− = max(−f, 0).
Let (f+)n be an non-decreasing sequence of simple functions with limit f+. Similarly, Let (f−)n be annon-decreasing sequence of simple functions with limit f−. Set
gn = (f+)n − (f−)n.
Then|gn(x)| = (f+)n(x) + (f−)n(x) ≤ f+(x) + f−(x) = |f(x)|.
For every x ∈ f−1([−n, n]),
|f(x)− gn(x)| = |(f(x)− (f+)n + (f−)n)(x)|≤ |f+(x)− (f+)n(x)|+ |f−(x)− (f−)n(x)|
≤ 2
2n.
This means that fro every x, limn→∞ gn(x)→ f(x). If |f | ≤M ≤ n0 is bounded, then for n ≥ n0,
|f(x)− gn(x)| ≤ 2
2n,
showing the uniform convergence.
4.1.2 Product σ-algebras
To understand the algebra of measurable functions, we introduce product σ-algebras.
Definition 4.1.3 Let (X ,F) and (Y,G) be two measurable spaces. We define the product σ-algebra(also called tensor σ-algebra) on the product space X × Y to be that generated by product sets
F1 ⊗F2 = σA×B : A ∈ F , B ∈ G.
This definition extends to any number of measurable spaces. We give the definition for product of acountable number of measurable spaces.
4.2. PROPERTIES OF MEASURABLE FUNCTIONS 43
Definition 4.1.4 Let (En,Fn) be measurable spaces. The tensor or product σ-algebra on the productspace Π∞n=1En is
σ
ΠnAn, : An ∈ Fn.
Remark 4.1.4 If X is a separable metric space, the metric topology satisfies the second axiom of count-ability, i.e. there exists a countable base. This countable base generates B(X). Let (Xα, α ∈ I) beseparable metric spaces. The product topology on XI is the coarsest topology such that the projectionsare continuous, it is generated by sets of the form Π−1
α (Aα) where α ∈ I and Aα are open sets of Xα.The coordinate mappings are measurable with respect to the Borel σ algebra on the product space and⊗αB(Xα) ⊂ B(Πα∈IXα).
Theorem 4.1.5 Let (Xn) be separable metric spaces and set X = Π∞i=1Xi. Then
B(X) = ⊗∞n=1B(Xn).
For a proof see Theorem 1.10 in the book by Parthasarathy.
4.2 Properties of measurable functions
Proposition 4.2.1 Letf1 : (X ,F) → (X1,F1) and f2 : (X ,F) → (X2,F2) be measurable functions.Then
ψ = (f1, f2) : X → X1 ×X2
is measurable, when the target space is given by the product σ algebra F1 ⊗F2.
Proof Apply Proposition 4.2.3, it is sufficient to know that h−1(A × B) where A,∈ F1 and B ∈ F2 ismeasurable. But,
ψ−1(A×B) = f−11 (A) ∩ f−1
2 (B) ∈ F ,
completing the proof.
Proposition 4.2.2 If f : (X1,F1) → (X2,F2) and g : (X2,F2) → (X3,F3) are measurable functionsthen so f g : X1 → X3 is measurable.
Proof Let A ∈ F3 then f−1(A) ∈ F2. Furthermore,
(f g)−1(A) = x ∈ X1 : f g ∈ A = x ∈ X1 : g ∈ f−1(A) ∈ F1
using g is measurable.
Proposition 4.2.3 Let (Ω,F) and (X ,B) be measurable spaces. Suppose that B is generated by C andf : Ω→ X . If f−1(A) ∈ F whenever A ∈ C, then f is measurable.
4.2. PROPERTIES OF MEASURABLE FUNCTIONS 44
Proof B′ = A ∈ B : f−1(A) ∈ F ⊃ C is a σ-algebra, check with Lemma 2.1.1, it must agree withB = σ(C).
Recall that If X,Y are metric spaces and if f : X → Y is a continuous, then f is Borel measurable.(Since continuity means that the pre-image of an open set is an open set. Since B(Y ) is generated byopen sets which forms a π-system
C := A ∈ B(Y ) : f−1(A) ∈ B(X)
contains all open sets, C = B(Y ).)
4.2.1 Measurable functions on the real line
Definition 4.2.1 If f : Ω → R (or talking values in a complete separable metric space) with the Borelσ-algebra, and (Ω,F ,P) a probability space, we tend to call f a random variable.
The Borel σ-algebra of R is generated by the set of open intervals, equally they are generated by theset of all closed intervals. We state some of these below, they follow straightforwardly from Proposition4.2.3.
Proposition 4.2.4 Let (X,B) be a measurable space. A function f : X → R is Borel measurable if oneof the following conditions hold:
1. for each real number a, x : f(x) > a is a measurable set;
2. for each real number a, x : f(x) < a is a measurable set;
3. for each real number a, x : f(x) ≥ a is a measurable set;
4. for each real number a, x : f(x) ≤ a is a measurable set;
For the first just note that sets of the form (a,∞) generates the Borel σ-algebra, apply Proposition 4.2.3to. conclude. Similarly the other statements chan be shown.
Proposition 4.2.5 Let (X,F) be a measurable space. If f, g : X → R are measurable functions, thenso are the following functions.
1. f + g
2. fg,
3. max(f, g)
4.2. PROPERTIES OF MEASURABLE FUNCTIONS 45
4. f+ = max(f, 0), f− = max(−f, 0)
5. |f | = f+ + f−.
Proof Firstly (f, g) : X → R2 is Borel measurable. Set ψ : X → R2 by
ψ(x) = (f(x), g(x))
and define ϕ : R2 → R by ϕ(x, y) = x+ y, a continuous function. Then f + g = ϕ ψ is a compositionof measurable function, so measurable. Choosing ϕ(x, y) = xy, for the analogous proof for fg. Nowfor any a > 0, max(f, g) < a = f < a ∩ g < a ∈ F , so max(f, g) is measurable. Part 4 followsfrom part 3, and 5 follows part 1 and part 4.
Note max(f, g) = (f+g)+|f−g|2 .
Proposition 4.2.6 Let (X,B) be a measurable space. If fn : X → R be measurable functions, so arethe following:
1. supn≥1 fn,
2. infn≥1 fn;
3. lim sup fn,
4. lim inf fn.
Proof Firstlysupnfn > a = ∪nfn ≥ a,
belongs to F , as each fn ≥ a does (the above is x : infn≥1 fn(x) < a = ∪nx : fn(x) < a), sosupn≥1 fn is measurable. Also,
infn≥1
fn < a = ∩nfn < a ∈ F ,
showing that infn≥1 fn is measurable. Since
lim supn→∞
fn = infn
supk≥n
fk, lim inf fn. = supn
supk≥n
fk,
they are both measurable.
Proposition 4.2.7 If fn is a sequence of Borel measurable functions with f = limn→∞ fn(x) exists atevery x, then f is a Borel measurable function.
4.2. PROPERTIES OF MEASURABLE FUNCTIONS 46
Example 4.4 If (X ,F) and (Y,G) are measurable spaces. Suppose that µ is a measure. Let f : X → Yand g : X → Y . Let A = f 6= g.
1. Suppose both f and g are measurable, then f − g is a measurable function, consequently, A =
f − g 6= 0 is a measurable set.
2. Suppose that F is complete with respect to µ. Suppose A is contained in a null measurable set.Since F is complete, then A is measurable and µ(A) = 0. Then f is measurable implies that g ismeasurable. Indeed, for any B ∈ G,
g ∈ B = (f ∈ B ∩Ac)⋃
(g ∈ B ∩A) ∈ F .
We use that (g ∈ B ∩ A) is contained in a null set and therefore measurable, Ac and f ∈ Bafre measurable.
Definition 4.2.2 If f = g on a set of full measure, we say f = g almost surely. This is abbreviated byf = g a.s. or f = g a.e.
Example 4.5 (This is not intended to be delivered in class) If fn is a sequence of Borel measurablefunctions. Set
f(x) =
limn→∞
fn(x), if the limit exists at x,
0, otherwise
Then f is measurable.
Proof SetU := x : lim
n→∞fn(x) exists = x : lim sup
n→∞fn(x) = lim inf
n→∞fn(x)
Since lim supn→∞ fn(x) and lim infn→∞ fn(x) are both measurable, U is a measurable set. Define
Z := (lim inf fn, lim sup fn).
( Let ∆ = (b, b) : b ∈ R denote the the diagonal subset of R2, then U = Z−1(∆).)
Note that f(x) = 0 on U c. Then for any a < 0,
x : limn→∞
fn(x) < a = x : lim supn→∞
fn(x) < a,
which is measurable since lim supn→∞ fn is measurable. Let a ≥ 0, then
x : limn→∞
fn(x) < a = x : lim supn→∞
fn(x) < a ∪ U,
which is also a measurable set.
In analysis we frequently encounter Lebesgue measurable sets and functions, this refers to the σ-algebra
4.3. FACTORISATION LEMMA 47
4.3 Factorisation Lemma
Lemma 4.3.1 (The factorization Lemma) Let (X ,G) be a measurable space and Ω a set. Let Y :
Ω → X be a measurable function and consider the measurable space (Ω, σ(Y )). Then X : Ω → R isσ(Y )-measurable if and only if there exists a measurable function f : Ω→ R such that X = f Y .
Proof ( not to be delivered in class) Recall that σ(Y ) = Y −1(B) : B ∈ G). Suppose that f : X → Ris measurable, since Y is measurable with respect to σ(Y )/G, then the composition of the measurablefunctions f Y is measurable.
For the converse, suppose that X is σ(Y )-measurable. We first assume X = 1A where A ∈ σ(Y ).Then A = Y −1(B) for some B ∈ G and
1B Y = 1ω:Y (ω)∈B = X.
So X is the composition of the measurable function 1B with Y .
Suppose that X takes only a countable number of values (an) and write An = X−1(an). Since Xis σ(Y )-measurable, there exist sets Bn ∈ G such that Y −1(Bn) = An. Define now the sets
Cn = Bn \⋃p<n
Bp.
These sets are disjoint and one has again Y −1(Cn) = An. Setting f(x) = an for x ∈ Cn and f(x) = 0
for x ∈ X \⋃nCn, we see that f has the required property.
Suppose that X ≥ 0, we may approximate it with an increasing sequence of simple functions Xn.By the paragraph above, there exists gn : X → R simple functions such that Xn = gn(Y ). In fact, gnis an increasing sequence and let g = supn gn. Since g is a pointwise limit of measurable functions, g isalso measurable. We note g(Y (ω)) = supn gn(Y (ω)) = supnXn(ω)) = X(ω). We complete the proofwith X = X+ −X−.
4.4 Appendix*
4.4.1 Littlewood’s three principles
Every Borel measurable set on R is nearly a finite union of intervals. Every (measurable) function isnearly continuous, every convergent sequence of measurable functions is nearly uniform convergent.
4.4. APPENDIX* 48
4.4.2 Product σ-algebras
Let I be an arbitrary index and let (Eα,Fα), α ∈ I be a family of measurable spaces. The tensorσ-algebra, also called the product σ-algebra, on E = Πα∈IEα is defined to be
⊗α∈IFα = σπ−1α (Aα) : Aα ∈ Fα, α ∈ I.
Here πα : E = Πα∈IEα → Eα is the projection (also called the coordinate map). If x = (xα, α ∈ I)
is an element of E, πα(x) = xα. The tensor σ-algebra is the smallest one such that for all α ∈ I , themapping
πα : (E,⊗α∈IFα)→ (Eα,Fα)
is measurable.
Let πi : Rn → R be the projection to the ith component. The tensor σ-algebra ⊗nB(R) is
⊗nB(R) = σπ−1i (A) : A ∈ B(R).
For example π−11 (A) = A× R× · · · × R.
4.4.3 The Borel σ-algebra on the Wiener Space
The set of all maps from [0, 1] to Rd can be denoted as the product space (Rd)[0,1]. The tensor σ-algebra⊗[0,1]B(Rd) is the smallest space such that the projections πt where πt(σ) = σ(t) is measurable. Theproduct topology is the smallest topology such that the projections are continuous. Hence the Borelσ-algebra of the product topology is larger than the tensor σ-algebra.
Let W d = C([0, T ],Rd) be the space of continuous functions from [0, 1] to Rn. It is a Banach spacewith the supremum norm:
‖g‖W := supt∈[0,1]
‖g(t)‖.
Any measurable set in the tensor σ-algebra is determined by the a countable number of projections. Theaction to determine whether a function is continuous or not cannot be determined by a countable numberof operations. Hence W d is not a measurable set in the tensor σ-algebra. (once we know a function iscontinuous, the function can be determined by its values on a countable dense set.)
Example 4.6 A sample continuous real valued stochastic process (Xt, 0 ≤ t ≤ 1) on a probability space(Ω,F , P ) can be considered as a function, denoted by X , from (Ω,F) to W :
X(ω)(t) = Xt(ω).
For each time t, and a ∈ Rn, ω : |Xt(ω)− a| < ε belongs to F as Xt is measurable.
4.4. APPENDIX* 49
We show that the function X : Ω → W is Borel measurable. Take f ∈ W and the open ball in theBanach space W centred at f with radius ε > 0. Its pre-image by X is:
ω : sup0≤t≤1
|Xt(ω)− fti | < ε = ω : sup0≤ti≤1,ti∈Q
|Xti(ω)− fti | < ε
= ∩0≤ti≤1,ti∈Qω : |Xti(ω)− fti | < ε.
Since for each i, ω : |Xti(ω) − fti | < ε ∈ F , the intersection of the countable number of such setsbelongs to F .
Chapter 5
Integration
5.1 Integration
Throughout this chapter (X ,F , µ) is a measure space with a σ-finite measure µ. (Previously I assumethe measure is complete, this assumption is not needed for defining integration.)
By the integral of a Borel measurable function f : X → R, we mean a value assigned to this function.Let us call this value I(f). So I : H → R is a function on a set of functionsH. For I to qualify to be anintegral it should have some properties which we describe below.
Property 1 (IP) Let H be a class of Borel measurable functions f : X → R. Suppose that to everyf ∈ H, we assign a number or∞ which we denote by I(f). We say I satisfies the IP if the followingholds:
1. (Linearity) If f, g ∈ H and a, b ∈ R, we have
I(af + bg) = aI(f) + bI(g).
2. (Monotonicity) If f ≤ g then I(f) ≤ I(g)
Notation. We use∫f for the integral of f . To emphasize the measure used we also use
∫fdµ for
the same integral. Sometime we emphasize the domain of integration as follows∫X fdµ. Sometimes we
explicitly express the argument of integration by a dummy variable:∫f(x)µ(dx), this is also written as∫
f(x)dµ(x). Also if A is a measurable set, we set∫A f =
∫f1A etc.
51
5.2. OUTLINE OF THE CONSTRUCTION 52
5.2 Outline of the construction
The procedure for defining the integral of a function f : X → R w.r.t. a measure µ is as follows.
1. If f =∑N
i=1 ai1Ai is a simple function (the assumption that Ai are measurable is included in thedefinition), we define ∫
Xf dµ =
N∑i=1
ai µ(Ai).
2. If f is a positive function we define ∫Xf dµ = sup
g
∫g dµ
where the supremum is taken over simple functions g with 0 ≤ g ≤ f . We say f is integrable if∫f dµ <∞.
3. Let f : X → R be a measurable function. Let f+ = max(f(x), 0) and f−(x) = max(−f(x), 0)
be the positive and negative parts of f , then f = f+ − f−. If both f+ and f− have finite integralswe say f is integrable and define∫
Xf dµ =
∫Xf+ dµ−
∫Xf− dµ.
The set of integrable functions is denoted by L1. Observe that in this case∫|f |dµ =
∫f+dµ+
∫f−dµ.
5.3 Integral of simple functions
Let us denote by E the set of measurable functions
E =
n∑i=1
ci1Ai : Ai ∈ F , ci ∈ R, n = 1, 2, . . .
.
Elements of E are called simple functions. We also denote by E+ the set of positive simple functions:
E+ =
n∑i=1
ci1Ai : Ai ∈ F , ci > 0, n = 1, 2, . . .
.
A simple function can have more than one representations. Take for example
f(x) =
1, x ∈ [0, 1]
1, x ∈ (2, 3]
5.3. INTEGRAL OF SIMPLE FUNCTIONS 53
Thenf(x) = 1[0,1](x) + 1[2,3](x) = 1[0,1)(x) + 11(x) + 1[2,3](x).
There exists a unique canonical representation, up to re-arranging the orders of the sets. By the canonicalform we mean
f =
n∑i=1
ai1f−1(ai),
where ai are the set of non-zero values of f (they are of course distinct numbers), this is unique up tore-ordering.
Exercise 5.3.1 A simple functions is a measurable function taking only a finite number of values, ameasurable function taking only a finite number of values is a simple function.
Proof If f is a measurable function taking only a finite number of non-zero values c1, . . . , cn, then itcan be expressed in the form of
∑ni=1 ci1f−1(ci) and f−1(ci) is measurable.
We show the converse: f =∑n
i=1 ai1Ai only takes a finite number of values.
Remark 5.3.1 An easier way to show that f takes only a finitely many values is as below: f is a linearcombination of n indicator functions. Since an indicator function takes only value 0 or 1, such a functionthus can take at most 2n values.
We next show this in at hard way, the proof itself has some value. First suppose that
f = a11A1 + a21A2 .
SetB1 = A1 \A2, B2 = A2 \A1, B3 = A1 ∩A2.
Thenf = a11B1 + a21B2 + (a1 + a2)1B3 ,
so f takes at most the following values: 0, a1, a2, a1 + a2.
We now prove by induction to show that every f of the form∑n
i=1 ai1Ai can be written in the form
f =
m∑i=j
bj1Bj ,
where Bj are disjoint. We have shown this holds for n ≤ 2. Suppose this holds for k ≤ n. Then,
f =
n+1∑i=1
ai1Ai =
m∑i=j
bj1Bj + an+11An+1 ,
5.3. INTEGRAL OF SIMPLE FUNCTIONS 54
where Bi are disjoint. SetCj = Bj \An+1, j = 1, . . . ,m,
Cm+j = Bj ∩An+1, C2m+1 = An+1 \ (n⋃j=1
Bm).
Then Cj are disjojnt sets and
f =m∑j=1
bj1Cj + an+11C2m+1 +m∑k=1
(bk + an+1)1Bk∩An+1 .
Thus f can be written as a finite linear combination of indicator functions of disjoint sets, among otherthings this also shows that f takes a finite number of values.
Exercise 5.3.2 1. If f, g are simple functions, so are f + g and fg. In particular E is a vector space.
2. If f is a simple function, f+, f− are both simple functions.
Proof Let us denote by ai the distinct values of f and bj the distinct values for g. We write
f =n∑i=1
ai1f−1(ai), g =m∑j=1
bj1f−1(bj),
SetEi = f−1(ai), Fj = g−1(bj).
Then (Ei ∩ Fj) ∩ (Ek ∩ Fl) = φ if (i, j) 6= (k, l). It is clear that
f + g =∑
1≤i≤n,1≤j≤m(ai + bj)1Ei∩Fj.
showing f + g is a simple function. The proof for fg is similar. It is clear that f+ = max(f, 0),f− = min(f, 0) are also simple functions. They are obtained by flipping ai to max(ai, 0) or max(−ai, 0).
Definition 5.3.1 Let f =∑n
i=1 ai1Ai , in the canonical representation. Suppose that µ(Ai) < ∞ forevery i, we then set
I(f) =n∑i=1
aiµ(Ai).
If in addition f ∈ E+, then ai > 0, we may remove the assumption µ(Ai) = ∞, I(f) is still definednon-ambiguously.
Of course I(f) takes the value ∞ if and only if one of the Ai’s has infinite measure. Observe that iff = 0 almost surely then I(f) = 0.
5.3. INTEGRAL OF SIMPLE FUNCTIONS 55
Lemma 5.3.2 If f ∈ E is represented as
f =M∑j=1
dj1Bj
where Bi are pairwise disjoint, then
I(f) =
M∑i=1
diµ(Bi).
Proof Observe that ⋃j:dj=ai
Bj = Ai.
Also µ(Ai) =∑j:dj=ai µ(Bj) by additive property of measures. Hence
I(f) =
n∑i=1
aiµ(Ai) =
n∑i=1
ai∑
j:dj=ai
µ(Bj)
=M∑j=1
djµ(Bj),
completing the proof. (It is clear if µ(Ai) =∞ then one of the µ(Bj) =∞.)
We show next that I has the IP.
Proposition 5.3.3 Let f, g ∈ E , vanishing outside of a set of finite measure. Then
1. for any a, b ∈ R,I(af + bg) = aI(f) + bI(g).
2. If f ≤ g a.e., then I(f) ≤ I(g).
Proof Let Ai ( respectively Bi) be the sets in the canonical representation for f (respectively forg). Let A0 = f−1(0) and B0 = f−1(0). Then, Ai ∩ Bj forms a finite collection of disjoint measurablesets, denote this collection by Ei : i ≤ N. Then
f =
N∑i=1
ai1Ei , g =
N∑i=1
bi1Ei
and
af + bg =
N∑i=1
(a ai + b bi)1Ei .
5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 56
Use Lemma 5.3.2, we see that
I(af + bg) =
N∑i=1
(a ai + b bi)µ(Ei) = aI(f) + bI(g).
Next if f ≤ g a.e. then f > g is a measurable set (by comparing their values on Ei as above), and
I(f)− I(g) = I(f − g) ≥ 0.
This proved the monotonicity.
Example 5.1 The Dirichlet function f can be written as f = 1Qc . Its integral w.r.t. the Lebesguemeasure on [0, 1] is λ(Qc ∩ [0, 1]) = 1.
Example 5.2 Let a, b be real numbers. Consider the measure space ([a, b],B([a, b]). Let µ be the theLebesgue measure on [a, b]. It is determined by its value on sets of the form A = ∪ni=1(ci, di) (union ofdisjoint intervals),
µ(A) =n∑i
(di − ci).
Consider a = t0 < t1 < · · · < tn = b. Let
f(x) = f010(x) +
n−1∑j=0
aj1(tj ,tj+1](x).
Since µ(0) = 0, ∫ b
af(x)dx =
n−1∑j=0
aj(tj+1 − tj).
5.4 Integration of non-negative functions
We will allow a non-negative function to take values in the extended real line. We say f is measurableif A = f < ∞ is measurable and f−1(A) ∈ B(R) for any A ∈ B(R). The principle is as follows: iff = ∞ on a set of non-zero measure, we should assign∞ to its integral; if f < ∞ almost surely, weignore those values that are∞.
Definition 5.4.1 If f : X → [0,∞] is a measurable function, we define the integral of f to be∫fdµ = supI(g) : 0 ≤ g ≤ f, g ∈ E.
We say f is integrable if∫fdµ <∞.
5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 57
Remark 5.4.1 If µ is a finite measure, then a bounded measurable function is integrable. A special caseis when µ is the Lebesgue measure restricted to any bounded interval.
Definition 5.4.2 If A is a measurable set and f1A is integrable, we define∫Af dµ =
∫f 1A dµ.
Proposition 5.4.2 1. If f ∈ E+ then I(f) =∫fdµ.
2. the integral satisfies monotonicity.
Proof Firstly, let f ∈ E+. We may take g = f ∈ E+, so
supI(g) : g ≤ f, g ∈ E+ ≥ I(f).
Next, by monotonicity for I , if g ∈ E and g ≤ f , then I(g) ≤ I(f). Hence
supI(g) : g ≤ f, g ∈ E+ ≤ I(f).
This proves that I(f) =∫f dµ for f ∈ E+. Now if f1 ≤ f2 are non-negative and measurable, then∫
f1 dµ = supI(g) : g ≤ f1, g ∈ E+ ≤ supI(g) : g ≤ f2, g ∈ E+ =
∫f2 dµ
proving the monotonicity.
We cannot directly prove the linearity and try to obtain this from that of simple functions by takinglimits.
Definition 5.4.3 We say fn → f a.e. if there exists a null set Asuch that fn → f everywhere outside ofthe null set A.
Theorem 5.4.3 (Fatou’s lemma) Let fn be a sequence of non-negative measurable functions that con-verges almost surely on a set E to a function f , then∫
fdµ ≤ lim infn→∞
∫fndµ.
Proof By the definition we may assume that fn → f everywhere. It is sufficient to show that for anyg ∈ E+ with g ≤ f , ∫
g dµ ≤ lim infn→∞
∫fn dµ.
Step 1. Suppose that∫g <∞, set
X0 = x : g(x) > 0.
5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 58
Since g takes only a finite number of values µ(X0) <∞ andM = max g <∞ ( We only need to restrictto X0 as
∫X0g =
∫X g.)
Fix ε > 0. Since g ≤ f = limn→∞ fn, for any x there exists N(x, ε) s.t. for n > N(x, ε),
fn(x) ≥ (1− ε)g(x).
LetAk = x ∈ X0 : fn(x) > (1− ε)g(x), ∀n ≥ k.
Then An is an increasing sequence and ∪nAn = X0. By the continuity property of measures,
limn→∞
µ(X0 \An) = 0.
Then for n ≥ k,
(1− ε)∫Ak
g ≤∫Ak
fn ≤∫fn.
Also,
(1− ε)∫Ak
g = (1− ε)∫g − (1− ε)
∫Ack
g.
Hence
(1− ε)∫g − (1− ε)
∫Ack
g ≤∫Ak
fn.
Since g ≤M , we may take ε to 0, ∫g −
∫Ack
g ≤ lim infn→∞
∫fn,
for every n. Take k →∞ to conclude∫g ≤ lim infn→∞
∫fn.
Step 2. Suppose that∫g =∞, since g take only a finite number of values, and g ≥ 0, it is necessary
that there exists a set of infinite measure on which g takes a positive value a > 0. Denote this set by A.Set
Ak = x : fn(x) ≥ 1
2a, ∀n ≥ k.
Then An is increasing. Since limn→∞ fn ≥ g, we have ∪nAn = A and limn→∞ µ(An) = µ(A) = ∞.For all n ≥ k, ∫
fn ≥∫
1
2a1Ak =
1
2aµ(Ak).
Thus,
lim infn→∞
∫fn ≥
1
2aµ(Ak).
Take k →∞ lim infn→∞ fn ≥ ∞, this completes the proof.
5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 59
Theorem 5.4.4 (Monotone convergence theorem) If fn is a sequence of non-negative measurable func-tions converging almost everywhere to f . Suppose that fn ≤ f for all n. Then∫
limnfn = lim
n
∫fn.
Proof If 0 ≤ g ≤ fn then 0 ≤ g ≤ f , hence for every n,∫fn ≤
∫f .
lim supn
∫fn ≤
∫f.
By Fatou’s lemma,
lim inf∫fn ≥
∫limnfn =
∫f,
completing the proof.
The definition of integrals for non-negative functions using the value supI(g) : 0 ≤ g ≤ f, g ∈ Eis not easy to use. The monotone convergence theorem allow us to use a sequence. Choose fn ∈ E withfn ↑ f (this exists by Theorem 4.1.3), then∫
fdµ = limn→∞
∫fn.
The name ‘Monotone convergence theorem’ comes from the following Corollary.
Corollary 5.4.5 If 0 ≤ f1 ≤ f2 ≤ . . . is a sequence of increasing real valued measurable functions,and let f = supn fn. Then f is integrable if and only if supn
∫fn <∞, and then∫
supnfn = sup
n
∫fn.
This means in particular that
Corollary 5.4.6 If fn is a sequence of increasing simple functions with limit f , then∫f = lim
n→∞
∫fn.
Proposition 5.4.7 If f, g are non-negative and measurable, then the following holds:(1)[Linearity] for any a, b ≥ 0, ∫
(af + bg) = a
∫f + b
∫g.
(2) [Monotonicity] If f ≥ g then∫f ≥
∫g.
(3)∫f = 0 if and only if f = 0 a.e.
5.4. INTEGRATION OF NON-NEGATIVE FUNCTIONS 60
Proof (1) Note that linearity holds on E . Now we approximate f, g with positive simple increasingsequence of functions. Take fn ∈ E+, gn ∈ E+ such that
fn ↑ f, gn ↑ g.
with fn ≤ f and gn ≤ g. Then∫(afn + bgn)dµ =
∫fndµ+
∫bgndµ,
We apply the monotone convergence theorem to take the limit inside the integrals, proving the linearity.
(2) It is sufficient to show f ≥ 0 a.e. then∫f ≥ 0. This is true as the simple functions approximating
f is non-negative.
(3) Let f ≥ 0 a.e. If f = 0 a.e. it is clear that∫fdµ = 0 (e.g. use monotonicity.)
Suppose that∫f = 0. Let An = f ≥ 1
n. Then f > 0 = ∪∞n=1An.
µ(An) =
∫1Andµ ≤
∫1Annfdµ ≤ n
∫fdµ = 0.
Thus
µ(f > 0 ≤∞∑n=1
µ(An) = 0,
concluding the proof.
These shows that we can exchange summation (over a finite number of functions) with integration.For a sum of non-negative functions, this holds also with infinitely many summands, as to be explainedbelow.
Corollary 5.4.8 (Exchange of order of summation and integration) Suppose that fn is a sequence ofnon-negative measurable functions, then∫ ∞∑
n=1
fn =∞∑n=1
∫fn.
Proof By Proposition 5.4.7, finite sum can be taken out of the integral. Fatou’s lemma allow us to takethe limit. ∫ ∞∑
n=1
fn =
∫limN→∞
N∑n=1
fnMonotone
= limN→∞
∫ N∑n=1
fn
linearity= lim
N→∞
N∑n=1
∫fn =
∞∑n=1
∫fn,
concluding the proof.
5.5. INTEGRABLE FUNCTIONS 61
5.5 Integrable functions
Definition 5.5.1 Let f : X → R be measurable. If both f+ and f− are integrable we say that f isintegrable, in which case we define its integral to be:∫
f =
∫f+ −
∫f−.
Note that f is integrable if and only if |f | is integrable. A bounded measurable function is integrablewith respect to a finite measure.
Definition 5.5.2 We denote by the class of integrable functions by L1. The notation L1(X ) is used toemphasize the space. Also L1(µ), L1(X ;µ) are used to emphasize the measure.
Proposition 5.5.1 If f, g are measurable functions. Then
(1) (linearity) For any a, b ∈ R, ∫(af + bg) = a
∫f + b
∫g.
(2) (Monotonicity) If g ≤ f a.e., then∫g ≤
∫f .
(3)∫|f | = 0 if and only if f = 0 almost surely.
Proof Part one follows from splitting f = f+ − f− and g = g+ − g− and the same property fornon-negative functions. If g ≤ f then g − f ≥ 0, so
∫(g − f) ≥ 0. Monotonicity now follows from the
linearity. The last one follows from Proposition 5.4.7 and use the property |f | ≥ 0.
Exercise 5.5.1 If f, g are real valued integrable functions, show the following statements hold:
1. If µ(A) = 0 then∫A f dµ = 0.
2. If∫A f = 0 for every measurable set A then f = 0 µ almost-everywhere.
3. |∫f | ≤
∫|f |
Hint: For part (2), use Proposition 5.4.7 and choose a suitable A.
Example 5.3 Recall that for x0 ∈ X , the Dirac measure at x0 is defined by:
δx0(A) =
0, if x0 ∈ A1, if x0 6∈ A
= 1A(x0).
Let f : X → R be measurable. Then f is integrable and∫X f dδx0 = f(x0).
5.6. FURTHER LIMIT THEOREMS 62
Example 5.4 Let p : X → [0,∞) be integrable. Set
µp(A) :=
∫Ap(x)µ(dx).
Then µp defines a measure.
Proof Firstly µp(φ) =∫φ p µ = 0. Next if A ∩B = φ, 1A∩B = 1A + 1B , So
µp(A ∪B) =
∫(1A + 1B)pdµ = µp(A) + µp(B).
Finally, if An is an increasing sequence of measurable sets with union A,
limn→∞
1An = 1∪∞n=1An= 1A,
showing that
µp(A) =
∫1Ap dµ =
∫limn→∞
1An pdµ = limn→∞
∫1Anp dµ = lim
n→∞µp(An).
Hence the σ-additive property holds.
Questions.Let x0 ∈ R. Can one find a function p : R→ R such that, for all A ∈ B(R), δx0(A) =
∫A p dλ?
Can one find a function p : R→ R such that, for all A ∈ B(R), λ(A) =∫A p dδx0?
Exercise 5.5.2 The answer are no and no. Prove it.
Two measures µ and ν on Ω are mutually singular if there are disjoint sets A1 and A2 such thatΩ = A1
⋃A2 with µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν. The Lebesgue measure and
any Dirac measures δx are singular.
Definition 5.5.3 We say that µ is absolutely continuous with respect to ν if whenever ν(A) = 0, wehave µ(A) = 0. We write µ ν.
Later we will see that if µ and ν are two finite measures such that µ ν then we can found a a densityfunction p such that µ is given by integration of p with respect to ν. This is called the Radon-NikodymTheorem.
5.6 Further limit theorems
Theorem 5.6.1 (Fatou’s Lemma) Suppose that fn is a sequence of non-negative measurable functions,then ∫
lim infn→∞
fn dµ ≤ lim infn→∞
∫fn dµ.
5.6. FURTHER LIMIT THEOREMS 63
Proof Firstly,lim infn→∞
fn = limk→∞
infm≥k
fm.
Observe that infm≥k fm is an increasing sequence and infm≥k fm ≤ fk. So∫lim infn→∞
fn =
∫limk→∞
infm≥k
fm
= lim infk→∞
∫infm≥k
fm ≤ lim infk→∞
∫fk.
This completes the proof.
Theorem 5.6.2 (Dominated convergence theorem) Suppose that fn is a sequence of measurable func-tions converging to a function f almost everywhere. Suppose that there exists an integrable function gsuch that |fn| ≤ |g|, then
(1)
limn→∞
∫fn dµ =
∫limn→∞
fn dµ.
(2)
limn→∞
∫|fn − f | dµ = 0.
Proof Since |fn| ≤ g and g is integrable then so is fn, also f+n ≤ |g| and f−n ≤ |g|. By the monotone
convergence theorem,
limn→∞
∫f+n dµ =
∫limn→∞
f+n dµ, lim
n→∞
∫f−n dµ =
∫limn→∞
f−n dµ,
and proving (1). For (2) note that |fn − f | → 0 a.s. and
|fn − f | ≤ 2|f |.
So by part (1),
limn→∞
∫|fn − f | dµ =
∫limn→∞
|fn − f |dµ = 0.
Exercise 5.6.1 Prove part (2) of the dominated convergence theorem.
Definition 5.6.1 A sequence of measurable functions fn is said to converge to f in L1 if
limn→∞
∫|fn − f | dµ = 0.
5.7. EXAMPLES AND EXERCISES 64
Exercise 5.6.2 Show that if fn → f in L1 then,
limn→∞
∫fn dµ =
∫f dµ.
Definition 5.6.2 1. A measurable functions with the property∫|f |p <∞ are said to be in Lp where
1 ≤ p <∞.
2. If there exists a measurable function f such that∫|fn − f |pdµ → 0, then we say fn converges to
f is Lp, in which case f is also in Lp.
5.7 Examples and exercises
Exercise 5.7.1 If a bounded function f : [a, b]→ R is Riemann integrable, then it is Lebesgue measur-able and Lebesgue integrable. Furthermore, the integrals have the common value:∫
[a,b]f(x)dµ =
∫ b
af(x)dx.
Hint. Check that the integrals agree on functions of the form:
n∑i=1
αi1(ci,di)
where (ci, di) are disjoint intervals.
Let ∆ be the dyadic partition of [a, b]: tj = (b−a)2n . Consider
fn(t) =∑j
supf(x) : x ∈ [ti, ti+1]1(ti,ti+1](t).
Then fn is measurable and converges to f a.e.
Exercise 5.7.2 1. Show that if for two measures µ and ν,∫fdµ =
∫fdν for all f ∈ E , then L1(µ) =
L1(ν), and∫fdµ =
∫fdν for all integrable functions.
2. LetE be a measurable set and µE the restriction of µ to the trace σ-algebraFE = E∩A : A ∈ F.Show that ∫
Efdµ =
∫XfdµE .
Also f ∈ L1(µE) if and only if f1E ∈ L1(µ).
5.7. EXAMPLES AND EXERCISES 65
(1) By the construction of integrals,∫fdµ =
∫fdν for all non-negative functions. Thus
∫|f |dµ =∫
|f |dν, which are either both finite or both infinite. This implies that L1(µ) = L1(ν) and the integralsare the same for they are defined by the integrals of their positive and negative parts.
(2) If g is elementary, so is g1E and∫g1Edµ =
∫gdµE for all elementary g ∈ E . Let fn be an
increasing sequence of simple functions converging to f , then fn1E ↑ f1E . Hence∫Efdµ = lim
n→∞
∫fn1E dµ = lim
n→∞
∫fn dµE =
∫fdµE .
Exercise 5.7.3 Suppose that p : [a, b]→ R is a non-negative continuous function. Set
F (x) = F (a) +
∫ x
ap(t)dt.
Let f : [a, b]→ R be a continuous function. Describe the Lebesgue-Stilejes measure µF and the integral∫f dµF
in terms of Riemann integral.
Sketch. Let A ⊂ B([a, b]). Then,
∫1A dµF = µF (A) =
∫ d
cp(t)dt =
∫[a,b]
1A p(t) dt.
By linearity for any simple function f ,∫fdµF =
∫ b
af(t)p(t)dt.
By Fatou’s lemma we can pass this to non-negative functions, and then by linearity to integrable func-tions: ∫
[a,b]fdµF =
∫ b
af(t)p(t)dt.
Example 5.5 Let fn = n1[0, 1n
] Then, fn → 0 everywhere except at 0.
∫fndλ =
∫ n
0dx = 1.
Hence we cannot exchange the order of taking limit with taking integration in this case.
5.8. THE MONOTONE CLASS THEOREM FOR FUNCTIONS 66
5.8 The Monotone Class Theorem for Functions
Proposition 5.8.1 (The Monotone Class Theorem for Functions) Let (Ω,F) be a measure space. LetC be a π system with σ(C) = F . Let H be a vector space of functions from Ω to R, with the followingproperty:
1. 1 ∈ H, and 1A ∈ H for every A ∈ C.
2. (Monotone class property) If fn ∈ H is an increasing sequence of non-negative functions withf(x) = supn f(x) finite for every x (resp. with f = supn f bounded), then f ∈ H.
ThenH contains the set of all real valued (resp. all bounded ) F-measurable functions.
Proof LetJ = B ∈ F : 1B ∈ H.
By assumption, 1 = 1Ω ∈ H, and J ⊃ C ∪ Ω. If A ⊂ B, A,B ∈ J then 1B\A = 1B − 1A ∈ H bythe vector space property of H. Thus B \ A ∈ J . Let An ∈ J be an increasing sequence of sets then bycondition 2,
1∪∞n=1An= lim
n→∞1An ∈ H.
Hence ∪∞n=1An ∈ J and J is a λ-system. It therefore contains F . This means all indictor functions areinH and simple functions are in J .
Suppose H is closed under monotone limit of functions with bounded limit. If f is bounded positiveand measurable there is an sequence of positive simple functions fn converging to f , thus f ∈ H. If f isnot positive let f = f+ − f− to conclude.
Suppose H is closed under monotone limit of functions with finite limit. If f is positive and measur-able there is an sequence of positive simple functions fn converging to f , thus f ∈ H. Again, if f is notpositive let f = f+ − f− to conclude.
Remark 5.8.2 From this we see a meta theorem (means the theorem is likely to hold, of course one needsto give a proof): If a property concerning integration of functions holds for all indicator functions, then itholds for all measurable functions (use Fatou’s lemma to obtain the monotone property) or it holds for allbounded measurable functions ( use dominated convergence theorem to obtain the monotone property).
5.8.1 Lebesgue Integration
In this section let X = R, F the completion of the Borel σ algebra, i.e. consists of Lebesgue measurablesets, and µ the Lebesgue measure λ. The restriction of λ to any measurable subset is also denoted by thesame letter.
5.9. THE PUSHED FORWARD MEASURE 67
A function f : R→ R is said to be Lebesgue measurable if it is measurable with respect to F/B(R).Borel measurable functions are of course Lebesgue measurable. In this section by a measurable functionwe refer to a Lebesgue measurable function.
Definition 5.8.1 Integrals with respect to a Lebesgue measure, are called Lebesgue integrals.
Exercise 5.8.1 Let I be a Lebesgue measurable set. Suppose that f : I → R is bounded.
1. If f is Lebesgue measurable, show that
inf∫
h dλ, h ≥ f, h ∈ E .
= sup∫
g dλ : g ≤ f, g ∈ E
(5.1)
2. Show that if (5.1) holds, then f is Lebesgue measurable.
5.9 The pushed forward measure
Suppose we are given two measurable spaces (X ,A) and (Y,B). Let µ be a measure on X and f : X →Y be a measurable function. Denote by f∗µ the pushed forward (induced) measure on (Y,B), i.e.
f∗(µ)(B) = µx : f(x) ∈ B.
The right hand side µ(f−1(B)), so we given B the measure of its pre-image.
Proposition 5.9.1 Let ϕ : Y → R a measurable function, we have∫Xϕ f dµ =
∫Yϕ d(f∗µ).
This is in the sense that ϕ is integrable with respect to f∗µ if and only if ϕ f is integrable with respectto µ.
Proof This holds for indicator functions of measurable sets by the definition of pushed forward measures.Apply monotone convergence theorem on both sides to see that those functions with the desired propertyis a monotone class. Apply the monotone class theorem to conclude.
Let (Ω,F ,P) be a probability measure. Let (S,G) be a measurable space. A measurable functionfrom X : Ω → S is called a random variable, S is called its state space. If ϕ : S → R and X : Ω → S
are measurable, then
E[ϕ(X)] :=
∫Ωϕ X dP =
∫Sϕ dX∗(P).
Exercise 5.9.1 If both Y , Y ′ are real valued integrable functions on (Ω,F ,P) with∫A Y =
∫A Y
′ forevery measurable set A, then Y = Y ′ almost surely.
5.10. APPENDIX: RIEMANN INTEGRALS 68
Proof Let A = Y > Y ′. By symmetry it is sufficient to proves that P(A) = 0. Thus we only needto prove the following statement: for any Y ≥ 0,
∫A Y = 0 for any A ∈ F , then Y = 0 almost surely.
For this just note that if Y 6= 0 = Y > 0 has positive measure then for some a > 0, Y > a haspositive measure (for otherwise P(Y > 0) = limn→∞ P(Y > 1
n) = 0), and then∫AY dP ≥ aP(Y > a) > 0,
this contradicts the assumption. Hence µ(Y 6= 0) = 0 ,completting the proof.
Example 5.6 Let f : X → Y . Given a measure on Y , can we use f and µ to define a measure on Xin a meaningful way? The answer is no in general. There is no good sensible way for this construction,for otherwise you should be able to pull back a direct measure, what that would be if f : 1, 2 → Rf(1) = 0 and f(2) = 0? If f : R → R is injective, one can define a measure from a measure on thetarget space, it would be the pushed forward measure for the inverse map.
Definition 5.9.1 If f is a measurable map from X to X , we say µ is invariant by f if f∗(µ) = µ.
Example 5.7 Let δx be the Dirac measure on Rn. Then for any transformation T : Rn → Rn, T∗(δx) =
δTx. The δ-measure is not invariant under rotations (unless x = 0), nor by translation.
5.10 Appendix: Riemann integrals
If a function f : [a, b]→ R is Riemann integrable, it is Lebesque integrable and the two integrals agree.In this and the next section we review Riemann integrals and Riemann-Stieltjes integrals.
I do not intend to cover this in class. However you might find this helpful for comparing severalnotions of integration theory.
We know how to compute the areas of a triangle and a polygon. We then define the integral underneatha continuous curve by rectangle approximation. If y = f(x), x ≥ 0 is the curve, the approximation leadsto∫ t
0 f(s)ds, this is the Riemann integral∫ ba f(x)dx. (Riemann integrals are covered in M2PM1: Real
Analysis)
The Riemann integral of f is defined as follows. Take a partition ∆ : a = x0 < a1 < · · · < an = b.Let Mi and mi be respectively the supremum and infimum values of f on the interval [ai−1, ai]. Define
U(∆, f) =
n∑i=1
Mi(ai − ai−1), L(∆, g) =
n∑i=1
mi(ai − ai−1).
U(f) = inf∆U(∆, f), L(f) = sup
∆U(∆, f).
5.11. APPENDIX: RIEMANN-STIELTJES INTEGRALS 69
Definition 5.10.1 A bounded function f : [a, b] → R is Riemann integrable, if U(f) = L(f) and thecommon value is the Riemann integral of f on [a, b].
A continuous function is Riemann integrable.
Theorem 5.10.1 A bounded function f is Riemann integrable if and only if for every ε > 0 there existsa partition such that U(f,∆)− L(f,∆) < ε.
Observe that the oscillation of f on [ai−1, ai] is Osc(f) = Mi −mi,
U(f,∆)− L(f,∆) =∑
Osc(f, [ai−1, ai])(ai − ai−1).
Define the Riemann sum:
R(f,∆) =n∑i=1
f(x∗i )(ai − ai−1).
Theorem 5.10.2 Suppose a bounded function f : [a, b]→ R is Riemann integrable. If ∆n is a sequenceof partitions with mesh goes to zero, then their Riemann sum R(f,∆n) converges to
∫ ba f(x)dx.
This can be proved using Theorem 5.10.1.
Remark 5.10.3 If f : [0,∞)→ R is measurable and Riemann integrable on every interval [0, n] wheren ∈ N, then f is Lebesgue integrable on [0,∞) if and only if
limN→∞
∫ N
0|f(x)|dx <∞.
If so,∫∞
0 f(x)dx =∫
[0,∞) fdλ.
The concept ‘Lebesgue integrable’ does not allow cancellation, while improper Riemann integrationallows it. sin(x)
x is improper Riemann integrable, bot not Lebesgue integrable.
5.11 Appendix: Riemann-Stieltjes integrals
Riemann-Stieltjes integrals aree defined in parallel to Riemann integrals. Suppose : f [a, b] → R is abounded function and g a monotone increasing function on [a, b], for example g(x) = x in which wecase we are back to Riemann integrals.
Write ∆gi = g(ai)− g(ai−1). Define
U(∆, f, g) =
n∑i=1
Mi∆gi, L(∆, f, g) =
n∑i=1
mi∆gi.
5.11. APPENDIX: RIEMANN-STIELTJES INTEGRALS 70
U(f, g) = inf∆U(∆, f, g), L(f, g) = sup
∆U(∆, f, g).
Definition 5.11.1 If U(f, g) = L(f, g), we say f is Riemann-Stieltjes integrable (with respect to g), wedenote f ∈ D(g). The common value is denoted by
∫ ba fdg.
Theorem 5.11.1 (Integrability Criterion) Suppose f is a bounded function and g a monotone increas-ing function on [a, b]. Then f is Riemann-Stieltjes integrable if and only if for any ε > 0, there exists apartition ∆ of [a, b] such that
U(∆, f, g)− L(∆, f, g) < ε.
Corollary 5.11.2 Suppose f is a bounded function with only a finite number of discontinuity points, andg a monotone increasing function on [a, b], continuous at every point where f is discontinuous. Then fis Riemann-Stieltjes integrable.
Proposition 5.11.3 Suppose that f, f1, f2 : [a, b] → R are bounded, Riemann-Stieltjes integrable withrespect to a monotone increasing function. g : [a, b]→ R. Let c ∈ R.
1. If f ∈ D(g), then for any c ∈ R, cf ∈ D(g)and∫ ba (cf)dg = c
∫ ba fdg.
2. f1 + f2 ∈ D(g), and∫ ba (f1 + f2)dg =
∫ ba f1dg +
∫ ba dg.
3. If f1 ≤ f2 then∫ ba f1dg ≤
∫ ba f2dg.
4. If f ∈ D(g) and c > 0 is a constant, then f ∈ D(cg) and∫ ba fd(cg) = c
∫ ba fdg.
5. If f ∈ D(g) ∩ D(g′), then f ∈ D(g + g′),∫ b
afd(g + g′) =
∫ b
afdg +
∫ b
addg′.
Definition 5.11.2 If g = g1 − g2, where monotone increasing functions we define∫fd(g1 − g2) =
∫fdg1 −
∫fdg2.
Riemann-Stieljes integrals do not exists if the integrator and the integrand have the same point ofdiscontinuity. That
∫ ba fdF,
∫ cb fdF exist in Riemann-Stieljes sense do not necessarily imply that
∫ ca fdF
is Riemann-Stieljes integrable.
Proposition 5.11.4 If g : [a, b] → R is continuous and F : [a, b] → R has bounded variation then theRiemann-Stieljes integral
∫ ba gdF and
∫ ba Fdg exist. If F is furthermore absolutely continuous and the
integral equals to the corresponding Lebesgue integral:∫ b
agdF =
∫ b
agF ′dx.
5.12. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION 71
5.12 Appendix: Functions with bounded variation
The most commonly used integrators are functions of bounded variations. Functions of bounded varia-tions are differences of increasing functions.
Given an interval [a, b], define P ([a, b]) to be the set of all partitions of [a, b]:
P ([a, b]) = t = (t0, t1, . . . , tn) : a = t0 < t1 < · · · < tn−1 < tn = b, n ∈ N.
Let ∆t = max0≤i≤n−1(ti+1 − ti) be the modulus of the partition. Write
∑[u,v]⊂∆
|f(u)− f(v)| =n∑j=1
|f(tj)− f(tj−1)|.
Definition 5.12.1 The total variation of a function f on [a, b] is defined by
|f |TV ([a, b]) = sup∆∈P ([a,b])
∑[u,v]⊂∆
|f(u)− f(v)|.
If this number is finite we say that f has bounded total variation over [a, b].
The function x sin( 1x) does not have bounded variation over any interval containing 0.
Note that f could have bounded total variation over [a, b] without having a bounded variation over R(e.f. f(x) = sinx).
Theorem 5.12.1 A. function is of bounded variation on [a, b] if and only if f is the difference of twomonotone real-valued functions on [a, b].
Proof Let f = f1 − f2, where f, g are increasing functions, then
|f |TV ([a, b]) = sup∆∈P ([a,b])
∑[u,v]⊂∆
|f1(u)− f1(v)− (f2(u)− f2(v))|.
For f : [a, b]→ R, define a function:
|f |TV (x) = |f |TV ([a, x]).
Example 5.8 1. If f : R→ R is an increasing function, |f |TV (x) = f(x)− f(−∞).
2. If f is locally Lipschitz continuous, then f is of bounded variation on any finite time interval.
3. The set of bounded variation function form a vector space.
5.12. APPENDIX: FUNCTIONS WITH BOUNDED VARIATION 72
If f : R → R is an increasing then it has left and right derivatives at every point and there are only acountable number of points of discontinuity. Furthermore f is differentiable almost everywhere.
Theorem 5.12.2 Let F : R→ R+ be a function of bounded variation, right continuous with F (−∞) =
0. Then
1. F determines a Borel measure µ on R.
2. The Borel measure µ, with distribution function F , is absolutely continuous with respect to theLebesgue measure dx if and only if
F (x) =
∫ x
0F ′(t)dt.
3. It is singular with respect to dx if and only if F ′ = 0.
Chapter 6
Product measures and Fubini’ Theorem
Throughout this section let (X ,F , µ) and (Y,G, ν) be two measure spaces. A (measurable) product setis of the form A×B where A ∈ F and B ∈ G. Denote by C the collection of products measurable set.
C = A×B : A ∈ F , B ∈ G.
The product σ-algebra is generated by such sets.
F ⊗ G = σ(C).
Can we assign a measure to F ⊗ G that is consistent to µ and ν?
If X = a1, . . . , an, we use the power set σ-algebra. If we assign any non-negative values to ai,say pi, this determines a measure on 2X (no relations is needed on pi, as intersections of singletonsare emptysets, the union of two singletons is no longer a singleton). The power σ-algebra has precisely2n elements. Every measure on the finite space X , with the power σ-algebra, is determined uniquely bytheir values of the singleton sub-sets.
Let Y = b1, . . . , bm, then X × Y = (ai, bj) : 1 ≤ i ≤ n, 1 ≤ j ≤ m, and every element ofX ×Y is contained in 2X ⊗ 2Y . Thus 2X ⊗ 2Y = 2X×Y . Hence by assigning a value to every (ai, bj),this is a product set, this determine a measure on the product σ-algebra. Given µ on X and ν on Y , wedefine the product measure by
µ× ν((ai, bj)) = µ(ai)ν(bj).
Similarly if X is given by a partition A1, . . . , An, and Y is given a partition B1, . . . , Bm, and ifwe set F = σ(A1, . . . , An) and G = σ(B1, . . . , Bm). Then there are 2n and 2m elements in F andG respectively. If we assign values µ(Ai), this determine a measure on X (because the intersections oftheAi are emptyset, no relation between these numbers are neeeded for extending these to every elementof F). Every measure on F is obtained in this way. Also F ⊗ G = σ(Ai × Bj). It is clear that
73
6.1. PRODUCT MEASURES 74
Ai×Bj is a partition of X ×Y . Thus this is totally analogous to the finite state spaces discussed earlier.Furthermore given µ, ν on F and G respectively, the product measure on F ⊗ G will be defined by
µ× G(Ai ×Bj) = µ(Ai)ν(Bj).
If F and G are not finite σ-algebras, they are uncountable, the construction for the product measurebecomes much more involved.
6.1 Product measures
Recall the definition of elementary family in Definition 2.2.4.
Lemma 6.1.1 The collection C of product sets is an elementary family. The collection A of a finitenumber of disjoint union of product sets is an algebra.
Proof The empty set is also in C. Also,
(A×B) ∩ (A′ ×B′) = (A ∩A′)× (B ∩B′), (A×B)c = (X ×Bc) ∪ (Ac ×B).
Thus, C is closed under taking complement, and the complement of a set is the disjoint union of a finitenumber of product sets. By exercise 2.2.2, the collection of finite number of disjoint union of sets fromC is an algebra.
Let us define a function µ × ν : A → [0,∞] by the following rules: If E = ∪nj=1(Ai × Bj) whereAi ∈ F and Bi ∈ G, then we want to define
µ× ν(E) =n∑j=1
µ(Ai)ν(Bi). (6.1)
We must show that this is independent of the expression of E into disjoint unions of product sets.
Lemma 6.1.2 Let E = A × B ∈ C. Suppose that E = ∪∞j=1(Aj × Bj) where Ai ∈ F and Bi ∈ G,where Aj ×Bj are disjoint sets. then
µ(A)ν(B) =∞∑j=1
µ(Aj)ν(Bi).
Proof Observe that 1A×B(x, y) = 1A(x)1B(y) for x ∈ X , B ∈ Y . Also,
1A×B =∞∑i=1
1Ai×Bi =∞∑i=1
1Ai1Bi .
6.1. PRODUCT MEASURES 75
Holding y fixed, we may integrate 1A(x)1B(y) =∑∞
i=1 1Ai(x)1Bi(y) with respect to µ.∫X1A(x)1B(y) dµ(x) =
∫X
∞∑i=1
1Ai(x)1Bi(y)dµ(y).
Use Corollary 5.4.8 for exchanging the summation and integration,
µ(A)1B =∞∑i=1
µ(Ai)1Bi .
We use the convention0 · ∞ = 0.
( So if µ(A) = ∞, then the left hand side is ∞ if y ∈ B and 0 otherwise. The right hand side isinterpreted in the same way. Observe that Bi ⊂ B and Ai ⊂ A.)
Integrate both sides from the above line w.r.t. ν we obtain µ(A)ν(B) =∑∞
i=1 µ(Ai)ν(Bi), complet-ting the proof.
Lemma 6.1.3 Suppose that⋃∞i=1(Ai ×Bi),
⋃∞j=1(Cj ×Dj) are disjoint unions and
∞⋃i=1
(Ai ×Bi) =∞⋃j=1
(Cj ×Dj),
then∞∑i=1
µ(Ai)ν(Bi) =∞∑j=1
µ(Cj)ν(Dj).
Proof Let E =⋃∞j=1(Ai ×Bi). Then,
∞⋃i=1
(Ai ×Bi) = (∞⋃j=1
(Ai ×Bi)) ∩ E
=∞⋃i=1
∞⋃j=1
(Ai ×Bi) ∩ (Cj ×Dj)
=
∞⋃i=1
∞⋃j=1
(Ai ∩ Cj)× (Bi ∩Dj).
The last is also a union of disjoint sets.
For each i, we apply the previous lemma to
Ei = Ai ×Bi =
∞⋃j=1
(Ai ∩ Cj)× (Bi ∩Dj)
6.1. PRODUCT MEASURES 76
to see
µ(Ai ×Bi) =
∞∑j=1
µ(Ai ∩ Cj)ν(Bi ∩Dj)
So,∞∑i=1
µ(Ai ×Bi) =
∞∑i=1
∞∑j=1
µ(Ai ∩ Cj)ν(Bi ∩Dj).
Similarly∞∑j=1
µ(Cj ×Dj) =∞∑i=1
∞∑j=1
µ(Ai ∩ Cj)ν(Bi ∩Dj).
This finishes the proof.
The map (6.1) defines a pre-measure on A. Firstly µ × ν(φ) = 0, secondly it has finite additiveproperty on A. Furthermore, suppose that E =
⋃∞i=1Ei, where Ei ∈ A are disjoint, with the property
that E ∈ A, we show that µ(E) =∑∞
i=1 µ(Ei). On one hand, each Ei is a finite disjoint union ofproduct sets, we pull these product sets , that composes of Ei, together: they are all disjoint. Relabelthese disjoint product sets we see E =
⋃∞i=1Ai ×Bi. Re-arrange it, we see that µ(E) =
∑µi(Ei).
With this we define an outer measure µ∗, by (3.1):
µ∗(E) = inf
∞∑j=1
µ× ν(Ej) : E ⊂∞⋃j=1
Ej , Ej ∈ A
.
By the Caratheodory theorem, the outer measure is a measure on F × G and agree with µ × ν on A.This measure on F ⊗ G will be called the product measure. If µ and ν are σ-finite, there exists a uniquemeasure such that the measure of product sets is the product of the measures on each factor.
If µ, ν are finite, so is µ×ν(X ×Y) = µ(X )ν(Y) <∞. If they are both σ-finite, so is µ×ν. Indeed,there exist An ↑ X and Bn ↑ Y such that µ(An) < ∞, ν(Bn) < ∞ for every n. Now Ai × bj is anincreasing sequence converging to X × Y with finite measure, hence µ× ν is σ-finite.
Remark: Since a countable union of measures of A is a countable union of product sets,
µ∗(E) = inf
∞∑j=1
µ(Aj)ν(Aj) : E ⊂∞⋃j=1
Aj ×Bj , Aj ∈ F , Bj ∈ F ∈ A
.
Remark 6.1.4 Similar constructions can be used to obtain a product measure on a finite number ofcopies of product spaces and their product σ-algebras.
Let us take for example three measure spaces (Xi,Fi, µi). Recall that
⊗3i=1Fi = σ(
Π3i=1Ai : Ai ∈ Fi, i = 1, 2, 3
).
6.1. PRODUCT MEASURES 77
We define µ1 × µ2 × µ3 by
µ1 × µ2 × µ3(A1 ×A2 ×A3) = Π3i=1µi(Ai).
Proposition 6.1.5 Let (Xi,Fi, µi) be measurable spaces. We have,
(F1 ⊗F2)⊗F3 = ⊗3i=1Fi.
If µi are σ-finite, then
(µ1 × µ2)× µ3 = µ1 × µ2 × µ3.
Proof It is clear that,
Π3i=1Ai : Ai ∈ Fi, i = 1, 2, 3
⊂ E × C : E ∈ F1 ⊗F2, C ∈ F3. Hence
⊗3i=1Fi ⊂ (F1 ⊗F2)⊗F3.
On the other hand, for any C ∈ F3,
E × C : E ∈ F1 ⊗F2 ⊂ ⊗3i=1Fi.
so
E × C : E ∈ F1 ⊗F2, C ∈ F3 ⊂ ∪C∈F3E × C : E ∈ ⊗2i=1Fi ⊂ ⊗3
i=1Fi.
Thus
(F1 ⊗F2)⊗F3 = σ(E × C : E ∈ F1 ⊗F2, C ∈ F3) ⊂ ⊗3i=1Fi.
Now define µ1 × µ2 on F1 ⊗ F2. Then, (µ1 × µ2) × µ3 and µ1 × µ2 × µ3 agree on the product sets.These product set determines σ-finite measures.
Example 6.1 Let λ be the Lebesgue measure, the completion of the product measure on R×R is calledthe Lebesgue measure on R2. This is denoted by the same letter or by λ2. Our question is: givenf : R2 → R measurable, what is ∫
R2fdλ2?
Remark 6.1.6 If each factor (X ,F , µ) and (Y,G, ν) are complete, (i.e. the σ-algebra is complete witherespect to the given measure, The product σ-algebra is often NOT complete with respect to the productmeasure. We can of course complete the product σ-algebra under the product measure.
Exercise 6.1.1 Give an example of a Borel measure on R2 (i.e. on the Borel σ-algebra) that is not aproduct measure.
6.1. PRODUCT MEASURES 78
6.1.1 Sections of subsets of product spaces
If E ⊂ X × Y , for x ∈ X , we define the section
Ex = y ∈ Y : (x, y) ∈ E.
Similarly for y ∈ Y we defineEy = x ∈ X : (x, y) ∈ E.
Proposition 6.1.7 If E ∈ F ⊗ G, then
1. Ex ∈ G for every x ∈ X and ν(Ex) is measurable.
2. Ey ∈ F for every y ∈ Y and µ(Ey) is measurable.
3. Also,
µ× ν(E) =
∫Xν(Ex)µ(dx) =
∫Yµ(Ey)ν(dy).
Proof (1) We first assume that the measures are finite. Let
D = E ∈ F ⊗ G : conclusion holds.
Observe that
(A×B)x =
B, x ∈ A,φ, x 6∈ A.
Take µ measure of the above,
ν((A×B)x) =
ν(B), x ∈ A,0, x 6∈ A.
Integrate w.r.t. x, we see
ν((A×B)x)µ(dx) = ν(B)µ(A) = µ× ν(A×B).
A similar conclusion holds for the y-sections. This means D contains the set of products sets, which is aπ-system.
If A ⊂ B, A,B ∈ D then
(A \B)x = Ax \Bx ∈ G (A \B)y = Ay \By ∈ F .∫ν((A \B)x)µ(dx) =
∫ν(Ax)dµ(x)−
∫ν(Bx)dµ(y),
6.2. FUBINI’S THEOREM 79
A similar conclusion for the y-sections. Hence A \B ∈ D. If An ∈ D is an increasing sequence,
(⋃n
An)x =⋃
(An)x ∈ G.
Thus D is a λ system and thus equals F ⊗ G, as it contains a generating set which is a π-system. Thisconcludes the finite measure case.
(2) Suppose that the measures are σ-finite, we takeAi×Bi increasing with finite measure, then applythe proposition to E ∩ (Ai ×Bi), we see that
(E ∩ (Ai ×Bi))x =
Ex ∩Bi, if x ∈ Aiφ, otherwise,
is measurable for every x, then so is Ex = ∪i(E∩ (Ai×Bi))x. Similarly conclusions for the y-sections.Since,
µ× ν(E ∩ (Ai ×Bi)) =
∫Xν((E ∩ (Ai ×Bi))x)µ(dx) =
∫Yµ((E ∩ (Ai ×Bi))yx)ν(dy),
we may take i→∞ and use the monotone convergence theorem to conclude.
Proposition 6.1.8 If f : X × Y → R is measurable, then so are the following functions:
fx = f(x, ·) : CY → R, fy = f(·, y) : X → R
are measurable for every x ∈ X and for every y ∈ Y .
Proof For any B ∈ B(R),
(fy)−1(B) = x : f(x, y) ∈ B = (f−1(B))y ∈ F ,(fx)−1(B) = y : f(x, y) ∈ B = (f−1(B))x ∈ G.
This conclusion follows from Proposition 6.1.7.
6.2 Fubini’s Theorem
Theorem 6.2.1 (The Fubini-Tonelli Theorem) Let µ and ν be σ-finite measures.
(1) Suppose f : X×Y → [0,∞] is measurable. Then g(x) =∫fx(y)dν(y) and h(y) =
∫fy(x)dµ(x)
are both non-negative and measurable. Furthermore,∫fd(µ× ν) =
∫X
(∫Yf(x, y)dν(y)
)dµ(x) =
∫Y
(∫Xf(x, y)dµ(x)
)dν(y). (6.2)
6.2. FUBINI’S THEOREM 80
(2) Suppose f ∈ L1(µ× ν), then
(a) fx ∈ L1(ν) for a.e. x
(b) fy ∈ L1(µ) for a.e. y ∈ Y .
(c) The almost everywhere defined functions g1(x) =∫fx(y)dν(y) ∈ L1(µ) and g2(y) =∫
fy(x)dµ(x) ∈ L1(ν) are both integrable.
(d) Furthermore, (6.2) holds.
Proof This theorem holds for characteristic functions, and therefore holds for simple functions by non-linearity. We prove (1) first. Let fn ∈ E+ such that fn ↑ f . Then
gn(x) =
∫fn(x, y)dν(y) ↑ g(x) =
∫f(x, y)dν(y),
hn(y) =
∫fn(x, y)dµ(x) ↑ h(y) =
∫f(x, y)dµ(x).
So g, h are measurable and∫fnd(µ× ν) =
∫X
(∫Yfn(x, y)dν(y)
)dµ(x) =
∫Y
(∫Xfn(x, y)dµ(x)
)dν(y).
Apply the monotone convergence theorem to conclude (1).
Also,∫fd(µ × ν) < ∞ implies that
∫Y fn(x, y)dν(y) is finite for a.e. x and
∫X fn(x, y)dµ(x) is
finite for a.e. y.
For a general integrable function f , apply this to f+ and f−, and use the statement above to conclude.
Remark 6.2.2 For simplicity we remove the brackets and write∫X
∫Yf(x, y)dν(y)dµ(x) =
∫X
(∫Yf(x, y)dν(y)
)dµ(x),∫
X
∫Yf(x, y)dν(y)dµ(x) =
∫Y
(∫Xf(x, y)dν(y)
)dµ(x).
Corollary 6.2.3 Fubini’s theorem holds if we replace µ × ν by its completion, provided that µ, ν arecomplete.
To see this we note that if f is measurable with respect to the completion of F ⊗ G, then there existsan F ⊗ G -measurable function f such that f = f except on a null set E, which by enlarging we mayassume to be F × G measurable. Then µ(Ex) = 0 for a.e. x, ν(Ey) = 0 for a.e. y. Since µ, ν arecomplete ,m they are measurable. Thus, we may apply Fubini’s theorem to f ,∫
X×Yf dµ× ν =
∫X×Y
f dµ× ν =
∫ ∫f(x, y) dµ(x)dν(y) =
∫ ∫f(x, y) dν(y)dµ(x).
6.2. FUBINI’S THEOREM 81
We know that f(x, ·) is measurable for almost every x and f(x, ·) differs at most on Ex, a null set.Similarly we conclude f(·, y) is measurable a..e. y, and also they are in L1, the conclusion follows.
Exercise 6.2.1 Let X = 1, . . . , N be a finite state space. Write down the the Fubini-Tonelli Theoremspecific to this case.
Given a measure µ on X × Y . We say µi are the marginals of µ if
µ(X ×B) = µ2(B), µ(A× Y) = µ1(A), ∀A ∈ F , B ∈ G.
We also say that µ is a coupling of µ1 and µ2.
Exercise 6.2.2 Given an example of a Borel measure µ on R2, two Borel measures µi on R such that µis not the product measure µ1×µ2, but µi are marginals of µ. Given another set of examples of measuressuch that µ = µ1 × µ2 but µi are not the Lebesgue measure.
Note that if f is a non-negative measurable function, µ(A) :=∫A f dµ is a measure.
Exercise 6.2.3 For f : R → R given by f(x) = x2 + 1 and λ the Lebesgue measure, compute thepushed forward measure. f∗λ and compute
∫[1,2] e
√x−1 d(f∗λ).
Exercise 6.2.4 Let f(x, y) = (x2−y2)(x2+y2)2
for x ∈ [0, 1] and y ∈ (0, 1], also f(0, 0) = 0. Show thatf 6∈ L1([0, 1]× [0, 1]).
Chapter 7
Radon-Nikodym Theorem
In this chapter we compare two measure, discuss the concepts of singular and absolutely continuity.
7.1 Singular and absolutely continuity
Definition 7.1.1 We say that µ is absolutely continuous with respect to ν if whenever ν(A) = 0, wehave µ(A) = 0.
We write µ ν. Given any two finite measures µ1, µ2 we can found a reference measure (e.g. µ =
µ1 + µ2, such that µ1 µ and µ2 µ).
Definition 7.1.2 Two measures µ and ν on Ω are mutually singular if there are disjoint sets A1 and A2
such that Ω = A1⋃A2 such that µ(A1) = 0 and ν(A2) = 0. This is denoted by µ ⊥ ν.
The Lebesgue measure and any Dirac measures δx are singular.
Exercise 7.1.1 If ν is a measure on (X ,G) and p : X → [0,∞) is an integrable function. Show that
µ(A) =
∫Ap(y)ν(dy)
defines a measure and µ ν. We write µ = pν.
Example 7.1 The Gaussian measure N(0, 1) on R1 is absolutely continuous w.r.t. the Lebesgue mea-sure.
N(0, 1)(A) =
∫A
1√2πe−|x|22 dx.
83
7.2. SIGNED MEASURE 84
Example 7.2 Let Ω = [0, 1] and P the Lebesgue measure . Define Q(A) = 2∫A x
2dx so
dQ
dP(x) = 2x2.
Define Q1 bydQ1
dP= 21[0, 1
2].
Then Q1 P and P is not absolutely continuous with respect to Q1. Define Q2 by
dQ2
dP= 21[ 1
2,1].
The two measures Q1 and Q2 are singular.
Exercise 7.1.2 Let ν be a finite measure. Sow that ν µ if and only if the following holds: for anyε > 0 there exists δ > 0 such that whenever µ(E) < δ then ν(E) < ε.
Exercise 7.1.3 Show that if f ∈ L1(µ) is a non-negative function, then for every ε > 0 there existsδ > 0 such that whenever µ(A) < δ we have ∫
Efdµ < ε.
Note that ν(A) =∫A dµ defines a measure with ν µ.
Exercise 7.1.4 Give examples of pairs of measure µ, ν with µ ν and µ ⊥ ν.
7.2 Signed measure
A signed measure µ is a function on sets taking values in [−∞,∞] and satisfying the properties of ameasure: µ(φ) = 0, if Aj are disjoint measurable sets, then ν(∪∞j=1Aj) =
∑∞j=1 ν(Aj), in addition we
assume that if taking an infinity value, µ takes at most one of the values from ±∞.
If f = f+ − f− is measurable, with either∫f+ < ∞ or
∫f− < ∞, then µ(A) :=
∫A f dµ is a
signed measure.
A measurable set P is said to be positive for µ if for any F ⊂ P , we have µ(F ) ≥ 0. In other words µrestricts to a (positive) measure on F . A measurable set P is said to be negative for µ if fro any F ⊂ N ,we have µ(F ) ≤ 0.
Exercise 7.2.1 If Pn is a sequence of positive sets for µ, then so is ∪∞n=1Pn.
Note that each Pn \⋃n−1k=1 Pn is positive for µ.
7.2. SIGNED MEASURE 85
Theorem 7.2.1 (The Hahn decomposition theorem) If µ is a signed measure on (X ,F), then thereexists a partition of X : X = P ∪N and P ∩N = φ, such that P is positive and N is negative, for µ.
If X = P ′ ∪N ′ is another such partition then µ(P∆P ′) = 0 and µ(N∆N ′) = 0.
Theorem 7.2.2 (Jordan decomposition theorem) If µ is a signed measure on (X ,F), then there existunique positive measures µ+ and µ− such that
µ = µ+ − µ−, µ+ ⊥ µ−.
Proof Take µ+ = µ|P , µ− = µ|N where X = P ∪N as in Hahn decomposition theorem. To see it isunique suppose µ = Q1 −Q2 with Q1 ⊥ Q2, then X = E ∪ F , disjoint E,F , where Q1(E2) = 0 andQ2(E1) = 0. Then E ∪ F is another Hahan decompostion, by its uniqueness we conclude. We can
talk about a signed measure to be absolutely continuous with respect to a (positive ) measure.
Definition 7.2.1 If µ is a signed measure on (X ,F), we set |µ| = µ+ + µ−, this is called the totalvariation of µ. If ν is a (positive ) measure, we say µ ν if µ(A) = 0 whenever ν(A) = 0 where A isa measurable set.
Exercise 7.2.2 Show that µ ν if and only if |µ| ν if and only if µ+ ν and µ− ν.
Definition 7.2.2 If µ is a signed measure we set L1(µ) = L1(µ+)∩L1(µ−), and for f ∈ L1(µ), we set∫fdµ =
∫fdµ+ −
∫fdµ−.
Lemma 7.2.3 Suppose that µ and ν are finite (positive) measures then µ ⊥ ν or there exists ε > 0 anda measurable set E such that ν(E) > 0 and E is a positive set for µ− εν.
Proof Suppose that µ is non-trivial. By Hahn decomposition theorem, there exists a partition X =
Pk ∪Nk such that Pk is a positive set and Nk a negative set for the signed measure µ− 1kν.
Let P =⋃k Pk and N = ∩kNk. Then N , a subset of Nk, is negative for every µ− 1
kν, in particular
µ(N)− 1
kν(N) ≤ 0.
Taking k → ∞, we see µ(N) ≤ 1kν(N) = 0. If µ, ν are not mutually singular, µ must charge the
complement of N which is P . Then, since P is a countable union of Pk’s we have µ(Pn) > 0 for somen. We take E = Pn on which µ− 1
nν > 0.
7.3. RADON-NIKODYM THEOREM 86
7.3 Radon-Nikodym Theorem
Terminology: A signed measure is σ-finite if both µ+ and µ− are.Use with caution the following (for it might be confused with, I used this in proofs.) We use µ ≤ ν toindicate that µ(A) ≤ ν(A) for every measurable set A.
Theorem 7.3.1 (Lebesgue-Radon-Nikodym Theorem) Let µ and ν be two σ-finite measures, µ is asigned measure while ν is a positive measure, on a space (X ,F). Then,
(1) [Lebesgue decomposition] there exist, uniquely, σ-finite measures with the property that µ = µ1 +
µ0, µ1 ⊥ ν, µ0 ν.
(2) [Radon-Nikodym Theorem] There exists a measurable function D : X → R such that µ1(A) =∫ADdν. Any two such functions agree ν almost surely.
(3) If ν and µ are finite, then D is finite a.s. and integrable.
(3) If µ is a positive measure then f ≥ 0.
Definition 7.3.1 We denote D by dµdν and call it the Radon-Nikodym derivative of µ with respect to ν.
Proof Step 1. We first assume that µ and ν are finite positive measures. Let us define
H =
f : X → [0,∞] :
∫Afdν ≤ µ(A) for all A ∈ F
.
We choose the ‘largest f’ fromH as follows. Let
M = sup∫Xfdν : f ∈ H
.
This is possible sinceH is not empty: f ≡ 0 ∈ H. Also any f ∈ H has finite integral∫fdν ≤ µ(X ) so
M <∞.
Next we see, If f, g ∈ H then max(f, g) ∈ H. Let E = f < g.∫A
max(f, g)dν =
∫E∩A
gdν +
∫E∩Ac
fdν ≤ µ(A ∩ E) + µ(A ∩ Ec) = µ(A).
There exists a sequence fn such that,∫fndν increases and
limn
∫fndµ = M.
Setgn = maxf1, . . . , fn ∈ H.
7.3. RADON-NIKODYM THEOREM 87
This is an increasing sequence and ∫Xfn ≤
∫Xgn ≤M.
By the monotone convergence theorem,∫
supn gndµ = limn
∫gndµ = M .
Set
D = limngn,
Then D ∈ H. Indeed, it is measurable and for any A ∈ F ,∫Agdν =
∫1Agdν = lim
n
∫1A gndν ≤ µ(A),
We show that µ − fdν is singular w.r.t. ν. If not there exists ε > 0 and E measurable such thatν(E) > 0 with µ− fdν − ε1Edν > 0 on E, i.e. for every A ∈ F ,∫
A∩E(f + ε1E)dν ≤ µ(A ∩ E).
Then∫A
((f + ε1E))dν =
∫A∩E
((f + ε1E))dν +
∫A∩Ec
fdν ≤ µ(A ∩ E) + µ(A ∩ Ec) = µ(A)
and (f + ε1E) ∈ H and its integrable is larger than M , contradicts with M is the largest integral amongall f ∈ H.
Uniqueness: if µ = µ0 + fdν = µ0 + gdν, then µ0− µ0 + (f − g)dν = 0. Then µ0− µ0 = (f − g)ν
is both singular and absoultely continuous w.r.t. ν. It must then vanish. This means µ0 = µ0 and∫A(f − g)dν = 0 for every A ∈ F and f = g a.e. w.r.t. ν.
Step 2. Suppose µ and ν are σ-finite (positive) measures,. Then there exists Ej disjoint with X =
∪jEj µ(Ei) <∞ and ν(Ej) <∞. Let µj = µ|Ej and νj = ν|Ej . By previous argument,
µj = µj0 + fjdνj
We may extend fj from Ej to X by setting it to be zero on Ecj . Set µ =∑µj(E ∩ ·) and f =
∑j fj .
Then∑
j µj is singular w.r.t. fdν. Now
µ(A) =∑j
µj(A ∩ Ej) =∑j
νj0(A) +
∫A
∑j
fjdν = ν0(A) +
∫Afdν.
Step 3. Finally suppose µ is a signed measure, proceed as above.
7.3. RADON-NIKODYM THEOREM 88
Exercise 7.3.1 (1) In the proof, why did we not take the density to be D(x) = supf∈H f(x)?1
(2) If we have a finite state space and discrete σ-algebra, construct the function D directly.
(3) If one has a finite σ-algebra, construct this function D directly.
We use ‘essentially unique’ to say that if D1 and D2 are two possible choices for the density of µwith respect to ν, then the set ω |D1(ω) 6= D2(ω), is of ν-measure 0.
Exercise 7.3.2 Define a Boral measure on R by µ(A) =∫A p(x)dx + 1A(2π) where p is a continuous
function vanishing everywhere on the complement of [−1, 1]. Is µ absolutely continuous with respect tothe Lebesgue measure dx? Write down its Lebesgue decomposition w.r.t. dx
Proposition 7.3.2 Let µ1, µ2, µ3 be σ -finite measures, µ2, µ3 are positive measures.
1. Suppose µ1 µ2. If g ∈ L1(µ1) then g dµ1dµ2∈ L1(µ2) and∫
gdµ1 =
∫gdµ1
dµ2dµ2.
2. If µ1 µ2, µ2 µ3 then µ1 µ3, and
dµ1
dµ3=dµ1
dµ2
dµ2
dµ3, µ3 − almost surely.
Proof (1) If g = 1A and µ1(A) <∞, we know by the definition,
µ1(A) =
∫A
dµ1
dµ2dµ2.
So the claim holds. This extends to simple functions by linearity, to non-negative functions by themonotone convergence theorem and to any L1 functions by linearity again.
(2) The transitivity of the absolute continuity is clear(check!). Use part (1), first suppose that dµ1dµ2∈
L1(µ2), we see
µ1(A) =
∫A
dµ1
dµ2dµ2
g=dµ1dµ2=
∫A
dµ1
dµ2
dµ2
dµ3dµ3.
Then we assume µ1 is a positive σ-finite measure. break down X = ∪Ej where µ1(Ej) <∞ andEj aredisjoint measurable sets. We apply the argument, on each Ej , then sum them up. For a signed measurewe break it down to m1 = µ+ − µ− and to each we apply the above conclusion.
1Hint: Let A be a non-measurable set, let us consider the collections of functions indexed by A as follows:
fy(x) =
1, x = y
0, x 6= y= 1y(x).
Is supy∈A f(y) measurable?
7.3. RADON-NIKODYM THEOREM 89
Definition 7.3.2 If µ ν and ν µ we say that they are equivalent. In this case they have the sameset of measure zero’s.
Proposition 7.3.3 If µ and ν are equivalent then dµdν = 1
dνdµ
a.e.
Chapter 8
Conditional Expectations
Let us take a probability space (Ω,F ,P). The σ-algebra F represents the set of all observation andmeasurements one can make on a physical system. From observations ( representing partial information)we would like to make predictions on the value of an observables, an observable is a measurable function( a random variable). The observed information is represented by a sub-σ-algebra G. Suppose G = σ(Y )
where Y is a random variable (this represents information obtained by for example an experiment). If arandom variable X : Ω → R is furthermore measurable w.r.t. G, then there exists a Borel measurablefunction ϕ : R→ R, such that X = ϕ(Y ), thus the observable X is determined by the information in G(i.e. by Y ). If X is not measurable, can we predict its value by G, what is the best approximation for it?
We denote by EX or E(X) the integral of X , called the (mathematical) expectation of X:
EX =
∫ΩXdP.
This is the best predicted value for X given F . If X is real valued, EX minimizes, among all constants,the value minc E(X − c)2.
8.1 Conditional expectation and probabilities
Definition 8.1.1 Let X be a real-valued random variable on some probability space (Ω,F ,P) such thatE|X| < ∞ and let F ′ be a sub σ-algebra of F . Then the conditional expectation of X with respect toF ′ is a F ′-measurable random variable X ′ such that∫
AX(ω) P(dω) =
∫AX ′(ω) P(dω) , (8.1)
for every A ∈ F ′. We denote this by X ′ = E(X | F ′).
91
8.1. CONDITIONAL EXPECTATION AND PROBABILITIES 92
Proposition 8.1.1 With the notations as above, the conditional expectation X ′ = E(X | F ′) exists andis essentially unique.
Proof Denote by P also the restriction of P to F ′ and define the measure Q on (Ω,F ′) by
Q(A) =
∫AX(ω) P(dω), ∀A ∈ F ′.
Then Q is a measure on (Ω,G), absolutely continuous with respect to ν. Also Q(Ω) =∫XdP < ∞
so it is a finite measure. Its density with respect to P , given by the Radon-Nikodym theorem, is thena G-measurable functions and is the required conditional expectation. The uniqueness follows from theuniqueness statement in the Radon-Nikodym theorem.
Using the Monotone Class Theorem we obtain:
Proposition 8.1.2 For all bounded G-measurable function g.∫ΩgX dP =
∫ΩgE(X|G) dP. (8.2)
Example 8.1 • If F ′ = φ,Ω, verify that E(X|F ′) = EX . Remember that X ′ being φ,Ω-measurable means that the pre-image under X ′ of an arbitrary Borel set is either φ or Ω, so theconditional expectation is a constant.
• If X is G measurable, then E(X|G) = X a.e.
Example 8.2 If the only information we know is whether a certain event B happened or not, then itshould by now be intuitively clear that the conditional expectation of a random variable X with respectto this information is given by E(X|B), if B happened and by E(X|Bc) if B didn’t happen. It isa straightforward exercise that the conditional expectation of X with respect to the σ-algebra FB =
φ,Ω, B,Bc is indeed given by
E(X | FB)(ω) =
E(X|B) if ω ∈ B
E(X|Bc) otherwise.
Proof. Denote by Y the right hand side. Then, E(1BY ) = E (E(X|B)1B) = E(X1B). SimilarlyE(1BcY ) = E(1BcX), and hence also E(X) = E(Y ).
If Y is a measurable function, we denote
E(X|Y ) = E(X|σ(Y )). (8.3)
Exercise 8.1.1 Suppose that A1, . . . , An is a partition of Ω: they are disjoint and their union is Ω, alsoP(Ai) > 0 for every i. Let F ′ be generated by them. Then
E(X|F ′) =n∑i=1
E(X|Ai)1Ai .
8.1. CONDITIONAL EXPECTATION AND PROBABILITIES 93
If Y takes only a countable values si with positive probability, then Y = si is a partition of Ω and
E(X|Y ) =∑i
E(X|Y = si)1Y=si .
8.1.1 Conditioning on a random variable
In many situations, we will describe F ′ as the σ-algebra generated by another random variable Y :
Ω → X . By the factorization lemma there exists ϕ : X → R such that E(X|Y ) = ϕ(Y ) a.s. If twofunctions ϕ and ϕ′ satisfy this relation, then ϕ = ϕ′ on a set of B(X ) of full measure (measured by thelaw of Y which is denoted by PY ). To see this just note that if A = ω : ϕ(Y (ω) = ϕ(Y (ω) andB = x : ϕ(x) = ϕ(x), then A = Y −1(B) and PY (B) = P(Y −1(B)). And we do know that by theuniqueness fo the conditional expectation, P
(ω : ϕ(Y (ω)) = ϕ(Y (ω)
)= 1.
Notation. We denote by E(X|Y = y) the measurable function ϕ : X → R such that E(X|Y ) =
ϕ(Y ).
E(X|Y ) = ψ(Y ) a.s. If two functions ϕ and ϕ satisfy this relation, by the uniqueness fo the condi-tional expectation,
P(ω : ϕ(Y (ω)) = ϕ(Y (ω)
)= 1.
Write Y∗µ for the pushed forward measure on C. Then
(Y∗µ)(ϕ = ϕ′) = 1.
(Just note that if A = ω : ϕ(Y (ω)) = ϕ(Y (ω)) and B = x : ϕ(x) = ϕ(x), then A = Y −1(B) and(Y∗µ)(B) = P(Y −1(B)) by the definition).
Definition 8.1.2 We denote by E(X|Y = y) the measurable function ϕ : X → R s.t. E(X|Y ) = ϕ(Y ).
Example 8.3 Take a measurable function Y : Ω→ E = y1, y2, . . . , yn. Define
Ai = Y −1(yi) = ω : Y (ω) = yi.
Then σ(Y ) = σAi and Ai ∩Aj = φ if i 6= j. Let X ∈ L1(Ω,F , P ). Define
ϕ(yi) =
E(1AiX)
P (Ai), if P (Ai) 6= 0
0, if P (Ai) = 0.
Define
E(X|Ai) =E(X1Ai)
P (Ai), if P (Ai) 6= 0,
otherwise define it to be zero. Then
ϕ(y) =n∑i=1
E(X|Ai)1Ai(y), ϕ(Y (ω)) =n∑i=1
E(X|Ai) 1Y=yi.
8.2. PROPERTIES 94
Then E(X|Y )(ω) = ϕ(Y (ω)). Indeed for any Ak ∈ σ(Y ),∫Ak
ϕ(Y )dP =
∫Ak
(n∑i=1
E(X|Ai) 1Ai
)dP =
∫Ak
E(X|Ak) dP =
∫Ak
XdP.
We could also define the function E(X|Y = y) as follows.
Definition 8.1.3 Define PY (A) = P(Y ∈ A), the probability distribution of Y . If X takes value in R,then E(X |Y = y) is any random function such that for any B Borel measurable,∫
BE(X |Y = y) PY (dy) =
∫ω:Y (ω)∈B
XdP.
When conditioning on a number of variables Y1, . . . , Yn, the following notation is also used:
E(X|Y1, . . . , Yn) = E(X|σ(Y1) ∨ · · · ∨ σ(Yn)). (8.4)
8.2 Properties
Conditional expectation has very nice properties, they behave almost like integration with respect to a(family of ) measures.
Proposition 8.2.1 • (linearity) If X,Y are integrable and a, b are numbers, then
E(aX + bY |G) = aE(X|G) + bE(Y |G), a.s.
• (positivity preserving) If X ≥ 0 and is integrable then E(X|G) ≥ 0 a.s.
Proof The linearity follows from uniqueness of conditional expectations. (ii) is a direct consequence ofthe Radon-Nikodyme theorem.
Definition 8.2.1 1. We say B is independent of G if P(A ∩B) = P(A)P(B) for every A ∈ G.
2. We say two σ-algebras F and G are independent is any A ∈ F is independent of any B ∈ G.
3. Two random variables are independent if σ(X) is independent of σ(Y ).
4. A random variable is said to be independent of a σ-algebra G, if σ(Y ) and G are independent.
Proposition 8.2.2 1. E(X) = E(E(X|F ′)).
2. If X is F ′ measurable, then E(X|F ′) = X .
8.2. PROPERTIES 95
3. (Tower property) If G1 ⊂ G2,
E(X|G1) = E(E(X|G1)|G2)) = E(E(X|G2)|G1)).
4. (With any information, the best estimate is expectation) If X is independent of G,
E(X|G) = E(X), a.e.
Proof (1) To see this set A = Ω in equation (8.1.1). (2) follows from uniqueness.
(3) If G1 ⊂ G2, E(X|G1) is G2-measurable, so E(X|G1) = E(E(X|G1)|G2). One can also show bythe definition that
E(X|G1) = E(E(X|G2)|G1)).
(4) For any A ∈ G,∫A E(X|G)dP = E(X1A) = E(X)P (A) =
∫A E(X) dP . Since EX ∈ G, the
conclusion follows.
Exercise 8.2.1 Let X,Y ∈ L1(Ω,F , P ) and G a sub-σ-algebra of F . If X is G measurable, XY ∈ L1
thenE(XY |G) = XE(Y |G).
This means we take out ‘what is known’.
Proposition 8.2.3 Let Xn ∈ L1(Ω,F , P ) and G a sub-σ-algebra of F .
1. (conditional monotone convergence) If fn is a sequence of non-negative random variables, suchthat fn ↑ f a.s., then
E(fn|G) ↑ E(f |G), a.s.
2. (Conditional bounded convergence Theorem. ) If |Xn| ≤ g ∈ L1 then
E(Xn|G)→ E(X|G).
3. (Conditional Fatou’s lemma ) If Xn ≥ 0, E(lim infn→∞Xn|G) ≤ lim infn→∞ E(Xn|G).
Proof (1) Apply monotone convergence theorem to fn1Aand E(fn1A) where A ∈ G. Note E(fn|G) isalso non-negative and increasing and has a limit F . Then∫
AfndP ↑
∫AfdP,
∫A
E(fn|G)dP ↑∫AFdP,
Since∫A fndP =
∫A E(fn|G)dP ,
∫A fDP =
∫A FdP and E(f |G) = limn→∞ E(fn|G).
(2) check it holds for X = 1B where B ∈ G and apply Monotone class theorem.
(3) Apply Fatou’s lemma.
8.3. LP SPACES 96
Proposition 8.2.4 Suppose that σ(X) ∨ G is independent of A, then E(X|A ∨ G) = E(X|G).
Proof Let A ∈ A, B ∈ G, ∫A∩B
XdP = E(X1B)P (A),∫A∩B
EX | GdP =
∫A
EX1B|GdP = P (A)E(X1B).
Since A ∩B forms a π-system, and we can prove that
C = D ∈ G ∨ A :
∫DXdP =
∫D
EX|GdP
is a λ system. Hence C = G ∨ A.
8.3 Lp spaces
8.3.1 Mode of convergence
Wo wrap up we note the commonly used notions of convergence. Let (X ,F , µ) be a measure space withσ-finite measure, and fn, f : X → R be Borel measurable functions.
Definition 8.3.1 We say fn converges to f in Lp if∫|fn − f |pdµ→ 0.
Definition 8.3.2 We say fn converges to f in measure if for any ε > 0,
limn→∞
µ(x : |fn(x)− f(x)| > ε) = 0.
Definition 8.3.3 We say fn converges to f a.s. if there exists a null set N and for any x 6∈ N ,
limn→∞
fn(x) = f(x).
Remark 8.3.1 If fn → f in Lp it converges in measure.
Let (X ,B, µ) be a measure space. Let f : X → Rd, denote by |f | is norm in Rd. We define
‖f‖Lp =
(∫|f |p)dµ
) 1p
.
8.3. LP SPACES 97
Definition 8.3.4 Two functions f, g : Ω→ R are considered to be in the same equivalent class if f = g
almost surely.
For 1 ≤ p <∞, the Lp space is the space of equivalent class of measurable |f |p ∈ L1.
Lp = f : X → Rd : measurable,∫X|f |p dµ <∞.
Note that∫|f |pdµ = 0 if and only if |f | = 0 almost surely. That Lp is a vector space is clear from the
elementary inequality |a+ b|p ≤ Cp|a|p + Cp|b|p. Define
‖f‖Lp =
(∫E|f(x)|pdµ(x)
) 1p
.
Then ‖f‖ is a norm which follows from the following:
Theorem 8.3.2 (Minkowski’ inequality) Let f, g ∈ Lp where 1 ≤ p <∞. Then, f + g ∈ Lp and
‖f + g‖Lp ≤ ‖f‖Lp + ‖g‖Lp .
We also use ‖f‖p for ‖f‖Lp .
Proposition 8.3.3 The Lp spaces are vector spaces, they are Banach space under the norm ‖f‖Lp .
This means for every Cauchy sequence xn ∈ X , there exists x ∈ X such that
limn→∞
‖xn − x‖p = 0.
Proof A normed linear is complete if and only if any absolutely convergent sequence is convergent. Letfn ∈ Lp with
∑∞n=1 ‖fn‖p = M <∞. We define
gn(x) =
n∑k=1
|fk(x)|.
Then supn ‖gn‖p ≤M and by the dominated convergence theorem,
limn→∞
∫|gn|pdµ =
∫limn→∞
|gn|pdµ =
∫ ∞∑k=1
|fk(x)|p dµ.
Hence the integral is finite and∑∞
k=1 |fk(x)| <∞ a.e. Then
f(x) =
∞∑k=1
fk(x),
this converges almost everywhere. Set fn(x) =∑n
k=1 fk(x). Then |fn(x)| ≤∑∞
k=1 |fk(x)| ∈ Lp, bythe dominated convergence theorem, (since |fn − f |p ≤ 2
∑∞k=1 |fk(x)| ∈ Lp) |fn − f |p converges in
L1, hence fn → f in Lp. This completes the proof.
8.3. LP SPACES 98
Theorem 8.3.4 Let 1 ≤ p < ∞. If f ∈ Lp(X ) then there exists a sequence of simple functions fn suchthat
limn→∞
∫X|fn − f |pdµ = 0.
Proof We may assume that f ≥ 0 (standard methods passes to f = f+−f−). Let fn ≤ f be a sequenceof non-negative increasing simple functions converging to f (this exists, which theorem?). Define
gn = fp − (f − fn)p.
Then gn → fp, and gn is non-negative and increasing , by the monotone convergence theorem,
limn→∞
∫ (fp − (f − fn)p
)dµ =
∫fpdµ
i.e.
limn→∞
∫(f − fn)pdµ = 0,
completing the proof.
Corollary 8.3.5 Let 1 ≤ p <∞, then Lp(Rn) is a separable metric space.
Sketch proof. Simple functions∑n
i=1 ai1Ai where Ai = Πni=1[c,di] with ai, ci, di ∈ Q is a countable
dense subset of Lp, hence Lp is separable.
Theorem 8.3.6 Let 1 ≤ p < ∞. The space C∞c (Rn) the space of C∞ functions on Rn with compactsupport is dense in Lp(Rn).
Definition 8.3.5 A function ϕ : Rd → R is convex if
ϕ(tx+ (1− t)y) ≤ tϕ(x) + (1− t)ϕ(y)
for any x, y ∈ Rd and for any t ∈ [0, 1].
The following are convex functions: ϕ(x) = |x|, |x|p, p > 1. If d = 1, ex is convex. Jensen’s Inequalitystates that If ϕ is twice differentiable f is convex if and only if ϕ
′′ ≥ 0. Example of convex functionsare: ϕ(x) = xp for p > 1, ϕ(x) = ex.
Theorem 8.3.7 (Jensen’s Inequality) If ϕ is a convex function then for all integrable functions f and µa probability measure
ϕ
(∫fdµ
)≤∫ϕ(f)dµ.
8.3. LP SPACES 99
Proof It is a fact that if ϕ is convex then there exists a constant c ∈ R such that
ϕ(y)− ϕ(∫
fdµ
)≥ c
(y −
∫fdµ
).
Take y = f(x),
ϕ(f(x))− ϕ(∫
fdµ
)≥ c
(f(x)−
∫fdµ
).
Integrate this over with respect to the probability measure µ to conclude.
Example 8.4 Let x1, . . . xn be points in R and suppose that µ =∑n
i=1 λiδxi is a probability measureon R. Then apply Jensen to f(x) = x and any convex ϕ:
ϕ(
n∑i=1
λixi) ≤n∑i=1
λiϕ(xi).
Theorem 8.3.8 (Holder’s inequality (Cauchy-Schwarz if p = q = 2)) Let 1 ≤ p, q ≤ ∞. If f ∈Lp, g ∈ Lq with 1
p +′ f1q = 1 (with convention 1∞ = 0), then fg ∈ L1 and .
∫|fg|dµ ≤
(∫|f |pdµ
) 1p(∫|g|qdµ
) 1q
,1
p+
1
q= 1.
If µ is a finite measure, by Holder inequality, if p < q we have Lp ⊂ Lq.
8.3.2
Proposition 8.3.9 Conditional Jensen’s Inequality. Let ϕ : Rd → R be a convex function. Then
ϕ (E(X|G)) ≤ E(ϕ(X)|G).
For p ≥ 1, ‖E(X|G)‖p ≤ ‖X‖p.
Lemma 8.3.10 If E|Xn −X| → 0, then E|E(Xn|G)− E(X|G)| → 0.
Proof By Jensen’s inequality, |E(Xn −X∣∣G)| ≤ E(|Xn −X|
∣∣G), the latter converges in L1,
EE(|Xn −X|∣∣G) = E|Xn −X| → 0,
completing the proof.
8.3. LP SPACES 100
8.3.3 Conditional expectation as orthogonal projection
We will learn shortly that Lp, the space of Lp integrable and F-measurable functions, is a closed vectorspace under the norm Lp, and simple functions are dense in Lp. Furthermore L2 is a Hilbert space, wedenote it by L2(Ω,F , P ). If G is a sub-σ algebra, the sub-space of L2(Ω,F , P ) that are measurable withrespect to G, which we denote by L2(Ω,G, P ) is a closed sub-space of L2.
Since L2(Ω,F , P ) is a Hilbert space and L2(Ω,G, P ) is a closed subspace of L2, let π denote theorthogonal projection defined by the projection theorem ( §II.2 Functional Analysis [?]),
π : L2(Ω,F , P )→ L2(Ω,G, P ).
Theorem 8.3.11 If X ∈ L2(Ω,F , P ) then E(X|G) = π(X).
Proof * Let f ∈ L2(Ω,F , P ). Then for any h ∈ L2(Ω,G, P ),
〈f − πf, h〉L2(Ω,F ,P ) = 0.
This is,∫
Ω fhdP =∫
Ω πfhdP .
Let A ∈ G and take h = 1A to see that πf = E(f |G).
f − πf
πf
f
Corollary 8.3.12 E(X) is the constant that minimizes E(X − c)2 and E(X|G) minimizes E(X − Y )2
where Y is a G-measurable random variable.
Remark 8.3.13 The projection πX is the unique element of L2(Ω,G, P ) such that
E|X − πX|2 = minY ∈L2(Ω,G,P )
E|X − Y |2.
Remark 8.3.14 The conditional expectation defines on L2 can be extended to L1 to give the conditionalexpectation, as defined earlier. Indeed, simple functions and bounded functions are in L2. Let f ∈ L1
with f ≥ 0. Let fn be a sequence of bounded functions (increasing with n) converging to f pointwise.Then πfn is well defined and ∫
AfndP =
∫AπfndP, A ∈ G.
Since fn increases with n and is positive; so does πfn and limn→∞ πfn exists. We may exchange limitsand integration: ∫
AfdP = lim
n→∞
∫AfndP =
∫A
limn→∞
π(fn)dP.
8.3. LP SPACES 101
Thus we defineE(f |G) = lim
n→∞π(fn)
For f ∈ L1 let f = f+ − f− and define E(f |G) = E(f+|G)− E(f−|G).
8.3.4 Useful Inequalities*
Exercise 8.3.1 (Chebychev’s inequality/ Markov inequality) If X is an Lp random variable then fora ≥ 0,
P (|X| ≥ a) ≤ 1
aE|X|p.
Exercises
Exercise 8.3.2 Suppose that Y1, Y2 be random variables taking values in X1 and X2 respectively. LetZ = E(X|σ(Y1)∨σ(Y2)). Show that the statement that a σ(Y1)∨σ(Y2)-measurable random variable Zis the conditional expectation of X with respect to σ(Y1) ∨ σ(Y2) is equivalent to one of the followingstatements
1. For any A ∈ σ(Y1) and B ∈ σ(Y2),∫A∩B
Z dP =
∫A∩B
X dP,
2. For any gi : Xi → R Borel measurable and bounded,
E (g1(Y1)g2(Y2)Z) = E (g1(Y1)g2(Y2)X) ,
Exercise 8.3.3 Let h : E ×E → R be an integrable function on a metric space E. Let X,Y be randomvariables with state space E such that h(X,Y ) ∈ L1. Let H(y) = E (h(X, y)). Then
E(h(X,Y )|σ(Y )) = H(Y ).
Exercise 8.3.4 Let X : Ω → X1 be F ′ ⊂ F measurable and let Y : Ω → X2 be independent ofF ′. Let Φ : X1 × X2 → R be measurable and such that E|Φ(X,Y )| < ∞. For x ∈ X1 defineg(x) = E[Φ(x, Y )] =
∫Ω Φ(x, Y (ω)
)P(dω). Then
E(
Φ(X,Y )∣∣∣ F ′) = g(X).
Proof Let gi : Xi → R be bounded measurable. Let Φ(x, y) = g1(x)g2(y).
E(g1(X)E(g2(Y )|F ′)
)= g1(X)E
(g2(Y )‖F ′
)= g1(X)Eg2(Y ) = Φ(X).
8.4. CONDITIONING PROBABILITY 102
So g1(X)g2(Y ) ∈ H. where H def=
Φ : E(Φ(X,Y )
∣∣ F ′) = Φ(X)
. The earlier statementshows that if A× B ∈ C, where C = A× B : A ∈ B(X1), B ∈ B(X2). That C is a π system followsfrom A × B ∩ C × D = (A ∩ C) × (B ∩ D). Finally, the π-system C generates B(X1) × B(X2) bythe π − λ theorem, and then we conclude the statement holds for any bounded measurable functions Φ,by approximating Φ with simple functions, and thusH contains all bounded Borel measurable functionsfrom X1 ×X2 → R.
Exercise 8.3.5 Prove that if (X,Y ) has joint density p(x, y), then for any Borel measurable set A andx ∈ R,
P (X ∈ A|Y = y) =
∫A p(x, y)dx∫∞−∞ p(x, y)dx
.
8.4 Conditioning probability
We define similarly the concept of conditional probability.
Definition 8.4.1 Let B ∈ F , define
P(B|F ′) := E(1B|F ′).
This is called the conditional probability of B given F ′.
Remark 8.4.1 ** For every B ∈ F , this identity P(B | F ′) = E(1B | F ′) holds outside of a null set. IfBi is a sequence of disjoint sets, there is a common set of null sets for every Bi, and so
P(∞⋃i=1
Bi | F ′) = E(∑
i
1Bi | F ′)
=∑i
P(B|F ′)
holds a.e. To make P(· | F ′) a probability measure, the σ-additive property must hold for all sequencesof disjoint unions, that is an unaccountable number of sequences in general, and one has to be able tochoose a common set of null sets for all of them. If this can be done we have a family of probabilitymeasures P(·|F ′)(ω), called regular conditional probabilities, and then
E(X|F ′)(ω) =
∫ΩX(ω′)P(dω′|F ′)(ω), a.e..
See Thm 7.1 in ‘Probability measures on metric spaces’ by K.R. Parthasarthy: regular conditional prob-abilities exist for complete separable metric spaces and their Borel σ-algebras.
8.4. CONDITIONING PROBABILITY 103
8.4.1 Finite σ-algebras
For finite σ-algebras, we can construct a family of probability measures Pω, so the conditional expecta-tions is integration w.r.t. this family, by which we mean E(X|G)(ω) =
∫XdP omega.
Let A,B ∈ F . If P (B) > 0. define
P (A|B) =P (A ∩B)
P (B).
Let X : Ω→ R, define,
E(X|B) =E(X1B)
P (B).
If B1, . . . , Bn is a partition of Ω with P (Bi) > 0 for all i, we define G = σBi, i = 1, . . . , n. Thenfor ω ∈ Bi,
E(X|G)(ω) = E(X|Bi)(ω).
Fix ω. For A ∈ F , define
P (A|G)(ω) =
n∑i=1
P (A|Bi)1Bi(ω).
This is a probability measure, denoted by P (dω′|G)(ω). And
E(X|G)(ω) =
∫X(ω′)P (dω′|G)(ω).
For a large class of general σ-algebra, we can indeed construct a family of probability measures,called conditional probabilities.
Chapter 9
Mastery Material: Ergodic Theory
The mastery material are basics of ergodic theory. These are: definitions of measure preserving trans-formations, invariant sets, ergodic measures, Poincare’s Recurrence Theorem and Birkhoff’s ergodicTheorem (and their proofs). These can be found in many textbooks.
• An Introduction to Infinite Ergodic Theory, by Jon Aaronson, published by the Americam Mathe-matical Society. In library, 517.518.1AAR
• Ergodic Theory and Dynamical systems by Yves Coudene (Chapter 2)
https://link.springer.com/content/pdf/10.1007
• Introduction to ergodic theory, by Peter Walters
• Ergodic theory and information by Billingsley.
For your convenience, some basic concepts are given below, this is not the full scope of the readingmaterial. Please refer to the references for a comprehensive study.
9.1 Basic Concepts
Let (X ,F , µ) be a measure space with the measure σ-finite.
Definition 9.1.1 A map T : X → X is a measure preserving transformation (map) if the pushed forwardmeasure by T is the same as µ, i.e.
T∗µ = T.
105
9.1. BASIC CONCEPTS 106
In other words µ(T−1(A)) = µ(A) for every measurable set A. We also say T preserves µ, also µ isinvariant under T , and also µ is an invariant measure for T .
A Dirac measure δa is invariant under T if T leaves the point a fixed.
Definition 9.1.2 1. The invariant σ-algebra of a transformation T is
I = A ∈ F : T−1(A) = A.
2. A function f : X → R is invariant under T if f(Tx) = f(x) for every x ∈ X .
It is easy to see that I is a σ-algebra.
Theorem 9.1.1 (Poincare Recurrence Theorem) Let (X ,F , µ) be a finite measure space and let T :
X → X be a measure preserving transformation. Let A ∈ F . Then for almost every point in A, thereexists some n ≥ 1 (and hence an infinitely many n) such that Tn(x) ∈ A.
If µ is a probability measure the celebrated Birkhoff’s ergodic theorem states that the time average equalsto spatial average.
Theorem 9.1.2 (Birkhoff’s ergodic theorem) Let (X ,F , µ) be a probability space and T preservesmeasure. Then for any f ∈ L1,
limn→∞
1
n
n∑j=1
f(T jx) = E(f |I)(x), a.e.
A transformation is said to be non-singular if T∗µ µ. Let T : X → X preserves a probabilitymeasure µ. We say T is ergodic (we also say µ is ergodic) if whenever A ∈ I then µ(A) = 0 orµ(Ac) = 0.
9.1.1 Example: circle rotation
The Lebesgue measure on the unit circle S1 is defined to be the pushed forward measure on the Lebesguemeasure on [0, 1) to S1 by the map α 7→ e2παi. A rotation is x 7→ xe2πθi where θ ∈ R.
We are going to actually identify the unit circle S1 with the quotient space R/Z, so x ∼ y if x = n+y,where n is an integer. So S1 is [0, 1] with 0 and 1 glued together. From now on this is what we use.
Definition 9.1.3 Let θ ∈ S1 define Tθ : S1 → S1 by Tθ(α) = θ + α mod 1. This is a called a rotationof the circle.
The Lebesgue measure is invariant under any circle rotation.